Commit Graph

6825 Commits

Author SHA1 Message Date
David Xu
a9a48d6862 Lock and unlock sched_lock when walking through thread list, current we
insert kse upcall thread into thread list at mi_switch time, process lock
is not enough.
2003-12-07 23:47:15 +00:00
Don Lewis
50105bcf1a Pass MTX_DEF as the last argument to mtx_init() instead of 0. This
is not a functional change.  The code happened to work properly only
because MTX_DEF is defined as 0.
2003-12-07 21:53:41 +00:00
Poul-Henning Kamp
377e7be416 Make the DIAGNOSTIC code which complains about long {call|time}out(9)
functions less noisy:  We printf if a new function took longer than
the previous record holder, or of the previous record holder took
more than twice as long as the current record.
2003-12-07 20:03:28 +00:00
Marcel Moolenaar
cfa4b1e7b1 Regen due to kse_switchin(2). 2003-12-07 19:36:16 +00:00
Marcel Moolenaar
702b2a179c Add kse_switchin(2). This syscall can be used by KSE implementations
to have the kernel switch to a new thread, instead of doing it in
userland. It is in fact needed on ia64 where syscall restarts do not
return to userland first. It's completely handled inside the kernel.
As such, any context created by the kernel as part of an upcall and
caused by some syscall needs to be restored by the kernel.
2003-12-07 19:34:29 +00:00
Peter Wemm
a2640c9ba9 rqb_bits[] may be an int64_t (eg: on alpha, and recently on amd64).
Be sure to shift (long)1 << 33 and higher, not (int)1.  Otherwise bad
things happen(TM).  This is why beast.freebsd.org paniced with ULE.

Reviewed by:  jeff
2003-12-07 09:57:51 +00:00
Scott Long
774114995e Re-arrange and consolidate some random debugging stuff 2003-12-07 05:04:49 +00:00
Alan Cox
bca62663ab - Giant is no longer required by vm_thread_new(). 2003-12-07 04:16:49 +00:00
Robert Watson
56d9e93207 Rename mac_create_cred() MAC Framework entry point to mac_copy_cred(),
and the mpo_create_cred() MAC policy entry point to
mpo_copy_cred_label().  This is more consistent with similar entry
points for creation and label copying, as mac_create_cred() was
called from crdup() as opposed to during process creation.  For
a number of policies, this removes the requirement for special
handling when copying credential labels, and improves consistency.

Approved by:	re (scottl)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-12-06 21:48:03 +00:00
John Baldwin
b6c71225a9 Fix all users of mp_maxid to use the same semantics, namely:
1) mp_maxid is a valid FreeBSD CPU ID in the range 0 .. MAXCPU - 1.
2) For all active CPUs in the system, PCPU_GET(cpuid) <= mp_maxid.

Approved by:	re (scottl)
Tested on:	i386, amd64, alpha
2003-12-03 14:57:26 +00:00
John Baldwin
45c1c90f6a Export a few SMP related symbols in UP kernels as well. This is needed to
aid other kernel code, especially code which can be in a module such as
the acpi_cpu(4) driver, to work properly with both SMP and UP kernels.
The exported symbols include mp_ncpus, all_cpus, mp_maxid, smp_started, and
the smp_rendezvous() function.  This also means that CPU_ABSENT() is now
always implemented the same on all kernels.

Approved by:	re (scottl)
2003-12-03 14:55:31 +00:00
David Greenman
186e347f2c Fixed a bug in sendfile(2) where the sent data would be corrupted due
to sendfile(2) being erroneously automatically restarted after a signal
is delivered. Fixed by converting ERESTART to EINTR prior to exiting.

Updated manual page to indicate the potential EINTR error, its cause
and consequences.

Approved by: re@freebsd.org
2003-12-01 22:12:50 +00:00
Ian Dowse
25cb5d7a6b In dounmount(), only call checkdirs() prior to VFS_UNMOUNT() in the
forced unmount case. Otherwise, a file system that is referenced
only by process fd_cdir/fd_rdir references to the file system root
vnode will be successfully unmounted without the MNT_FORCE flag.

The previous behaviour was not compatible with the unmount semantics
required by amd(8), so file systems could be unexpectedly unmounted
while there were still references to the file system root directory.

Reported by:	Erez Zadok <ezk@cs.sunysb.edu>
Approved by:	re (scottl)
2003-11-30 23:30:09 +00:00
Jeff Roberson
a6c6a93c89 - Don't forget to unlock the vnode interlock in the LK_NOWAIT case.
Submitted by:	Stephan Uphoff <ups@stups.com>
Approved by:	re (rwatson)
2003-11-30 22:09:58 +00:00
Alexander Kabaev
97c43a540a Do not attempt to destroy NULL vfs options list.
Approved by: re (scottl)
Reported by: Christian Laursen <xi atborderworlds dot dk>
2003-11-23 17:13:48 +00:00
John Baldwin
798a45964d - Split cpu_mp_probe() into two parts. cpu_mp_setmaxid() is still called
very early (SI_SUB_TUNABLES - 1) and is responsible for setting mp_maxid.
  cpu_mp_probe() is now called at SI_SUB_CPU and determines if SMP is
  actually present and sets mp_ncpus and all_cpus.  Splitting these up
  allows an architecture to probe CPUs later than SI_SUB_TUNABLES by just
  setting mp_maxid to MAXCPU in cpu_mp_setmaxid().  This could allow the
  CPU probing code to live in a module, for example, since modules
  sysinit's in modules cannot be invoked prior to SI_SUB_KLD.  This is
  needed to re-enable the ACPI module on i386.
- For the alpha SMP probing code, use LOCATE_PCS() instead of duplicating
  its contents in a few places.  Also, add a smp_cpu_enabled() function
  to avoid duplicating some code.  There is room for further code
  reduction later since much of this code is also present in cpu_mp_start().
- All archs besides i386 still set mp_maxid to the same values they set it
  to before this change.  i386 now sets mp_maxid to MAXCPU.

Tested on:	alpha, amd64, i386, ia64, sparc64
Approved by:	re (scottl)
2003-11-21 22:23:26 +00:00
Mark Murray
4e3a7a14d9 Fix a major faux pas of mine. I was causing 2 very bad things to
happen in interrupt context; 1) sleep locks, and 2) malloc/free
calls.

1) is fixed by using spin locks instead.

2) is fixed by preallocating a FIFO (implemented with a STAILQ)
   and using elements from this FIFO instead. This turns out
   to be rather fast.

OK'ed by:	re (scottl)
Thanks to:	peter, jhb, rwatson, jake
Apologies to:	*
2003-11-20 15:35:48 +00:00
Mark Murray
3fed54aaaa Hackfix to patch around a kernel panic I introduced. Real fix to
follow. In the meanwhile, we are not harvesting interrupt entropy.

Approved by:	re (jhb)
2003-11-18 14:35:43 +00:00
Robert Watson
a557af222b Introduce a MAC label reference in 'struct inpcb', which caches
the   MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols.  This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks.  Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by:	sam, bms
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-18 00:39:07 +00:00
Robert Watson
64d19c2ea7 Add a sysctl, security.bsd.see_other_gids, similar in semantics
to see_other_uids but with the logical conversion.  This is based
on (but not identical to) the patch submitted by Samy Al Bahra.

Submitted by:	Samy Al Bahra <samy@kerneled.com>
2003-11-17 20:20:53 +00:00
Peter Wemm
0d2a298904 Initial landing of SMP support for FreeBSD/amd64.
- This is heavily derived from John Baldwin's apic/pci cleanup on i386.
- I have completely rewritten or drastically cleaned up some other parts.
  (in particular, bootstrap)
- This is still a WIP.  It seems that there are some highly bogus bioses
  on nVidia nForce3-150 boards.  I can't stress how broken these boards
  are.  I have a workaround in mind, but right now the Asus SK8N is broken.
  The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed.
- Most of my testing has been with SCHED_ULE.  SCHED_4BSD works.
- the apic and acpi components are 'standard'.
- If you have an nVidia nForce3-150 board, you are stuck with 'device
  atpic' in addition, because they somehow managed to forget to connect the
  8254 timer to the apic, even though its in the same silicon!  ARGH!
  This directly violates the ACPI spec.
2003-11-17 08:58:16 +00:00
Jeff Roberson
fa9c971710 - Mark ksq_assigned as volatile so that when this code is used without
sched_lock we can be sure that we'll pick up the new value.
2003-11-17 08:27:11 +00:00
Jeff Roberson
093c05e39d - Remove long dead code. rslices hasn't been used in some time and neither
has sched_pickcpu().
2003-11-17 08:24:14 +00:00
Peter Wemm
90e3387e54 Expand the argument to the ithread enable/disable helper hooks from an
int to something big enough to hold a pointer.  amd64 needs this.
2003-11-17 06:08:10 +00:00
Robert Watson
b0323ea3aa Implement sockets support for __mac_get_fd() and __mac_set_fd()
system calls, and prefer these calls over getsockopt()/setsockopt()
for ABI reasons.  When addressing UNIX domain sockets, these calls
retrieve and modify the socket label, not the label of the
rendezvous vnode.

- Create mac_copy_socket_label() entry point based on
  mac_copy_pipe_label() entry point, intended to copy the socket
  label into temporary storage that doesn't require a socket lock
  to be held (currently Giant).

- Implement mac_copy_socket_label() for various policies.

- Expose socket label allocation, free, internalize, externalize
  entry points as non-static from mac_net.c.

- Use mac_socket_label_set() in __mac_set_fd().

MAC-aware applications may now use mac_get_fd(), mac_set_fd(), and
mac_get_peer() to retrieve and set various socket labels without
directly invoking the getsockopt() interface.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-16 23:31:45 +00:00
Robert Watson
9e71dd0feb Reduce gratuitous redundancy and length in function names:
mac_setsockopt_label_set() -> mac_setsockopt_label()
  mac_getsockopt_label_get() -> mac_getsockopt_label()
  mac_getsockopt_peerlabel_get() -> mac_getsockopt_peerlabel()

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-16 18:25:20 +00:00
Alan Cox
e45db9b837 - Modify alpha's sf_buf implementation to use the direct virtual-to-
physical mapping.
 - Move the sf_buf API to its own header file; make struct sf_buf's
   definition machine dependent.  In this commit, we remove an
   unnecessary field from struct sf_buf on the alpha, amd64, and ia64.
   Ultimately, we may eliminate struct sf_buf on those architecures
   except as an opaque pointer that references a vm page.
2003-11-16 06:11:26 +00:00
Robert Watson
12cbb9dc56 When implementing getsockopt() for SO_LABEL and SO_PEERLABEL, make
sure to sooptcopyin() the (struct mac) so that the MAC Framework
knows which label types are being requested.  This fixes process
queries of socket labels.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-16 03:53:36 +00:00
Bruce Evans
416ab90e6b Localized the cy driver's locking. 2003-11-16 00:55:54 +00:00
Poul-Henning Kamp
d87526cf43 Rename the debugging mutex "callout_no_sleep" to "dont_sleep_in_callout". 2003-11-15 18:33:54 +00:00
Tim J. Robbins
4d93f53e74 Initialize sequence numbers to 0 in seminit() instead of using whatever
garbage happens to be in memory. This did not seem to cause any problems
except making semaphore ID's unpredictable (and ugly in ipcs(1) output).
2003-11-15 11:56:53 +00:00
Poul-Henning Kamp
00cbe31bd8 Send B_PHYS out to pasture, it no longer serves any function. 2003-11-15 09:28:09 +00:00
Alan Cox
28c9416429 - Remove the remaining now unnecessary checks for the buf's b_object being
NULL.  See revision 1.421 for more detail.
 - Remove GIANT_REQUIRED from vfs_unbusy_pages().  Discussed with: jeff
2003-11-15 08:45:36 +00:00
Jeff Roberson
155b9987a3 - Introduce kseq_runq_{add,rem}() which are used to insert and remove
kses from the run queues.  Also, on SMP, we track the transferable
   count here.  Threads are transferable only as long as they are on the
   run queue.
 - Previously, we adjusted our load balancing based on the transferable count
   minus the number of actual cpus.  This was done to account for the threads
   which were likely to be running.  All of this logic is simpler now that
   transferable accounts for only those threads which can actually be taken.
   Updated various places in sched_add() and kseq_balance() to account for
   this.
 - Rename kseq_{add,rem} to kseq_load_{add,rem} to reflect what they're
   really doing.  The load is accounted for seperately from the runq because
   the load is accounted for even as the thread is running.
 - Fix a bug in sched_class() where we weren't properly using the PRI_BASE()
   version of the kg_pri_class.
 - Add a large comment that describes the impact of a seemingly simple
   conditional in sched_add().
 - Also in sched_add() check the transferable count and KSE_CAN_MIGRATE()
   prior to checking kseq_idle.  This reduces the frequency of access for
   kseq_idle which is a shared resource.
2003-11-15 07:32:07 +00:00
Olivier Houchard
1a29c80648 Better fix than my previous commit:
in exit1(), make sure the p_klist is empty after sending NOTE_EXIT.
The process won't report fork() or execve() and won't be able to handle
NOTE_SIGNAL knotes anyway.
This fixes some race conditions with do_tdsignal() calling knote() while
the process is exiting.

Reported by:	Stefan Farfeleder <stefan@fafoe.narf.at>
MFC after:	1 week
2003-11-14 18:49:01 +00:00
Alexander Kabaev
3b39740df8 Fix a number of style(9) bugs introduced in r1.113 by me.
Suggested by:	bde
2003-11-14 05:27:41 +00:00
Jeff Roberson
808674fd0e - regen. 2003-11-14 03:49:41 +00:00
Jeff Roberson
5c49a0566a - Revision 1.156 marked ptrace() SMP safe. Unfortunately, alpha implements
parts of ptrace using proc_rwmem().  proc_rwmem() requires giant, and
   giant must be acquired prior to the proc lock, so ptrace must require giant
   still.
2003-11-14 03:48:37 +00:00
Poul-Henning Kamp
555a5de270 Various minor details:
Give the HZ/overflow check a 10% margin.
	Eliminate bogus newline.
	If timecounters have equal quality, prefer higher frequency.

Some inspiration from:	bde
2003-11-13 10:03:58 +00:00
John Baldwin
79a13d0182 - Close a race where a thread on another CPU could release a contested lock
and empty its turnstile while the blocking threads still pointed to the
  turnstile.  If the thread on the first CPU blocked on a lock owned by
  one of the threads blocked on the turnstile just woken up, then the
  first CPU could try to manipulate a bogus thread queue in the turnstile
  during priority propagation.
- Update locking notes for ts_owner and always clear ts_owner, not just
  under INVARIANTS.

Tested by:      sam (1)
2003-11-12 23:48:42 +00:00
Kirk McKusick
48b0f4b67d At the request of several developers, restore the DIAGNOSIC code
deleted in 1.81. Increase the initial timeout limit to 2ms to
eliminate spurious messages of excessive timeouts in the NFS
client code.

Requested by:	Poul-Henning Kamp <phk@phk.freebsd.dk>
Requested by:	Mike Silbersack <silby@silby.com>
Requested by:	Sam Leffler <sam@errno.com>
2003-11-12 22:28:27 +00:00
Robert Watson
f0ab044241 Mark __mac_get_pid() as MPSAFE in the comment, as it runs without
Giant and is also MPSAFE.

Push Giant further down into __mac_get_fd() and __mac_set_fd(),
grabbing it only for constrained regions dealing with VFS, and
dropping it entirely for operations related to labeling of pipes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-11-12 22:19:15 +00:00
Peter Wemm
cde6302bf0 MNAMELEN is back to an int again after Kirk's statfs commit
kern/vfs_mount.c:1305: warning: signed size_t format, different type arg (arg 4)
*** Error code 1
2003-11-12 17:09:12 +00:00
John Baldwin
861a7db56f Fix a typo in a comment.
Submitted by:	das
2003-11-12 14:55:45 +00:00
Poul-Henning Kamp
1415a09d42 Replace B_PHYS conditional assignment to bio_offset with KASSERT check
to see that the originating code already did it right.
2003-11-12 10:27:06 +00:00
Kirk McKusick
1977597b34 Update the five files derived from /sys/kern/syscalls.master
after the additions made for the new statfs structure (version
1.157). These must be updated in a separate checkin after
syscalls.master has been checked in so that they reflect its
new CVS identity. As these are purely derived files, it is not
clear to me why they are under CVS at all. I presume that it has
something to do with having `make world' operate properly.
2003-11-12 08:09:19 +00:00
Kirk McKusick
fde81c7d8e Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Tim Robbins <tjr@freebsd.org>
Reviewed by:	Julian Elischer <julian@elischer.org>
Reviewed by:	the hoards of <arch@freebsd.org>
Sponsored by:   DARPA & NAI Labs.
2003-11-12 08:01:40 +00:00
Robert Watson
eca8a663d4 Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c).  This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE.  This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time.  This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change.  Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from:	bmilekic
Obtained from:		TrustedBSD Project
Sponsored by:		DARPA, Network Associates Laboratories
2003-11-12 03:14:31 +00:00
Alexander Kabaev
5c957adbf1 1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.

2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.

3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.

4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.

Submitted by:  (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by:    rwatson
2003-11-12 02:54:47 +00:00
John Baldwin
961a7b244d Add an implementation of turnstiles and change the sleep mutex code to use
turnstiles to implement blocking isntead of implementing a thread queue
directly.  These turnstiles are somewhat similar to those used in Solaris 7
as described in Solaris Internals but are also different.

Turnstiles do not come out of a fixed-sized pool.  Rather, each thread is
assigned a turnstile when it is created that it frees when it is destroyed.
When a thread blocks on a lock, it donates its turnstile to that lock to
serve as queue of blocked threads.  The queue associated with a given lock
is found by a lookup in a simple hash table.  The turnstile itself is
protected by a lock associated with its entry in the hash table.  This
means that sched_lock is no longer needed to contest on a mutex.  Instead,
sched_lock is only used when manipulating run queues or thread priorities.
Turnstiles also implement priority propagation inherently.

Currently turnstiles only support mutexes.  Eventually, however, turnstiles
may grow two queue's to support a non-sleepable reader/writer lock
implementation.  For more details, see the comments in sys/turnstile.h and
kern/subr_turnstile.c.

The two primary advantages from the turnstile code include: 1) the size
of struct mutex shrinks by four pointers as it no longer stores the
thread queue linkages directly, and 2) less contention on sched_lock in
SMP systems including the ability for multiple CPUs to contend on different
locks simultaneously (not that this last detail is necessarily that much of
a big win).  Note that 1) means that this commit is a kernel ABI breaker,
so don't mix old modules with a new kernel and vice versa.

Tested on:	i386 SMP, sparc64 SMP, alpha SMP
2003-11-11 22:07:29 +00:00