9959 Commits

Author SHA1 Message Date
andre
14d6d7db44 Make the TCP timer callout obtain Giant if the network stack is marked
as non-mpsafe.

This change is to be removed when all protocols are mp-safe.
2007-05-11 20:52:47 +00:00
rwatson
3cf529ac1e Remove more one more stale comment regarding unpcb type-safety. 2007-05-11 12:28:45 +00:00
rwatson
144e902f6c Clarify and update quite a few comments to reflect locking optimizations,
the addition of unpcb refcounts, and bug fixes.  Some of these fixes are
appropriate for MFC.

MFC after:	3 days
2007-05-11 12:10:45 +00:00
jhb
7b6b99abf6 Add destroyed cookie values for sx locks and rwlocks as well as extra
KASSERTs so that any lock operations on a destroyed lock will panic or
hang.
2007-05-08 21:51:37 +00:00
jhb
86f8e9262c Teach 'show lock' to properly handle a destroyed mutex. 2007-05-08 21:50:46 +00:00
jhb
b5754be873 Fix a potential LOR with sx_sleep() and cv_wait() with sx locks by
1) adding the thread to the sleepq via sleepq_add() before dropping the
lock, and 2) dropping the sleepq lock around calls to lc_unlock() for
sleepable locks (i.e. locks that use sleepq's in their implementation).
2007-05-08 21:49:59 +00:00
yongari
ff40eaca0e Add missing socket buffer unlock before returning to userland.
Reviewed by:    rwatson
2007-05-08 12:34:14 +00:00
piso
25c9d95cb5 Bring in the reminaing bits to make interrupt filtering work:
o push much of the i386 and amd64 MD interrupt handling code
  (intr_machdep.c::intr_execute_handlers()) into MI code
  (kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
  (intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
  and make them part of 'struct intr_event', passing them as arguments to
  kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
  with filter and ithread functions.

Approved by: re (implicit?)
2007-05-06 17:02:50 +00:00
wkoszek
a8246d5e35 Don't acquire Giant unconditionally.
Reviewed by:	rwatson
2007-05-06 12:00:38 +00:00
kib
cef0254760 Mark the filedescriptor table entries with VOP_OPEN being performed for them
as UF_OPENING. Disable closing of that entries. This should fix the crashes
caused by devfs_open() (and fifo_open()) dereferencing struct file * by
index, while the filedescriptor is closed by parallel thread.

Idea by:	tegge
Reviewed by:	tegge (previous version of patch)
Tested by:	Peter Holm
Approved by:	re (kensmith)
MFC after:	3 weeks
2007-05-04 14:23:29 +00:00
rwatson
20848234d9 sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags
on each socket buffer with the socket buffer's mutex.  This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.

This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.

While here, fix two historic bugs:

(1) a bug allowing I/O to be occasionally interlaced during long I/O
    operations (discovere by Isilon).

(2) a bug in which failed non-blocking acquisition of the socket buffer
    I/O serialization lock might be ignored (discovered by sam).

SCTP portion of this patch submitted by rrs.
2007-05-03 14:42:42 +00:00
alc
76816978f2 Remove unneeded include files. 2007-05-01 06:35:54 +00:00
jmg
eac96af0bf Complete removal of restriction about overlaps to rman_manage_region:
remove comment and man page verbage...

Document return values for rman_init and rman_manage_region..

MFC after:	1 week
2007-04-28 07:37:49 +00:00
jhb
8d6f49b731 Avoid a lot of code duplication by using kern_open() to open /dev/null
in fdcheckstd() instead of a stripped down version of kern_open()'s code.

MFC after:	1 week
Reviewed by:	cperciva
2007-04-26 18:01:19 +00:00
kib
83b7f9371b Allow the dounmount() to proceed even for doomed coveredvp.
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
  for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.

Submitted by:	sobomax
MFC after:	2 weeks
2007-04-26 08:56:56 +00:00
kib
543a201e4b Disable nesting of BOP_BDFLUSH(). VOP_FSYNC() call in bdwrite() could
result in bdwrite() being reentered, thus causing infinite recursion.

Reported and tested by:	Peter Holm
Reviewed by:	tegge
MFC after:	2 weeks
2007-04-24 10:59:21 +00:00
pjd
19d0863e4a Correct typo. 2007-04-23 12:53:00 +00:00
rwatson
d1196975a0 Remove MAC Framework access control check entry points made redundant with
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting.  These entry points exactly aligned with privileges and
provided no additional security context:

- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()

Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies.  These mostly, but not
entirely, align with the set of privileges granted in jails.

Obtained from:	TrustedBSD Project
2007-04-22 15:31:22 +00:00
sepotvin
a1e73b1eaf Add support for specifying a minimal size for vm.kmem_size in the loader via
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.

Approved by: njl (mentor)
Reviewed by: alc
2007-04-21 01:14:48 +00:00
pjd
acc4c54fc5 Don't reinvent vm_page_grab().
Reviewed by:	ups
2007-04-20 19:49:20 +00:00
kmacy
d7e93cb21b Schedule the ithread on the same cpu as the interrupt
Tested by: kmacy
Submitted by: jeffr
2007-04-20 05:45:46 +00:00
jkoshy
4552216c34 Fix witness(4) warnings about mutex use.
Group mutexes used in hwpmc(4) into 3 "types" in the sense of
witness(4):

 - leaf spin mutexes---only one of these should be held at a time,
   so these mutexes are specified as belonging to a single witness
   type "pmc-leaf".

 - `struct pmc_owner' descriptors are protected by a spin mutex of
   witness type "pmc-owner-proc".  Since we call wakeup_one() while
   holding these mutexes, the witness type of these mutexes needs
   to dominate that of "sleepq chain" mutexes.

 - logger threads use a sleep mutex, of type "pmc-sleep".

Submitted by:	wkoszek (earlier patch)
2007-04-19 08:02:51 +00:00
pjd
e728588aa7 Fix a bug in sendfile(2) when files larger than page size and nbytes=0.
When nbytes=0, sendfile(2) should use file size. Because of the bug, it
was sending half of a file. The bug is that 'off' variable can't be used
for size calculation, because it changes inside the loop, so we should
use uap->offset instead.
2007-04-19 05:54:45 +00:00
njl
95e9f5610b Bump the interrupt storm detection counter to 1000. My slow fileserver
gets a bogus irq storm detected when periodic daily kicks off at 3 am
and disconnects the disk.  Change the print logic to print once per second
when the storm is occurring instead of only once.  Otherwise, it appeared
that something else was causing the errors each night at 3 am since the
print only occurred the first time.

Reviewed by:	jhb
MFC after:	1 week
2007-04-19 01:24:32 +00:00
pjd
4d856175c4 Export vfs_mount_alloc() as it is used in ZFS. 2007-04-17 21:14:06 +00:00
jhb
84f8e133d5 - Add a 'show rman <rm>' DDB command to dump the resources in a resource
manager similar to 'devinfo -u'.
- Add a 'show allrman' DDB command that effectively does 'show rman' on all
  resource managers in the system.
2007-04-16 21:09:03 +00:00
kmacy
6ebe8e3a88 remove now invalid check from m_sanity
panic on m_sanity check failure with INVARIANTS
2007-04-14 20:19:16 +00:00
pjd
ad49fbe326 Fix jails and jail-friendly file systems handling:
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
  for jailed root from different jails, etc.

OK'ed by:	rwatson
2007-04-13 23:54:22 +00:00
pjd
e140c1e4f1 When we are running low on vnodes, there is currently no way to ask other
subsystems to release some vnodes. Implement backpressure based on
vfs_lowvnodes event (similar to vm_lowmem for memory).
2007-04-13 08:38:48 +00:00
rwatson
9228026070 Remove now-obsolete comment regarding mqueue privileges in jail. 2007-04-11 16:22:59 +00:00
rwatson
2a40223841 Allow PRIV_NETINET_REUSEPORT in jail. 2007-04-10 15:59:49 +00:00
rwatson
7ce5969854 Do allow POSIX mqueue unlink privilege inside a jail, as we all all
other POSIX mqueue privileges inside a jail.
2007-04-10 15:40:27 +00:00
pjd
c6b82992cd Minor style cleanups (mostly removal of trailing whitespaces). 2007-04-10 15:29:37 +00:00
pjd
592c863ef3 Correct typos. 2007-04-10 15:22:40 +00:00
njl
7d4003b184 Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec
if a race was lost.  We're still single-threaded at this point, but just
be safe for the future.
2007-04-09 21:10:04 +00:00
njl
d6c7a51b9a Clean up the root mount and mount wait code. No mutexes are needed here
since a spurious wakeup() is the only possible outcome and this is fine in
the BSD programming model.
2007-04-09 19:23:52 +00:00
pjd
d74643dd36 Add kern.hostuuid sysctl, which will be used to keep host's UUID.
Reviewed by:	mlaier, rink, brooks, rwatson
2007-04-09 19:18:09 +00:00
pjd
6f7c2e9be3 Add root_mounted() function that returns true if the root file system is
already mounted.
2007-04-08 23:54:01 +00:00
pjd
7e25d4e142 prison_free() can be called with a mutex held. This wasn't a problem until
I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR,
move prison removal to prison_complete() entirely. To ensure that noone
will reference this prison before it's beeing removed from the list skip
prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to
greater than 0 in prison_hold().

Reported by:	kris
OK'ed by:	rwatson
2007-04-08 10:46:23 +00:00
pjd
313c37aa50 Only use prison mutex to protect the fields that need to be protected by it. 2007-04-08 10:21:38 +00:00
pjd
314f7e9104 pr_list is protected by the allprison_lock. 2007-04-08 02:13:32 +00:00
rwatson
033364d5a1 Remove XXX comment that changes to file fields should be protected with
the file lock rather than the filedesc lock: I fixed this in the last
revision.

Spotted by:	kris
2007-04-06 23:31:30 +00:00
pjd
5cde9c6089 allprison mutex was converted to sx(9) lock. 2007-04-05 23:32:32 +00:00
pjd
f9a3a5e1fc Implement functionality I called 'jail services'.
It may be used for external modules to attach some data to jail's in-kernel
structure.

- Change allprison_mtx mutex to allprison_sx sx(9) lock.
  We will need to call external functions while holding this lock, which may
  want to allocate memory.
  Make use of the fact that this is shared-exclusive lock and use shared
  version when possible.
- Implement the following functions:
  prison_service_register() - registers a service that wants to be noticed
	when a jail is created and destroyed
  prison_service_deregister() - deregisters service
  prison_service_data_add() - adds service-specific data to the jail structure
  prison_service_data_get() - takes service-specific data from the jail
	structure
  prison_service_data_del() - removes service-specific data from the jail
	structure

Reviewed by:	rwatson
2007-04-05 23:19:13 +00:00
pjd
21853869a0 Make prison_find() globally accessible. 2007-04-05 21:34:54 +00:00
pjd
4718e01f98 Implement SEEK_DATA and SEEK_HOLE extensions to lseek(2) as found in
OpenSolaris. For more information please refer to:

	http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data
2007-04-05 21:10:53 +00:00
pjd
7e73da14eb Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by:	rwatson
2007-04-05 21:03:05 +00:00
kmacy
3daa1603f7 Fix mb_ctor_clust and mb_dtor_clust to reference the appropriate zone,
simplify setting refcnt

Reviewed by: andre, rwatson, and glebius
MFC after: 3 days
2007-04-04 21:27:01 +00:00
rwatson
765a83fd79 Replace custom file descriptor array sleep lock constructed using a mutex
and flags with an sxlock.  This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention.  All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently.  Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.

- Generally eliminate the distinction between "fast" and regular
  acquisisition of the filedesc lock; the plan is that they will now all
  be fast.  Change all locking instances to either shared or exclusive
  locks.

- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
  was called without the mutex held; sx_sleep() is now always called with
  the sxlock held exclusively.

- Universally hold the struct file lock over changes to struct file,
  rather than the filedesc lock or no lock.  Always update the f_ops
  field last. A further memory barrier is required here in the future
  (discussed with jhb).

- Improve locking and reference management in linux_at(), which fails to
  properly acquire vnode references before using vnode pointers.  Annotate
  improper use of vn_fullpath(), which will be replaced at a future date.

In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).

Tested by:	kris
Discussed with:	jhb, kris, attilio, jeff
2007-04-04 09:11:34 +00:00
kmacy
16fabe1e36 fix typo 2007-04-04 00:11:22 +00:00