o push much of the i386 and amd64 MD interrupt handling code
(intr_machdep.c::intr_execute_handlers()) into MI code
(kern_intr.c::ithread_loop())
o move filter handling to kern_intr.c::intr_filter_loop()
o factor out the code necessary to mask and ack an interrupt event
(intr_machdep.c::intr_eoi_src() and intr_machdep.c::intr_disab_eoi_src()),
and make them part of 'struct intr_event', passing them as arguments to
kern_intr.c::intr_event_create().
o spawn a private ithread per handler (struct intr_handler::ih_thread)
with filter and ithread functions.
Approved by: re (implicit?)
as UF_OPENING. Disable closing of that entries. This should fix the crashes
caused by devfs_open() (and fifo_open()) dereferencing struct file * by
index, while the filedescriptor is closed by parallel thread.
Idea by: tegge
Reviewed by: tegge (previous version of patch)
Tested by: Peter Holm
Approved by: re (kensmith)
MFC after: 3 weeks
on each socket buffer with the socket buffer's mutex. This sleep lock is
used to serialize I/O on sockets in order to prevent I/O interlacing.
This change replaces the custom sleep lock with an sx(9) lock, which
results in marginally better performance, better handling of contention
during simultaneous socket I/O across multiple threads, and a cleaner
separation between the different layers of locking in socket buffers.
Specifically, the socket buffer mutex is now solely responsible for
serializing simultaneous operation on the socket buffer data structure,
and not for I/O serialization.
While here, fix two historic bugs:
(1) a bug allowing I/O to be occasionally interlaced during long I/O
operations (discovere by Isilon).
(2) a bug in which failed non-blocking acquisition of the socket buffer
I/O serialization lock might be ignored (discovered by sam).
SCTP portion of this patch submitted by rrs.
In dounmount(), before or while vn_lock(coveredvp) is called, coveredvp
vnode may be VI_DOOMED due to one of the following:
- other thread finished unmount and vput()ed it, and vnode was chosen
for recycling, while vn_lock() slept;
- forced unmount of the coveredvp->v_mount fs.
In the first case, next check for changed v_mountedhere or mnt_gen counter
would be successfull. In the second case, the unmount shall be allowed.
Submitted by: sobomax
MFC after: 2 weeks
the introduction of priv(9) and MAC Framework entry points for privilege
checking/granting. These entry points exactly aligned with privileges and
provided no additional security context:
- mac_check_sysarch_ioperm()
- mac_check_kld_unload()
- mac_check_settime()
- mac_check_system_nfsd()
Add mpo_priv_check() implementations to Biba and LOMAC policies, which,
for each privilege, determine if they can be granted to processes
considered unprivileged by those two policies. These mostly, but not
entirely, align with the set of privileges granted in jails.
Obtained from: TrustedBSD Project
vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will
be at least 256mb (for example) without forcing a particular value via vm.kmem_size.
Approved by: njl (mentor)
Reviewed by: alc
Group mutexes used in hwpmc(4) into 3 "types" in the sense of
witness(4):
- leaf spin mutexes---only one of these should be held at a time,
so these mutexes are specified as belonging to a single witness
type "pmc-leaf".
- `struct pmc_owner' descriptors are protected by a spin mutex of
witness type "pmc-owner-proc". Since we call wakeup_one() while
holding these mutexes, the witness type of these mutexes needs
to dominate that of "sleepq chain" mutexes.
- logger threads use a sleep mutex, of type "pmc-sleep".
Submitted by: wkoszek (earlier patch)
When nbytes=0, sendfile(2) should use file size. Because of the bug, it
was sending half of a file. The bug is that 'off' variable can't be used
for size calculation, because it changes inside the loop, so we should
use uap->offset instead.
gets a bogus irq storm detected when periodic daily kicks off at 3 am
and disconnects the disk. Change the print logic to print once per second
when the storm is occurring instead of only once. Otherwise, it appeared
that something else was causing the errors each night at 3 am since the
print only occurred the first time.
Reviewed by: jhb
MFC after: 1 week
- We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail.
- Move security checks to vfs_suser() and deny unmounting and updating
for jailed root from different jails, etc.
OK'ed by: rwatson
I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR,
move prison removal to prison_complete() entirely. To ensure that noone
will reference this prison before it's beeing removed from the list skip
prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to
greater than 0 in prison_hold().
Reported by: kris
OK'ed by: rwatson
It may be used for external modules to attach some data to jail's in-kernel
structure.
- Change allprison_mtx mutex to allprison_sx sx(9) lock.
We will need to call external functions while holding this lock, which may
want to allocate memory.
Make use of the fact that this is shared-exclusive lock and use shared
version when possible.
- Implement the following functions:
prison_service_register() - registers a service that wants to be noticed
when a jail is created and destroyed
prison_service_deregister() - deregisters service
prison_service_data_add() - adds service-specific data to the jail structure
prison_service_data_get() - takes service-specific data from the jail
structure
prison_service_data_del() - removes service-specific data from the jail
structure
Reviewed by: rwatson
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.
A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.
There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.
Reviewed by: rwatson
and flags with an sxlock. This leads to a significant and measurable
performance improvement as a result of access to shared locking for
frequent lookup operations, reduced general overhead, and reduced overhead
in the event of contention. All of these are imported for threaded
applications where simultaneous access to a shared file descriptor array
occurs frequently. Kris has reported 2x-4x transaction rate improvements
on 8-core MySQL benchmarks; smaller improvements can be expected for many
workloads as a result of reduced overhead.
- Generally eliminate the distinction between "fast" and regular
acquisisition of the filedesc lock; the plan is that they will now all
be fast. Change all locking instances to either shared or exclusive
locks.
- Correct a bug (pointed out by kib) in fdfree() where previously msleep()
was called without the mutex held; sx_sleep() is now always called with
the sxlock held exclusively.
- Universally hold the struct file lock over changes to struct file,
rather than the filedesc lock or no lock. Always update the f_ops
field last. A further memory barrier is required here in the future
(discussed with jhb).
- Improve locking and reference management in linux_at(), which fails to
properly acquire vnode references before using vnode pointers. Annotate
improper use of vn_fullpath(), which will be replaced at a future date.
In fcntl(), we conservatively acquire an exclusive lock, even though in
some cases a shared lock may be sufficient, which should be revisited.
The dropping of the filedesc lock in fdgrowtable() is no longer required
as the sxlock can be held over the sleep operation; we should consider
removing that (pointed out by attilio).
Tested by: kris
Discussed with: jhb, kris, attilio, jeff
- Close the new file objects created during socketpair() if the copyout of
the new file descriptors fails.
- Add a test to the socketpair regression test for this edge case.