freebsd-skq

Author	SHA1	Message	Date
rwatson	d1196975a0	Remove MAC Framework access control check entry points made redundant with the introduction of priv(9) and MAC Framework entry points for privilege checking/granting. These entry points exactly aligned with privileges and provided no additional security context: - mac_check_sysarch_ioperm() - mac_check_kld_unload() - mac_check_settime() - mac_check_system_nfsd() Add mpo_priv_check() implementations to Biba and LOMAC policies, which, for each privilege, determine if they can be granted to processes considered unprivileged by those two policies. These mostly, but not entirely, align with the set of privileges granted in jails. Obtained from: TrustedBSD Project	2007-04-22 15:31:22 +00:00
sepotvin	a1e73b1eaf	Add support for specifying a minimal size for vm.kmem_size in the loader via vm.kmem_size_min. Useful when using ZFS to make sure that vm.kmem size will be at least 256mb (for example) without forcing a particular value via vm.kmem_size. Approved by: njl (mentor) Reviewed by: alc	2007-04-21 01:14:48 +00:00
pjd	acc4c54fc5	Don't reinvent vm_page_grab(). Reviewed by: ups	2007-04-20 19:49:20 +00:00
kmacy	d7e93cb21b	Schedule the ithread on the same cpu as the interrupt Tested by: kmacy Submitted by: jeffr	2007-04-20 05:45:46 +00:00
jkoshy	4552216c34	Fix witness(4) warnings about mutex use. Group mutexes used in hwpmc(4) into 3 "types" in the sense of witness(4): - leaf spin mutexes---only one of these should be held at a time, so these mutexes are specified as belonging to a single witness type "pmc-leaf". - `struct pmc_owner' descriptors are protected by a spin mutex of witness type "pmc-owner-proc". Since we call wakeup_one() while holding these mutexes, the witness type of these mutexes needs to dominate that of "sleepq chain" mutexes. - logger threads use a sleep mutex, of type "pmc-sleep". Submitted by: wkoszek (earlier patch)	2007-04-19 08:02:51 +00:00
pjd	e728588aa7	Fix a bug in sendfile(2) when files larger than page size and nbytes=0. When nbytes=0, sendfile(2) should use file size. Because of the bug, it was sending half of a file. The bug is that 'off' variable can't be used for size calculation, because it changes inside the loop, so we should use uap->offset instead.	2007-04-19 05:54:45 +00:00
njl	95e9f5610b	Bump the interrupt storm detection counter to 1000. My slow fileserver gets a bogus irq storm detected when periodic daily kicks off at 3 am and disconnects the disk. Change the print logic to print once per second when the storm is occurring instead of only once. Otherwise, it appeared that something else was causing the errors each night at 3 am since the print only occurred the first time. Reviewed by: jhb MFC after: 1 week	2007-04-19 01:24:32 +00:00
pjd	4d856175c4	Export vfs_mount_alloc() as it is used in ZFS.	2007-04-17 21:14:06 +00:00
jhb	84f8e133d5	- Add a 'show rman <rm>' DDB command to dump the resources in a resource manager similar to 'devinfo -u'. - Add a 'show allrman' DDB command that effectively does 'show rman' on all resource managers in the system.	2007-04-16 21:09:03 +00:00
kmacy	6ebe8e3a88	remove now invalid check from m_sanity panic on m_sanity check failure with INVARIANTS	2007-04-14 20:19:16 +00:00
pjd	ad49fbe326	Fix jails and jail-friendly file systems handling: - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson	2007-04-13 23:54:22 +00:00
pjd	e140c1e4f1	When we are running low on vnodes, there is currently no way to ask other subsystems to release some vnodes. Implement backpressure based on vfs_lowvnodes event (similar to vm_lowmem for memory).	2007-04-13 08:38:48 +00:00
rwatson	9228026070	Remove now-obsolete comment regarding mqueue privileges in jail.	2007-04-11 16:22:59 +00:00
rwatson	2a40223841	Allow PRIV_NETINET_REUSEPORT in jail.	2007-04-10 15:59:49 +00:00
rwatson	7ce5969854	Do allow POSIX mqueue unlink privilege inside a jail, as we all all other POSIX mqueue privileges inside a jail.	2007-04-10 15:40:27 +00:00
pjd	c6b82992cd	Minor style cleanups (mostly removal of trailing whitespaces).	2007-04-10 15:29:37 +00:00
pjd	592c863ef3	Correct typos.	2007-04-10 15:22:40 +00:00
njl	7d4003b184	Restore the locking for the sleep/wakeup to avoid waiting an extra 1 sec if a race was lost. We're still single-threaded at this point, but just be safe for the future.	2007-04-09 21:10:04 +00:00
njl	d6c7a51b9a	Clean up the root mount and mount wait code. No mutexes are needed here since a spurious wakeup() is the only possible outcome and this is fine in the BSD programming model.	2007-04-09 19:23:52 +00:00
pjd	d74643dd36	Add kern.hostuuid sysctl, which will be used to keep host's UUID. Reviewed by: mlaier, rink, brooks, rwatson	2007-04-09 19:18:09 +00:00
pjd	6f7c2e9be3	Add root_mounted() function that returns true if the root file system is already mounted.	2007-04-08 23:54:01 +00:00
pjd	7e25d4e142	prison_free() can be called with a mutex held. This wasn't a problem until I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR, move prison removal to prison_complete() entirely. To ensure that noone will reference this prison before it's beeing removed from the list skip prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to greater than 0 in prison_hold(). Reported by: kris OK'ed by: rwatson	2007-04-08 10:46:23 +00:00
pjd	313c37aa50	Only use prison mutex to protect the fields that need to be protected by it.	2007-04-08 10:21:38 +00:00
pjd	314f7e9104	pr_list is protected by the allprison_lock.	2007-04-08 02:13:32 +00:00
rwatson	033364d5a1	Remove XXX comment that changes to file fields should be protected with the file lock rather than the filedesc lock: I fixed this in the last revision. Spotted by: kris	2007-04-06 23:31:30 +00:00
pjd	5cde9c6089	allprison mutex was converted to sx(9) lock.	2007-04-05 23:32:32 +00:00
pjd	f9a3a5e1fc	Implement functionality I called 'jail services'. It may be used for external modules to attach some data to jail's in-kernel structure. - Change allprison_mtx mutex to allprison_sx sx(9) lock. We will need to call external functions while holding this lock, which may want to allocate memory. Make use of the fact that this is shared-exclusive lock and use shared version when possible. - Implement the following functions: prison_service_register() - registers a service that wants to be noticed when a jail is created and destroyed prison_service_deregister() - deregisters service prison_service_data_add() - adds service-specific data to the jail structure prison_service_data_get() - takes service-specific data from the jail structure prison_service_data_del() - removes service-specific data from the jail structure Reviewed by: rwatson	2007-04-05 23:19:13 +00:00
pjd	21853869a0	Make prison_find() globally accessible.	2007-04-05 21:34:54 +00:00
pjd	4718e01f98	Implement SEEK_DATA and SEEK_HOLE extensions to lseek(2) as found in OpenSolaris. For more information please refer to: http://blogs.sun.com/bonwick/entry/seek_hole_and_seek_data	2007-04-05 21:10:53 +00:00
pjd	7e73da14eb	Add security.jail.mount_allowed sysctl, which allows to mount and unmount jail-friendly file systems from within a jail. Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user. It is turned off by default. A jail-friendly file system is a file system which driver registers itself with VFCF_JAIL flag via VFS_SET(9) API. The lsvfs(1) command can be used to see which file systems are jail-friendly ones. There currently no jail-friendly file systems, ZFS will be the first one. In the future we may consider marking file systems like nullfs as jail-friendly. Reviewed by: rwatson	2007-04-05 21:03:05 +00:00
kmacy	3daa1603f7	Fix mb_ctor_clust and mb_dtor_clust to reference the appropriate zone, simplify setting refcnt Reviewed by: andre, rwatson, and glebius MFC after: 3 days	2007-04-04 21:27:01 +00:00
rwatson	765a83fd79	Replace custom file descriptor array sleep lock constructed using a mutex and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff	2007-04-04 09:11:34 +00:00
kmacy	16fabe1e36	fix typo	2007-04-04 00:11:22 +00:00
kmacy	f66541917b	style fixes and make sure that the lock is treated as released in the sharers == 0 case not that this is somewhat racy because a new sharer can come in while we're updating stats	2007-04-04 00:01:05 +00:00
kmacy	836059f8b9	Fixes to sx for newsx - fix recursed case and move out of inline Submitted by: Attilio Rao <attilio@freebsd.org>	2007-04-03 22:58:21 +00:00
kmacy	58561e082b	move lock_profile calls out of the macros and into kern_mutex.c add check for mtx_recurse == 0 when releasing sleep lock	2007-04-03 22:52:31 +00:00
kmacy	8913ddf202	skip call to _lock_profile_obtain_lock_success entirely if acquisition time is non-zero (i.e. recursing or adding sharers)	2007-04-03 18:36:27 +00:00
pjd	aa77196921	Add root_mount_wait() function which can be used to wait until the root file system is mounted. This is useful for kernel modules loaded from /boot/loader.conf, that have to access file system.	2007-04-03 11:45:28 +00:00
jhb	b9a3a5afc7	Fix a fd leak in socketpair(): - Close the new file objects created during socketpair() if the copyout of the new file descriptors fails. - Add a test to the socketpair regression test for this edge case.	2007-04-02 19:15:47 +00:00
jhb	6d9ee961ef	Don't go to a whole lot of extra work to handle the race where the new file descriptor is closed out from under us in kern_open(). This race is already handled and the file will be closed when kern_open() does an fdrop just before returning.	2007-04-02 13:40:38 +00:00
wkoszek	aae26bbf9f	ng_node and ng_worklist locks both migrated from being spinning locks to adaptive mutexes. Let witness(4) calm down and bring proper types of those locks to the lock order database. Glanced at by: rwatson	2007-04-01 15:48:10 +00:00
pjd	b078229dad	I think the code I'm removing here is completely bogus. vfs_flags field is used for VFCF_* flags which are given at file system driver creation time (via VFS_SET(9)) macro. What this code did was bascially this: If file system registers itself with VFCF_UNICODE flag (stores file names as Unicode), it will gain MNT_SOFTDEP flag (UFS soft-updates). If file system registers itself with VFCF_LOOPBACK flag (aliases some other mounted FS), it will gain MNT_SUIDDIR flag (special handling of SUID on dirs). The latter will be quite dangerous, but those flags are reset later in vfs_domount(). MFC after: 1 month	2007-04-01 13:08:05 +00:00
pjd	c20a93a345	Now that the vdropl() function is public, assert that the vnode interlock is held.	2007-04-01 10:45:32 +00:00
des	b0b258dcad	Make vdropl() public; zfs needs it. There is also plenty of existing file system code (mostly _reclaim()) which look like this: VOP_LOCK(vp); / examine vp / VOP_UNLOCK(vp); vdrop(vp); This can now be rewritten to: VOP_LOCK(vp); / examine vp / vdropl(vp); / will unlock vp */ MFC after: 1 week	2007-03-31 23:57:17 +00:00
jhb	b0b93a3c55	Optimize sx locks to use simple atomic operations for the common cases of obtaining and releasing shared and exclusive locks. The algorithms for manipulating the lock cookie are very similar to that rwlocks. This patch also adds support for exclusive locks using the same algorithm as mutexes. A new sx_init_flags() function has been added so that optional flags can be specified to alter a given locks behavior. The flags include SX_DUPOK, SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature to the similar flags for mutexes. Adaptive spinning on select locks may be enabled by enabling the ADAPTIVE_SX kernel option. Only locks initialized with the SX_ADAPTIVESPIN flag via sx_init_flags() will adaptively spin. The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock() are now performed inline in non-debug kernels. As a result, <sys/sx.h> now requires <sys/lock.h> to be included prior to <sys/sx.h>. The new kernel option SX_NOINLINE can be used to disable the aforementioned inlining in non-debug kernels. The size of struct sx has changed, so the kernel ABI is probably greatly disturbed. MFC after: 1 month Submitted by: attilio Tested by: kris, pjd	2007-03-31 23:23:42 +00:00
pjd	969ba66299	Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.	2007-03-31 22:44:45 +00:00
rwatson	76104e0492	Rather than ignoring any error return from getnewvnode() in nameiinit(), explicitly test and panic. This should not ever happen, but if it does, this is a preferred failure mode to a NULL pointer dereference in kernel. Coverity CID: 1716 Found with: Coverity Prevent(tm)	2007-03-31 16:08:50 +00:00
jhb	602dd12fbc	- Drop memory barriers in rw_try_upgrade(). We don't need an 'acq' memory barrier here as the earlier rw_rlock() already contained one. - Comment fix.	2007-03-30 18:08:55 +00:00
jhb	0d7ad36cd0	- Use lock_init/lock_destroy() to setup the lock_object inside of lockmgr. We can now use LOCK_CLASS() as a stronger check in lockmgr_chain() as a result. This required putting back lk_flags as lockmgr's use of flags conflicted with other flags in lo_flags otherwise. - Tweak 'show lock' output for lockmgr to match sx, rw, and mtx.	2007-03-30 18:07:24 +00:00
wkoszek	78bc9b6aa1	vm_map_delete should be used only internally, by the VM subsystem. Replace it with vm_map_remove, which not only embeds additional check, but also takes care of locking. Reviewed by: alc Approved by: alc, cognet (mentor)	2007-03-29 13:26:13 +00:00

1 2 3 4 5 ...

9892 Commits