freebsd-skq

Author	SHA1	Message	Date
Alfred Perlstein	17ebe960a6	Do not autotune ncallout to be greater than 18508. When maxusers was unrestricted and maxfiles was allowed to autotune much higher the result was that ncallout which was based on maxfiles and maxproc grew much higher than was needed. To fix this clip autotuning to the same number we would get with the old maxusers algorithm which would stop scaling at 384 maxusers. Growing ncalout higher is not likely to be needed since most consumers of timeout(9) are gone and any higher value for ncallout causes the callwheel hashes to be much larger than will even be needed for most applications. MFC after: 1 month Reviewed by: mav	2013-01-15 19:26:17 +00:00
Andrey Zonov	0165f660c1	- Detect when we are in KVM. Silence on: emulation Approved by: kib (mentor) MFC after: 1 week	2013-01-15 14:05:59 +00:00
Konstantin Belousov	10b4bb0b33	Add a trivial comment to record the proper commit log for r245407: Set the v_hash for a new vnode in the getnewvnode() to the value calculated based on the vnode structure address. Filesystems using vfs_hash_insert() override the v_hash using the standard formula of (inode_number + mnt_hashseed). For other filesystems, the initialization allows the vfs_hash_index() to provide useful hash too. Suggested, reviewed and tested by: peter Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:52:23 +00:00
Konstantin Belousov	a41df84820	diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 7c243b6..0bdaf36 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -279,6 +279,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, #define VSHOULDFREE(vp) (!((vp)->v_iflag & VI_FREE) && !(vp)->v_holdcnt) #define VSHOULDBUSY(vp) (((vp)->v_iflag & VI_FREE) && (vp)->v_holdcnt) +static int vnsz2log; /* * Initialize the vnode management data structures. @@ -293,6 +294,7 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLAG_RW, static void vntblinit(void dummy __unused) { + u_int i; int physvnodes, virtvnodes; / @@ -332,6 +334,9 @@ vntblinit(void dummy __unused) syncer_maxdelay = syncer_mask + 1; mtx_init(&sync_mtx, "Syncer mtx", NULL, MTX_DEF); cv_init(&sync_wakeup, "syncer"); + for (i = 1; i <= sizeof(struct vnode); i <<= 1) + vnsz2log++; + vnsz2log--; } SYSINIT(vfs, SI_SUB_VFS, SI_ORDER_FIRST, vntblinit, NULL); @@ -1067,6 +1072,14 @@ alloc: } rangelock_init(&vp->v_rl); + / + * For the filesystems which do not use vfs_hash_insert(), + * still initialize v_hash to have vfs_hash_index() useful. + * E.g., nullfs uses vfs_hash_index() on the lower vnode for + * its own hashing. + / + vp->v_hash = (uintptr_t)vp >> vnsz2log; + vpp = vp; return (0); }	2013-01-14 05:42:54 +00:00
Konstantin Belousov	f6af8e375c	Add exported vfs_hash_index() function, which calculates the canonical pre-masked hash for the given vnode. The function assumes that vp->v_hash is initialized by the filesystem vnode instantiation function. At the moment, it is only done if filesystem uses vfs_hash_insert(). Reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:41:40 +00:00
Konstantin Belousov	7b982bc831	Rename vfs_hash_index() to vfs_hash_bucket(). Reviewed by: peter Tested by: peter, pho Sponsored by: The FreeBSD Foundation MFC after: 5 days	2013-01-14 05:40:21 +00:00
Konstantin Belousov	ddd6b3fc33	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation	2013-01-11 06:08:32 +00:00
Mateusz Guzik	43287e2753	lockmgr: unlock interlock (if requested) when dealing with upgrade/downgrade requests for LK_NOSHARE locks, just like for shared locks. PR: kern/174969 Reviewed by: attilio MFC after: 1 week	2013-01-06 21:47:59 +00:00
Konstantin Belousov	a545089ed5	Protect the p->p_pgrp dereference with the process lock. MFC after: 3 days	2013-01-06 15:10:10 +00:00
Neel Natu	a09580e7a3	Teach the kernel to recognize that it is executing inside a bhyve virtual machine. Obtained from: NetApp	2013-01-05 19:18:50 +00:00
Benjamin Kaduk	5e9723e271	Fix some minor inaccuracies introduced in r243251. Also correct the comment in kern_synch.c which was the source of the problematic text. Reviewed by: kib (previous version) Approved by: hrs (mentor)	2013-01-05 00:23:26 +00:00
David Xu	eea8d86d4d	Revert revision 244760 because strncpy pads trailing space with zero, this prevents kernel data from being leaked. Noticed by: Joerg Sonnenberger < joerg at britannica dot bec dot de >	2013-01-04 11:11:12 +00:00
Konstantin Belousov	d1c5e3f8b0	Remove the deprecated MNT_VNODE_FOREACH interface. Use the MNT_VNODE_FOREACH_ALL instead.	2013-01-03 19:02:52 +00:00
Konstantin Belousov	f99cb34c4f	The process_deferred_inactive() function locks the vnodes of the ufs mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-01-01 16:14:48 +00:00
Konstantin Belousov	91e9474552	Make it possible to atomically resume writes on the mount and account the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD	2012-12-28 23:08:30 +00:00
Oleksandr Tymoshenko	7fc3ae51f3	Fix build on ARM (and probably other platforms)	2012-12-28 06:52:53 +00:00
David Xu	9d4bf0db7c	Use strlcpy to NULL-terminate error message even if user provided a short buffer.	2012-12-28 02:43:33 +00:00
Attilio Rao	c92c859b7b	Fixup r244240: mp_ncpus will be 1 also in the !SMP and smp_disabled=1 case. There is no point in optimizing further the code and use a TRUE litteral for a path that does heavyweight stuff anyway (like lock acq), at the price of obfuscated code. Use the appropriate check where necessary and remove a macro. Sponsored by: EMC / Isilon storage division MFC after: 3 days	2012-12-26 15:20:32 +00:00
Konstantin Belousov	ad9789f6db	Do not force a writer to the devfs file to drain the buffer writes. Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org> MFC after: 2 weeks	2012-12-23 22:43:27 +00:00
Jaakko Heinonen	b1e1f725e7	Reject spaces and double quotation marks in device names. devctl(4) and devd(8) can't handle names with such characters properly. PR: bin/144736, kern/161912 Discussed with: imp, kib, pjd	2012-12-22 13:33:28 +00:00
Attilio Rao	cd2fe4e632	Fixup r240424: On entering KDB backends, the hijacked thread to run interrupt context can still be idlethread. At that point, without the panic condition, it can still happen that idlethread then will try to acquire some locks to carry on some operations. Skip the idlethread check on block/sleep lock operations when KDB is active. Reported by: jh Tested by: jh MFC after: 1 week	2012-12-22 09:37:34 +00:00
Attilio Rao	b1308d72c2	Fixup r218424: uio_yield() was scaling directly to userland priority. When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week	2012-12-21 13:14:12 +00:00
Dag-Erling Smørgrav	b5471c918f	Rewrite fdgrowtable() so common mortals can actually understand what it does and how, and add comments describing the data structures and explaining how they are managed.	2012-12-20 20:18:27 +00:00
Olivier Houchard	05d9035003	Create an architecture-agnostic buffer pool manager that uses uma(9) to manage a set of power-of-2 sized buffers for bus_dmamem_alloc(). This allows the caller to provide the back-end allocator uma allocator, allowing full control of the memory pages backing the pool. For convenience, it provides an optional builtin allocator that provides pages allocated with the VM_MEMATTR_UNCACHEABLE attribute, for managing pools of DMA buffers for BUS_DMA_COHERENT or BUS_DMA_NOCACHE. This also allows the caller to specify a minimum alignment, and it ensures that all buffers start on a boundary and have a length that's a multiple of that value, to avoid using buffers that trigger partial cache line flushes. Submitted by: Ian Lepore <freebsd@damnhippie.dyndns.org>	2012-12-20 00:34:54 +00:00
Pawel Jakub Dawidek	c345faea5a	Replace expand_name() function with corefile_open() function, which not only returns name, but also vnode of corefile to use. This simplifies the code and closes few races, especially in %I handling. Reviewed by: kib Obtained from: WHEEL Systems	2012-12-19 23:59:48 +00:00
Pawel Jakub Dawidek	22a5d85aa9	Use correct file permissions when looking for available core file if kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 23:40:02 +00:00
Jeff Roberson	4c44811c9d	- Add new machine parsable KTR macros for timing events. - Use this new format to automatically handle syscalls and VOPs. This changes the earlier format but is still human readable. Sponsored by: EMC / Isilon Storage Division	2012-12-19 20:10:00 +00:00
Jeff Roberson	5b39d5c739	- Correctly handle EWOULDBLOCK in quiesce_cpus Discussed with: mav	2012-12-19 20:08:06 +00:00
Pawel Jakub Dawidek	07a8e07896	The 'flags' argument can be modified in vn_open_cred(), so we need to set it for every loop interation. Pointed out by: kib	2012-12-19 12:14:08 +00:00
Pawel Jakub Dawidek	cc58032c44	Do not audit paths we try when kern.corefile contains %I. Obtained from: WHEEL Systems	2012-12-19 12:12:53 +00:00
Pawel Jakub Dawidek	29146f1a7a	Style cleanups.	2012-12-19 12:10:14 +00:00
Pawel Jakub Dawidek	086053a370	The expand_name() function isn't called with the process lock held anymore, so we can safely use malloc(M_WAITOK) now. Pointed out by: kib	2012-12-19 12:00:09 +00:00
Mateusz Guzik	af3c786c47	prison_racct_detach can be called for not fully initialized jail, so make it check that the jail has racct before doing anything PR: kern/174436 Reviewed by: trasz MFC after: 3 days	2012-12-18 18:34:36 +00:00
Andrey Zonov	5eb0d2838c	- Add sysctl to allow unprivileged users to call mlock(2)-family system calls and turn it on. - Do not allow to call them inside jail. [1] Pointed out by: trasz [1] Reviewed by: avg Approved by: kib (mentor) MFC after: 1 week	2012-12-18 07:36:45 +00:00
Pawel Jakub Dawidek	f06f465db7	Minor style tweaks. Obtained from: WHEEL Systems	2012-12-17 10:51:22 +00:00
Pawel Jakub Dawidek	c52ff61196	Better variables naming in expand_name() to be more consistent with coredump(). Obtained from: WHEEL Systems	2012-12-17 10:48:10 +00:00
Pawel Jakub Dawidek	dd57ce87eb	Move expand_name() after process lock is released. This fixed panic where we hold mutex (process lock) and try to obtain sleepable lock (vnode lock in expand_name()). The panic could occur when %I was used in kern.corefile. Additionally we avoid expand_name() overhead when coredumps are disabled. Obtained from: WHEEL Systems	2012-12-16 14:53:27 +00:00
Pawel Jakub Dawidek	2ce1b32df2	Don't add audit record when coredumps are disabled or name cannot be expanded. Discussed with: rwatson Obtained from: WHEEL Systems	2012-12-16 14:24:59 +00:00
Pawel Jakub Dawidek	7e73ee85ab	Make the check easier to read. Obtained from: WHEEL Systems	2012-12-16 14:14:18 +00:00
Pawel Jakub Dawidek	b039f8c2aa	Use 'cred' variable. Obtained from: WHEEL Systems	2012-12-16 13:56:38 +00:00
Konstantin Belousov	14df601e47	When mnt_vnode_next_active iterator cannot lock the next vnode and yields, specify the user priority for the yield. Otherwise, a higher-priority (kernel) thread could fall into the priority-inversion with the thread owning the mutex lock. On single-processor machines or UP kernels, do not loop adaptively when the next vnode cannot be locked, instead yield unconditionally. Restructure the iteration initializer and the iterator to remove code duplication. Put the code to fetch and lock a vnode next to the current marker, into the mnt_vnode_next_active() function, and use it instead of repeating the loop. Reported by: hrs, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2012-12-15 02:04:46 +00:00
Konstantin Belousov	d4015944e7	Remove a special case for XEN, which is erronous and makes vfork(2) behaviour to differ from the documented, only on XEN. If there are any issues with XEN pmap left, they should be fixed in pmap. MFC after: 2 weeks	2012-12-15 02:02:11 +00:00
Rick Macklem	f1c4014cd5	The group list for a non-default export entry (a host/subnet one) was being copied from the wrong place. This patch fixes that. This could cause access failures for mapped users, when the group permissions were needed. PR: 147998 Submitted by: Christopher Key (cjk32 at cam.ac.uk) MFC after: 2 weeks	2012-12-14 21:49:06 +00:00
Alfred Perlstein	15d32bd543	Cleanup more of the kassert_panic. fix compile warnings on !amd64 and NULL derefs that would happen if kassert_panic() would return.	2012-12-11 07:08:14 +00:00
Alfred Perlstein	c2c5ede903	Fix WITNESS when INVARIANT_SUPPORT is defined. This fixes tinderbox breakage from r244105. Pointed out by: adrian	2012-12-11 05:59:16 +00:00
Alfred Perlstein	6b6bd3b704	Switch the hardwired WITNESS panics to kassert_panic. This is an ongoing effort to provide runtime debug information useful in the field that does not panic existing installations. This gives us the flexibility needed when shipping images to a potentially large audience with WITNESS enabled without worrying about formerly non-fatal LORs hurting a release. Sponsored by: iXsystems	2012-12-11 01:23:50 +00:00
Alfred Perlstein	d3bfafb4f6	back out half of 244098. kern.bootfile needs to be rw for installkernel. Pointed out by: kib, flo	2012-12-11 00:10:20 +00:00
Alfred Perlstein	a94053ba39	allow KASSERT to enter KDB.	2012-12-10 23:11:26 +00:00
Alfred Perlstein	d06cadae1e	make sysctls kern.{bootfile,conftxt} read-only MFC after: 1 month	2012-12-10 23:09:55 +00:00
Konstantin Belousov	686ffcaceb	Do not yield while owning a mutex. The Giant reacquire in the kern_yield() is problematic than. The owned mutex is the mount interlock, and it is in fact not needed to guarantee the stability of the mount list of active vnodes, so fix the the issue by only taking the mount interlock for MNT_REF and MNT_REL operations. While there, augment the unconditional yield by some amount of spinning [1]. Reported and tested by: pho Reviewed by: attilio Submitted by: attilio [1] MFC after: 3 days	2012-12-10 20:44:09 +00:00
Andre Oppermann	0060bab556	Prevent long type overflow of realmem calculation on ILP32 by forcing calculation to be in quad_t space. Fix style issue with second parameter to qmin(). Reported by: alc Reviewed by: bde, alc	2012-12-10 12:19:03 +00:00
Konstantin Belousov	5d439a2957	Do not ignore zero address, possibly returned by the vm_map_find() call. The function indicates a failure by the TRUE return value. To be extra safe, assert that the return value from the following vm_map_insert() indicates success. Fix style issues in the nearby lines, reformulate the comment. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2012-12-10 05:14:04 +00:00
Konstantin Belousov	17cb8cfc31	Remove useless comment. MFC after: 3 days	2012-12-09 20:34:11 +00:00
Konstantin Belousov	796fa4fb86	Fix typo. MFC after: 3 days	2012-12-09 20:26:51 +00:00
Attilio Rao	e68ccbe85e	Add a comment on why inlining critical_enter() may not be a good idea for the general case. Reviewed by: bde MFC after: 1 week	2012-12-09 04:54:22 +00:00
Pawel Jakub Dawidek	6e0b674628	Configure UMA warnings for the following zones: - unp_zone: kern.ipc.maxsockets limit reached - socket_zone: kern.ipc.maxsockets limit reached - zone_mbuf: kern.ipc.nmbufs limit reached - zone_clust: kern.ipc.nmbclusters limit reached - zone_jumbop: kern.ipc.nmbjumbop limit reached - zone_jumbo9: kern.ipc.nmbjumbo9 limit reached - zone_jumbo16: kern.ipc.nmbjumbo16 limit reached Note that those warnings are printed not often than every five minutes and can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0. Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks	2012-12-07 22:30:30 +00:00
Pawel Jakub Dawidek	45fe0bf7e4	Make use of the fact that uma_zone_set_max(9) already returns actual limit set.	2012-12-07 22:23:53 +00:00
Pawel Jakub Dawidek	4007b61cde	More style cleanups.	2012-12-07 22:22:04 +00:00
Pawel Jakub Dawidek	b0b1402537	Style cleanups.	2012-12-07 22:19:41 +00:00
Pawel Jakub Dawidek	94b0ae5d62	- Make socket_zone static - it is used only in this file. - Update maxsockets on uma_zone_set_max(). Obtained from: WHEEL Systems	2012-12-07 22:15:51 +00:00
Pawel Jakub Dawidek	68412f4179	Style cleanups.	2012-12-07 22:13:33 +00:00
Pawel Jakub Dawidek	0b746181a2	There is no need anymore to include vm/uma.h after r241726. Obtained from: WHEEL Systems	2012-12-07 22:05:42 +00:00
Alfred Perlstein	3945a96431	Allow KASSERT to log instead of panic. This is to allow debug images to be used without taking down the system when non-fatal asserts are hit. The following sysctls are added: debug.kassert.warn_only: 1 = log, 0 = panic debug.kassert.do_ktr: set to a ktr mask for logging via KTR debug.kassert.do_log: 1 = log, 0 = quiet debug.kassert.warnings: stats, number of kasserts hit debug.kassert.log_panic_at: number of kasserts before we actually panic, 0 = never debug.kassert.log_pps_limit: pps limit for log messages debug.kassert.log_mute_at: stop warning after N kasserts, 0 = never stop debug.kassert.kassert: set this sysctl to trigger a kassert Discussed with: scottl, gnn, marcel Sponsored by: iXsystems	2012-12-07 08:25:08 +00:00
Alfred Perlstein	3356d129ad	Use uint instead of int for flags exported via sysctl.	2012-12-07 05:55:48 +00:00
Kevin Lo	b08d12d9be	- according to POSIX, make socket(2) return EAFNOSUPPORT rather than EPROTONOSUPPORT if the address family is not supported. - introduce pffinddomain() to find a domain by family and use it as appropriate. Reviewed by: glebius	2012-12-07 02:22:48 +00:00
David Xu	3f6bad0181	Eliminate superfluous code.	2012-12-06 06:29:08 +00:00
Attilio Rao	bdf9120c16	Fixup r243901: - As the comment report, CALLOUT_LOCAL_ALLOC cannot be checked directly from the callout flags but might be checked by a cached value. Hence, do so before to actually remove the callout, when needed, in softclock_call_cc(). - In softclock_call_cc() also add a comment in the waiting and deferred migration case explaining that the dereference should be safe because of the migration dereference invariants. Additively: - In softclock_call_cc(), for the deferred migration case, move all the accesses to callout structure after the comment stating the callout must not be destroyed. - For consistency with this last tweak, use cached c_flags for the KASSERT() in the deferred migration case. It is not strictly necessary but this way all the callout accesses happen after the above mentioned comment, improving consistency. Pointy hat to: me Sponsored by: Isilon Systems / EMC Corporation Reviewed by: kib MFC after: 2 weeks X-MFC: 243901	2012-12-05 22:32:12 +00:00
Konstantin Belousov	eb8a718686	The softclock_call_cc() is executing with the callout already removed from the callwheel. Calculate the cc->cc_next before removing the callout, otherwise the code followed the invalid tailq links. After this, make softclock_call_cc() return void, since it always return cc->cc_next, which is immediately available to the softclock() anyway. This also allows to eliminate a label under #ifdef SMP. Remove the assignment of cc->cc_next from callout_cc_del(), since the function is called with the callout already removed from callwheel. If cancelling the migration, also clear the CALLOUT_DFRMIGRATION flag. Postpone the free of the timeout(9) allocated callouts after the migration checks are done. Add some more strict asserts about the state of the callout in callout_call_cc(). Reviewed by: attilio Reported and tested by: pho (previous version) MFC after: 2 weeks	2012-12-05 19:02:22 +00:00
Attilio Rao	1c7d98d0df	Check for lockmgr recursion in case of disown and downgrade and panic also in !debugging kernel rather than having "undefined" behaviour. Tested by: avg MFC after: 1 week	2012-12-05 15:11:01 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Konstantin Belousov	f7e50ea722	Fix a race between kern_setitimer() and realitexpire(), where the callout is started before kern_setitimer() acquires process mutex, but looses a race and kern_setitimer() gets the process mutex before the callout. Then, assuming that new specified struct itimerval has it_interval zero, but it_value non-zero, the callout, after it starts executing again, clears p->p_realtimer.it_value, but kern_setitimer() already rescheduled the callout. As the result of the race, both p_realtimer is zero, and the callout is rescheduled. Then, in the exit1(), the exit code sees that it_value is zero and does not even try to stop the callout. This allows the struct proc to be reused and eventually the armed callout is re-initialized. The consequence is the corrupted callwheel tailq. Use process mutex to interlock the callout start, which fixes the race. Reported and tested by: pho Reviewed by: jhb MFC after: 2 weeks	2012-12-04 20:49:39 +00:00
Konstantin Belousov	9bdf6ccab3	Do not allocate buffer of the 255 bytes length on the stack. Reported and tested by: sig6247@gmail.com MFC after: 1 week	2012-12-04 20:49:04 +00:00
Alfred Perlstein	922314f018	replace bit shifting loop with 1<<fls(n), improve comments. Reviewed by: davide	2012-12-04 05:28:20 +00:00
Konstantin Belousov	07840861b1	The vnode_free_list_mtx is required unconditionally when iterating over the active list. The mount interlock is not enough to guarantee the validity of the tailq link pointers. The __mnt_vnode_next_active() and __mnt_vnode_first_active() active lists iterators helper functions did not provided the neccessary stability for the list, allowing the iterators to pick garbage. This was uncovered after the r243599 made the active list iterators non-nop. Since a vnode interlock is before the vnode_free_list_mtx, obtain the vnode ilock in the non-blocking manner when under vnode_free_list_mtx, and restart iteration after the yield if the lock attempt failed. Assert that a vnode found on the list is active, and assert that the helpers return the vnode with interlock owned. Reported and tested by: pho MFC after: 1 week	2012-12-03 22:15:16 +00:00
Pawel Jakub Dawidek	8909f88d28	Fix one more compilation issue.	2012-12-01 08:59:36 +00:00
Pawel Jakub Dawidek	499f0f4d55	IFp4 @208451: Fix path handling for *at() syscalls. Before the change directory descriptor was totally ignored, so the relative path argument was appended to current working directory path and not to the path provided by descriptor, thus wrong paths were stored in audit logs. Now that we use directory descriptor in vfs_lookup, move AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where we hold file descriptors table lock, so we are sure paths will be resolved according to the same directory in audit record and in actual operation. Sponsored by: FreeBSD Foundation (auditdistd) Reviewed by: rwatson MFC after: 2 weeks	2012-11-30 23:18:49 +00:00
Pawel Jakub Dawidek	e1216d1335	IFp4 @208450: Remove redundant call to AUDIT_ARG_UPATH1(). Path will be remembered by the following NDINIT(AUDITVNODE1) call. Sponsored by: FreeBSD Foundation (auditdistd) MFC after: 2 weeks	2012-11-30 22:49:28 +00:00
Andre Oppermann	df905a2bd3	Using a long is the wrong type to represent the realmem and maxmbufmem variable as they may overflow on i386/PAE and i386 with > 2GB RAM. Use 64bit quad_t instead. It has broader kernel infrastructure support with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types. Pointed out by: alc, bde	2012-11-29 07:30:42 +00:00
Andre Oppermann	416a434cd0	Complete r243631 by applying the remainder of kern_mbuf.c that got lost while merging into the commit tree. MFC after: 1 month X-MFC-with: r243631	2012-11-27 23:16:56 +00:00
Andre Oppermann	358c7f47da	Fix r243627 by testing against the head socket instead of the socket just created. MFC after: 1 week X-MFC-with: r243627	2012-11-27 22:35:48 +00:00
Andre Oppermann	ead46972a4	Base the mbuf related limits on the available physical memory or kernel memory, whichever is lower. The overall mbuf related memory limit must be set so that mbufs (and clusters of various sizes) can't exhaust physical RAM or KVM. The limit is set to half of the physical RAM or KVM (whichever is lower) as the baseline. In any normal scenario we want to leave at least half of the physmem/kvm for other kernel functions and userspace to prevent it from swapping too easily. Via a tunable kern.maxmbufmem the limit can be upped to at most 3/4 of physmem/kvm. At the same time divorce maxfiles from maxusers and set maxfiles to physpages / 8 with a floor based on maxusers. This way busy servers can make use of the significantly increased mbuf limits with a much larger number of open sockets. Tidy up ordering in init_param2() and check up on some users of those values calculated here. Out of the overall mbuf memory limit 2K clusters and 4K (page size) clusters to get 1/4 each because these are the most heavily used mbuf sizes. 2K clusters are used for MTU 1500 ethernet inbound packets. 4K clusters are used whenever possible for sends on sockets and thus outbound packets. The larger cluster sizes of 9K and 16K are limited to 1/6 of the overall mbuf memory limit. When jumbo MTU's are used these large clusters will end up only on the inbound path. They are not used on outbound, there it's still 4K. Yes, that will stay that way because otherwise we run into lots of complications in the stack. And it really isn't a problem, so don't make a scene. Normal mbufs (256B) weren't limited at all previously. This was problematic as there are certain places in the kernel that on allocation failure of clusters try to piece together their packet from smaller mbufs. The mbuf limit is the number of all other mbuf sizes together plus some more to allow for standalone mbufs (ACK for example) and to send off a copy of a cluster. Unfortunately there isn't a way to set an overall limit for all mbuf memory together as UMA doesn't support such a limiting. NB: Every cluster also has an mbuf associated with it. Two examples on the revised mbuf sizing limits: 1GB KVM: 512MB limit for mbufs 419,430 mbufs 65,536 2K mbuf clusters 32,768 4K mbuf clusters 9,709 9K mbuf clusters 5,461 16K mbuf clusters 16GB RAM: 8GB limit for mbufs 33,554,432 mbufs 1,048,576 2K mbuf clusters 524,288 4K mbuf clusters 155,344 9K mbuf clusters 87,381 16K mbuf clusters These defaults should be sufficient for even the most demanding network loads. MFC after: 1 month	2012-11-27 21:19:58 +00:00
Andre Oppermann	2c3142c82c	Fix a race on listen socket teardown where while draining the accept queues a new socket/connection may be added to the queue due to a race on the ACCEPT_LOCK. The submitted patch is slightly changed in comments, teardown and locking order and extended with KASSERT's. Submitted by: Vijay Singh <vijju.singh-at-gmail-dot-com> Found by: His team. MFC after: 1 week	2012-11-27 20:04:52 +00:00
Pawel Jakub Dawidek	b0c9d4d70e	Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode to dump core. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:38:11 +00:00
Pawel Jakub Dawidek	f121e3e81d	- Add NOCAPCHECK flag to namei that allows lookup to work even if the process is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag. This functionality will be used to enable core dumps for sandboxed processes. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:32:35 +00:00
Pawel Jakub Dawidek	90b2202145	Regenerate after r243610.	2012-11-27 10:25:03 +00:00
Pawel Jakub Dawidek	8890f5d020	Allow to use kill(2) in capability mode, but process can send a signal only to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT) which was failing in capability mode, so the code was failing back to exit(1). Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:22:40 +00:00
Pawel Jakub Dawidek	b62d05fcf9	Allow to modify kern.sugid_coredump and kern.corefile from loader.conf. Obtained from: WHEEL Systems	2012-11-27 10:16:48 +00:00
Pawel Jakub Dawidek	c320984687	More style fixes.	2012-11-27 10:15:58 +00:00
Pawel Jakub Dawidek	23c6445a4b	Style fixes (mostly whitespaces).	2012-11-27 10:11:54 +00:00
David Xu	3da9ab75f4	Take first active vnode correctly. Reviewed by: kib MFC after: 3 days	2012-11-27 06:07:58 +00:00
Pawel Jakub Dawidek	4f66641749	Look for zombie process only if we were given process id. Reviewed by: kib MFC after: 2 weeks X-MFC-after-or-with: 243142	2012-11-25 19:31:42 +00:00
Andriy Gapon	6898bee9a9	remove stop_scheduler_on_panic knob There has not been any complaints about the default behavior, so there is no need to keep a knob that enables the worse alternative. Now that the hard-stopping of other CPUs is the only behavior, the panic_cpu spinlock-like logic can be dropped, because only a single CPU is supposed to win stop_cpus_hard(other_cpus) race and proceed past that call. MFC after: 1 month	2012-11-25 14:22:08 +00:00
Andriy Gapon	6b991098a7	assert_vop_locked: make the assertion race-free and more efficient this is really a minor improvement for the sake of correctness MFC after: 6 days	2012-11-24 13:11:47 +00:00
Andriy Gapon	4f15bb6730	remove vop_lookup_pre and vop_lookup_post Suggested by: kib MFC after: 5 days	2012-11-22 10:36:10 +00:00
Konstantin Belousov	daee0f0b0b	Schedule garbage collection run for the in-flight rights passed over the unix domain sockets to the next tick, coalescing the serial calls until the collection fires. The thought is that more work for the collector could arise in the near time, allowing to clean more and not spend too much CPU on repeated collection when there is no garbage. Currently the collection task is fired immediately upon unix domain socket close if there are any rights in flight, which caused excessive CPU usage and too long blocking of the threads waiting for unp_list_lock and unp_link_rwlock in write mode. Robert noted that it would be nice if we could find some heuristic by which we decide whether to run GC a bit more quickly. E.g., if the number of UNIX domain sockets is close to its resource limit, but not quite. Reported and tested by: Markus Gebert <markus.gebert@hostpoint.ch> Reviewed by: rwatson MFC after: 2 weeks	2012-11-20 15:45:48 +00:00
Konstantin Belousov	b7c8d2f2f5	Add a special meaning to the negative ticks argument for taskqueue_enqueue_timeout(). Do not rearm the callout if it is already armed and the ticks is negative. Otherwise rearm it to fire in abs(ticks) ticks in the future. The intended use is to call taskqueue_enqueue_timeout() for the given timeout_task with the same negative ticks argument. As result, the task is scheduled to execute not further than abs(ticks) ticks in future, and the consequent enqueues are coalesced until the already scheduled task is finished. Reviewed by: rwatson Tested by: Markus Gebert <markus.gebert@hostpoint.ch> MFC after: 2 weeks	2012-11-20 15:33:48 +00:00
Attilio Rao	973b795b64	insmntque() is always called with the lock held in exclusive mode, then: - assume the lock is held in exclusive mode and remove a moot check about the lock acquisition. - in the destructor remove !MPSAFE specific chunk. Reviewed by: kib MFC after: 2 weeks	2012-11-19 20:43:19 +00:00
Andriy Gapon	ab49c952d9	assert_vop_locked should treat LK_EXCLOTHER as the not locked case ... from a perspective of the current thread. Spotted by: mjg Discussed with: kib MFC after: 18 days	2012-11-19 11:35:56 +00:00
Andriy Gapon	c496727c54	vnode_if: fix locking protocol description for lookup and cachedlookup Also remove the checks from vop_lookup_pre and vop_lookup_post, which are now completely redundant (before this change they were partially redundant). Discussed with: kib MFC after: 10 days	2012-11-19 11:32:56 +00:00
Mateusz Guzik	dd103d4d06	Fix possible fp reference leak in posix_openpt Reviewed by: ed Approved by: trasz (mentor) MFC after: 3 days	2012-11-18 15:48:34 +00:00

1 2 3 4 5 ...

13056 Commits