freebsd-skq

Author	SHA1	Message	Date
pjd	dbdae53685	Minor style tweaks. Obtained from: WHEEL Systems	2012-12-17 10:51:22 +00:00
pjd	ec8b5e0006	Better variables naming in expand_name() to be more consistent with coredump(). Obtained from: WHEEL Systems	2012-12-17 10:48:10 +00:00
pjd	3e4021fec3	Move expand_name() after process lock is released. This fixed panic where we hold mutex (process lock) and try to obtain sleepable lock (vnode lock in expand_name()). The panic could occur when %I was used in kern.corefile. Additionally we avoid expand_name() overhead when coredumps are disabled. Obtained from: WHEEL Systems	2012-12-16 14:53:27 +00:00
pjd	a198747e4a	Don't add audit record when coredumps are disabled or name cannot be expanded. Discussed with: rwatson Obtained from: WHEEL Systems	2012-12-16 14:24:59 +00:00
pjd	c6ce471b73	Make the check easier to read. Obtained from: WHEEL Systems	2012-12-16 14:14:18 +00:00
pjd	b6b2ae8e26	Use 'cred' variable. Obtained from: WHEEL Systems	2012-12-16 13:56:38 +00:00
kib	2818562651	When mnt_vnode_next_active iterator cannot lock the next vnode and yields, specify the user priority for the yield. Otherwise, a higher-priority (kernel) thread could fall into the priority-inversion with the thread owning the mutex lock. On single-processor machines or UP kernels, do not loop adaptively when the next vnode cannot be locked, instead yield unconditionally. Restructure the iteration initializer and the iterator to remove code duplication. Put the code to fetch and lock a vnode next to the current marker, into the mnt_vnode_next_active() function, and use it instead of repeating the loop. Reported by: hrs, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2012-12-15 02:04:46 +00:00
kib	fc4c8c168f	Remove a special case for XEN, which is erronous and makes vfork(2) behaviour to differ from the documented, only on XEN. If there are any issues with XEN pmap left, they should be fixed in pmap. MFC after: 2 weeks	2012-12-15 02:02:11 +00:00
rmacklem	868038e38d	The group list for a non-default export entry (a host/subnet one) was being copied from the wrong place. This patch fixes that. This could cause access failures for mapped users, when the group permissions were needed. PR: 147998 Submitted by: Christopher Key (cjk32 at cam.ac.uk) MFC after: 2 weeks	2012-12-14 21:49:06 +00:00
alfred	778f6b756b	Cleanup more of the kassert_panic. fix compile warnings on !amd64 and NULL derefs that would happen if kassert_panic() would return.	2012-12-11 07:08:14 +00:00
alfred	10171a489d	Fix WITNESS when INVARIANT_SUPPORT is defined. This fixes tinderbox breakage from r244105. Pointed out by: adrian	2012-12-11 05:59:16 +00:00
alfred	f63b721d8a	Switch the hardwired WITNESS panics to kassert_panic. This is an ongoing effort to provide runtime debug information useful in the field that does not panic existing installations. This gives us the flexibility needed when shipping images to a potentially large audience with WITNESS enabled without worrying about formerly non-fatal LORs hurting a release. Sponsored by: iXsystems	2012-12-11 01:23:50 +00:00
alfred	861ae338d3	back out half of 244098. kern.bootfile needs to be rw for installkernel. Pointed out by: kib, flo	2012-12-11 00:10:20 +00:00
alfred	0edf2075d5	allow KASSERT to enter KDB.	2012-12-10 23:11:26 +00:00
alfred	255d7c4222	make sysctls kern.{bootfile,conftxt} read-only MFC after: 1 month	2012-12-10 23:09:55 +00:00
kib	ac910a885b	Do not yield while owning a mutex. The Giant reacquire in the kern_yield() is problematic than. The owned mutex is the mount interlock, and it is in fact not needed to guarantee the stability of the mount list of active vnodes, so fix the the issue by only taking the mount interlock for MNT_REF and MNT_REL operations. While there, augment the unconditional yield by some amount of spinning [1]. Reported and tested by: pho Reviewed by: attilio Submitted by: attilio [1] MFC after: 3 days	2012-12-10 20:44:09 +00:00
andre	62886c7ac9	Prevent long type overflow of realmem calculation on ILP32 by forcing calculation to be in quad_t space. Fix style issue with second parameter to qmin(). Reported by: alc Reviewed by: bde, alc	2012-12-10 12:19:03 +00:00
kib	d4cecb240c	Do not ignore zero address, possibly returned by the vm_map_find() call. The function indicates a failure by the TRUE return value. To be extra safe, assert that the return value from the following vm_map_insert() indicates success. Fix style issues in the nearby lines, reformulate the comment. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2012-12-10 05:14:04 +00:00
kib	2eb68d2f70	Remove useless comment. MFC after: 3 days	2012-12-09 20:34:11 +00:00
kib	868d4dc870	Fix typo. MFC after: 3 days	2012-12-09 20:26:51 +00:00
attilio	230b7eda6f	Add a comment on why inlining critical_enter() may not be a good idea for the general case. Reviewed by: bde MFC after: 1 week	2012-12-09 04:54:22 +00:00
pjd	24b0c94606	Configure UMA warnings for the following zones: - unp_zone: kern.ipc.maxsockets limit reached - socket_zone: kern.ipc.maxsockets limit reached - zone_mbuf: kern.ipc.nmbufs limit reached - zone_clust: kern.ipc.nmbclusters limit reached - zone_jumbop: kern.ipc.nmbjumbop limit reached - zone_jumbo9: kern.ipc.nmbjumbo9 limit reached - zone_jumbo16: kern.ipc.nmbjumbo16 limit reached Note that those warnings are printed not often than every five minutes and can be globally turned off by setting sysctl/tunable vm.zone_warnings to 0. Discussed on: arch Obtained from: WHEEL Systems MFC after: 2 weeks	2012-12-07 22:30:30 +00:00
pjd	0ae3a2ffee	Make use of the fact that uma_zone_set_max(9) already returns actual limit set.	2012-12-07 22:23:53 +00:00
pjd	d71ec288dc	More style cleanups.	2012-12-07 22:22:04 +00:00
pjd	8cb7b219b4	Style cleanups.	2012-12-07 22:19:41 +00:00
pjd	4a320b1cc2	- Make socket_zone static - it is used only in this file. - Update maxsockets on uma_zone_set_max(). Obtained from: WHEEL Systems	2012-12-07 22:15:51 +00:00
pjd	e7a7ec6407	Style cleanups.	2012-12-07 22:13:33 +00:00
pjd	3702962778	There is no need anymore to include vm/uma.h after r241726. Obtained from: WHEEL Systems	2012-12-07 22:05:42 +00:00
alfred	d4467dc033	Allow KASSERT to log instead of panic. This is to allow debug images to be used without taking down the system when non-fatal asserts are hit. The following sysctls are added: debug.kassert.warn_only: 1 = log, 0 = panic debug.kassert.do_ktr: set to a ktr mask for logging via KTR debug.kassert.do_log: 1 = log, 0 = quiet debug.kassert.warnings: stats, number of kasserts hit debug.kassert.log_panic_at: number of kasserts before we actually panic, 0 = never debug.kassert.log_pps_limit: pps limit for log messages debug.kassert.log_mute_at: stop warning after N kasserts, 0 = never stop debug.kassert.kassert: set this sysctl to trigger a kassert Discussed with: scottl, gnn, marcel Sponsored by: iXsystems	2012-12-07 08:25:08 +00:00
alfred	471892517a	Use uint instead of int for flags exported via sysctl.	2012-12-07 05:55:48 +00:00
kevlo	c71d284884	- according to POSIX, make socket(2) return EAFNOSUPPORT rather than EPROTONOSUPPORT if the address family is not supported. - introduce pffinddomain() to find a domain by family and use it as appropriate. Reviewed by: glebius	2012-12-07 02:22:48 +00:00
davidxu	386e9d0db5	Eliminate superfluous code.	2012-12-06 06:29:08 +00:00
attilio	3145eea56c	Fixup r243901: - As the comment report, CALLOUT_LOCAL_ALLOC cannot be checked directly from the callout flags but might be checked by a cached value. Hence, do so before to actually remove the callout, when needed, in softclock_call_cc(). - In softclock_call_cc() also add a comment in the waiting and deferred migration case explaining that the dereference should be safe because of the migration dereference invariants. Additively: - In softclock_call_cc(), for the deferred migration case, move all the accesses to callout structure after the comment stating the callout must not be destroyed. - For consistency with this last tweak, use cached c_flags for the KASSERT() in the deferred migration case. It is not strictly necessary but this way all the callout accesses happen after the above mentioned comment, improving consistency. Pointy hat to: me Sponsored by: Isilon Systems / EMC Corporation Reviewed by: kib MFC after: 2 weeks X-MFC: 243901	2012-12-05 22:32:12 +00:00
kib	efc06bd801	The softclock_call_cc() is executing with the callout already removed from the callwheel. Calculate the cc->cc_next before removing the callout, otherwise the code followed the invalid tailq links. After this, make softclock_call_cc() return void, since it always return cc->cc_next, which is immediately available to the softclock() anyway. This also allows to eliminate a label under #ifdef SMP. Remove the assignment of cc->cc_next from callout_cc_del(), since the function is called with the callout already removed from callwheel. If cancelling the migration, also clear the CALLOUT_DFRMIGRATION flag. Postpone the free of the timeout(9) allocated callouts after the migration checks are done. Add some more strict asserts about the state of the callout in callout_call_cc(). Reviewed by: attilio Reported and tested by: pho (previous version) MFC after: 2 weeks	2012-12-05 19:02:22 +00:00
attilio	97d8ae3890	Check for lockmgr recursion in case of disown and downgrade and panic also in !debugging kernel rather than having "undefined" behaviour. Tested by: avg MFC after: 1 week	2012-12-05 15:11:01 +00:00
glebius	8e20fa5ae9	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
kib	54d4ef7790	Fix a race between kern_setitimer() and realitexpire(), where the callout is started before kern_setitimer() acquires process mutex, but looses a race and kern_setitimer() gets the process mutex before the callout. Then, assuming that new specified struct itimerval has it_interval zero, but it_value non-zero, the callout, after it starts executing again, clears p->p_realtimer.it_value, but kern_setitimer() already rescheduled the callout. As the result of the race, both p_realtimer is zero, and the callout is rescheduled. Then, in the exit1(), the exit code sees that it_value is zero and does not even try to stop the callout. This allows the struct proc to be reused and eventually the armed callout is re-initialized. The consequence is the corrupted callwheel tailq. Use process mutex to interlock the callout start, which fixes the race. Reported and tested by: pho Reviewed by: jhb MFC after: 2 weeks	2012-12-04 20:49:39 +00:00
kib	f366a4aadf	Do not allocate buffer of the 255 bytes length on the stack. Reported and tested by: sig6247@gmail.com MFC after: 1 week	2012-12-04 20:49:04 +00:00
alfred	d7137f4490	replace bit shifting loop with 1<<fls(n), improve comments. Reviewed by: davide	2012-12-04 05:28:20 +00:00
kib	69dfcf0272	The vnode_free_list_mtx is required unconditionally when iterating over the active list. The mount interlock is not enough to guarantee the validity of the tailq link pointers. The __mnt_vnode_next_active() and __mnt_vnode_first_active() active lists iterators helper functions did not provided the neccessary stability for the list, allowing the iterators to pick garbage. This was uncovered after the r243599 made the active list iterators non-nop. Since a vnode interlock is before the vnode_free_list_mtx, obtain the vnode ilock in the non-blocking manner when under vnode_free_list_mtx, and restart iteration after the yield if the lock attempt failed. Assert that a vnode found on the list is active, and assert that the helpers return the vnode with interlock owned. Reported and tested by: pho MFC after: 1 week	2012-12-03 22:15:16 +00:00
pjd	c6ea39d1ef	Fix one more compilation issue.	2012-12-01 08:59:36 +00:00
pjd	632d7191a2	IFp4 @208451: Fix path handling for *at() syscalls. Before the change directory descriptor was totally ignored, so the relative path argument was appended to current working directory path and not to the path provided by descriptor, thus wrong paths were stored in audit logs. Now that we use directory descriptor in vfs_lookup, move AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where we hold file descriptors table lock, so we are sure paths will be resolved according to the same directory in audit record and in actual operation. Sponsored by: FreeBSD Foundation (auditdistd) Reviewed by: rwatson MFC after: 2 weeks	2012-11-30 23:18:49 +00:00
pjd	a3ce94291c	IFp4 @208450: Remove redundant call to AUDIT_ARG_UPATH1(). Path will be remembered by the following NDINIT(AUDITVNODE1) call. Sponsored by: FreeBSD Foundation (auditdistd) MFC after: 2 weeks	2012-11-30 22:49:28 +00:00
andre	051df828de	Using a long is the wrong type to represent the realmem and maxmbufmem variable as they may overflow on i386/PAE and i386 with > 2GB RAM. Use 64bit quad_t instead. It has broader kernel infrastructure support with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types. Pointed out by: alc, bde	2012-11-29 07:30:42 +00:00
andre	b0ff37a790	Complete r243631 by applying the remainder of kern_mbuf.c that got lost while merging into the commit tree. MFC after: 1 month X-MFC-with: r243631	2012-11-27 23:16:56 +00:00
andre	48786251b2	Fix r243627 by testing against the head socket instead of the socket just created. MFC after: 1 week X-MFC-with: r243627	2012-11-27 22:35:48 +00:00
andre	d85b510608	Base the mbuf related limits on the available physical memory or kernel memory, whichever is lower. The overall mbuf related memory limit must be set so that mbufs (and clusters of various sizes) can't exhaust physical RAM or KVM. The limit is set to half of the physical RAM or KVM (whichever is lower) as the baseline. In any normal scenario we want to leave at least half of the physmem/kvm for other kernel functions and userspace to prevent it from swapping too easily. Via a tunable kern.maxmbufmem the limit can be upped to at most 3/4 of physmem/kvm. At the same time divorce maxfiles from maxusers and set maxfiles to physpages / 8 with a floor based on maxusers. This way busy servers can make use of the significantly increased mbuf limits with a much larger number of open sockets. Tidy up ordering in init_param2() and check up on some users of those values calculated here. Out of the overall mbuf memory limit 2K clusters and 4K (page size) clusters to get 1/4 each because these are the most heavily used mbuf sizes. 2K clusters are used for MTU 1500 ethernet inbound packets. 4K clusters are used whenever possible for sends on sockets and thus outbound packets. The larger cluster sizes of 9K and 16K are limited to 1/6 of the overall mbuf memory limit. When jumbo MTU's are used these large clusters will end up only on the inbound path. They are not used on outbound, there it's still 4K. Yes, that will stay that way because otherwise we run into lots of complications in the stack. And it really isn't a problem, so don't make a scene. Normal mbufs (256B) weren't limited at all previously. This was problematic as there are certain places in the kernel that on allocation failure of clusters try to piece together their packet from smaller mbufs. The mbuf limit is the number of all other mbuf sizes together plus some more to allow for standalone mbufs (ACK for example) and to send off a copy of a cluster. Unfortunately there isn't a way to set an overall limit for all mbuf memory together as UMA doesn't support such a limiting. NB: Every cluster also has an mbuf associated with it. Two examples on the revised mbuf sizing limits: 1GB KVM: 512MB limit for mbufs 419,430 mbufs 65,536 2K mbuf clusters 32,768 4K mbuf clusters 9,709 9K mbuf clusters 5,461 16K mbuf clusters 16GB RAM: 8GB limit for mbufs 33,554,432 mbufs 1,048,576 2K mbuf clusters 524,288 4K mbuf clusters 155,344 9K mbuf clusters 87,381 16K mbuf clusters These defaults should be sufficient for even the most demanding network loads. MFC after: 1 month	2012-11-27 21:19:58 +00:00
andre	e323c31816	Fix a race on listen socket teardown where while draining the accept queues a new socket/connection may be added to the queue due to a race on the ACCEPT_LOCK. The submitted patch is slightly changed in comments, teardown and locking order and extended with KASSERT's. Submitted by: Vijay Singh <vijju.singh-at-gmail-dot-com> Found by: His team. MFC after: 1 week	2012-11-27 20:04:52 +00:00
pjd	e4de9f38a2	Add kern.capmode_coredump sysctl/tunable to allow processes in capability mode to dump core. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:38:11 +00:00
pjd	7a831b4b8c	- Add NOCAPCHECK flag to namei that allows lookup to work even if the process is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag. This functionality will be used to enable core dumps for sandboxed processes. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks	2012-11-27 10:32:35 +00:00

1 2 3 4 5 ...

12969 Commits