freebsd-nq

Author	SHA1	Message	Date
Andrey V. Elsukov	da0770bd57	Fix copy/paste typo. MFC after: 1 week	2013-12-17 16:45:19 +00:00
Attilio Rao	e7a9eed7a8	- Assert for not leaking readers rw locks counter on userland return. - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division	2013-12-17 13:37:02 +00:00
Adrian Chadd	dc3bdd4ad9	Remove the invariants stuff I copy/paste'd from the mbuf code when setting up the UMA zone. This should (a) be correct(er) and (b) it should build on non-amd64. Pointed out by: glebius	2013-12-17 03:06:21 +00:00
Adrian Chadd	73242a5ee1	Migrate the sendfile_sync struct to use a UMA zone rather than M_TEMP. This allows it to be better tracked as well as being able to leverage UMA for more interesting/useful behaviour at a later date. Sponsored by: Netflix, Inc.	2013-12-16 19:31:23 +00:00
Alexander Motin	e37e08c7bf	Fix periodic per-CPU timers startup on boot. Reported by: neel MFC after: 2 weeks	2013-12-16 13:52:18 +00:00
Marcel Moolenaar	15773775f7	Properly drain the TTY when both revoke(2) and close(2) end up closing the TTY. In such a case, ttydev_close() is called multiple times and each time, t_revokecnt is incremented and cv_broadcast() is called for both the t_outwait and t_inwait condition variables. Let's say revoke(2) comes in first and gets to call tty_drain() from ttydev_leave(). Let's say that the revoke comes from init(8) as the result of running "shutdown -r now". Since shutdown prints various messages to the console before announing that the machine will reboot immediately, let's also say that the output queue is not empty and that tty_drain() has something to do. Let's assume this all happens on a 9600 baud serial console, so it takes a time to drain. The shutdown command will exit(2) and as such will end up closing stdout. Let's say this close will come in second, bump t_revokecnt and call tty_wakeup(). This has tty_wait() return prematurely and the next thing that will happen is that the thread doing revoke(2) will flush the TTY. Since the drain wasn't complete, the flush will effectively drop whatever is left in t_outq. This change takes into account that tty_drain() will return ERESTART due to the fact that t_revokecnt was bumped and in that case simply call tty_drain() again. The thread in question is already performing the close so it can safely finish draining the TTY before destroying the TTY structure. Now all messages from shutdown will be printed on the serial console. Obtained from: Juniper Networks, Inc.	2013-12-16 00:50:14 +00:00
Pawel Jakub Dawidek	007e4f41a7	Regenerate after r259438.	2013-12-15 23:20:26 +00:00
Pawel Jakub Dawidek	82845da3fa	Fix syscalls that can be loaded as kernel modules - they were not given the flag allowing to call them from capability mode sandbox. Noticed by: David Drysdale <drysdale@google.com>	2013-12-15 23:19:42 +00:00
Pawel Jakub Dawidek	61a9fc8fe2	Regenerate after r259436.	2013-12-15 23:15:12 +00:00
Pawel Jakub Dawidek	e1e16d2419	Allow for pselect(2) in capability mode. Noticed by: David Drysdale <drysdale@google.com>	2013-12-15 23:14:27 +00:00
Pawel Jakub Dawidek	73a4fbbb39	Forgot to regenerate after r257736.	2013-12-15 23:12:42 +00:00
Mateusz Guzik	374ce66b66	proc exit: don't take PROC_LOCK while freeing rlimits Code wishing to check rlimits of some process should check whether it is exiting first, which current consumers do. MFC after: 2 weeks	2013-12-15 04:11:43 +00:00
Mateusz Guzik	c2a48c0d1b	rlimit: avoid unnecessary copying of rlimits If refcount is 1 just modify rlimits in place. MFC after: 2 weeks	2013-12-13 20:54:45 +00:00
Mateusz Guzik	3318a9c895	rlimit: add and utilize lim_shared MFC after: 2 weeks	2013-12-13 20:53:31 +00:00
Alexander Motin	1cf78c85c5	Create own free list for each of the first 32 possible allocation sizes. In case of 4K allocation quantum that means for allocations up to 128K. With growth of memory fragmentation these lists may grow to quite a large sizes (tenths and hundreds of thousands items). Having in one list items of different sizes in worst case may require full linear list traversal, that may be very expensive. Having lists for items of single size means that unless user specify some alignment or border requirements (that are very rare cases) first item found on the list should satisfy the request. While running SPEC NFS benchmark on top of ZFS on 24-core machine with 84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8% and lock congestion spinning around it from 20% to invisible levels. And that all is by the cost of just 26 more pointers per vmem instance. If at some point our kernel will start to actively use KVA allocations with odd sizes above 128K, something may need to be done to bigger lists also.	2013-12-11 21:48:04 +00:00
Konstantin Belousov	b4aa4fed2b	Fix detection of EOF in kern_physio(). If bio_length was clipped by the excess code in g_io_check(), bio_resid is also truncated by g_io_deliver(). As result, bufdonebio() assigns truncated value to the buffer b_resid field. Use the residual bio_completed to calculate buffer b_resid from b_bcount in bufdonebio(), instead of bio_resid, calculated from bio_length in g_io_deliver(). The issue is seemingly caused by the code rearrange into g_io_check(), which is not present in stable/10. The change still looks as the useful change to have in 10 nevertheless. Reported by: Stefan Hegnauer <stefan.hegnauer@gmx.ch> Tested by: pho, Stefan Hegnauer <stefan.hegnauer@gmx.ch> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-12-10 21:15:18 +00:00
Aleksandr Rybalko	27cf7d04ef	Merge VT(9) project (a.k.a. newcons). Reviewed by: nwhitehorn MFC_to_10_after: re approval Sponsored by: The FreeBSD Foundation	2013-12-05 22:38:53 +00:00
Colin Percival	3b251028e2	Make panic_reboot_wait_time static. Submitted by: jhb	2013-12-05 03:01:41 +00:00
Nathan Whitehorn	fec27435ab	Rename sysctl kern.supported_abis to kern.supported_archs, since it gives the set of MACHINE_ARCH values that can be run.	2013-12-04 16:38:40 +00:00
Pawel Jakub Dawidek	53449c98b7	Break the loop once we know we have the SYF_CAPENABLED flag.	2013-12-04 00:10:37 +00:00
Colin Percival	1cdbb9ed2b	Add a new sysctl / loader tunable kern.panic_reboot_wait_time which defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config setting). Use this now-variable value in place of the defined constant to control how long the system waits after a panic before rebooting.	2013-12-03 21:35:25 +00:00
John Baldwin	5457fa234b	Fix an off-by-one error in r228960. The maximum priority delta provided by SCHED_PRI_TICKS should be SCHED_PRI_RANGE - 1 so that the resulting priority value (before nice adjustment) is between SCHED_PRI_MIN and SCHED_PRI_MAX, inclusive. Submitted by: kib Reported by: pho MFC after: 1 week	2013-12-03 14:50:12 +00:00
Nathan Whitehorn	3cb6654d23	Add new sysctl, kern.supported_abis, containing the list of FreeBSD MACHINE_ARCH values whose binaries this kernel can run. This patch provides a feature requested for implementing pkgng ABI identifiers in a robust way. The list is designed to indicate whether, say, an i386 package can be run on the current system. If kern.supported_abis contains "i386", then the answer is yes. Otherwise, the answer is no. At the moment, this only supports MACHINE_ARCH and MACHINE_ARCH32. As we gain support for more interesting combinations, this needs to become more flexible, possibily through the sysent framework, along with the hw.machine_arch emulation immediately preceding this code in kern_mib.c. Reviewed by: imp MFC after: 3 days	2013-12-02 00:44:36 +00:00
Gleb Smirnoff	ad4804a001	Remove unused variable.	2013-12-01 20:03:00 +00:00
Adrian Chadd	79750e3b36	Migrate the sendfile_sync structure into a public(ish) API in preparation for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_() methods teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc.	2013-12-01 03:53:21 +00:00
Pawel Jakub Dawidek	f2b525e6b9	Make process descriptors standard part of the kernel. rwhod(8) already requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days	2013-11-30 15:08:35 +00:00
Peter Wemm	b5019bc45b	jail_v0.ip_number was always in host byte order. This was handled in one of the many layers of indirection and shims through stable/7 in jail_handle_ips(). When it was cleaned up and unified through kern_jail() for 8.x, the byte order swap was lost. This only matters for ancient binaries that call jail(2) themselves internally.	2013-11-28 19:40:33 +00:00
Andriy Gapon	73f82099ea	add taskqueue_drain_all This API has semantics similar to that of taskqueue_drain but acts on all tasks that might be queued or running on a taskqueue. A caller must ensure that no new tasks are being enqueued otherwise this call would be totally meaningless. For example, if the tasks are enqueued by an interrupt filter then its interrupt must be disabled. MFC after: 10 days	2013-11-28 18:56:34 +00:00
Konstantin Belousov	80c3af4e80	Add an kinfo sysctl to retrieve signal trampoline location for the given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-26 19:47:09 +00:00
Andriy Gapon	55050ab560	use saner calculations in should_yield This is based on feedback from bde. MFC after: 6 days	2013-11-26 14:00:50 +00:00
Andriy Gapon	a776a1c1c5	sdt: add support for solaris/illumos style DTRACE_PROBE macros The new macros are implemented in terms of SDT_PROBE_DEFINE and SDT_PROBE. Probes defined in this way will appear under SDT provider named "sdt". Parameter types are exposed via SDT_PROBE_ARGTYPE. This is something that illumos does not have by default. This kind of SDT probes is already present in ZFS code, so those probes will now be available if KDTRACE_HOOKS options is enabled. A potential future illumos compatibility enhancement is to encode a provider name as a prefix in a probe name. Reviewed by: markj MFC after: 3 weeks X-MFC after: r258622	2013-11-26 08:49:53 +00:00
Andriy Gapon	d9fae5ab88	dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINE In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks	2013-11-26 08:46:27 +00:00
Adrian Chadd	3287361e38	Refactor out the sendfile copyout in order to make vn_sendfile() callable from the kernel. Right now vn_sendfile() can't be called from anything other than a syscall handler _and_ return the number of bytes queued. This simply moves the copyout() to do_sendfile() so that any kernel code can initiate vn_sendfile() outside of a syscall context. Tested: * tiny little sendfile program spitting things out a tcp socket Sponsored by: Netflix, Inc.	2013-11-26 02:02:05 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Konstantin Belousov	7e14088d93	Revert back to use int for the page counts. In vn_io_fault(), the i/o is chunked to pieces limited by integer io_hold_cnt tunable, while vm_fault_quick_hold_pages() takes integer max_count as the upper bound. Rearrange the checks to correctly handle overflowing address arithmetic. Submitted by: bde Tested by: pho Discussed with: alc MFC after: 1 week	2013-11-20 08:45:26 +00:00
Andriy Gapon	4c47024ce0	taskqueue_cancel: garbage collect a write-only variable MFC after: 3 days	2013-11-19 18:45:29 +00:00
Jilles Tjoelker	b20a9aa92a	Fix siginfo_t.si_status for wait6/waitid/SIGCHLD. Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002 Reported by: Dmitrijs Ledkovs Tested by: Dmitrijs Ledkovs	2013-11-17 22:31:23 +00:00
Pawel Jakub Dawidek	ed5848c835	Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had a very hard time to fully understand) with much more intuitive rights: CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2). CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor. Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE. Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-11-15 19:55:35 +00:00
John Baldwin	ba9593f3bd	Don't allow vfs.lorunningspace or vfs.hirunningspace to be set such that lorunningspace is greater than hirunningspace as the system performs terribly if it is mistuned in this fashion. MFC after: 1 week	2013-11-15 15:29:53 +00:00
Pawel Jakub Dawidek	d673cd358c	Change cap_rights_merge(3) and cap_rights_remove(3) to return pointer to the destination cap_rights_t structure. This already matches manual page. MFC after: 3 days	2013-11-14 22:59:20 +00:00
Pawel Jakub Dawidek	98b74f0d1d	Add a note that this file is compiled as part of the kernel and libc. Requested by: kib MFC after: 3 days	2013-11-14 22:57:07 +00:00
Gleb Smirnoff	77badb18cd	Fix a very bad typo from r248887. Submitted by: art	2013-11-14 09:45:33 +00:00
Sergey Kandaurov	903093ecec	Add VM_LAST, a special last element in enum VM_GUEST and use it in CTASSERT to ensure that vm_guest range is covered by vm_guest_sysctl_names. Suggested by: mjg	2013-11-12 20:13:10 +00:00
Alan Cox	63b9ae948d	Eliminate the gratuitous use of mmap(2) flags from the implementation of kern_shmat(). Use a simpler approach to determine whether to pass VMFS_NO_SPACE or VMFS_OPTIMAL_SPACE to vm_map_find().	2013-11-12 17:46:11 +00:00
Konstantin Belousov	d005ed537c	Avoid overflow for the page counts. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2013-11-12 08:47:58 +00:00
Sergey Kandaurov	40b4cb0f54	Set description string for VM_GUEST_HV (HyperV guest). This fixes fallout from r256425. Reported by: Pavel Timofeev <timp87@gmail com> Tested by: Pavel Timofeev <timp87@gmail com> Reviewed by: Roger Pau Monnц╘ MFC after: 3 days	2013-11-11 16:14:33 +00:00
Konstantin Belousov	1bd7d0b7db	If filesystem declares that it supports shared locking for writes, use shared vnode lock for VOP_PUTPAGES() as well. The only such filesystem in the tree is ZFS, and it uses vnode_pager_generic_putpages(), which performs the pageout with VOP_WRITE(). Reviewed by: alc Discussed with: avg Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-11-09 20:36:29 +00:00
Konstantin Belousov	6272798a3f	Both vn_close() and VFS_PROLOGUE() evaluate vp->v_mount twice, without holding the vnode lock; vp->v_mount is checked first for NULL equiality, and then dereferenced if not NULL. If vnode is reclaimed meantime, second dereference would still give NULL. Change VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and MNTK_EXTENDED_SHARED tests into inline functions. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-11-09 20:30:13 +00:00
Hiren Panchasara	16ef0fa833	Fix typo in a comment.	2013-11-08 20:11:15 +00:00
Alan Cox	c70af4875e	As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In other words, every architecture is now auto-sizing the kmem arena. This revision changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and the definition of VM_KMEM_SIZE becomes optional. Replace or eliminate all existing definitions of VM_KMEM_SIZE. With auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling for VM_KMEM_SIZE_MIN on most architectures. Use VM_KMEM_SIZE_MIN for clarity. Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to that of setting the tunable vm.kmem_size. Whereas the macros VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size have been distinct. In particular, whereas VM_KMEM_SIZE was overridden by VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size was not. Remedy this inconsistency. Now, VM_KMEM_SIZE can be used to set the size of the kmem arena at compile-time without that value being overridden by auto-sizing. Update the nearby comments to reflect the kmem submap being replaced by the kmem arena. Stop duplicating the auto-sizing formula in every machine- dependent vmparam.h and place it in kmeminit() where auto-sizing takes place. Reviewed by: kib (an earlier version) Sponsored by: EMC / Isilon Storage Division	2013-11-08 16:25:00 +00:00

1 2 3 4 5 ...

13553 Commits