freebsd-skq

Author	SHA1	Message	Date
mjg	630e712410	mtx: implement thread lock fastpath MFC after: 1 week	2017-10-21 22:40:09 +00:00
mmel	be76b77ce7	Add AT_HWCAP2 ELF auxiliary vector. - allocate value for new AT_HWCAP2 auxiliary vector on all platforms. - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly same way as for AT_HWCAP. MFC after: 1 month Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D12699	2017-10-21 12:05:01 +00:00
markj	ce2d698d24	Avoid the nbp lookup in the final loop iteration in flushbuflist(). The end of the loop must re-lookup the next buf since the bufobj lock is dropped in the loop body. If the lookup fails, the loop is restarted. This mechanism non-obviously also terminates the loop when the end of the buf list is reached. Split up the two loops termination cases to make the code a bit less fragile. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12730	2017-10-20 14:56:13 +00:00
mjg	002d36b454	mtx: fix up UP build after r324778 Reported by: Michael Butler	2017-10-20 14:04:01 +00:00
mjg	7c60734b41	Mark kdb_active as __read_frequently and switch to bool to eat less space.	2017-10-20 04:02:53 +00:00
mjg	abcba1b168	rwlock: reduce lockstat branches in the slowpath MFC after: 1 week	2017-10-20 03:32:42 +00:00
mjg	cb00d2eba3	mtx: stop testing SCHEDULER_STOPPED in kabi funcs for spin mutexes There is nothing panic-breaking to do in the unlock case and the lock case will fallback to the slow path doing the check already. MFC after: 1 week	2017-10-20 00:34:25 +00:00
mjg	96ff69bdc9	mtx: clean up locking spin mutexes 1) shorten the fast path by pushing the lockstat probe to the slow path 2) test for kernel panic only after it turns out we will have to spin, in particular test only after we know we are not recursing MFC after: 1 week	2017-10-20 00:30:35 +00:00
mjg	2b6199af16	sysctl: only take mem lock if oldlen is > 4 * PAGE_SIZE The previous limit of just one page is hit by ps. The entire mechanism should be reworked, if not whacked. It seems the intent is to reduce kernel dos-ability - some handlers wire the amount of memory passed here. Handlers should probably stop wiring in the first place or in the worst case indicate they are doing so so that the check is done only if necessary. It should also probably be a counter, not a lock. MFC after: 1 week	2017-10-19 01:38:31 +00:00
mjg	a5e44043ca	execve: avoid one proc lock/unlock trip unless PTRACE_EXEC is set MFC after: 1 week	2017-10-19 00:46:15 +00:00
mjg	7b0fc84b9d	Tidy up pmc support at execve. The proc-specific check is inherently racy, so the code can just unlock beforehand. MFC after: 1 week	2017-10-19 00:38:14 +00:00
mjg	e611bbff76	sysvsem: check if semu_list has anything on it before grabbing the lock This should get a process-specific support instead. MFC after: 1 week	2017-10-19 00:31:00 +00:00
mjg	8f29191fc8	Don't take Giant for SMP status and cpu topology sysctls. Not only this lock doesn't play any role here, dirtying it slows down other things a little bit as giant-held checks (e.g. DROP_GIANT) are spread all over the kernel. MFC after: 1 week	2017-10-18 22:00:44 +00:00
markj	c87fb69add	Move kernel dump offset tracking into MI code. All of the kernel dump implementations keep track of the current offset ("dumplo") within the dump device. However, except for textdumps, they all write the dump sequentially, so we can reduce code duplication by having the MI code keep track of the current offset. The new dump_append() API can be used to write at the current offset. This is needed to implement support for kernel dump compression in the MI kernel dump code. Also simplify dump_encrypted_write() somewhat: use dump_write() instead of duplicating its bounds checks, and get rid of the redundant offset tracking. Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11722	2017-10-18 15:38:05 +00:00
brooks	fb0dea4f57	Remove mbpool(9) now that it has no consumers. mbpool existed to support NICs with memory interfaces and all remaining comsumers were removed earlier this year with NATM. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10513	2017-10-18 00:18:03 +00:00
markj	9fce266679	Fix a racy VI_DOOMED check in MNT_VNODE_FOREACH_ALL(). MNT_VNODE_FOREACH_ALL() is supposed to avoid returning doomed vnodes, but the VI_DOOMED check it used was done without the vnode interlock held, so it could race with a concurrent vgone(). Submitted by: Don Morris <don.morris@isilon.com> Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12704	2017-10-17 19:41:45 +00:00
avos	17ed4f0710	mbuf(9): unbreak m_fragment() - Fix it by replacing m_cat() with m_prev->m_next = m_new (m_cat() will try to append data - as a result, there will be no fragmentation). - Move some constants out of the loop. Was previously tested with D4077. Differential Revision: https://reviews.freebsd.org/D4090	2017-10-16 21:46:11 +00:00
kib	49a0c8651e	Re-evaluate thread' signal mask after ptracestop(). The stop drops process lock, which allows the signal mask to be changed and our selected signal might become blocked, i.e. should be returned to the process queue instead of delivery. Also, for the existing check of the process no longer having an attached debugger, we should not loose the signal, but requeue it. Reported and tested by: bdrewery Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 20:21:51 +00:00
kib	30cdab3f84	Improve assertion that an ignored or blocked signal is not delivered. Split two conditions into separate asserts. Print additional details, like the signal number and action value. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 20:15:19 +00:00
kib	27e9b9ded2	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 20:11:29 +00:00
mjoras	88e3a903b6	Properly reset the fields in clean_unrhdr. In r324542 I neglected to reset the first and last fields of struct unrhdr. This causes a tmpfs to fail the unr(9) consistency checks with DIAGNOSTIC on. Fix this by resetting the fields by calling init_unrhdr. While here, change a loop to use TAILQ_FOREACH_SAFE since it is more readable and equally fast. Reported by: David Wolfskill <david@catwhisker.org> Approved by: rstone (mentor) Sponsored by: Dell EMC Isilon	2017-10-16 16:14:50 +00:00
tijl	5b4d8cee9d	When a Linux program tries to access a /path the kernel tries /compat/linux/path before /path. Stop following symbolic links when looking up /compat/linux/path so dead symbolic links aren't ignored. This allows syscalls like readlink(2) and lstat(2) to work on such links. And open(2) will return an error now instead of trying /path.	2017-10-15 18:53:21 +00:00
mjg	e99dca871b	mtx: fix up owner_mtx after r324609 Now that MTX_UNOWNED is 0 the test was alwayas false.	2017-10-14 00:47:30 +00:00
alc	e64f2e0930	Address two problems with sendfile(..., SF_NOCACHE) and apply one "optimization". First, sendfile(..., SF_NOCACHE) frees pages without checking whether those pages are mapped. This can leave the system with mappings to free or repurposed pages. Second, a page can be busied between the time of the current busy test and acquiring the object lock. Essentially, the test performed before the object lock is acquired can only be regarded as an optimization to short-circuit further work on the page. It cannot, however, be relied upon to prove that it is safe to free the page. Third, when sendfile(..., SF_NOCACHE) was originally implemented, vm_page_deactivate_noreuse() did not yet exist. Use vm_page_deactivate_noreuse() instead of vm_page_deactivate(), because it comes closer to freeing the page. In collaboration with: glebius Discussed with: gallatin, kib, markj X-MFC after: r324448	2017-10-13 16:31:50 +00:00
avg	13558f906a	remove process and jail directory machinations from dounmount The manipulations done by mountcheckdirs() are not that useful during the unmount, they can bring about unexpected security consequences. Thic change effectively reverts the change in r73241. The change also allows to simplify the handling of rootvnode global variable. Discussed with: mckusick, mjg, kib Reviewed by: trasz MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12366	2017-10-13 09:42:05 +00:00
emaste	a97448391f	regen init_sysent.c r324560	2017-10-12 15:48:37 +00:00
emaste	5d14c78c8e	allow posix_fallocate in capability mode posix_fallocate is logically equivalent to writing zero blocks to the desired file size and there is no reason to prevent calling it in capability mode. posix_fallocate already checked for the CAP_WRITE right, so we merely need to list it in capabilities.conf. Reviewed by: allanjude MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D12640	2017-10-12 15:45:53 +00:00
mjoras	1bac95586a	Add clearing function for unr(9). Previously before you could call unrhdr_delete you needed to individually free every allocated unit. It is useful to be able to tear down the unr without having to go through this process, as it is significantly faster than freeing the individual units. Reviewed by: cem, lidl Approved by: rstone (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12591	2017-10-11 21:53:50 +00:00
kib	49c3eee83d	The th_bintime, th_microtime and th_nanotime members of the timehand all cache the last system time (uptime + boottime). Only the format differs. Do not re-calculate the bintime and simply use the value used to calculate the microtime and nanotime. Group all the updates under the relevant comment. Remove obsoleted XXX part. Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week	2017-10-11 11:03:11 +00:00
sbruno	89fa9e690e	match sendfile() error handling to send(). Sendfile() should match the error checking order of send() which is currently: SBS_CANTSENDMORE so_error SS_ISCONNECTED Submitted by: Jason Eggleston <jason@eggnet.com> Reviewed by: glebius MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12633	2017-10-10 22:21:05 +00:00
sbruno	1b75921a14	Revert r324405 at the request of the submitter pending better solution. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks	2017-10-10 00:32:21 +00:00
glebius	4bbcc4b9b3	Improvements to sendfile(2) mbuf free routine. o Fall back to default m_ext free mech, using function pointer in m_ext_free, and remove sf_ext_free() called directly from mbuf code. Testing on modern CPUs showed no regression. o Provide internally used flag EXT_FLAG_SYNC, to mark that I/O uses SF_SYNC flag. Lack of the flag allows us not to dereference ext_arg2, saving from a cache line miss. o Create function sendfile_free_page() that later will be used, for multi-page mbufs. For now compiler will inline it into sendfile_free_mext(). In collaboration with: gallatin Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 21:06:16 +00:00
glebius	492b1ec8b6	In mb_dupcl() don't copy full m_ext, to avoid cache miss. Respectively, in mb_free_ext() always use fields from the original refcount holding mbuf (see. r296242) mbuf. Cuts another cache miss from mb_free_ext(). However, treat EXT_EXTREF mbufs differently, since they are different - they don't have a refcount holding mbuf. Provide longer comments in m_ext declaration to explain this change and change from r296242. In collaboration with: gallatin Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 20:51:58 +00:00
glebius	a9152a7f99	Shorten list of arguments to mbuf external storage freeing function. All of these arguments are stored in m_ext, so there is no reason to pass them in the argument list. Not all functions need the second argument, some don't even need the first one. The second argument lives in next cache line, so not dereferencing it is a performance gain. This was discovered in sendfile(2), which will be covered by next commits. The second goal of this commit is to bring even more flexibility to m_ext mbufs, allowing to create more fields in m_ext, opaque to the generic mbuf code, and potentially set and dereferenced by subsystems. Reviewed by: gallatin, kbowling Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 20:35:31 +00:00
hselasky	69fd8ab233	When showing the sleepqueues from the in-kernel debugger, properly dump all the sendqueues and not just the first one History: It appears that in the commit which introduced the code, r165272, the array indexes of "sq_blocked[0]" and "td_name[i]" were interchanged. In r180927 "td_name[i]" was corrected to "td_name[0]", but "sq_blocked[0]" was left unchanged. PR: 222624 Discussed with: kmacy @ MFC after: 1 week Sponsored by: Mellanox Technologies	2017-10-09 18:33:29 +00:00
alc	6a7e568f99	The recent change to initialization of blists (r324420) relied on '-1' appearing only where the code explicitly set it, but since much of the data was not initialized, '-1' appeared other places too, and led to panics. Clear the allocated data before initializing nonzero values by allocating with M_ZERO. Submitted by: Doug Moore <dougm@rice.edu> Reported by: Oleg V. Nauman <oleg@theweb.org.ua>, cy Tested by: Oleg V. Nauman <oleg@theweb.org.ua> MFC after: 1 week X-MFC with: r324420 Differential Revision: https://reviews.freebsd.org/D12627	2017-10-09 18:19:06 +00:00
alc	008747750e	The blst_radix_init function has two purposes - to compute the number of nodes to allocate for the blist, and to initialize them. The computation can be done much more quickly by identifying the terminating node, if any, at every level of the tree and then summing the number of nodes at each level that precedes the topmost terminator. The initialization can also be done quickly, since settings at the root mark the tree as all-allocated, and only a few terminator nodes need to be marked in the rest of the tree. Eliminate blst_radix_init, and perform its two functions more simply in blist_create. The allocation of the blist takes places in two pieces, but there's no good reason to do so, when a single allocation is sufficient, and simpler. Allocate the blist struct, and the array of nodes associated with it, with a single allocation. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: markj (an earlier version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11968	2017-10-08 22:17:39 +00:00
ian	5cc2194826	Add eventhandler notifications for newbus device attach/detach. The detach case is slightly complicated by the fact that some in-kernel consumers may want to know before a device detaches (so they can release related resources, stop using the device, etc), but the detach can fail. So there are pre- and post-detach notifications for those consumers who need to handle all cases. A couple salient comments from the review, they amount to some helpful documentation about these events, but there's currently no good place for such documentation... Note that in the current newbus locking model, DETACH_BEGIN and DETACH_COMPLETE/FAILED sequence of event handler invocation might interweave with other attach/detach events arbitrarily. The handlers should be prepared for such situations. Also should note that detach may be called after the parent bus knows the hardware has left the building. In-kernel consumers have to be prepared to cope with this race. Differential Revision: https://reviews.freebsd.org/D12557	2017-10-08 17:33:49 +00:00
ian	69fb8e033a	Restore the ability to deregister an eventhandler from within the callback. When the EVENTHANDLER(9) subsystem was created, it was a documented feature that an eventhandler callback function could safely deregister itself. In r200652 that feature was inadvertantly broken by adding drain-wait logic to eventhandler_deregister(), so that it would be safe to unload a module upon return from deregistering its event handlers. There are now 145 callers of EVENTHANDLER_DEREGISTER(), and it's likely many of them are depending on the drain-wait logic that has been in place for 8 years. So instead of creating a separate eventhandler_drain() and adding it to some or all of those 145 call sites, this creates a separate eventhandler_drain_nowait() function for the specific purpose of deregistering a callback from within the running callback. Differential Revision: https://reviews.freebsd.org/D12561	2017-10-08 17:21:16 +00:00
sbruno	692f29cb6e	Check so_error early in sendfile() call. Prior to this patch, if a connection was reset by the remote end, sendfile() would just report ENOTCONN instead of ECONNRESET. Submitted by: Jason Eggleston <jason@eggnet.com> Reviewed by: glebius Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12575	2017-10-07 23:30:57 +00:00
mjg	6523dff125	namecache: factor out ~MAKEENTRY lookups from the common path Lookups of the sort are rare compared to regular ones and succesfull ones result in removing entries from the cache. In the current code buckets are rlocked and a trylock dance is performed, which can fail and cause a restart. Fixing it will require a little bit of surgery and in order to keep the code maintaineable the 2 cases have to split. MFC after: 1 week	2017-10-06 23:05:55 +00:00
markj	08a37c1513	Let stack_create(9) take a malloc flags argument. Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12614	2017-10-06 21:52:28 +00:00
manu	13125fb720	vfs_export_lookup: Fix r324054 When using the default address list nam is still valid, the code in r324054 assumed that is was NULL. Reported by: Guy Yur <guyyur@gmail.com> Tested by: Guy Yur <guyyur@gmail.com>	2017-10-06 09:02:36 +00:00
mjg	6f516b023d	locks: take the number of readers into account when waiting Previous code would always spin once before checking the lock. But a lock with e.g. 6 readers is not going to become free in the duration of once spin even if they start draining immediately. Conservatively perform one for each reader. Note that the total number of allowed spins is still extremely small and is subject to change later. MFC after: 1 week	2017-10-05 19:18:02 +00:00
shurd	bb993c697a	Fix "taskqgroup_attach: setaffinity failed: 3" with iflib drivers Improved logging added in r323879 exposed an error during attach. We need the irq, not the rid to work correctly. em uses shared irqs, so it will use the same irq for TX as RX. bnxt does not use shared irqs, or TX irqs at all, so there's no need to set the TX irq affinity. Reviewed by: sbruno Approved by: sbruno (mentor) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12496	2017-10-05 14:43:30 +00:00
mjg	a059e2d8eb	locks: partially tidy up waiting on readers spin first instant of instantly re-readoing and don't re-read after spinning is finished - the state is already known. Note the code is subject to significant changes later. MFC after: 1 week	2017-10-05 13:01:18 +00:00
avg	341d9df85b	sysctl-s in a module should be accessible only when the module is initialized A sysctl can have a custom handler that may access data that is initialized via SYSINIT(9) or via a module event handler (also invoked via SYSINIT). Thus, it is not safe to allow access to the module's sysctl-s until the initialization is performed. Likewise, we should not allow access to teh sysctl-s after the module is uninitialized. The latter is easy to achieve by properly ordering linker_file_unregister_sysctls and linker_file_sysuninit. The former is not as easy for two reasons: - the initialization may depend on tunables which get set when sysctl-s are registered, so we need to set the tunables before running sysinit-s - the initialization may try to dynamically add more sysctl-s under statically defined sysctl nodes So, this change splits the sysctl setup into two phases. In the first phase the sysctl-s are registered as before but they are disabled and hidden from consumers. In the second phase, done after sysinit-s, normal access to the sysctl-s is enabled. The change should affect only dynamic module loading and unloading after the system boot-up. Nothing changes for sysctl-s compiled into the kernel and sysctl-s in preloaded modules. Discussed with: hselasky, ian, jhb Reviewed by: julian, kib MFC after: 2 weeks Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D12545	2017-10-05 12:32:14 +00:00
glebius	7168fac388	Hide struct socket and struct unpcb from the userland. Violators may define _WANT_SOCKET and _WANT_UNPCB respectively and are not guaranteed for stability of the structures. The violators list is the the usual one: libprocstat(3) and netstat(1) internally and lsof in ports. In struct xunpcb remove the inclusion of kernel structure and add a bunch of spare fields. The xsocket already has socket not included, but add there spares as well. Embed xsockbuf into xsocket. Sort declarations in sys/socketvar.h to separate kernel only from userland available ones. PR: 221820 (exp-run)	2017-10-02 23:29:56 +00:00
alc	513b841b01	Use vm_page_active() rather than directly accessing the page's queue field. Reviewed by: kib, markj MFC after: 2 weeks X-MFC with: r324146	2017-10-02 07:30:21 +00:00
avg	8e8e725268	revert r324166, it has an unrelated change in it	2017-10-01 16:37:54 +00:00

1 2 3 4 5 ...

15682 Commits