freebsd-nq

Author	SHA1	Message	Date
Warner Losh	7d41b6f078	Handle RB_POWERCYCLE in the MI part of the kernel Signal init with SIGWINCH in shutdown_nice for RB_POWERCYCLE. Sponsored by: Netflix	2017-10-25 15:30:44 +00:00
Mark Johnston	64a16434d8	Add support for compressed kernel dumps. When using a kernel built with the GZIO config option, dumpon -z can be used to configure gzip compression using the in-kernel copy of zlib. This is useful on systems with large amounts of RAM, which require a correspondingly large dump device. Recovery of compressed dumps is also faster since fewer bytes need to be copied from the dump device. Because we have no way of knowing the final size of a compressed dump until it is written, the kernel will always attempt to dump when compression is configured, regardless of the dump device size. If the dump is aborted because we run out of space, an error is reported on the console. savecore(8) is modified to handle compressed dumps and save them to vmcore.<index>.gz, as it does when given the -z option. A new rc.conf variable, dumpon_flags, is added. Its value is added to the boot-time dumpon(8) invocation that occurs when a dump device is configured in rc.conf. Reviewed by: cem (earlier version) Discussed with: def, rgrimes Relnotes: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11723	2017-10-25 00:51:00 +00:00
Alan Somers	913b932900	Remove artificial restriction on lio_listio's operation count In r322258 I made p1003_1b.aio_listio_max a tunable. However, further investigation shows that there was never any good reason for that limit to exist in the first place. It's used in two completely different ways: * To size a UMA zone, which globally limits the number of concurrent aio_suspend calls. * To artifically limit the number of operations in a single lio_listio call. There doesn't seem to be any memory allocation associated with this limit. This change does two things: * Properly names aio_suspend's UMA zone, and sizes it based on a new constant. * Eliminates the artifical restriction on lio_listio. Instead, lio_listio calls will now be limited by the more generous max_aio_queue_per_proc. The old p1003_1b.aio_listio_max is now an alias for vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with _SC_AIO_LISTIO_MAX. Reported by: bde Reviewed by: jhb MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D12120	2017-10-23 23:12:01 +00:00
Mateusz Guzik	5132933a08	Bump WITNESS_PENDLIST to accomodate sleepq chain bump. Reported by: ngie	2017-10-23 01:00:35 +00:00
Mateusz Guzik	9e68989764	Make the sleepq chain hash size configurable per-arch and bump on amd64. While here cache-align chains. This shortens longest found chain during poudriere -j 80 from 32 to 16. Pushing this higher up will probably require allocation on boot.	2017-10-22 20:43:50 +00:00
Mateusz Guzik	5a17c5524f	sdt: make all sdt probe sites test one variable This saves on cache misses at the expense of a slight grow of .text. Note this is a bandaid for lack of hotpatching. Discussed with: markj	2017-10-22 20:22:23 +00:00
Mateusz Guzik	614e1868d6	Change kdb_active type to u_char. Fixes warnings from gcc and keeps the small size. Perhaps nesting should be moved to another variablle. Reported by: ngie	2017-10-22 13:42:56 +00:00
Enji Cooper	f2374e0cc5	Clean up trailing whitespace in kdb_thr_ctx(..) MFC after: 1 week	2017-10-22 12:12:52 +00:00
Konstantin Belousov	456a73ef01	Remove the support for mknod(S_IFMT), which created dummy vnodes with VBAD type. FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not check for the type and their behaviour is really undefined. The comment claims that this support was done for 'badsect' to flag bad sectors, we do not have such facility in kernel anyway. Reported by: Dmitry Vyukov <dvyukov@google.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-22 08:11:45 +00:00
Mateusz Guzik	be49509eea	mtx: implement thread lock fastpath MFC after: 1 week	2017-10-21 22:40:09 +00:00
Michal Meloun	904d8c492f	Add AT_HWCAP2 ELF auxiliary vector. - allocate value for new AT_HWCAP2 auxiliary vector on all platforms. - expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly same way as for AT_HWCAP. MFC after: 1 month Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D12699	2017-10-21 12:05:01 +00:00
Mark Johnston	a3e8a25a52	Avoid the nbp lookup in the final loop iteration in flushbuflist(). The end of the loop must re-lookup the next buf since the bufobj lock is dropped in the loop body. If the lookup fails, the loop is restarted. This mechanism non-obviously also terminates the loop when the end of the buf list is reached. Split up the two loops termination cases to make the code a bit less fragile. No functional change intended. Reviewed by: kib MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12730	2017-10-20 14:56:13 +00:00
Mateusz Guzik	62bf13cbf9	mtx: fix up UP build after r324778 Reported by: Michael Butler	2017-10-20 14:04:01 +00:00
Mateusz Guzik	c48a94251d	Mark kdb_active as __read_frequently and switch to bool to eat less space.	2017-10-20 04:02:53 +00:00
Mateusz Guzik	2567807c32	rwlock: reduce lockstat branches in the slowpath MFC after: 1 week	2017-10-20 03:32:42 +00:00
Mateusz Guzik	cbc2d7c218	mtx: stop testing SCHEDULER_STOPPED in kabi funcs for spin mutexes There is nothing panic-breaking to do in the unlock case and the lock case will fallback to the slow path doing the check already. MFC after: 1 week	2017-10-20 00:34:25 +00:00
Mateusz Guzik	0d74fe267b	mtx: clean up locking spin mutexes 1) shorten the fast path by pushing the lockstat probe to the slow path 2) test for kernel panic only after it turns out we will have to spin, in particular test only after we know we are not recursing MFC after: 1 week	2017-10-20 00:30:35 +00:00
Mateusz Guzik	9b8de76beb	sysctl: only take mem lock if oldlen is > 4 * PAGE_SIZE The previous limit of just one page is hit by ps. The entire mechanism should be reworked, if not whacked. It seems the intent is to reduce kernel dos-ability - some handlers wire the amount of memory passed here. Handlers should probably stop wiring in the first place or in the worst case indicate they are doing so so that the check is done only if necessary. It should also probably be a counter, not a lock. MFC after: 1 week	2017-10-19 01:38:31 +00:00
Mateusz Guzik	e6b645ae89	execve: avoid one proc lock/unlock trip unless PTRACE_EXEC is set MFC after: 1 week	2017-10-19 00:46:15 +00:00
Mateusz Guzik	80a2397a38	Tidy up pmc support at execve. The proc-specific check is inherently racy, so the code can just unlock beforehand. MFC after: 1 week	2017-10-19 00:38:14 +00:00
Mateusz Guzik	cb1c79008e	sysvsem: check if semu_list has anything on it before grabbing the lock This should get a process-specific support instead. MFC after: 1 week	2017-10-19 00:31:00 +00:00
Mateusz Guzik	c69a1a50cd	Don't take Giant for SMP status and cpu topology sysctls. Not only this lock doesn't play any role here, dirtying it slows down other things a little bit as giant-held checks (e.g. DROP_GIANT) are spread all over the kernel. MFC after: 1 week	2017-10-18 22:00:44 +00:00
Mark Johnston	46fcd1af63	Move kernel dump offset tracking into MI code. All of the kernel dump implementations keep track of the current offset ("dumplo") within the dump device. However, except for textdumps, they all write the dump sequentially, so we can reduce code duplication by having the MI code keep track of the current offset. The new dump_append() API can be used to write at the current offset. This is needed to implement support for kernel dump compression in the MI kernel dump code. Also simplify dump_encrypted_write() somewhat: use dump_write() instead of duplicating its bounds checks, and get rid of the redundant offset tracking. Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D11722	2017-10-18 15:38:05 +00:00
Brooks Davis	39ed7f250a	Remove mbpool(9) now that it has no consumers. mbpool existed to support NICs with memory interfaces and all remaining comsumers were removed earlier this year with NATM. Reviewed by: jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D10513	2017-10-18 00:18:03 +00:00
Mark Johnston	fa00affd18	Fix a racy VI_DOOMED check in MNT_VNODE_FOREACH_ALL(). MNT_VNODE_FOREACH_ALL() is supposed to avoid returning doomed vnodes, but the VI_DOOMED check it used was done without the vnode interlock held, so it could race with a concurrent vgone(). Submitted by: Don Morris <don.morris@isilon.com> Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12704	2017-10-17 19:41:45 +00:00
Andriy Voskoboinyk	6623429867	mbuf(9): unbreak m_fragment() - Fix it by replacing m_cat() with m_prev->m_next = m_new (m_cat() will try to append data - as a result, there will be no fragmentation). - Move some constants out of the loop. Was previously tested with D4077. Differential Revision: https://reviews.freebsd.org/D4090	2017-10-16 21:46:11 +00:00
Konstantin Belousov	e9445808a8	Re-evaluate thread' signal mask after ptracestop(). The stop drops process lock, which allows the signal mask to be changed and our selected signal might become blocked, i.e. should be returned to the process queue instead of delivery. Also, for the existing check of the process no longer having an attached debugger, we should not loose the signal, but requeue it. Reported and tested by: bdrewery Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 20:21:51 +00:00
Konstantin Belousov	cd735d8f5a	Improve assertion that an ignored or blocked signal is not delivered. Split two conditions into separate asserts. Print additional details, like the signal number and action value. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 20:15:19 +00:00
Konstantin Belousov	0167b33b81	Style. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-10-16 20:11:29 +00:00
Matt Joras	0d8e04054e	Properly reset the fields in clean_unrhdr. In r324542 I neglected to reset the first and last fields of struct unrhdr. This causes a tmpfs to fail the unr(9) consistency checks with DIAGNOSTIC on. Fix this by resetting the fields by calling init_unrhdr. While here, change a loop to use TAILQ_FOREACH_SAFE since it is more readable and equally fast. Reported by: David Wolfskill <david@catwhisker.org> Approved by: rstone (mentor) Sponsored by: Dell EMC Isilon	2017-10-16 16:14:50 +00:00
Tijl Coosemans	11ce4d9f39	When a Linux program tries to access a /path the kernel tries /compat/linux/path before /path. Stop following symbolic links when looking up /compat/linux/path so dead symbolic links aren't ignored. This allows syscalls like readlink(2) and lstat(2) to work on such links. And open(2) will return an error now instead of trying /path.	2017-10-15 18:53:21 +00:00
Mateusz Guzik	e280ce465b	mtx: fix up owner_mtx after r324609 Now that MTX_UNOWNED is 0 the test was alwayas false.	2017-10-14 00:47:30 +00:00
Alan Cox	41bf90bb78	Address two problems with sendfile(..., SF_NOCACHE) and apply one "optimization". First, sendfile(..., SF_NOCACHE) frees pages without checking whether those pages are mapped. This can leave the system with mappings to free or repurposed pages. Second, a page can be busied between the time of the current busy test and acquiring the object lock. Essentially, the test performed before the object lock is acquired can only be regarded as an optimization to short-circuit further work on the page. It cannot, however, be relied upon to prove that it is safe to free the page. Third, when sendfile(..., SF_NOCACHE) was originally implemented, vm_page_deactivate_noreuse() did not yet exist. Use vm_page_deactivate_noreuse() instead of vm_page_deactivate(), because it comes closer to freeing the page. In collaboration with: glebius Discussed with: gallatin, kib, markj X-MFC after: r324448	2017-10-13 16:31:50 +00:00
Andriy Gapon	f92e3400bc	remove process and jail directory machinations from dounmount The manipulations done by mountcheckdirs() are not that useful during the unmount, they can bring about unexpected security consequences. Thic change effectively reverts the change in r73241. The change also allows to simplify the handling of rootvnode global variable. Discussed with: mckusick, mjg, kib Reviewed by: trasz MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12366	2017-10-13 09:42:05 +00:00
Ed Maste	05e47051a2	regen init_sysent.c r324560	2017-10-12 15:48:37 +00:00
Ed Maste	5532aa9bb4	allow posix_fallocate in capability mode posix_fallocate is logically equivalent to writing zero blocks to the desired file size and there is no reason to prevent calling it in capability mode. posix_fallocate already checked for the CAP_WRITE right, so we merely need to list it in capabilities.conf. Reviewed by: allanjude MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D12640	2017-10-12 15:45:53 +00:00
Matt Joras	333dcaa498	Add clearing function for unr(9). Previously before you could call unrhdr_delete you needed to individually free every allocated unit. It is useful to be able to tear down the unr without having to go through this process, as it is significantly faster than freeing the individual units. Reviewed by: cem, lidl Approved by: rstone (mentor) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12591	2017-10-11 21:53:50 +00:00
Konstantin Belousov	70e3b262d1	The th_bintime, th_microtime and th_nanotime members of the timehand all cache the last system time (uptime + boottime). Only the format differs. Do not re-calculate the bintime and simply use the value used to calculate the microtime and nanotime. Group all the updates under the relevant comment. Remove obsoleted XXX part. Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de> MFC after: 1 week	2017-10-11 11:03:11 +00:00
Sean Bruno	1f9916ed08	match sendfile() error handling to send(). Sendfile() should match the error checking order of send() which is currently: SBS_CANTSENDMORE so_error SS_ISCONNECTED Submitted by: Jason Eggleston <jason@eggnet.com> Reviewed by: glebius MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12633	2017-10-10 22:21:05 +00:00
Sean Bruno	009ad5724d	Revert r324405 at the request of the submitter pending better solution. Submitted by: Jason Eggleston <jason@eggnet.com> Sponsored by: Limelight Networks	2017-10-10 00:32:21 +00:00
Gleb Smirnoff	9c82bec42d	Improvements to sendfile(2) mbuf free routine. o Fall back to default m_ext free mech, using function pointer in m_ext_free, and remove sf_ext_free() called directly from mbuf code. Testing on modern CPUs showed no regression. o Provide internally used flag EXT_FLAG_SYNC, to mark that I/O uses SF_SYNC flag. Lack of the flag allows us not to dereference ext_arg2, saving from a cache line miss. o Create function sendfile_free_page() that later will be used, for multi-page mbufs. For now compiler will inline it into sendfile_free_mext(). In collaboration with: gallatin Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 21:06:16 +00:00
Gleb Smirnoff	07e87a1d55	In mb_dupcl() don't copy full m_ext, to avoid cache miss. Respectively, in mb_free_ext() always use fields from the original refcount holding mbuf (see. r296242) mbuf. Cuts another cache miss from mb_free_ext(). However, treat EXT_EXTREF mbufs differently, since they are different - they don't have a refcount holding mbuf. Provide longer comments in m_ext declaration to explain this change and change from r296242. In collaboration with: gallatin Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 20:51:58 +00:00
Gleb Smirnoff	e8fd18f306	Shorten list of arguments to mbuf external storage freeing function. All of these arguments are stored in m_ext, so there is no reason to pass them in the argument list. Not all functions need the second argument, some don't even need the first one. The second argument lives in next cache line, so not dereferencing it is a performance gain. This was discovered in sendfile(2), which will be covered by next commits. The second goal of this commit is to bring even more flexibility to m_ext mbufs, allowing to create more fields in m_ext, opaque to the generic mbuf code, and potentially set and dereferenced by subsystems. Reviewed by: gallatin, kbowling Differential Revision: https://reviews.freebsd.org/D12615	2017-10-09 20:35:31 +00:00
Hans Petter Selasky	32b413d7f0	When showing the sleepqueues from the in-kernel debugger, properly dump all the sendqueues and not just the first one History: It appears that in the commit which introduced the code, r165272, the array indexes of "sq_blocked[0]" and "td_name[i]" were interchanged. In r180927 "td_name[i]" was corrected to "td_name[0]", but "sq_blocked[0]" was left unchanged. PR: 222624 Discussed with: kmacy @ MFC after: 1 week Sponsored by: Mellanox Technologies	2017-10-09 18:33:29 +00:00
Alan Cox	03ca213761	The recent change to initialization of blists (r324420) relied on '-1' appearing only where the code explicitly set it, but since much of the data was not initialized, '-1' appeared other places too, and led to panics. Clear the allocated data before initializing nonzero values by allocating with M_ZERO. Submitted by: Doug Moore <dougm@rice.edu> Reported by: Oleg V. Nauman <oleg@theweb.org.ua>, cy Tested by: Oleg V. Nauman <oleg@theweb.org.ua> MFC after: 1 week X-MFC with: r324420 Differential Revision: https://reviews.freebsd.org/D12627	2017-10-09 18:19:06 +00:00
Alan Cox	8eefcd407b	The blst_radix_init function has two purposes - to compute the number of nodes to allocate for the blist, and to initialize them. The computation can be done much more quickly by identifying the terminating node, if any, at every level of the tree and then summing the number of nodes at each level that precedes the topmost terminator. The initialization can also be done quickly, since settings at the root mark the tree as all-allocated, and only a few terminator nodes need to be marked in the rest of the tree. Eliminate blst_radix_init, and perform its two functions more simply in blist_create. The allocation of the blist takes places in two pieces, but there's no good reason to do so, when a single allocation is sufficient, and simpler. Allocate the blist struct, and the array of nodes associated with it, with a single allocation. Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: markj (an earlier version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D11968	2017-10-08 22:17:39 +00:00
Ian Lepore	7f92689427	Add eventhandler notifications for newbus device attach/detach. The detach case is slightly complicated by the fact that some in-kernel consumers may want to know before a device detaches (so they can release related resources, stop using the device, etc), but the detach can fail. So there are pre- and post-detach notifications for those consumers who need to handle all cases. A couple salient comments from the review, they amount to some helpful documentation about these events, but there's currently no good place for such documentation... Note that in the current newbus locking model, DETACH_BEGIN and DETACH_COMPLETE/FAILED sequence of event handler invocation might interweave with other attach/detach events arbitrarily. The handlers should be prepared for such situations. Also should note that detach may be called after the parent bus knows the hardware has left the building. In-kernel consumers have to be prepared to cope with this race. Differential Revision: https://reviews.freebsd.org/D12557	2017-10-08 17:33:49 +00:00
Ian Lepore	fc09164658	Restore the ability to deregister an eventhandler from within the callback. When the EVENTHANDLER(9) subsystem was created, it was a documented feature that an eventhandler callback function could safely deregister itself. In r200652 that feature was inadvertantly broken by adding drain-wait logic to eventhandler_deregister(), so that it would be safe to unload a module upon return from deregistering its event handlers. There are now 145 callers of EVENTHANDLER_DEREGISTER(), and it's likely many of them are depending on the drain-wait logic that has been in place for 8 years. So instead of creating a separate eventhandler_drain() and adding it to some or all of those 145 call sites, this creates a separate eventhandler_drain_nowait() function for the specific purpose of deregistering a callback from within the running callback. Differential Revision: https://reviews.freebsd.org/D12561	2017-10-08 17:21:16 +00:00
Sean Bruno	75c8dfb6ae	Check so_error early in sendfile() call. Prior to this patch, if a connection was reset by the remote end, sendfile() would just report ENOTCONN instead of ECONNRESET. Submitted by: Jason Eggleston <jason@eggnet.com> Reviewed by: glebius Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D12575	2017-10-07 23:30:57 +00:00
Mateusz Guzik	709939a7b7	namecache: factor out ~MAKEENTRY lookups from the common path Lookups of the sort are rare compared to regular ones and succesfull ones result in removing entries from the cache. In the current code buckets are rlocked and a trylock dance is performed, which can fail and cause a restart. Fixing it will require a little bit of surgery and in order to keep the code maintaineable the 2 cases have to split. MFC after: 1 week	2017-10-06 23:05:55 +00:00

... 5 6 7 8 9 ...

15964 Commits