freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	08da267775	mtx: move lockstat handling out of inline primitives Lockstat requires checking if it is enabled and if so, calling a 6 argument function. Further, determining whether to call it on unlock requires pre-reading the lock value. This is problematic in at least 3 ways: - more branches in the hot path than necessary - additional cacheline ping pong under contention - bigger code Instead, check first if lockstat handling is necessary and if so, just fall back to regular locking routines. For this purpose a new macro is introduced (LOCKSTAT_PROFILE_ENABLED). LOCK_PROFILING uninlines all primitives. Fold in the current inline lock variant into the _mtx_lock_flags to retain the support. With this change the inline variants are not used when LOCK_PROFILING is defined and thus can ignore its existence. This results in: text data bss dec hex filename 22259667 1303208 4994976 28557851 1b3c21b kernel.orig 21797315 1303208 4994976 28095499 1acb40b kernel.patched i.e. about 3% reduction in text size. A remaining action is to remove spurious arguments for internal kernel consumers.	2017-02-05 08:04:11 +00:00
Mateusz Guzik	3ae56ce958	sx: add witness support missed in r313272	2017-02-05 06:51:45 +00:00
Mateusz Guzik	9d2e4290ff	sx: uninline slock/sunlock Shared locking routines explicitly read the value and test it. If the change attempt fails, they fall back to a regular function which would retry in a loop. The problem is that with many concurrent readers the risk of failure is pretty high and even the value returned by fcmpset is very likely going to be stale by the time the loop in the fallback routine is reached. Uninline said primitives. It gives a throughput increase when doing concurrent slocks/sunlocks with 80 hardware threads from ~50 mln/s to ~56 mln/s. Interestingly, rwlock primitives are already not inlined.	2017-02-05 05:20:29 +00:00
Mateusz Guzik	fa47404353	sx: switch to fcmpset Discussed with: jhb Tested by: pho (previous version)	2017-02-05 04:54:20 +00:00
Mateusz Guzik	c84f347985	rwlock: switch to fcmpset Discussed with: jhb Tested by: pho	2017-02-05 04:53:13 +00:00
Mateusz Guzik	90836c3270	mtx: switch to fcmpset The found value is passed to locking routines in order to reduce cacheline accesses. mtx_unlock grows an explicit check for regular unlock. On ll/sc architectures the routine can fail even if the lock could have been handled by the inline primitive. Discussed with: jhb Tested by: pho (previous version)	2017-02-05 03:26:34 +00:00
Mateusz Guzik	2d78a5531e	vfs: use atomic_fcmpset in vfs_refcount_*	2017-02-05 03:23:16 +00:00
Mark Johnston	69d2418faa	Make witness_warn() always print to the console. witness_warn() either breaks into the debugger or panics the system, so its output should go to the console regardless of the witness(4) output channel configuration. MFC after: 1 week Sponsored by: Dell EMC Isilon	2017-02-05 02:27:04 +00:00
Mateusz Guzik	3a2f282532	fd: switch fget_unlocked to atomic_fcmpset	2017-02-05 01:40:27 +00:00
Jason A. Harmening	ad62ba6e96	Revert r313037 The switch to get_pcpu() in MI code seems to cause hangs on MIPS. Back out until we can get a better idea of what's happening there. Reported by: kan, lidl	2017-02-04 06:24:49 +00:00
Hartmut Brandt	4b481ba0ed	Merge filt_soread and filt_solisten and decide what to do when checking for EVFILT_READ at the point of the check not when the event is registers. This fixes a problem with asio when accepting a connection. Reviewed by: kib@, Scott Mitchell	2017-02-01 13:12:07 +00:00
Jason A. Harmening	65ed483615	Implement get_pcpu() for the remaining architectures and use it to replace pcpu_find(curcpu) in MI code.	2017-02-01 03:32:49 +00:00
Edward Tomasz Napierala	b38b22b0b2	Add kern_pread() and kern_pwrite(), and use it in compats instead of their sys_*() counterparts. The svr4 is left unchanged. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9379	2017-01-31 15:35:18 +00:00
Edward Tomasz Napierala	fc8bde8ffe	Replace calls to sys_truncate() with kern_truncate(). Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9371	2017-01-31 15:19:44 +00:00
Edward Tomasz Napierala	ea2ebdc19e	Add kern_cpuset_getid() and kern_cpuset_setid(), and use them in compat32 instead of their sub_*() counterparts. Reviewed by: jhb@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9382	2017-01-31 15:11:23 +00:00
Andriy Gapon	826b3d3187	put very expensive sanity checks of advisory locks under DIAGNOSTIC The checks have quadratic complexity over a number of advisory locks active for a file and that could be a lot. What's the worse is that the checks are done while holding ls_lock. That could lead to a long a very long backlog and performance degradation even if all requested locks are compatible (e.g. all shared locks). The checks used to be under INVARIANTS. Discussed with: kib MFC after: 2 weeks Sponsored by: Panzura	2017-01-30 15:20:13 +00:00
Edward Tomasz Napierala	d293f35c09	Add kern_listen(), kern_shutdown(), and kern_socket(), and use them instead of their sys_*() counterparts in various compats. The svr4 is left untouched, because there's no point. Reviewed by: ed@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9367	2017-01-30 12:57:22 +00:00
Edward Tomasz Napierala	f67d6b5f12	Add kern_lseek() and use it instead of sys_lseek() in various compats. I didn't touch svr4/, there's no point. Reviewed by: ed@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9366	2017-01-30 12:24:47 +00:00
Edward Tomasz Napierala	ae6b6ef6cb	Replace sys_ftruncate() with kern_ftruncate() in various compats. Reviewed by: kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9368	2017-01-30 11:50:54 +00:00
Mateusz Guzik	dfecf51dd0	cache: use vrefact for '.' lookups and refing the rdir in fullpath	2017-01-30 03:20:05 +00:00
Mateusz Guzik	3071469d57	fd: sprinkle __read_mostly and __exclusive_cache_line	2017-01-30 03:07:32 +00:00
Baptiste Daroussin	b4b4b5304b	Revert crap accidentally committed	2017-01-28 16:31:23 +00:00
Baptiste Daroussin	814aaaa7da	Revert r312923 a better approach will be taken later	2017-01-28 16:30:14 +00:00
Mateusz Guzik	95839d3d25	hwpmc: annotate pmc_hook and pmc_intr as __read_mostly MFC after: 1 month	2017-01-27 22:14:42 +00:00
Mateusz Guzik	f1f7f1cb29	hwpmc: partially depessimize mmap handling if the module is not loaded In particular this means the pmc sx lock is no longer taken when an executable mapping succeeds. MFC after: 1 week	2017-01-27 22:13:15 +00:00
Mateusz Guzik	290511163d	Sprinkle __read_mostly on backoff and lock profiling code. MFC after: 1 month	2017-01-27 15:03:51 +00:00
Mateusz Guzik	17071ff298	cache: annotate with __read_mostly and __exclusive_cache_line MFC after: 1 month	2017-01-27 14:56:36 +00:00
Sean Bruno	de414cfe14	A few more style bugs lying around in here. Submitted by: bde	2017-01-26 13:48:45 +00:00
Gleb Smirnoff	beb4b31200	For non-listening AF_UNIX sockets return error code EOPNOTSUPP to match documentation and SUS.	2017-01-25 22:26:45 +00:00
Ed Maste	f27ac8e297	ANSIfy kern_ntptime.c	2017-01-25 20:22:32 +00:00
Sean Bruno	06bb7c507a	Replace overlooked smp_started checks and variable use in a print with the now used tqg_smp_started. Submitted by: bde	2017-01-25 15:54:44 +00:00
Ed Maste	77ebe276ba	imgact_elf: refactor et_dyn_addr calculation This simplifies the logic somewhat. It is extracted from the change in review in D5603. Differential Revision: https://reviews.freebsd.org/D9321	2017-01-24 22:46:43 +00:00
Mateusz Guzik	543b2f425d	proc: perform a lockless check in sys_issetugid Discussed with: kib MFC after: 1 week	2017-01-24 21:48:57 +00:00
Conrad Meyer	90a79ac576	Use time_t for intermediate values to avoid overflow in clock_ts_to_ct Add additionally safety and overflow checks to clock_ts_to_ct and the BCD routines while we're here. Perform a safety check in sys_clock_settime() first to avoid easy local root panic, without having to propagate an error value back through dozens of APIs currently lacking error returns. PR: 211960, 214300 Submitted by: Justin McOmie <justin.mcomie at gmail.com>, kib@ Reported by: Tim Newsham <tim.newsham at nccgroup.trust> Reviewed by: kib@ Sponsored by: Dell EMC Isilon, FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D9279	2017-01-24 18:05:29 +00:00
Sean Bruno	bd84f70044	iflib: Add internal tracking of smp startup status to reliably figure out what methods are to be used to get gtaskqueue up and running. e1000: Calculating this pointer gives undefined behaviour when (last == -1) (it is before the buffer). The pointer is always followed. Panics occurred when it points to an unmapped page. Otherwise, the pointed-to garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the broken case the loop was usually null and the function just returned, and this was acidentally correct. Submitted by: bde Reported by: Matt Macy <mmacy@nextbsd.org>	2017-01-24 16:05:42 +00:00
Sean Bruno	36fa5d5b64	Revert 312696 due to build tests.	2017-01-24 15:55:52 +00:00
Sean Bruno	562a3182f6	iflib: Add internal tracking of smp startup status to reliably figure out what methods are to be used to get gtaskqueue up and running. e1000: Calculating this pointer gives undefined behaviour when (last == -1) (it is before the buffer). The pointer is always followed. Panics occurred when it points to an unmapped page. Otherwise, the pointed-to garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the broken case the loop was usually null and the function just returned, and this was acidentally correct. Submitted by: bde Reviewed by: Matt Macy <mmacy@nextbsd.org>	2017-01-24 14:48:32 +00:00
Konstantin Belousov	3467f88cd6	Add comments explaining unobvious td_critnest adjustments in critical_exit(). Based on the discussion with: jhb Reviewed by: imp Sponsored by: The FreeBSD Foundation Differential revision: D9276 MFC after: 1 week	2017-01-22 19:41:42 +00:00
Konstantin Belousov	25c6816845	More style cleanup. Use ANSI C definition for vn_closefile(). Switch to VNASSERT in _vn_lock(), simplify messages. Sponsored by: The FreeBSD Foundation X-MFC with: r312600, r312601, r312602, r312606	2017-01-22 19:38:45 +00:00
Konstantin Belousov	aec8391d46	Provide fallback VOP methods for crossmp vnode. In particular, crossmp vnode might leak into rename code. PR: 216380 Reported by: fnacl@protonmail.com Sponsored by: The FreeBSD Foundation X-MFC with: r309425	2017-01-22 19:36:02 +00:00
Edward Tomasz Napierala	5c93966020	Remove redundant KASSERT.	2017-01-22 15:35:51 +00:00
Edward Tomasz Napierala	8acac5a9f5	Improve debugging printf.	2017-01-22 15:27:14 +00:00
Mateusz Guzik	eaf0969bda	vfs: fix LK_RETRY logic braino in r312600	2017-01-21 20:34:20 +00:00
Mateusz Guzik	829857c893	vfs: __predict_false the need to handle F_HASLOCK Also reorder the check with DTYPE_VNODE. Passed files are vnodes vast majority of the time, so it is typically true.	2017-01-21 19:01:42 +00:00
Mateusz Guzik	abbc538d9a	vfs: fix whitespace damage in r312600 While here wrap the previously overly long line so that it fits 80 chars.	2017-01-21 18:56:58 +00:00
Mateusz Guzik	1091fb52c1	vfs: refactor _vn_lock Stop testing for LK_RETRY and error multiple times. Also postpone the VI_DOOMED until after LK_RETRY was seen as it reads from the vnode. No functional changes.	2017-01-21 18:38:16 +00:00
Mateusz Guzik	067115e050	vfs: hide the getvnode NULL mp message behind DIAGNOSTIC Since crossmp vnode changes the message was being printed on each boot. Reported by: trasz Discussed with: kib	2017-01-21 16:59:50 +00:00
Hans Petter Selasky	10c8755706	Fix for race leading to endless timer interrupts related to configtimer(). During normal operation "state->nextcallopt" will always be less than or equal to "state->nextcall" and checking only "state->nextcallopt" before calling "callout_process()" is sufficient. However when "configtimer()" is called a race might happen requiring both of these binary times to be checked. Short description of race: 1) A configtimer() call will reset both "state->nextcall" and "state->nextcallopt" to the same binary time. 2) If a "callout_reset()" call happens between "configtimer()" and the next "callout_process()" call, "state->nextcallopt" will get updated and "state->nextcall" will remain at the current time. Refer to logic inside cpu_new_callout(). 3) getnextcpuevent() only respects "state->nextcall" and returns this value over and over again, even if it is in the past, until "now >= state->nextcallopt" becomes true. Then these two time variables are corrected by a "callout_process()" call and the situation goes back to normal. The problem manifests itself in different ways. The common factor is the timer process(es) consume all CPU on one or more CPU cores for a long time, blocking other kernel processes from getting execution time. This can be seen by very high interrupt counts as displayed by "vmstat -i \| grep timer" right after boot. When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting this bug apparently increased. Example output from "vmstat -i" before patch: cpu0:timer 7591 69 cpu9:timer 39031773 358089 cpu4:timer 9359 85 cpu3:timer 9100 83 cpu2:timer 9620 88 Example output from "vmstat -i" after patch: cpu0:timer 4242 34 cpu6:timer 5531 44 cpu3:timer 6450 52 cpu1:timer 4545 36 cpu9:timer 7153 58 Before the patch cpu9 in the example above, was spinning in a loop in order to reach 39 million interrupts just a few seconds after bootup. After the patch the timer interrupt counts are more or less consistent. Discussed with: mav @ Reported by: several people MFC after: 1 week Sponsored by: Mellanox Technologies	2017-01-20 17:40:31 +00:00
Ed Maste	039644eca9	ANSYfy kern_ktrace.c and remove archaic register keyword Sponsored by: The FreeBSD Foundation	2017-01-20 14:59:56 +00:00
Andriy Gapon	c468ff880a	don't abort writing of a core dump after EFAULT It's possible to get EFAULT when writing a segment backed by a file if the segment extends beyond the file. The core dump could still be useful if we skip the rest of the segment and proceed to other segements. The skipped segment (or a portion of it) will be zero-filled. While there, use 'const' to signify that core_write() only reads the buffer and use __DECONST before calling vn_rdwr_inchunks() because it can be used for both reading and writing. Before the change: kernel: Failed to write core file for process mmap_trunc_core (error 14) kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6 After the change: kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped) Reviewed by: julian, kib Obtained from: Panzura (older version of the change) MFC after: 5 days Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D9233	2017-01-20 13:39:07 +00:00

1 2 3 4 5 ...

15285 Commits