freebsd-skq

Author	SHA1	Message	Date
Mark Johnston	02315a6759	Use a consistent snapshot of the lock state in owner_mtx(). MFC after: 2 weeks	2016-12-10 02:59:34 +00:00
Mark Johnston	c365a2934e	Return a non-NULL owner only if the lock is exclusively held in owner_sx(). Fix some whitespace bugs while here. MFC after: 2 weeks	2016-12-10 02:56:44 +00:00
Gleb Smirnoff	5040da77c1	Use acquire write to cr_lock to complement with release write at end of locked region. Submitted by: kib	2016-12-09 19:07:31 +00:00
Gleb Smirnoff	169170209c	Provide counter_ratecheck(), a MP-friendly substitution to ppsratecheck(). When rated event happens at a very quick rate, the ppsratecheck() is not only racy, but also becomes a performance bottleneck. Together with: rrs, jtl	2016-12-09 17:58:34 +00:00
Robert Watson	52b42f6287	Regnerate system-call definitions following r309677 correcting a whitespace glitch in syscalls.master.	2016-12-07 16:12:27 +00:00
Robert Watson	82d8d2b8bc	Replace spaces with tabs in definition of SCTP system calls, for consistency with the remainder of the syscalls.master file. This problem does not occur in the freebsd32 version of the same system calls.	2016-12-07 16:11:55 +00:00
Eric van Gyzen	3d32d4a7c9	Export the whole thread name in kinfo_proc kinfo_proc::ki_tdname is three characters shorter than thread::td_name. Add a ki_moretdname field for these three extra characters. Add the new field to kinfo_proc32, as well. Update all in-tree consumers to read the new field and assemble the full name, except for lldb's HostThreadFreeBSD.cpp, which I will handle separately. Bump __FreeBSD_version. Reviewed by: kib MFC after: 1 week Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D8722	2016-12-07 15:04:22 +00:00
Konstantin Belousov	435da98564	Restructure the code to handle reporting of non-exited processes from wait(2). - Do not acquire the process spinlock if neither WTRAPPED nor WUNTRACED options were passed [1]. - Extract the code to report alive process into a new helper report_alive_proc() and use it for trapped, stopped and continued childrens. Note that the process spinlock is required around the WTRAPPED and WUNTRACED tests, because P_STOPPED_TRACE and P_STOPPED_SIG flags are set before other threads are stopped at the suspension point, and that threads increment p_suspcount while owning only the process spinlock, the process lock is dropped by them. If the spinlock is not taken for tests, the syscall thread might miss both p_suspcount increment and wakeup in wakeup in thread_suspend_switch(). Based on the submission by: mjg [1] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-12-04 20:44:58 +00:00
Eric van Gyzen	ff07dd913e	thr_set_name(): silently truncate the given name as needed Instead of failing with ENAMETOOLONG, which is swallowed by pthread_set_name_np() anyway, truncate the given name to MAXCOMLEN+1 bytes. This is more likely what the user wants, and saves the caller from truncating it before the call (which was the only recourse). Polish pthread_set_name_np(3) and add a .Xr to thr_set_name(2) so the user might find the documentation for this behavior. Reviewed by: jilles MFC after: 3 days Sponsored by: Dell EMC	2016-12-03 01:14:21 +00:00
Mateusz Guzik	a2d3554542	vfs: provide fake locking primitives for the crossmp vnode Since the vnode is only expected to be shared locked, we can save a little overhead by only pretending we are locking in the first place. Reviewed by: kib Tested by: pho	2016-12-02 18:03:15 +00:00
Mateusz Guzik	a4ce25b5b0	vfs: fix a whitespace nit in r309307	2016-11-30 02:17:03 +00:00
Mateusz Guzik	1babea0341	vfs: avoid VOP_ISLOCKED in the common case in lookup	2016-11-30 02:14:53 +00:00
Mark Johnston	64910ddbff	Launder VPO_NOSYNC pages upon vnode deactivation. As of r234483, vnode deactivation causes non-VPO_NOSYNC pages to be laundered. This behaviour has two problems: 1. Dirty VPO_NOSYNC pages must be laundered before the vnode can be reclaimed, and this work may be unfairly deferred to the vnlru process or an unrelated application when the system is under vnode pressure. 2. Deactivation of a vnode with dirty VPO_NOSYNC pages requires a scan of the corresponding VM object's memq for non-VPO_NOSYNC dirty pages; if the laundry thread needs to launder pages from an unreferenced such vnode, it will reactivate and deactivate the vnode with each laundering, potentially resulting in a large number of expensive scans. Therefore, ensure that all dirty pages are laundered upon deactivation, i.e., when all maps of the vnode are removed and all references are released. Reviewed by: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8641	2016-11-26 21:00:27 +00:00
John Baldwin	9f3aabb9eb	Permit timed sleeps for threads other than thread0 before timers are working. The callout subsystem already handles early callouts and schedules the first clock interrupt appropriately based on the currently pending callouts. The one nit to fix was that callouts scheduled via C_HARDCLOCK during early boot could fire too early once timers were enabled as the per-CPU base time is always zero until timers are initialized. The change in callout_when() handles this case by using the current uptime as the base time of the callout during bootup if the per-CPU base time is zero. Reviewed by: kib MFC after: 2 weeks Sponsored by: Netflix	2016-11-25 18:02:43 +00:00
Mateusz Guzik	746b6e8176	wait: avoid relocking the child if proc_to_reap returns 1 proc_to_reap would always unlock. However, if it returned 1, kern_wait6 would immediately lock it again. Save the dance. Reviewed by: kib	2016-11-24 18:21:48 +00:00
Mateusz Guzik	8b0e0c91e0	cache: ensure that the number of bucket locks does not exceed hash size The size can be changed by side effect of modifying kern.maxvnodes. Since numbucketlocks was not modified, setting a sufficiently low value would give more locks than actual buckets, which would then lead to corruption. Force the number of buckets to be not smaller. Note this should not matter for real world cases. Reported and tested by: pho	2016-11-23 19:50:12 +00:00
Mark Johnston	99e6e1930c	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589	2016-11-23 17:53:07 +00:00
Ruslan Bukin	dd7d4f199e	Revert r306186 ("Adjust the sopt_val pointer on bigendian systems"). This logic doesn't work with bigger sopt_valsize (e.g. when ipfw passing 2048 bytes rule). Reported by: adrian Sponsored by: DARPA, AFRL	2016-11-22 18:31:43 +00:00
Konstantin Belousov	eb962424ba	Restore vnode pager statistic for buffer pagers. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8585	2016-11-22 10:06:39 +00:00
John Baldwin	5d8cce1764	Initialize 'ticks' earlier in boot after 'hz' is set. This avoids the time-warp after kthreads have started running and the required fixup to td_slptick and td_blktick in the EARLY_AP_STARTUP case. Now, 'ticks' is initialized before any kthreads are created or any context switches are performed. Tested by: gavin MFC after: 2 weeks Sponsored by: Netflix	2016-11-22 01:02:59 +00:00
Robert Watson	1279fdafce	Audit 'fd' and 'cmd' arguments to fcntl(2), and when generating BSM, always audit the file-descriptor number and vnode information for all fnctl(2) commands, not just locking-related ones. This was likely an oversight in the original adaptation of this code from XNU. MFC after: 3 days Sponsored by: DARPA, AFRL	2016-11-22 00:41:24 +00:00
Gleb Smirnoff	00b5ffde8e	Add flag SF_USER_READAHEAD to sendfile(2). When specified, the syscall won't do any speculations about readahead, and use exactly the amount of readahead specified by user. E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee that no readahead at all will be performed.	2016-11-17 21:36:18 +00:00
Gleb Smirnoff	5dba303d01	Use bogus_page to properly reduce number of I/Os in sendfile(2). The new sendfile_swapin() loop works this way: - Find first invalid page in the request. - Do vm_pager_has_page() and get count of pages, that can be taken in single I/O. - Trim valid pages from the end of the request. - Cycle through the request and substitute to bogus_page all valid pages that are in the middle of the request. - After I/O launched (pager copies array of pages into buf(9), it is important to restore proper page pointers with help vm_page_lookup(). Count bogus pages used and report them in sendfile stats.	2016-11-17 21:02:55 +00:00
Ruslan Bukin	6e18247a3d	Fix build when no INET and INET6 in kernel config. Submitted by: kan Sponsored by: DARPA, AFRL	2016-11-17 16:13:30 +00:00
Alan Cox	7667839a7e	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
Mateusz Guzik	6ce45c6ac3	cache: plug a write-only variable in cache_negative_zap_one	2016-11-15 03:43:10 +00:00
Mateusz Guzik	317cac6d5a	cache: fix a race between entry removal and demotion The negative list shrinker can demote an entry with only hotlist + neglist locks held. On the other hand entry removal possibly sets the NCF_DVDROP without aformentioned locks held prior to detaching it from the respective netlist., which can lose the update made by the shrinker. Reported and tested by: truckman	2016-11-15 03:38:05 +00:00
Adrian Chadd	8ffa01a061	[mips] enable relbuf on mips for now to work around page aliasing in mips hardware. Although the higher end MIPS hardware handles cache aliasing issues in hardware, the older cores (r4k, etc) and some compile versions of the newer cores (mips24k, mips34k, mips74k) don't have this feature. This means we end up with some very unfortunate behaviour that was made very obvious by some recent changes to the FFS pager by kib. So, flip this off until we get our MIPS pmap/cache code upgraded to handle aliased pages in software. Discussed with: kib, bsdimp, juli	2016-11-15 01:41:45 +00:00
Adrian Chadd	0046bef85a	[mips] make UMTX_CHAINS configurable at compile time. The default (512) wastes quite a bit of space which doesn't really buy us much on highly embedded systems which don't take a lot of locks in parallel. This makes it at least build time configurable so people can experiment.	2016-11-15 01:34:38 +00:00
Konstantin Belousov	ae44bb0146	Initialize reserved bytes in struct mq_attr and its 32compat counterpart, to avoid kernel stack content leak in kmq_setattr(2) syscall. Also slightly simplify the checks around copyout()s. Reported by: Vlad Tsyrklevich <vlad902+spam@gmail.com> PR: 214488 MFC after: 1 week	2016-11-14 13:20:10 +00:00
Konstantin Belousov	714b7df502	Provide simple mutual exclusion between mount point update and unmount. Currently mount update keeps vfs_busy(9) reference on the mount point during MNT_UPDATE VFS_MOUNT() vfsops call. This already provides the exclusion, but is problematic for filesystems which need to perform namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh mnt_from path, because namei(9) must not be called while the vfs_busy(9) reference is owned. Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing syscalls with EBUSY if conflict is detected. Keep vfs_busy(9) reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS KPI. In the update path in ffs_mount(), drop vfs_busy() reference around namei(), which is now safe due to unmount never executing in parallel with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-11-13 21:49:51 +00:00
Konstantin Belousov	9eb8f495b8	Move common cleanup code into helper. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-13 21:39:55 +00:00
Justin Hibbits	6487a709f3	Add two new ddb commands: show device/show all devices Shows several useful pieces of information from the device including the softc and ivars pointers.	2016-11-13 00:46:11 +00:00
John Baldwin	892f0ab0ab	Allow scheduling during early boot. - Send IPI wakeups once SMP is started even if cold is true. - Permit preemptions when cold is true. These changes are needed for EARLY_AP_STARTUP. MFC after: 2 weeks Sponsored by: Netflix	2016-11-12 00:23:09 +00:00
John Baldwin	a6b91f0f45	Don't place threads on the run queue after waking up other CPUs. The other CPU might resume and see a still-empty runq and go back to sleep before sched_add() adds the thread to the runq. This results in a lost wakeup and a potential hang if the system is otherwise completely idle. The race originated due to a micro-optimization (my fault) in 4BSD in that it avoided putting a thread on the run queue if the scheduler was going to preempt to the new thread. To avoid complexity while fixing this race, just drop this optimization. 4BSD now always sets the "owepreempt" flag when a preemption is warranted and defers the actual preemption to the thread_unlock of the caller the same as ULE. MFC after: 2 weeks Sponsored by: Netflix	2016-11-12 00:14:13 +00:00
Bryan Drewery	28323add09	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
Konstantin Belousov	9a639daf77	Tweaks for the buffer pager. Pass current thread credentials instead of NOCRED. Only allow unmapped buffers for filesystem which proclaimed the support. For all filesystems which currently use buffer pager (UFS, msdosfs and cd9660), the changes are effectively nop. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-08 10:10:55 +00:00
Konstantin Belousov	9bd4f0a2c6	vn_fullpath1() checked VV_ROOT and then unreferenced vp->v_mount->mnt_vnodecovered unlocked. This allowed unmount to race. Lock vnode after we noticed the VV_ROOT flag. See comments for explanation why unlocked check for the flag is considered safe. Reported and tested by: avg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-07 10:55:56 +00:00
Konstantin Belousov	75409ce1dd	Remove remnants of the recursive sleep support. Instead assert that we never try to sleep while the thread is on a sleepqueue. Reviewed by: jhb Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8422	2016-11-02 20:57:20 +00:00
Konstantin Belousov	7359fdcf5f	Allow some dotdot lookups in capability mode. If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal. Track the directories traversed, and check the result of dotdot lookup against the recorded list of the directory vnodes. Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently disabled by default until more verification of the approach is done. Disallow non-local filesystems for dotdot, since remote server might conspire with the local process to allow it to escape the namespace. This might be too cautious, provide the knob vfs.lookup_cap_dotdot_nonlocal to override as well. Idea by: rwatson Discussed with: emaste, jonathan, rwatson Reviewed by: mjg (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D8110	2016-11-02 12:43:15 +00:00
Konstantin Belousov	1bf6a0900d	Remove tautological casts. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:10:39 +00:00
Konstantin Belousov	ec84693535	Style fixes. Discussed with: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-02 12:02:31 +00:00
Edward Tomasz Napierala	53ae7e833c	Fix getfsstat(2) with MNT_WAIT to not skip filesystems that are in the process of being unmounted. Previously it would skip them, even if the unmount eventually failed eg due to the filesystem being busy. This behaviour broke autounmountd(8) - if you tried to manually unmount a mounted filesystem, using 'automount -u', and the autounmountd attempted to refresh the filesystem list in that very moment, it would conclude that the filesystem got unmounted and not try to unmount it afterwards. Reviewed by: kib@ Tested by: pho@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8030	2016-11-02 09:43:19 +00:00
Conrad Meyer	8532d381a9	Add BUF_TRACKING and FULL_BUF_TRACKING buffer debugging Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code. This can be handy in tracking down what code touched hung bios and bufs last. The full history is especially useful, but adds enough bloat that it shouldn't be enabled in release builds. Function names (or arbitrary string constants) are tracked in a fixed-size ring in bufs. Bios gain a pointer to the upper buf for tracking. SCSI CCBs gain a pointer to the upper bio for tracking. Reviewed by: markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8366	2016-10-31 23:09:52 +00:00
Mark Johnston	ac91917211	Fix WITNESS hints for pagequeue locks. MFC after: 1 week	2016-10-29 20:01:48 +00:00
Edward Tomasz Napierala	6eeff7a7b2	Fix getfsstat(2) handling of flags. The 'flags' argument is an enum, not a bitfield. For the intended usage - being passed either MNT_WAIT, or MNT_NOWAIT - this shouldn't introduce any changes in behaviour. Reviewed by: jhb@ MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8373	2016-10-29 12:38:30 +00:00
Konstantin Belousov	c39baa7480	Generalize UFS buffer pager to allow it serving other filesystems which also use buffer cache. Most important addition to the code is the handling of filesystems where the block size is less than the machine page size, which might require reading several buffers to validate single page. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:43:59 +00:00
Marcel Moolenaar	07f862a769	Include <stdarg.h> instead of <machine/stdarg.h> when compiled as part of libsbuf. The former is the standard header, and allows us to compile libsbuf on macOS/linux.	2016-10-24 18:03:04 +00:00
Konstantin Belousov	835c2787be	Handle broadcast NMIs. On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs reporting hardware errors are broadcasted to all CPUs. When kernel is configured to enter kdb on NMI, the outcome is problematic, because each CPU tries to enter kdb. All CPUs are executing NMI handlers, which set the latches disabling the nested NMI delivery; this means that stop_cpus_hard(), used by kdb_enter() to stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work. One indication of this is the harmless but annoying diagnostic "timeout stopping cpus". Much more harming behaviour is that because all CPUs try to enter kdb, and if ddb is used as debugger, all CPUs issue prompt on console and race for the input, not to mention the simultaneous use of the ddb shared state. Try to fix this by introducing a pseudo-lock for simultaneous attempts to handle NMIs. If one core happens to enter NMI trap handler, other cores see it and simulate reception of the IPI_STOP_HARD. More, generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting for the acknowledgement, relying on the nmi handler on other cores suspending and then restarting the CPU. Since it is impossible to detect at runtime whether some stray NMI is broadcast or unicast, add a knob for administrator (really developer) to configure debugging NMI handling mode. The updated patch was debugged with the help from Andrey Gapon (avg) and discussed with him. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D8249	2016-10-24 16:40:27 +00:00
Konstantin Belousov	55ee7a4c5f	In the fueword64(9) wrapper for architectures which do not implemented native fueword64(9) still, use proper type for local where fuword64() result is stored. Note that fueword64() is unused in the tree. Submitted by: Chunhui He <hchunhui@mail.ustc.edu.cn> PR: 212520 MFC after: 1 week	2016-10-23 11:23:17 +00:00

1 2 3 4 5 ...

15161 Commits