freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	e3d8f8fed4	When deallocating the vm object in elf_map_insert() due to vm_map_insert() failure, drop the vnode lock around the call to vm_object_deallocate(). Since the deallocated object is the vm object of the vnode, we might get the vnode lock recursion there. In fact, it is almost impossible to make vm_map_insert() failing there on stock kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-03-01 10:22:07 +00:00
Mateusz Guzik	a21018063b	locks: ensure proper barriers are used with atomic ops when necessary Unclear how, but the locking routine for mutexes was using the release barrier instead of acquire. This must have been either a copy-pasto or bad completion. Going through other uses of atomics shows no barriers in: - upgrade routines (addressed in this patch) - sections protected with turnstile locks - this should be fine as necessary barriers are in the worst case provided by turnstile unlock I would like to thank Mark Millard and andreast@ for reporting the problem and testing previous patches before the issue got identified. ps. .-'---`-. ,' `. \| \ \| \ \ _ \ ,\ _ ,'-,/-)\ ( * \ \,' ,' ,'-) `._,) -',-') \/ ''/ ) / / / ,'-' Hardware provided by: IBM LTC	2017-03-01 05:06:21 +00:00
Scott Long	38e41e66e5	Provide a comment on why stdio.h needs to be included.	2017-02-28 21:27:51 +00:00
Jung-uk Kim	c4e929946c	Include stdio.h to fix libsbuf build. Reviewed by: scottl	2017-02-28 21:18:45 +00:00
Scott Long	388f3ce6c3	Implement sbuf_prf(), which takes an sbuf and outputs it to stdout in the non-kernel case and to the console+log in the kernel case. For the kernel case it hooks the putbuf() machinery underneath printf(9) so that the buffer is written completely atomically and without a copy into another temporary buffer. This is useful for fixing compound console/log messages that become broken and interleaved when multiple threads are competing for the console. Reviewed by: ken, imp Sponsored by: Netflix	2017-02-28 18:25:06 +00:00
Gleb Smirnoff	efe3b0de14	Remove SVR4 (System V Release 4) binary compatibility support. UNIX System V Release 4 is operating system released in 1988. It ceased to exist in early 2000-s.	2017-02-28 05:14:42 +00:00
Konstantin Belousov	aca4bb9112	Do not leak mount references for dying threads. Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-25 10:38:18 +00:00
Konstantin Belousov	8cd5962571	Remove cpu_deepest_sleep variable. On Core2 and older Intel CPUs, where TSC stops in C2, system does not allow C2 entrance if timecounter hardware is TSC. This is done by tc_windup() which tests for TC_FLAGS_C2STOP flag of the new timecounter and increases cpu_disable_c2_sleep if flag is set. Right now init_TSC_tc() only sets the flag if cpu_deepest_sleep >= 2, but TSC is initialized too early for this variable to be set by acpi_cpu.c. There is no reason to require that ACPI reported C2 and deeper states to set TC_FLAGS_C2STOP, so remove cpu_deepest_sleep test from init_TSC_tc() condition. And since this is the only use of the variable, remove it at all. Reported and submitted by: Jia-Shiun Li <jiashiun@gmail.com> Suggested by: jhb MFC after: 2 weeks	2017-02-24 16:11:55 +00:00
Warner Losh	bbf6e5144e	Cast values to (int) before comparing them to the range of the enum. This ensures they are in range w/o the warnings.	2017-02-24 01:39:12 +00:00
Warner Losh	df1c30f6bd	KDTRACE_HOOKS isn't guaranteed to be defined. Change to check to see if it is defined or not rather than if it is non-zero. Sponsored by: Netflix, Inc	2017-02-24 01:39:08 +00:00
Mateusz Guzik	dfaa7859d6	mtx: microoptimize lockstat handling in spin mutexes and thread lock While here make the code compilablle on kernels with LOCK_PROFILING but without KDTRACE_HOOKS.	2017-02-23 22:46:01 +00:00
Eric van Gyzen	b215ceaaec	Add sem_clockwait_np() This function allows the caller to specify the reference clock and choose between absolute and relative mode. In relative mode, the remaining time can be returned. The API is similar to clock_nanosleep(3). Thanks to Ed Schouten for that suggestion. While I'm here, reduce the sleep time in the semaphore "child" test to greatly reduce its runtime. Also add a reasonable timeout. Reviewed by: ed (userland) MFC after: 2 weeks Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9656	2017-02-23 19:36:38 +00:00
Jonathan T. Looney	c9cde8251c	Fix a panic during boot caused by inadequate locking of some vt(4) driver data structures. vt_change_font() calls vtbuf_grow() to change some vt driver data structures. It uses TF_MUTE to prevent the console from trying to use those data structures while it changes them. During the early stage of the boot process, the vt driver's tc_done routine uses those data structures; however, it is currently called outside the TF_MUTE check. Move the tc_done routine inside the locked TF_MUTE check. PR: 217282 Reviewed by: ed, ray Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9709	2017-02-23 01:18:47 +00:00
Warner Losh	6fec662c86	Make the code match the comments: If we have ANY buf's that failed then return EAGAIN. The current code just returns that if the LAST buf failed. Reviewed by: kib@, trasz@ Differential Revision: https://reviews.freebsd.org/D9677	2017-02-21 18:56:06 +00:00
John Baldwin	150599be12	Consolidate statements to initialize files. Previously, the first lines of various generated files from system call tables were generated in two sections. Some of the initialization was done in BEGIN, and the rest was done when the first line was encountered. The main reason for this split before r313564 was that most of the initialization done in the second section depended on the $FreeBSD$ tag extracted from the system call table. Now that the $FreeBSD$ tag is no longer used, consolidate all of the file initialization in the BEGIN section. This change was tested by confirming that the content of generated files did not change.	2017-02-20 20:37:25 +00:00
Mateusz Guzik	13d2ef0f3a	mtx: fix spin mutexes interaction with failed fcmpset While doing so move recursion support down to the fallback routine.	2017-02-20 19:08:36 +00:00
Eric Badger	82a4538f31	Defer ptracestop() signals that cannot be delivered immediately When a thread is stopped in ptracestop(), the ptrace(2) user may request a signal be delivered upon resumption of the thread. Heretofore, those signals were discarded unless ptracestop()'s caller was issignal(). Fix this by modifying ptracestop() to queue up signals requested by the ptrace user that will be delivered when possible. Take special care when the signal is SIGKILL (usually generated from a PT_KILL request); no new stop events should be triggered after a PT_KILL. Add a number of tests for the new functionality. Several tests were authored by jhb. PR: 212607 Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC In collaboration with: jhb Differential Revision: https://reviews.freebsd.org/D9260	2017-02-20 15:53:16 +00:00
Konstantin Belousov	ecc6c515ab	Apply noexec mount option for mmap(PROT_EXEC). Right now the noexec mount option disallows image activators to try execve the files on the mount point. Also, after r127187, noexec also limits max_prot map entries permissions for mappings of files from such mounts, but not the actual mapping permissions. As result, the API behaviour is inconsistent. The files from noexec mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution permission, it cannot be re-enabled later. Make this consistent logically and aligned with behaviour of other systems, by disallowing PROT_EXEC for mmap(2). Note that this change only ensures aligned results from mmap(2) and mprotect(2), it does not prevent actual code execution from files coming from noexec mount. Such files can always be read into anonymous executable memory and executed from there. Reported by: shamaz.mazum@gmail.com PR: 217062 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-02-19 20:51:04 +00:00
Mateusz Guzik	b247fd395d	locks: make trylock routines check for 'unowned' value Since fcmpset can fail without lock contention e.g. on arm, it was possible to get spurious failures when the caller was expecting the primitive to succeed. Reported by: mmel	2017-02-19 16:28:46 +00:00
Hans Petter Selasky	316e092a77	Make sure the thread constructor and destructor eventhandlers are called for all threads belonging to a procedure. Currently the first thread in a procedure is kept around as an optimisation step and is never freed. Because the first thread in a procedure is never freed nor allocated, its destructor and constructor callbacks are never called which means per thread structures allocated by dtrace and the Linux emulation layers for example, might be present for threads which don't need these structures. This patch adds a thread construction and destruction call for the first thread in a procedure. Tested: dtrace, linux emulation Reviewed by: kib @ MFC after: 1 week Sponsored by: Mellanox Technologies	2017-02-19 13:15:33 +00:00
Jason A. Harmening	e2a8d17887	Bring back r313037, with fixes for mips: Implement get_pcpu() for amd64/sparc64/mips/powerpc, and use it to replace pcpu_find(curcpu) in MI code. Reviewed by: andreast, kan, lidl Tested by: lidl(mips, sparc64), andreast(powerpc) Differential Revision: https://reviews.freebsd.org/D9587	2017-02-19 02:03:09 +00:00
Mateusz Guzik	5c5df0d99b	locks: clean up trylock primitives In particular thius reduces accesses of the lock itself.	2017-02-18 22:06:03 +00:00
Bryan Drewery	8e31b510b0	Fix panic with unlocked vnode to vrecycle(). MFC after: 2 weeks	2017-02-18 05:07:53 +00:00
Mateusz Guzik	a24c8eb847	mtx: plug the 'opts' argument when not used	2017-02-18 01:52:10 +00:00
Mateusz Guzik	cbebea4e67	mtx: get rid of file/line args from slow paths if they are unused This denotes changes which went in by accident in r313877. On most production kernels both said parameters are zeroed and have nothing reading them in either __mtx_lock_sleep or __mtx_unlock_sleep. Thus this change stops passing them by internal consumers which this is the case. Kernel modules use _flags variants which are not affected kbi-wise.	2017-02-17 15:40:24 +00:00
Mateusz Guzik	09f1319acd	mtx: restrict r313875 to kernels without LOCK_PROFILING	2017-02-17 15:34:40 +00:00
Mateusz Guzik	7640beb920	mtx: microoptimize lockstat handling in __mtx_lock_sleep This saves a function call and multiple branches after the lock is acquired.	2017-02-17 14:55:59 +00:00
Mateusz Guzik	0108a98012	sx: fix compilation on UP kernels after r313855 sx primitives use inlines as opposed to macros. Change the tested condition to LOCK_DEBUG which covers the case, but is slightly overzelaous. Reported by: kib	2017-02-17 10:58:12 +00:00
Mateusz Guzik	91fa47076d	Introduce SCHEDULER_STOPPED_TD for use when the thread pointer was already read Sprinkle in few places.	2017-02-17 06:45:04 +00:00
Mateusz Guzik	ffd5c94c4f	locks: let primitives for modules unlock without always goging to the slsow path It is only needed if the LOCK_PROFILING is enabled. It has to always check if the lock is about to be released which requires an avoidable read if the option is not specified..	2017-02-17 05:39:40 +00:00
Mateusz Guzik	afa39f7a32	locks: remove SCHEDULER_STOPPED checks from primitives for modules They all fallback to the slow path if necessary and the check is there. This means a panicked kernel executing code from modules will be able to succeed doing actual lock/unlock, but this was already the case for core code which has said primitives inlined.	2017-02-17 05:09:51 +00:00
Ryan Stone	27ee18ad33	Revert r313814 and r313816 Something evidently got mangled in my git tree in between testing and review, as an old and broken version of the patch was apparently submitted to svn. Revert this while I work out what went wrong. Reported by: tuexen Pointy hat to: rstone	2017-02-16 21:18:31 +00:00
Eric van Gyzen	8144690af4	Use inet_ntoa_r() instead of inet_ntoa() throughout the kernel inet_ntoa() cannot be used safely in a multithreaded environment because it uses a static local buffer. Instead, use inet_ntoa_r() with a buffer on the caller's stack. Suggested by: glebius, emaste Reviewed by: gnn MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9625	2017-02-16 20:47:41 +00:00
Ryan Stone	3600f4ba35	Fix a typo in my previous commit Somehow in the late stages of testing my sched_ule patch, a character was accidentally deleted from the file. Correct this. While I'm committing anyway, the previous commit message requires some clarification: in the normal case of unlending priority after releasing a mutex, the thread that was doing the lending will be woken up and immediately become the highest-priority thread, and in that case no priority inversion would take place. However, if that thread is pinned to a different CPU, then the currently running thread that just had its priority lowered will not be preempted and then priority inversion can occur. Reported by: O. Hartmann (typo), jhb (scheduler clarification) MFC after: 1 month Pointy hat to: rstone	2017-02-16 20:06:21 +00:00
Ryan Stone	09ae7c4814	Check for preemption after lowering a thread's priority When a high-priority thread is waiting for a mutex held by a low-priority thread, it temporarily lends its priority to the low-priority thread to prevent priority inversion. When the mutex is released, the lent priority is revoked and the low-priority thread goes back to its original priority. When the priority of that thread is lowered (through a call to sched_priority()), the schedule was not checking whether there is now a high-priority thread in the run queue. This can cause threads with real-time priority to be starved in the run queue while the low-priority thread finishes its quantum. Fix this by explicitly checking whether preemption is necessary when a thread's priority is lowered. Sponsored by: Dell EMC Isilon Obtained from: Sandvine Inc Differential Revision: https://reviews.freebsd.org/D9518 Reviewed by: Jeff Roberson (ule) MFC after: 1 month	2017-02-16 19:41:13 +00:00
Mark Johnston	c6a4ba5a38	Apply MADV_FREE to exec_map entries only after a lowmem event. This effectively provides the same benefit as applying MADV_FREE inline upon every execve, since the page daemon invokes lowmem handlers prior to scanning the inactive queue. It also has less overhead; the cost of applying MADV_FREE is very noticeable on many-CPU systems since it includes that of a TLB shootdown of global PTEs. For instance, this change nearly halves the system CPU usage during a buildkernel on a 128-vCPU EC2 instance (with some other patches applied). Benchmarked by: cperciva (earlier version) Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9586	2017-02-15 01:50:58 +00:00
Eric Badger	28d2efa983	sleepq_catch_signals: do thread suspension before signal check Since locks are dropped when a thread suspends, it's possible for another thread to deliver a signal to the suspended thread. If the thread awakens from suspension without checking for signals, it may go to sleep despite having a pending signal that should wake it up. Therefore the suspension check is done first, so any signals sent while suspended will be caught in the subsequent signal check. Reviewed by: kib Approved by: kib (mentor) MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D9530	2017-02-14 17:13:23 +00:00
Andriy Gapon	937c1b0757	try to fix RACCT_RSS accounting There could be a race between the vm daemon setting RACCT_RSS based on the vm space and vmspace_exit (called from exit1) resetting RACCT_RSS to zero. In that case we can get a zombie process with non-zero RACCT_RSS. If the process is jailed, that may break accounting for the jail. There could be other consequences. Fix this race in the vm daemon by updating RACCT_RSS only when a process is in the normal state. Also, make accounting a little bit more accurate by refreshing the page resident count after calling vm_pageout_map_deactivate_pages(). Finally, add an assert that the RSS is zero when a process is reaped. PR: 210315 Reviewed by: trasz Differential Revision: https://reviews.freebsd.org/D9464	2017-02-14 13:54:05 +00:00
Konstantin Belousov	496ab0532d	Rework r313352. Rename kern_vm_* functions to kern_*. Move the prototypes to syscallsubr.h. Also change Mach VM types to uintptr_t/size_t as needed, to avoid headers pollution. Requested by: alc, jhb Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D9535	2017-02-13 09:04:38 +00:00
Konstantin Belousov	987ff18184	Consistently handle negative or wrapping offsets in the mmap(2) syscalls. For regular files and posix shared memory, POSIX requires that [offset, offset + size) range is legitimate. At the maping time, check that offset is not negative. Allowing negative offsets might expose the data that filesystem put into vm_object for internal use, esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies that the mapped range is valid, assuming that mmap(2) checked that arithmetic gives no undefined results. For device mappings, leave the semantic of negative offsets to the driver. Correct object page index calculation to not erronously propagate sign. In either case, disallow overflow of offset + size. Update mmap(2) man page to explain the requirement of the range validity, and behaviour when the range becomes invalid after mapping. Reported and tested by: royger (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-12 21:05:44 +00:00
Konstantin Belousov	f2277b64ec	Switch copyout_map() to use vm_mmap_object() instead of vm_mmap(). This is both a microoptimization and a move of the consumer to more commonly used vm function. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-12 20:54:31 +00:00
Mateusz Guzik	c4a48867f1	lockmgr: implement fast path The main lockmgr routine takes 8 arguments which makes it impossible to tail-call it by the intermediate vop_stdlock/unlock routines. The routine itself starts with an if-forest and reads from the lock itself several times. This slows things down both single- and multi-threaded. With the patch single-threaded fstats go 4% up and multithreaded up to ~27%. Note that there is still a lot of room for improvement. Reviewed by: kib Tested by: pho	2017-02-12 09:49:44 +00:00
John Baldwin	bb9b710477	Regenerate all the system call tables to drop "created from" lines. One of the ibcs2 files contains some actual changes (new headers) as it hasn't been regenerated after older changes to makesyscalls.sh.	2017-02-10 19:45:02 +00:00
John Baldwin	807a7231f2	Drop the "created from" line from files generated by makesyscalls.sh. This information is less useful when the generated files are included in source control along with the source. If needed it can be reconstructed from the $FreeBSD$ tag in the generated file. Removing this information from the generated output permits committing the generated files along with the change to the system call master list without having inconsistent metadata in the generated files. Reviewed by: emaste, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D9497	2017-02-10 19:25:52 +00:00
Konstantin Belousov	e83a71c656	Fix r313495. The file type DTYPE_VNODE can be assigned as a fallback if VOP_OPEN() did not initialized file type. This is a typical code path used by normal file systems. Also, change error returned for inappropriate file type used for O_EXLOCK to EOPNOTSUPP, as declared in the open(2) man page. Reported by: cy, dhw, Iblis Lin <iblis@hs.ntnu.edu.tw> Tested by: dhw Sponsored by: The FreeBSD Foundation MFC after: 13 days	2017-02-10 14:49:04 +00:00
Konstantin Belousov	e628e1b919	Increase a chance of devfs_close() calling d_close cdevsw method. If a file opened over a vnode has an advisory lock set at close, vn_closefile() acquires additional vnode use reference to prevent freeing the vnode in vn_close(). Side effect is that for device vnodes, devfs_close() sees that vnode reference count is greater than one and refuses to call d_close(). Create internal version of vn_close() which can avoid dropping the vnode reference if needed, and use this to execute VOP_CLOSE() without acquiring a new reference. Note that any parallel reference to the vnode would still prevent d_close call, if the reference is not from an opened file, e.g. due to stat(2). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-09 23:36:50 +00:00
Konstantin Belousov	7903b00087	Do not establish advisory locks when doing open(O_EXLOCK) or open(O_SHLOCK) for files which do not have DTYPE_VNODE type. Both flock(2) and fcntl(2) syscalls refuse to acquire advisory lock on a file which type is not DTYPE_VNODE. Do the same when lock is requested from open(2). Restructure the block in vn_open_vnode() which handles O_EXLOCK and O_SHLOCK open flags to make it easier to quit its execution earlier with an error. Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2017-02-09 23:35:57 +00:00
Mateusz Guzik	8eaaf58a5f	rwlock: fix r313454 The runlock slow path would update wrong variable before restarting the loop, in effect corrupting the state. Reported by: pho	2017-02-09 13:32:19 +00:00
Mateusz Guzik	3b3cf014fc	locks: tidy up unlock fallback paths Update comments to note these functions are reachable if lockstat is enabled. Check if the lock has any bits set before attempting unlock, which saves an unnecessary atomic operation.	2017-02-09 08:19:30 +00:00
Mateusz Guzik	834f70f32f	sx: implement slock/sunlock fast path See r313454.	2017-02-08 19:29:34 +00:00

1 2 3 4 5 ...

15351 Commits