freebsd-nq

Author	SHA1	Message	Date
Mateusz Guzik	17182b1351	session: avoid proctree lock on proc exit when possible We can get away with the common case with only proc lock held. Reviewed by: kib	2016-01-20 23:33:58 +00:00
Mateusz Guzik	6760182c5d	session: tidy up fixjobc This stops abusing the 'p' pointer for iteration over children processes and gets rid of useless locking around PRS_ZOMBIE check. Suggested by: kib	2016-01-20 23:22:36 +00:00
Marius Strobl	9750d9e5b6	Fix tty_drain() and, thus, TIOCDRAIN of the current tty(4) incarnation to actually wait until the TX FIFOs of UARTs have be drained before returning. This is done by bringing the equivalent of the TS_BUSY flag found in the previous implementation back in an ABI-preserving way. Reported and tested by: Patrick Powell Most likely, drivers for USB-serial-adapters likewise incorporating TX FIFOs as well as other terminal devices that buffer output in some form should also provide implementations of tsw_busy. MFC after: 3 days	2016-01-19 23:34:27 +00:00
John Baldwin	8a4dc40ff4	Various cleanups to the main function for AIO kernel processes: - Pull the vmspace logic out into helper functions and reduce duplication. Operations on the vmspace are all isolated to vm_map.c, but it now exports a new 'vmspace_switch_aio' for use by AIO kernel processes. - When an AIO kernel process wants to exit, break out of the main loop and perform cleanup after the loop end. This reduces a lot of indentation and allows cleanup to more closely mirror setup actions before the loop starts. - Convert a DIAGNOSTIC to KASSERT(). - Replace mycp with more typical 'p'. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4990	2016-01-19 21:37:51 +00:00
John Baldwin	f2e7f06a0d	Don't create a dedicated session for each AIO kernel process. This code dates back to the initial AIO support and the commit log does not explain why it is needed. However, I cannot find anything in the AIO code or the various file methods (fo_read/fo_write) that would change behavior due to using a private session instead of proc0's session. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4988	2016-01-19 20:46:30 +00:00
Mark Johnston	793c381706	Add vrefl(), a locked variant of vref(9). This API has no in-tree consumers at the moment but is useful to at least one out-of-tree consumer, and naturally complements existing vnode refcount functions (vholdl(9), vdropl(9)). Obtained from: kib (sys/ portion) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4947 Differential Revision: https://reviews.freebsd.org/D4953	2016-01-18 22:21:46 +00:00
Konstantin Belousov	ce958bdefe	When cleaning up from failed adv locking and checking for write, do not call VOP_CLOSE() manually. Instead, delegate the close to fo_close() performed as part of the fdrop() on the file failed to open. For this, finish constructing file on error, in particular, set f_vnode and f_ops. Forcibly resetting f_ops to badfileops disabled additional cleanups performed by fo_close() for some file types, in this case it was noted that cdevpriv data was corrupted. Since fo_close() call must be enabled for some file types, it makes more sense to enable it for all files opened through vn_open_cred(). In collaboration with: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-17 08:40:51 +00:00
John Baldwin	6c8fd02283	Remove aiod_timeout. It hasn't been used since the AIO code was made MPSAFE 10 years ago. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4946	2016-01-14 21:28:56 +00:00
John Baldwin	c85650cacc	Rename aiod_bio taskqueue to aiod_kick. This taskqueue is not used to handle bio requests. It is only used to run aio_kick_nowait() to spin up new aio daemon processes. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4904	2016-01-14 20:51:48 +00:00
Gleb Smirnoff	c8358c6e0d	Call crextend() before copying old credentials to the new credentials and replace crcopysafe by crcopy as crcopysafe is is not intended to be safe in a threaded environment, it drops PROC_LOCK() in while() that can lead to unexpected results, such as overwrite kernel memory. In my POV crcopysafe() needs special attention. For now I do not see any problems with this function, but who knows. Submitted by: dchagin Found by: trinity Security: SA-16:04.linux	2016-01-14 10:16:25 +00:00
Colin Percival	c0ada0377a	Fix a bug introduced in r291716: "The problem with the approach taken both in _bus_dmamap_load_pages and bus_dmamap_load_ma_triv is that they split the request buffer into arbitrary chunks based on page boundaries, creating segments that no longer have a size that's a multiple of the sector size. This breaks drivers like blkfront (and probably other stuff)." [1] This was most easily triggered by running `fsck /` on a system running in Xen (e.g. Amazon EC2) but also showed up via growfs(8) and probably many other userland tools which access the disk directly. Patch by: royger [1] "Thinks this should be fine" by: ken	2016-01-11 20:38:39 +00:00
Dmitry Chagin	038c720553	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
Mark Johnston	11f9ca696e	Prevent cv_waiters wraparound. r282971 attempted to fix this problem by decrementing cv_waiters after waking up from sleeping on a condition variable, but this can result in a use-after-free if the CV is freed before all woken threads have had a chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and have cv_signal() explicitly check for sleeping threads once cv_waiters has reached this bound. Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4822	2016-01-09 01:56:46 +00:00
Gleb Smirnoff	2bab0c5535	New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and up to now. The new sendfile is the code that Netflix uses to send their multiple tens of gigabits of data per second. The new implementation features asynchronous I/O, when I/O operations are launched, but not awaited to be complete. An explanation of why such behavior is beneficial compared to old one is going to be too long for a commit message, so we will skip it here. Additional features of new syscall are extra flags, which provide an application more control over data sent. The SF_NOCACHE flag tells kernel that data shouldn't be cached after it was sent. The SF_READAHEAD() macro allows to specify readahead size in pages. The new syscalls is a drop in replacement. No modifications are required to applications. One can take nginx binary for stable/10 and run it successfully on head. Although SF_NODISKIO lost its original sense, as now sendfile doesn't block, and now means something completely different (tm), using the new sendfile the old way is absolutely safe. Celebrates: Netflix global launch! Sponsored by: Nginx, Inc. Sponsored by: Netflix Relnotes: yes	2016-01-08 20:34:57 +00:00
Gleb Smirnoff	829fae9063	Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM sockets, due to sendfile(2) supporting only the latter, there is a corner case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake of control data, albeit being stream socket. Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY, similar to m_demote().	2016-01-08 19:03:20 +00:00
Gleb Smirnoff	ffd1c319a9	Revert r293405: it breaks socket buffer INVARIANTS when sending control data over local sockets.	2016-01-08 17:27:23 +00:00
Gleb Smirnoff	2f2edf0a08	For SOCK_STREAM socket use sbappendstream() instead of sbappend().	2016-01-08 01:16:03 +00:00
Konstantin Belousov	0de1455462	Convert tty common code to use make_dev_s(). Tty.c was untypical in that it handled the si_drv1 issue consistently and correctly, by always checking for si_drv1 being non-NULL and sleeping if NULL. The removed code also illustrated unneeded complications in drivers which are eliminated by the use of new KPI. Reviewed by: hps, jhb Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:15:09 +00:00
Konstantin Belousov	48ce5d4cac	Provide yet another KPI for cdev creation, make_dev_s(9). Immediate problem fixed by the new KPI is the long-standing race between device creation and assignments to cdev->si_drv1 and cdev->si_drv2, which allows the window where cdevsw methods might be called with si_drv1,2 fields not yet set. Devices typically checked for NULL and returned spurious errors to usermode, and often left some methods unchecked. The new function interface is designed to be extensible, which should allow to add more features to make_dev_s(9) without inventing yet another name for function to create devices, while maintaining KPI and even KBI backward-compatibility. Reviewed by: hps, jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:08:02 +00:00
Mateusz Guzik	6b53d1bc6f	cache: ansify functions and fix some style issues No functional changes.	2016-01-07 02:04:17 +00:00
Konstantin Belousov	1041e09089	Two fixes for excessive iterations after r292326. Advance the logical block number to the lblkno of the found block plus one, instead of incrementing the block number which was used for lookup. This change skips sparcely populated buffer ranges, similar to r292325, instead of doing useless lookups. Do not restart the bnoreuselist() from the start of the range if buffer lock cannot be obtained without sleep. Only retry lookup and lock for the same queue and same logical block number. Reported by: benno Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-01-05 14:48:40 +00:00
Ian Lepore	69dcb7e771	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
Marius Strobl	45005f7907	- (Ab)use udivx for dividing the u_int pc_cpuid when implementing CPU_ISSET(), CPU_SET etc. in sparc64 asm. This approach has the benefit of not clobbering %y, allowing to revert r222827 and partially r222828. - In r222828, CATR() already was changed to use the equivalent of PCPU_GET(cpuid) instead of the MD module ID for KTR_CPU, so belatedly also catch up with the C side of ktr(9). Originally, in r203838 CATR() was moved away from directly reading the module ID or equivalent as that became impractical with other CPU types than USI/II supported. With r222828 in place, per-CPU data generally is set up soon enough, though, that employing PCPU things in ktr(9) also for use during early stages works. - Unfortunately, an exception to the latter is the ktr(9) use in pmap_bootstrap(), which actually is run so early that even checking for bootverbose being set via the loader doesn't work. Consequently, replace the ktr(9) use in pmap_bootstrap() with OF_printf(9) and put it under #ifdef DIAGNOSTIC instead. MFC after: 3 days	2015-12-30 13:49:20 +00:00
John Baldwin	5fcfab6e32	Add ptrace(2) reporting for LWP events. Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting thread creation and destruction. Newly created threads will stop to report PL_FLAG_BORN before returning to userland and exiting threads will stop to report PL_FLAG_EXIT before exiting completely. Both of these events are only enabled and reported if PT_LWP_EVENTS is enabled on a process.	2015-12-29 23:25:26 +00:00
John Baldwin	d1e7a4a553	Call kern_thr_exit() instead of duplicating it. This code is missing the racct_subr() call from kern_thr_exit() and would require further code duplication in future changes. Reviewed by: kib MFC after: 1 week	2015-12-29 23:16:20 +00:00
Dmitry Chagin	3e18d701de	Verify that tv_sec value specified in settimeofday() and clock_settime() (CLOCK_REALTIME case) system calls is non negative. This commit hides a kernel panic in atrtc_settime() as the clock_ts_to_ct() does not properly convert negative tv_sec. ps. in my opinion clock_ts_to_ct() should be rewritten to properly handle negative tv_sec values. Differential Revision: https://reviews.freebsd.org/D4714 Reviewed by: kib MFC after: 1 week	2015-12-27 15:37:07 +00:00
Konstantin Belousov	1899507706	Do not substitute interpeter if the brand interpreter path is different from the interpreter path requested by the binary. Before this change, it is impossible to activate non-default interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file exists. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-12-26 15:40:12 +00:00
Jonathan T. Looney	d3ee0a15a4	Only allow one PT_INTERP ELF program header. This also fixes a potential memory leak for interp_buf. Differential Revision: https://reviews.freebsd.org/D4692 Reviewed by: kib MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-24 00:58:11 +00:00
Enji Cooper	853a17ad6f	Fix r292640 vim overzealously removed some trailing `+' and I didn't check the diff MFC after: 1 week X-MFC with: r292640 Pointyhat to: ngie Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:34:43 +00:00
Enji Cooper	905b145f0a	Clean up trailing whitespace; no functional change MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:29:37 +00:00
Enji Cooper	20b4c1d2cb	Fold lim_shared into lim_copy to mute a -Wunused compiler warning from clang when the kernel is compiled without INVARIANTS Differential Revision: https://reviews.freebsd.org/D4683 Reviewed by: kib, jhb MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-22 21:07:33 +00:00
Konstantin Belousov	d943fa35ad	If we annoy user with the terminal output due to failed load of interpreter, also show the actual error code instead of some interpretation. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-22 20:12:52 +00:00
Jonathan T. Looney	54503a13d8	Add a safety net to reclaim mbufs when one of the mbuf zones become exhausted. It is possible for a bug in the code (or, theoretically, even unusual network conditions) to exhaust all possible mbufs or mbuf clusters. When this occurs, things can grind to a halt fairly quickly. However, we currently do not call mb_reclaim() unless the entire system is experiencing a low-memory condition. While it is best to try to prevent exhaustion of one of the mbuf zones, it would also be useful to have a mechanism to attempt to recover from these situations by freeing "expendable" mbufs. This patch makes two changes: a) The patch adds a generic API to the UMA zone allocator to set a function that should be called when an allocation fails because the zone limit has been reached. Because of the way this function can be called, it really should do minimal work. b) The patch uses this API to try to free mbufs when an allocation fails from one of the mbuf zones because the zone limit has been reached. The function schedules a callout to run mb_reclaim(). Differential Revision: https://reviews.freebsd.org/D3864 Reviewed by: gnn Comments by: rrs, glebius MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-20 02:05:33 +00:00
Mateusz Guzik	aa0241d623	proc: fix a race which could result in dereference of bad p_pgrp pointer on fork During fork p_starcopy - p_endcopy area of a process is populated with bcopy with only proc lock held. Another forking thread can find such a process and proceed to access p_pgrp included in said area. Fix the problem by moving the field outside. It is being properly assigned later. Reviewed by: kib Diagnosed by: kib Tested by: Fabian Keil <freebsd-listen fabiankeil.de> MFC after: 10 days	2015-12-18 16:33:15 +00:00
Adrian Chadd	2b3ad18853	[intrng] Migrate the intrng code from sys/arm/arm to sys/kern/subr_intr.c. The ci20 port (by kan@) is going to reuse almost all of the intrng code since the SoC in question looks suspiciously like someone took an ARM SoC design and replaced the ARM core with a MIPS core. * migrate out the code; * rename ARM_ -> INTR_; * rename arm_ -> intr_; * move the interrupt flush routine from intr.c / intrng.c into arm/machdep_intr.c - removing the code duplication and removing the ARM specific bits from here. Thanks to the Star Wars: The Force Awakens premiere line for allowing me a couple hours of quiet time to finish the universe builds. Tested: * make universe TODO: * The structure definitions in subr_intr.c still includes machine/intr.h which requires one duplicates all of the intrng definitions in the platform code (which kan has done, and I think we don't have to.) Instead I should break out the generic things (function declarations, common intr structures, etc) into a separate header. * Kan has requested I make the PIC based IPI stuff optional.	2015-12-18 05:43:59 +00:00
Mark Johnston	8ff6d9dd22	Support an arbitrary number of arguments to DTrace syscall probes. Rather than pushing all eight possible arguments into dtrace_probe()'s stack frame, make the syscall_args struct for the current syscall available via the current thread. Using a custom getargval method for the systrace provider, this allows any syscall argument to be fetched, even in kernels that have modified the maximum number of system call arguments. Sponsored by: EMC / Isilon Storage Division	2015-12-17 00:00:27 +00:00
Mark Johnston	3616095801	Fix style issues around existing SDT probes. - Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect at the moment, but will be needed for some future changes. - Don't hardcode the module component of the probe identifier. This is set automatically by the SDT framework. MFC after: 1 week	2015-12-16 23:39:27 +00:00
Gleb Smirnoff	b0cd20172d	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-12-16 21:30:45 +00:00
Konstantin Belousov	106ebb761a	Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation	2015-12-16 08:48:37 +00:00
Konstantin Belousov	8549b4b9fe	Simplify the loop step in the flushbuflist() and make it independed on the type stability of the buffers memory. Instead of memoizing pointer to the next buffer and validating it, remember the next logical block number in the bo list and re-lookup. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation	2015-12-16 08:39:51 +00:00
Adrian Chadd	45130aa12e	Don't call wakeup if we're just returning reserved space; just return the reservation and wait for more space to appear. Submitted by: jeff Reviewed by: kib	2015-12-16 00:13:16 +00:00
Jamie Gritton	6ab6058ec4	Fix jail name checking that disallowed anything that starts with '0'. The intention was to just limit leading zeroes on numeric names. That check is now improved to also catch the leading spaces and '+' that strtoul can pass through. PR: 204897 MFC after: 3 days	2015-12-15 17:25:00 +00:00
Edward Tomasz Napierala	af1a7b2526	Tweak comments. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:30:36 +00:00
Edward Tomasz Napierala	a8e0740e85	Actually make the 'amount' argument to racct_adjust_resource() signed, as it was always supposed to be. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:21:13 +00:00
Edward Tomasz Napierala	57e2ebdc08	Avoid useless relocking. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-12-13 11:08:29 +00:00
Mark Johnston	d9e2e68d38	Don't make assertions about td_critnest when the scheduler is stopped. A panicking thread always executes with a critical section held, so any attempt to allocate or free memory while dumping will otherwise cause a second panic. This can occur, for example, if xpt_polled_action() completes non-dump I/O that was pending at the time of the panic. The fact that this can occur is itself a bug, but asserting in this case does little but reduce the reliability of kernel dumps. Suggested by: kib Reported by: pho	2015-12-11 20:05:07 +00:00
Warner Losh	da82615ae2	Create the MDT_PNP_INFO metadata record to communicate PNP info about modules. External agents may use this data to automatically load those modules. Differential Review: https://reviews.freebsd.org/D3461	2015-12-11 05:27:53 +00:00
Steven Hartland	f1b13a89b0	Don't use 0 for pointer comparison Use NULL instead of 0 for comparison with panicstr. MFC after: 1 week Sponsored by: Multiplay	2015-12-08 18:38:33 +00:00
Mark Johnston	711fbd17ec	Add helper functions proc_readmem() and proc_writemem(). These helper functions can be used to read in or write a buffer from or to an arbitrary process' address space. Without them, this can only be done using proc_rwmem(), which requires the caller to fill out a uio. This is onerous and results in code duplication; the new functions provide a simpler interface which is sufficient for most existing callers of proc_rwmem(). This change also adds a manual page for proc_rwmem() and the new functions. Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D4245	2015-12-07 21:33:15 +00:00
Ed Maste	4c22b4686b	Replace magic value ELF note type with NT_FREEBSD_ABI_TAG As of r291909 elf_common.h provides a definition. Suggested by: kib Sponsored by: The FreeBSD Foundation	2015-12-07 18:43:27 +00:00

1 2 3 4 5 ...

14616 Commits