freebsd-skq

Author	SHA1	Message	Date
trasz	8e0224f2c9	Fix the way RCTL handles rules' rrl_exceeded on credenials change. Because of what this variable does, it was probably harmless - but still incorrect. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-01-26 11:28:55 +00:00
kib	d4a0747609	Restore flushing of output for revoke(2) again. Document revoke()'s intended behaviour in its man page. Simplify tty_drain() to match. Don't call ttydevsw methods in tty_flush() if the device is gone since we now sometimes call it then. The flushing was supposed to be implemented by passing the FNONBLOCK flag to VOP_CLOSE() for revoke(). The tty driver is one of the few that can block in close and was one of the fewer that knew about this. This almost worked in FreeBSD-1 and similarly in Net/2. These versions only almost worked because there was and is considerable confusion between IO_NDELAY and FNONBLOCK (aka O_NONBLOCK). IO_NDELAY is only valid for VOP_READ() and VOP_WRITE(). For other VOPs it has the same value as O_SHLOCK. But since vfs_subr.c and tty.c consistently used the wrong flag and the O_SHLOCK flag is rarely set, this mostly worked. It also gave the feature than applications could get the non-blocking close by abusing O_SHLOCK. This was first broken then fixed in 1995. I changed only the tty driver to use FNONBLOCK, as a hack to get non-blocking via the normal flag FNONBLOCK for last closes. I didn't know about revoke()'s use of IO_NDELAY or change it to be consistent, so revoke() was broken. Then I changed revoke() to match. This was next broken in 1997 then fixed in 1998. Importing Lite2 made the flags inconsistent again by undoing the fix only in vfs_subr.c. This was next broken in 2008 by replacing everything in tty.c and not checking any flags in last close. Other bugs in draining limited the resulting unbounded waits to drain in some cases. It is now possible to fix this better using the new FREVOKE flag. Just restore flushing for revoke() for now. Don't restore or undo any hacks for ordinary last closes yet. But remove dead code in the 1-second relative timeout (r272789). This did extra work to extend the buggy draining for revoke() for as long as possible. The 1-second timeout made this not very long by usually flushing after 1 second. Submitted by: bde MFC after: 2 weeks	2016-01-26 07:57:44 +00:00
markj	249795444f	Evaluate the sysctl_running fail point before taking the sysctl lock. The fail point handler may sleep, but this is not permitted while holding a rm read lock. MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2016-01-26 01:15:18 +00:00
marius	e5b176f8bf	- Make the code consistent with itself style-wise and bring it closer to style(9). - Mark unused arguments as such. - Make the ttystates table const.	2016-01-25 22:58:06 +00:00
kib	8d218f7844	Don't allow opening the callout device when the callin device is already open (in disguise as the console device). The only allowed combination was supposed to be the callin device with the console. Fix the assertion in ttydev_close() that was meant to detect this (it only detected all 3 devices being open). Assert this in ttydev_open() too. Submitted by: bde MFC after: 2 weeks	2016-01-25 16:47:20 +00:00
kib	265360f6c0	Fix the %b flags string for ddb. All bits above the 5th (TF_OPENED_CONS) were broken in r188147 by adding TF_OPENED_CONS without updating the string. It was especially confusing to display OPENED_CONS as GONE and BYPASS as ZOMBIE. 2 flags at the end were not updated in r188487. Don't print an extra 0x prefix for %p in a ddb command. In the rest of the kernel there are more than 6000 lines with %p and only about 40 with this bug. Print a non-extra 0x prefix for %b in a ddb command. In the rest of the kernel, there are approx. 180 lines with %b and 2/3 of them have this bug. Submitted by: bde MFC after: 2 weeks	2016-01-25 15:37:01 +00:00
melifaro	23582454c7	MFP r287070,r287073: split radix implementation and route table structure. There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals. So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks. Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base). New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.	2016-01-25 06:33:15 +00:00
kib	66dd616ca8	In tty_dealloc(), clear the queues. See the comment for a scenario which explains why ttydev_leave() cleanup might not happen. Submitted by: bde MFC after: 3 weeks	2016-01-22 20:38:46 +00:00
kib	f011d9b5bd	The struct file f_advice member is overlaid with the devfs f_cdevpriv data. If vnode bypass for devfs file failed, vn_read/vn_write are called and might try to dereference f_advice. Limit the accesses to f_advice to VREG vnodes only, which is the type ensured by posix_fadvise(). The f_advice for regular files is protected by mtxpool lock. Recheck that f_advice is not NULL after lock is taken. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2016-01-22 20:35:20 +00:00
glebius	0d9f222414	- Separate sendfile(2) implementation from uipc_syscalls.c into separate file. Claim my copyright. - Provide more comments, better function and structure names. - Sort out unneeded includes from resulting two files. No functional changes.	2016-01-22 02:23:18 +00:00
jhb	7a18594de0	AIO daemons have always been kernel processes to facilitate switching to user VM spaces while servicing jobs. Update various comments and data structures that refer to AIO daemons as threads to refer to processes instead. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4999	2016-01-21 02:20:38 +00:00
jhb	e4c5597b37	Remove unused variables for socket AIO. In r55943, a per-process queue of pending socket AIO requests (requests waiting for the socket to become ready) was added so that they could be cancelled during process rundown. In r154765, the rundown code was changed to handle jobs in this state (JOBST_JOBQSOCK) directly removing the need for the extra queue. However, the per-process queue head and global lock were never removed. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4997	2016-01-21 01:28:31 +00:00
mjg	0b0166a192	cache: minor changes 1. vhold and zap immediately instead of postponing few lines later 2. increment numneg after new entry is added No functional changes. No objections: kib	2016-01-21 01:09:39 +00:00
mjg	fc47375f70	cache: perform . lockup without the namecache lock Reviewed by: kib	2016-01-21 01:07:05 +00:00
mjg	a7359714be	cache: provide a helper for computing the hash Reviewed by: kib	2016-01-21 01:05:41 +00:00
mjg	5ca67cac1e	cache: use counter(9) API to maintain statistics Previously the code would just increment statistics while only holding a shared lock, in effect losing updates. Separate tracking for nchstats is removed as values can be obtained from existing counters. Note that some fields are updated by external consumers and are left unfixed. This should not be a serious issue as this structure looks quite obsolete. No strong objections: kib	2016-01-21 01:04:03 +00:00
mjg	eabf748a18	session: avoid proctree lock on proc exit when possible We can get away with the common case with only proc lock held. Reviewed by: kib	2016-01-20 23:33:58 +00:00
mjg	1a52d1b25e	session: tidy up fixjobc This stops abusing the 'p' pointer for iteration over children processes and gets rid of useless locking around PRS_ZOMBIE check. Suggested by: kib	2016-01-20 23:22:36 +00:00
marius	c9d9d68bae	Fix tty_drain() and, thus, TIOCDRAIN of the current tty(4) incarnation to actually wait until the TX FIFOs of UARTs have be drained before returning. This is done by bringing the equivalent of the TS_BUSY flag found in the previous implementation back in an ABI-preserving way. Reported and tested by: Patrick Powell Most likely, drivers for USB-serial-adapters likewise incorporating TX FIFOs as well as other terminal devices that buffer output in some form should also provide implementations of tsw_busy. MFC after: 3 days	2016-01-19 23:34:27 +00:00
jhb	897a9bc5d3	Various cleanups to the main function for AIO kernel processes: - Pull the vmspace logic out into helper functions and reduce duplication. Operations on the vmspace are all isolated to vm_map.c, but it now exports a new 'vmspace_switch_aio' for use by AIO kernel processes. - When an AIO kernel process wants to exit, break out of the main loop and perform cleanup after the loop end. This reduces a lot of indentation and allows cleanup to more closely mirror setup actions before the loop starts. - Convert a DIAGNOSTIC to KASSERT(). - Replace mycp with more typical 'p'. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4990	2016-01-19 21:37:51 +00:00
jhb	ea0e31cb8f	Don't create a dedicated session for each AIO kernel process. This code dates back to the initial AIO support and the commit log does not explain why it is needed. However, I cannot find anything in the AIO code or the various file methods (fo_read/fo_write) that would change behavior due to using a private session instead of proc0's session. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4988	2016-01-19 20:46:30 +00:00
markj	09fb369fc5	Add vrefl(), a locked variant of vref(9). This API has no in-tree consumers at the moment but is useful to at least one out-of-tree consumer, and naturally complements existing vnode refcount functions (vholdl(9), vdropl(9)). Obtained from: kib (sys/ portion) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4947 Differential Revision: https://reviews.freebsd.org/D4953	2016-01-18 22:21:46 +00:00
kib	7d0828c94e	When cleaning up from failed adv locking and checking for write, do not call VOP_CLOSE() manually. Instead, delegate the close to fo_close() performed as part of the fdrop() on the file failed to open. For this, finish constructing file on error, in particular, set f_vnode and f_ops. Forcibly resetting f_ops to badfileops disabled additional cleanups performed by fo_close() for some file types, in this case it was noted that cdevpriv data was corrupted. Since fo_close() call must be enabled for some file types, it makes more sense to enable it for all files opened through vn_open_cred(). In collaboration with: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-17 08:40:51 +00:00
jhb	ea7fa1c904	Remove aiod_timeout. It hasn't been used since the AIO code was made MPSAFE 10 years ago. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D4946	2016-01-14 21:28:56 +00:00
jhb	61577b76c5	Rename aiod_bio taskqueue to aiod_kick. This taskqueue is not used to handle bio requests. It is only used to run aio_kick_nowait() to spin up new aio daemon processes. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D4904	2016-01-14 20:51:48 +00:00
glebius	796cbcc738	Call crextend() before copying old credentials to the new credentials and replace crcopysafe by crcopy as crcopysafe is is not intended to be safe in a threaded environment, it drops PROC_LOCK() in while() that can lead to unexpected results, such as overwrite kernel memory. In my POV crcopysafe() needs special attention. For now I do not see any problems with this function, but who knows. Submitted by: dchagin Found by: trinity Security: SA-16:04.linux	2016-01-14 10:16:25 +00:00
cperciva	9ca3584fdd	Fix a bug introduced in r291716: "The problem with the approach taken both in _bus_dmamap_load_pages and bus_dmamap_load_ma_triv is that they split the request buffer into arbitrary chunks based on page boundaries, creating segments that no longer have a size that's a multiple of the sector size. This breaks drivers like blkfront (and probably other stuff)." [1] This was most easily triggered by running `fsck /` on a system running in Xen (e.g. Amazon EC2) but also showed up via growfs(8) and probably many other userland tools which access the disk directly. Patch by: royger [1] "Thinks this should be fine" by: ken	2016-01-11 20:38:39 +00:00
dchagin	e706df7b9a	Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall instead of vdso. An upcoming linux_base-c6 needs it. Differential Revision: https://reviews.freebsd.org/D1090 Reviewed by: kib, trasz MFC after: 1 week	2016-01-09 20:18:53 +00:00
markj	e38d62e90d	Prevent cv_waiters wraparound. r282971 attempted to fix this problem by decrementing cv_waiters after waking up from sleeping on a condition variable, but this can result in a use-after-free if the CV is freed before all woken threads have had a chance to run. Instead, avoid incrementing cv_waiters past INT_MAX, and have cv_signal() explicitly check for sleeping threads once cv_waiters has reached this bound. Reviewed by: jhb MFC after: 2 weeks Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4822	2016-01-09 01:56:46 +00:00
glebius	aaa09777e1	New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and up to now. The new sendfile is the code that Netflix uses to send their multiple tens of gigabits of data per second. The new implementation features asynchronous I/O, when I/O operations are launched, but not awaited to be complete. An explanation of why such behavior is beneficial compared to old one is going to be too long for a commit message, so we will skip it here. Additional features of new syscall are extra flags, which provide an application more control over data sent. The SF_NOCACHE flag tells kernel that data shouldn't be cached after it was sent. The SF_READAHEAD() macro allows to specify readahead size in pages. The new syscalls is a drop in replacement. No modifications are required to applications. One can take nginx binary for stable/10 and run it successfully on head. Although SF_NODISKIO lost its original sense, as now sendfile doesn't block, and now means something completely different (tm), using the new sendfile the old way is absolutely safe. Celebrates: Netflix global launch! Sponsored by: Nginx, Inc. Sponsored by: Netflix Relnotes: yes	2016-01-08 20:34:57 +00:00
glebius	e25e77f91d	Make it possible for sbappend() to preserve M_NOTREADY on mbufs, just like sbappendstream() does. Although, M_NOTREADY may appear only on SOCK_STREAM sockets, due to sendfile(2) supporting only the latter, there is a corner case of AF_UNIX/SOCK_STREAM socket, that still uses records for the sake of control data, albeit being stream socket. Provide private version of m_clrprotoflags(), which understands PRUS_NOTREADY, similar to m_demote().	2016-01-08 19:03:20 +00:00
glebius	088235535d	Revert r293405: it breaks socket buffer INVARIANTS when sending control data over local sockets.	2016-01-08 17:27:23 +00:00
glebius	a4cad9f2ef	For SOCK_STREAM socket use sbappendstream() instead of sbappend().	2016-01-08 01:16:03 +00:00
kib	eb437d36bf	Convert tty common code to use make_dev_s(). Tty.c was untypical in that it handled the si_drv1 issue consistently and correctly, by always checking for si_drv1 being non-NULL and sleeping if NULL. The removed code also illustrated unneeded complications in drivers which are eliminated by the use of new KPI. Reviewed by: hps, jhb Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:15:09 +00:00
kib	3277da17a1	Provide yet another KPI for cdev creation, make_dev_s(9). Immediate problem fixed by the new KPI is the long-standing race between device creation and assignments to cdev->si_drv1 and cdev->si_drv2, which allows the window where cdevsw methods might be called with si_drv1,2 fields not yet set. Devices typically checked for NULL and returned spurious errors to usermode, and often left some methods unchecked. The new function interface is designed to be extensible, which should allow to add more features to make_dev_s(9) without inventing yet another name for function to create devices, while maintaining KPI and even KBI backward-compatibility. Reviewed by: hps, jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D4746	2016-01-07 20:08:02 +00:00
mjg	cbad85009d	cache: ansify functions and fix some style issues No functional changes.	2016-01-07 02:04:17 +00:00
kib	8c46f725d5	Two fixes for excessive iterations after r292326. Advance the logical block number to the lblkno of the found block plus one, instead of incrementing the block number which was used for lookup. This change skips sparcely populated buffer ranges, similar to r292325, instead of doing useless lookups. Do not restart the bnoreuselist() from the start of the range if buffer lock cannot be obtained without sleep. Only retry lookup and lock for the same queue and same logical block number. Reported by: benno Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days	2016-01-05 14:48:40 +00:00
ian	3d96cedc35	Make the 'env' directive described in config(5) work on all architectures, providing compiled-in static environment data that is used instead of any data passed in from a boot loader. Previously 'env' worked only on i386 and arm xscale systems, because it required the MD startup code to examine the global envmode variable and decide whether to use static_env or an environment obtained from the boot loader, and set the global kern_envp accordingly. Most startup code wasn't doing so. Making things even more complex, some mips startup code uses an alternate scheme that involves calling init_static_kenv() to pass an empty buffer and its size, then uses a series of kern_setenv() calls to populate that buffer. Now all MD startup code calls init_static_kenv(), and that routine provides a single point where envmode is checked and the decision is made whether to use the compiled-in static_kenv or the values provided by the MD code. The routine also continues to serve its original purpose for mips; if a non-zero buffer size is passed the routine installs the empty buffer ready to accept kern_setenv() values. Now if the size is zero, the provided buffer full of existing env data is installed. A NULL pointer can be passed if the boot loader provides no env data; this allows the static env to be installed if envmode is set to do so. Most of the work here is a near-mechanical change to call the init function instead of directly setting kern_envp. A notable exception is in xen/pv.c; that code was originally installing a buffer full of preformatted env data along with its non-zero size (like mips code does), which would have allowed kern_setenv() calls to wipe out the preformatted data. Now it passes a zero for the size so that the buffer of data it installs is treated as non-writeable.	2016-01-02 02:53:48 +00:00
marius	05a298f61f	- (Ab)use udivx for dividing the u_int pc_cpuid when implementing CPU_ISSET(), CPU_SET etc. in sparc64 asm. This approach has the benefit of not clobbering %y, allowing to revert r222827 and partially r222828. - In r222828, CATR() already was changed to use the equivalent of PCPU_GET(cpuid) instead of the MD module ID for KTR_CPU, so belatedly also catch up with the C side of ktr(9). Originally, in r203838 CATR() was moved away from directly reading the module ID or equivalent as that became impractical with other CPU types than USI/II supported. With r222828 in place, per-CPU data generally is set up soon enough, though, that employing PCPU things in ktr(9) also for use during early stages works. - Unfortunately, an exception to the latter is the ktr(9) use in pmap_bootstrap(), which actually is run so early that even checking for bootverbose being set via the loader doesn't work. Consequently, replace the ktr(9) use in pmap_bootstrap() with OF_printf(9) and put it under #ifdef DIAGNOSTIC instead. MFC after: 3 days	2015-12-30 13:49:20 +00:00
jhb	fb5720f7be	Add ptrace(2) reporting for LWP events. Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting thread creation and destruction. Newly created threads will stop to report PL_FLAG_BORN before returning to userland and exiting threads will stop to report PL_FLAG_EXIT before exiting completely. Both of these events are only enabled and reported if PT_LWP_EVENTS is enabled on a process.	2015-12-29 23:25:26 +00:00
jhb	79ec12eeb6	Call kern_thr_exit() instead of duplicating it. This code is missing the racct_subr() call from kern_thr_exit() and would require further code duplication in future changes. Reviewed by: kib MFC after: 1 week	2015-12-29 23:16:20 +00:00
dchagin	dad1819732	Verify that tv_sec value specified in settimeofday() and clock_settime() (CLOCK_REALTIME case) system calls is non negative. This commit hides a kernel panic in atrtc_settime() as the clock_ts_to_ct() does not properly convert negative tv_sec. ps. in my opinion clock_ts_to_ct() should be rewritten to properly handle negative tv_sec values. Differential Revision: https://reviews.freebsd.org/D4714 Reviewed by: kib MFC after: 1 week	2015-12-27 15:37:07 +00:00
kib	cc13042464	Do not substitute interpeter if the brand interpreter path is different from the interpreter path requested by the binary. Before this change, it is impossible to activate non-default interpreter for 32bit image on amd64, when /libexec/ld-elf32.so.1 file exists. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-12-26 15:40:12 +00:00
jtl	f41bf39357	Only allow one PT_INTERP ELF program header. This also fixes a potential memory leak for interp_buf. Differential Revision: https://reviews.freebsd.org/D4692 Reviewed by: kib MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-24 00:58:11 +00:00
ngie	b78f13918e	Fix r292640 vim overzealously removed some trailing `+' and I didn't check the diff MFC after: 1 week X-MFC with: r292640 Pointyhat to: ngie Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:34:43 +00:00
ngie	e1cc5a3ca1	Clean up trailing whitespace; no functional change MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-23 03:29:37 +00:00
ngie	9273c09a18	Fold lim_shared into lim_copy to mute a -Wunused compiler warning from clang when the kernel is compiled without INVARIANTS Differential Revision: https://reviews.freebsd.org/D4683 Reviewed by: kib, jhb MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-12-22 21:07:33 +00:00
kib	bcb048ba0c	If we annoy user with the terminal output due to failed load of interpreter, also show the actual error code instead of some interpretation. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-22 20:12:52 +00:00
jtl	94d8d1452b	Add a safety net to reclaim mbufs when one of the mbuf zones become exhausted. It is possible for a bug in the code (or, theoretically, even unusual network conditions) to exhaust all possible mbufs or mbuf clusters. When this occurs, things can grind to a halt fairly quickly. However, we currently do not call mb_reclaim() unless the entire system is experiencing a low-memory condition. While it is best to try to prevent exhaustion of one of the mbuf zones, it would also be useful to have a mechanism to attempt to recover from these situations by freeing "expendable" mbufs. This patch makes two changes: a) The patch adds a generic API to the UMA zone allocator to set a function that should be called when an allocation fails because the zone limit has been reached. Because of the way this function can be called, it really should do minimal work. b) The patch uses this API to try to free mbufs when an allocation fails from one of the mbuf zones because the zone limit has been reached. The function schedules a callout to run mb_reclaim(). Differential Revision: https://reviews.freebsd.org/D3864 Reviewed by: gnn Comments by: rrs, glebius MFC after: 2 weeks Sponsored by: Juniper Networks	2015-12-20 02:05:33 +00:00
mjg	e70da8e2e9	proc: fix a race which could result in dereference of bad p_pgrp pointer on fork During fork p_starcopy - p_endcopy area of a process is populated with bcopy with only proc lock held. Another forking thread can find such a process and proceed to access p_pgrp included in said area. Fix the problem by moving the field outside. It is being properly assigned later. Reviewed by: kib Diagnosed by: kib Tested by: Fabian Keil <freebsd-listen fabiankeil.de> MFC after: 10 days	2015-12-18 16:33:15 +00:00

1 2 3 4 5 ...

14659 Commits