freebsd-skq

Author	SHA1	Message	Date
jamie	c7d0935d11	Add allow.mount.fdescfs jail flag. PR: 192951 Submitted by: ruben@verweg.com MFC after: 3 days	2015-01-28 21:08:09 +00:00
jhb	c5ac2eb628	Fix a couple of panics when detaching from a cxgbe/cxl interface that was never brought up: - Allow NULL to be passed to sglist_free(). - Don't try to stop an interface that was never fully initialized. Reviewed by: np	2015-01-26 16:26:28 +00:00
adrian	70dc4fad7a	Call WITNESS_WARN() in callout_drain() to check whether any locks are being held before sleeping. This has bitten me (in ath(4)) once before and I'd like to see this not bite anyone else. Differential Revision: D1638 Reviewed by: jhb, hselasky MFC after: 1 week	2015-01-26 04:04:57 +00:00
jhb	71715e274d	Change the default VFS timestamp precision from seconds to microseconds. Discussed on: arch@ MFC after: 2 weeks	2015-01-25 19:56:45 +00:00
jilles	6ad32c1c79	Run make sysent.	2015-01-23 21:08:24 +00:00
jilles	67db24d0f2	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
danfe	d20a416d73	Fix usage example in kvprintf(9) and its copy in libstand(3): trailing '\n' in bitfield argument is wrong, as it will be treated as bit 10, causing any code printing >=10 bits with bit 10 on as having a trailing comma. Newline (intended one) should be part of the format string (already present in the examples). Also fix grammar and kill EOL whitespace in comment while here. PR: 195005 Approved by: bdrewery	2015-01-23 07:30:57 +00:00
hselasky	c0aba3b50d	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438	2015-01-22 11:12:42 +00:00
mjg	9bc86796d3	filedesc: avoid spurious copying of capabilities in fget_unlocked We obtain a stable copy and store it in local 'fde' variable. Storing another copy (based on aforementioned variable) does not serve any purpose. No functional changes.	2015-01-21 18:32:53 +00:00
mjg	e15a87cc6a	filedesc: return 0 from badfo_close The only potential in-tree consumer (_fdrop) special-cased it and returns 0 0 on its own instead of calling badfo_close. Remove the special case since it is not needed and very unlikely to encounter anyway. No objections from: kib	2015-01-21 18:05:42 +00:00
mjg	4b90cc79ee	filedesc: fix whitespace nits in fget and fget_read No functional changes.	2015-01-21 18:02:28 +00:00
kib	ef36961ce1	Do not assert that the new pipepair mutex is not initialized. The backing memory contains garbage and might trigger the assertion. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-21 16:32:54 +00:00
mjg	03fe27a773	filedesc: plug a test for impossible condition in _fget	2015-01-21 01:06:14 +00:00
neel	1fe4c6403b	Update the vdso timehands only via tc_windup(). Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter hardware was changed via 'sysctl kern.timecounter.hardware'. This happened because the vdso timehands update was missing the special treatment in tc_windup() when changing timecounters. Reviewed by: kib	2015-01-20 03:54:30 +00:00
kib	f748dc7ade	Stop enforcing additional reference on all cdevs, which was introduced in r277199. Acquire the neccessary reference in delist_dev_locked() and inform destroy_devl() about it using CDP_UNREF_DTR flag. Fix some style nits, add asserts. Discussed with: hselasky Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-19 17:36:52 +00:00
kib	aa0ac99391	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:13:11 +00:00
kib	53832db395	Make SIGSTOP working for sleeps done while waiting for fifo readers or writers in open(2), when the fifo is located on an NFS mount. Reported by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:03:26 +00:00
kib	f4ea6035eb	For sigaction(2), ignore possible garbage in sa_flags for sa_handler == SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct sigaction for such cases, and being too demanding in the case of default handler does not catch anything. Reported and tested by: Alex Tutubalin <lexa@lexa.ru> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-16 07:06:58 +00:00
hselasky	a9eed96a6b	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste	2015-01-15 15:32:30 +00:00
rwatson	08b6ceecbb	In order to support ongoing work to implement variable-size mbufs, and more generally make it easier to extend 'struct mbuf in the future', make a number of changes to the data structure: - As we anticipate embedding mbufs headers within variable-size regions of memory in the future, change the definitions of byte arrays embedded in mbufs to be of size [0] rather than [MLEN] and [MHLEN]. In fact, the cxgbe driver already uses 'struct mbuf' on the front of other storage sizes, but we would like the global mbuf allocator do be able to do this as well. - Fold 'struct m_hdr' into 'struct mbuf' itself, eliminating a set of macros that aliased 'mh_foo' field names to 'm_foo' names such as 'm_next'. These present a particular problem as we would like to add new mbuf-header fields -- e.g., 'm_size' -- that, if similarly named via macros, would introduce collisions with many other variable names in the kernel. - Rename 'struct m_ext' to 'struct struct_m_ext' so that we can add compile-time assertions without bumping into the still-extant 'm_ext' macro. - Remove the MSIZE compile-time assertion for 'struct mbuf', but add new assertions for alignment of embedded data arrays (64-bit alignment even on 32-bit platforms), and for the sizes the mbuf header, packet header, and m_ext structure. - Document that these assertions exist in comments in mbuf.h. This change is not intended to cause (non-trivial) behavioural differences, but is a precursor to further mbuf-allocator work. Differential Revision: https://reviews.freebsd.org/D1483 Reviewed by: bz, gnn, np, glebius ("go ahead, I trust you") Sponsored by: EMC / Isilon Storage Division	2015-01-14 23:44:00 +00:00
hselasky	b04cbf0c36	Avoid race with "dev_rel()" when using the recently added "delist_dev()" function. Make sure the character device structure doesn't go away until the end of the "destroy_dev()" function due to concurrently running cleanup code inside "devfs_populate()". MFC after: 1 week Reported by: dchagin@	2015-01-14 22:07:13 +00:00
hselasky	99b9110513	Add a kernel function to delist our kernel character devices, so that the device name can be re-used right away in case we are destroying the character devices in the background. MFC after: 4 days Reported by: dchagin@	2015-01-14 14:04:29 +00:00
jamie	15f3ae0c52	Remove the prison flags PR_IP4_DISABLE and PR_IP6_DISABLE, which have been write-only for as long as they've existed.	2015-01-14 04:50:28 +00:00
jamie	20db074137	Don't set prison's pr_ip4s or pr_ip6s to -1. PR: 196474 MFC after: 3 days	2015-01-14 03:52:41 +00:00
kib	79db3369f9	Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmem does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 08:58:07 +00:00
ian	f0876ef9ac	Fix an off by one in ppsratecheck(). If you asked for N=1 you'd get one, but for any N>1 you'd get N-1 packets/events per second.	2015-01-11 20:48:29 +00:00
rwatson	ac02dc452c	Garbage collect m_copymdata(), an mbuf utility routine introduced in FreeBSD 7 that has not been used since. It contains a number of unresolved bugs including an inverted bcopy() and incorrect handling of read-only mbufs using internal storage. Removing this unused code is substantially essier than fixing it in order to update it to the coming mbuf world order -- but it can always be restored from revision history if it turns out to prove useful for future work. Pointed out by: jmallett Sponsored by: EMC / Isilon Storage Division	2015-01-10 10:41:23 +00:00
dchagin	990c8c285b	Allow clock_getcpuclockid() on the CPU-time clock for zombie process. Posix does not prohibit this. Differential Revision: https://reviews.freebsd.org/D1470 Reviewed by: kib MFC after: 1 week	2015-01-10 07:22:38 +00:00
delphij	27945e456a	Improve style and fix a possible use-after-free case introduced in r268384 by reinitializing the 'freestate' pointer after freeing the memory. Obtained from: HardenedBSD (71fab80c5dd3034b71a29a61064625018671bbeb) PR: 194525 Submitted by: Oliver Pinter <oliver.pinter@hardenedbsd.org> MFC after: 2 weeks	2015-01-10 06:48:35 +00:00
rwatson	48e613fbd8	Remove a 'This is dumb' comment that has been incorrect for at least a decade: m_pulldown() is willing to consider ordinary mbufs writable. Retain another, related, and also outdated comment, but with a caveat that it is partially stale. Do not, for now, address the problem that it raises (that only EXT_CLUSTER external storage is considered writable, regardless of the results of M_WRITABLE() on the mbuf). MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2015-01-09 12:08:51 +00:00
jhb	69eebc007b	Change the default method for device_quiesce() to return 0 instead of EOPNOTSUPP. The current behavior can mask real quiesce errors since devclass_quiesce_driver() stops iterating over drivers as soon as it gets an error (incluiding EOPNOTSUPP), but the caller it returns the error to explicitly ignores EOPNOTSUPP. Reviewed by: imp	2015-01-08 21:46:28 +00:00
jhb	8189659be8	Reject attempts to read the cpuset mask of a negative domain ID.	2015-01-08 19:11:14 +00:00
jhb	06e75f0dba	Create a cpuset mask for each NUMA domain that is available in the kernel via the global cpuset_domain[] array. To export these to userland, add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a specific domain. Add a -d flag to cpuset(1) that can be used to fetch the mask for a given domain. Differential Revision: https://reviews.freebsd.org/D1232 Submitted by: jeff (kernel bits) Reviewed by: adrian, jeff	2015-01-08 15:53:13 +00:00
rwatson	a546fbcd7c	Replace hand-crafted versions of M_SIZE() and M_START() in uipc_mbuf.c with calls to the centralised macros, reducing direct use of MLEN and MHLEN. Differential Revision: https://reviews.freebsd.org/D1444 Reviewed by: bz Sponsored by: EMC / Isilon Storage Division	2015-01-08 11:16:21 +00:00
markj	7e7e145818	Factor out duplicated code from dumpsys() on each architecture into generic code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly identical and simply redefine a number of constants and helper subroutines; a generic implementation will make it easier to implement features around kernel core dumps. This change does not alter any minidump code and should have no functional impact. PR: 193873 Differential Revision: https://reviews.freebsd.org/D904 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Reviewed by: jhibbits (earlier version) Sponsored by: EMC / Isilon Storage Division	2015-01-07 01:01:39 +00:00
markj	457a8d982b	Use crcopysafe(9) to make a copy of a process' credential struct. crcopy(9) may perform a blocking memory allocation, which is unsafe when holding a mutex. Differential Revision: https://reviews.freebsd.org/D1443 Reviewed by: rwatson MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-01-05 23:07:22 +00:00
jhb	b607899981	Trim trailing whitespace.	2015-01-05 20:50:44 +00:00
jhb	55d0376a65	On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316 Differential Revision: https://reviews.freebsd.org/D1441 No objection from: jkim MFC after: 1 week	2015-01-05 20:44:44 +00:00
rwatson	1c44e71143	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division	2015-01-05 09:58:32 +00:00
gibbs	7646916ff0	Prevent live-lock and access of destroyed data in taskqueue_drain_all(). Phabric: https://reviews.freebsd.org/D1247 Reviewed by: jhb, avg Sponsored by: Spectra Logic Corporation sys/kern_subr_taskqueue.c: Modify taskqueue_drain_all() processing to use a temporary "barrier task", rather than rely on a user task that may be destroyed during taskqueue_drain_all()'s execution. The barrier task is queued behind all previously queued tasks and then has its priority elevated so that future tasks cannot pass it in the queue. Use a similar barrier scheme to drain threads processing current tasks. This requires taskqueue_run_locked() to insert and remove the taskqueue_busy object for the running thread for every task processed. share/man/man9/taskqueue.9: Remove warning about live-lock issues with taskqueue_drain_all() and indicate that it does not wait for tasks queued after it begins processing.	2015-01-04 19:55:44 +00:00
dchagin	6996603344	Regen for r276654 (__getcwd()).	2015-01-04 10:40:23 +00:00
dchagin	e777b160fa	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
hselasky	dd66ca979a	Rework r276532 a bit. Always avoid recursing into the console drivers clients, hence they might not handle it very well. This change allows debugging mutex problems with kernel console drivers when "debug.witness.skipspin=0" is set in the boot environment. MFC after: 1 week	2015-01-03 17:21:19 +00:00
hselasky	c81bb57ba0	The "cnputs_mtx" mutex must be allowed to recurse. Debug prints and/or witness printouts in the console driver clients can cause this mutex to recurse by calls to "printf()" from witness for example. In particular this can happen if "debug.witness.skipspin=0" is set in the boot environment. MFC after: 1 week	2015-01-02 13:10:33 +00:00
mjg	09f7a46ec7	Convert vfs hash lock from a mutex to an rwlock.	2014-12-30 21:40:45 +00:00
imp	e0fda90335	Turns out, this isn't only called from i386...	2014-12-30 02:39:47 +00:00
mjg	1fcfd71f4d	sysctl: don't modify oid_running for static nodes It is necessary to prevent nodes from being destroyed while used, but static ones cannot be destroyed.	2014-12-28 19:24:01 +00:00
rmacklem	ada0161b9f	Fix the comment introduced in r276192 so that it clearly states that the change is needed to avoid a deadlock. Suggested by: kib MFC after: 1 week	2014-12-25 14:44:04 +00:00
rmacklem	dac4c84940	Modify vop_stdadvlock{async}() so that it only locks/unlocks the vnode and does a VOP_GETATTR() for the SEEK_END case. This is safe to do, since lf_advlock{async}() only uses the size argument for the SEEK_END case. The NFSv4 server needs this when vfs.nfsd.enable_locallocks!=0 since locking the vnode results in a LOR that can cause a deadlock for the nfsd threads. Reviewed by: kib MFC after: 1 week	2014-12-24 22:58:08 +00:00
glebius	cca3a2b04d	In sbappend*() family of functions clear M_PROTO flags of incoming mbufs. sbappendstream() already does this in m_demote(). PR: 196174 Sponsored by: Nginx, Inc.	2014-12-22 15:39:24 +00:00
kib	1844398e91	Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps. Requested by: rpaulo Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-21 13:32:07 +00:00
imp	256101c95b	Where appropriate, use the modern terms for the one true time base (UTC) rather than the archaic (GMT) in comments. Except where the comments are making fun of people doing this (and pedants who insist on the new terms).	2014-12-21 05:07:11 +00:00
glebius	50e87fd952	Revert r274494, r274712, r275955 and provide extra comments explaining why there could appear a zero-sized mbufs in socket buffers. A proper fix would be to divorce record socket buffers and stream socket buffers, and divorce pru_send that accepts normal data from pru_send that accepts control data.	2014-12-20 22:12:04 +00:00
glebius	b008919e36	Add to sbappendstream_locked() a check against NULL mbuf, like it is done in sbappend_locked() and sbappendrecord_locked(). This is a quick fix to the panic introduced by r274712. A proper solution should be to make sosend_generic() avoid calling pru_send() with NULL mbuf for the protocols that do not understand control messages. Those protocols that understand control messages, should be able to receive NULL mbuf, if control is non-NULL.	2014-12-20 14:19:46 +00:00
kib	77c9d3f4e8	The VOP_LOOKUP() implementations for CREATE op do not put the name into namecache, to avoid cache trashing when doing large operations. E.g., tar archive extraction is not usually followed by access to many of the files created. Right now, each VOP_LOOKUP() implementation explicitely knowns about this quirk and tests for both MAKEENTRY flag presence and op != CREATE to make the call to cache_enter(). Centralize the handling of the quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP. VFS now sets NOCACHE flag for CREATE namei() calls. Note that the change in semantic is backward-compatible and could be merged to the stable branch, and is compatible with non-changed third-party filesystems which correctly handle MAKEENTRY. Suggested by: Chris Torek <torek@pi-coral.com> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-18 10:01:12 +00:00
gleb	5c99f46b3b	Adjust printf format specifiers for dev_t and ino_t in kernel. ino_t and dev_t are about to become uint64_t. Reviewed by: kib, mckusick	2014-12-17 07:27:19 +00:00
kib	57bd8c99cc	Add missed break. CID: 1258587 Sponsored by: The FreeBSD Foundation MFC after: 20 days	2014-12-16 09:49:07 +00:00
kib	26b9a952c5	Add missed break. CID: 1258586 Sponsored by: The FreeBSD Foundation MFC after: 4 days	2014-12-16 09:48:23 +00:00
jhb	93ef12f586	Check for SS_NBIO in so->so_state instead of sb->sb_flags in soreceive_stream(). Differential Revision: https://reviews.freebsd.org/D1299 Reviewed by: bz, gnn MFC after: 1 week	2014-12-15 17:52:08 +00:00
kib	c014fd46ec	Add a facility for non-init process to declare itself the reaper of the orphaned descendants. Base of the API is modelled after the same feature from the DragonFlyBSD. Requested by: bapt Reviewed by: jilles (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-15 12:01:42 +00:00
kib	3e18aeb7c2	Fix gcc build. Sponsored by: The FreeBSD Foundation MFC after: 13 days	2014-12-14 08:43:13 +00:00
dchagin	6f2b57128c	Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9). A _NEW flag passed to _init_flags() to avoid check for double-init. Differential Revision: https://reviews.freebsd.org/D1208 Reviewed by: jhb, wblock MFC after: 1 Month	2014-12-13 21:00:10 +00:00
kib	d41bf48327	Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-13 16:18:29 +00:00
kib	07899121cb	Only sleep interruptible while waiting for suspension end when filesystem specified VFCF_SBDRY flag, i.e. for NFS. There are two issues with the sleeps. First, applications may get unexpected EINTR from the disk i/o syscalls. Second, interruptible sleep allows the stop of the process, and since mount point is referenced while thread sleeps, unmount cannot free mount point structure' memory, blocking unmount indefinitely. Even for NFS, it is probably only reasonable to enable PCATCH for intr mounts, but this information is currently not available at VFS level. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-13 16:07:01 +00:00
kib	ca2d4fd9c1	The vinactive() call in vgonel() may start writes for the dirty pages, creating delayed write buffers belonging to the reclaimed vnode. Put the buffer cleanup code after inactivation. Add asserts that ensure that buffer queues are empty and add BO_DEAD flag for bufobj to check that no buffers are added after the cleanup. BO_DEAD is only used by INVARIANTS-enabled kernels. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-13 16:02:37 +00:00
kib	13be1bdb02	For architectures where time_t is wide enough, in particular, 64bit platforms, avoid overflow after year 2038 in clock_ct_to_ts(). PR: 195868 Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-12 09:37:18 +00:00
kib	c1b643c449	Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount. Since VFS does not/cannot stop writes, sync might run indefinitely, or be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for forced unmounts, since non-responding server does not allow sync to finish. On the other hand, filesystems can and do stop writes using fs-specific facilities, and should already fully flush caches in VFS_UNMOUNT() due to the race. Adjust msdosfs tp sync in unmount for forced call, to accomodate the new behaviour. Note that it is still racy, since writes are not stopped. Discussed with: avg, bjk, mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-09 10:00:47 +00:00
kib	86353785aa	Apply chunk forgotten in r275620. Remove local variable for real. CID: 1257462 Sponsored by: The FreeBSD Foundation	2014-12-09 09:36:28 +00:00
kib	1a8d4344d0	Add functions syncer_suspend() and syncer_resume(), which are supposed to be called before suspension and after resume, correspondingly. The syncer_suspend() ensures that all filesystems dirty data and metadata are saved to the permanent storage, and stops kernel threads which might modify filesystems. The syncer_resume() restores stopped threads. For now, only syncer is stopped. This is needed, because each sync loop causes superblock updates for UFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:48:57 +00:00
kib	0957112e3c	When getnewbuf_reuse_bp() is called to reclaim some (clean) buffer, the vnode owning the buffer is not locked. More, it cannot be locked safely, since getnewbuf_reuse_bp() is called from newbuf(), and some other vnode is already locked, for which reused buffer will be reassigned. As the consequence, reclamation of the owning vnode could go in parallel, in particular, the call to vnode_destroy_vobject(), which deallocates the vm object and zeroes the v_bufobj->bo_object. Note that the pages wired by the buffer are left wired and can be safely freed by the vfs_vmio_release() without the need for the vm object lock. Also, seeing stale pointer to the v_object is safe due to vm object type stability. Check for bo_bufobj != NULL and cache the value in local variable to avoid trying to lock NULL vm object. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:42:34 +00:00
kib	6ad3754659	Do some refactoring and minor cleanups of the thread_single() code in preparation for the global stop commit. Move the code to weed suspended or sleeping threads into the appropriate state, into the helper weed_inhib(). Current code already has deep nesting and hard to follow [1]. Add currently useless helper remain_for_mode(), which returns the count of threads which are allowed to run, according to the single-threading mode. In thread_single_end(), do not save curthread into local variable, it is unused after, except to find curproc. Remove stray empty line. Requested by: avg [1] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:27:43 +00:00
kib	042604e2ce	Thread waiting for the vfork(2)-ed child to exec or exit, must allow for the suspension. Currently, the loop performs uninterruptible cv_wait(9) call, which prevents suspension until child allows further execution of parent. If child is stopped, suspension or single-threading is delayed indefinitely. Create a helper thread_suspend_check_needed() to identify the need for a call to thread_suspend_check(). It is required since call to the thread_suspend_check() cannot be safely done while owning the child (p2) process lock. Only when suspension is needed, drop p2 lock and call thread_suspend_check(). Perform wait for cv with timeout, in case suspend is requested after wait started; I do not see a better way to interrupt the wait. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:18:05 +00:00
kib	efcd39f691	When process is exiting, check for suspension regardless of multithreaded status of the process. The stopped state must be cleared before P_WEXIT is set. A stop signal delivered just before first PROC_LOCK() block in exit1(9) would put the process into pending stop with P_WEXIT set or assertion triggered. Also recheck for the suspension after failed thread_single(9) call, since process lock could be dropped. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:02:02 +00:00
avg	15bfe2d262	remove opensolaris cyclic code, replace with high-precision callouts In the old days callout(9) had 1 tick precision and that was inadequate for some uses, e.g. DTrace profile module, so we had to emulate cyclic API and behavior. Now we can directly use callout(9) in the very few places where cyclic was used. Differential Revision: https://reviews.freebsd.org/D1161 Reviewed by: gnn, jhb, markj MFC after: 2 weeks	2014-12-07 11:21:41 +00:00
imp	46a8e0560d	Const poison in a few places to ensure we don't modify things through the module data pointer.	2014-12-03 22:14:13 +00:00
jhb	1e8b1cd510	Revert device_getenv_int() for now as it duplicates resource_int_value(). We should perhaps implement a device_getenv_() and device_setenv_() API as a convenience wrapper on top of resource__value() and resource_set_().	2014-12-03 15:29:53 +00:00
kib	1941346c9d	Disable recursion for the process spinlock. Tested by: pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-12-01 17:36:10 +00:00
gibbs	36e4267804	Remove trailing whitespace.	2014-11-30 19:32:00 +00:00
glebius	3dd3d6d9ff	Merge from projects/sendfile: Provide pru_ready for AF_LOCAL sockets. Local sockets sendsdata directly to the receive buffer of the peer, thus pru_ready also works on the peer socket. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 13:40:58 +00:00
glebius	9cadf1b974	Merge from projects/sendfile: extend protocols API to support sending not ready data: o Add new flag to pru_send() flags - PRUS_NOTREADY. o Add new protocol method pru_ready(). Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:24:21 +00:00
glebius	25da94eb3e	Merge from projects/sendfile: o Introduce a notion of "not ready" mbufs in socket buffers. These mbufs are now being populated by some I/O in background and are referenced outside. This forces following implications: - An mbuf which is "not ready" can't be taken out of the buffer. - An mbuf that is behind a "not ready" in the queue neither. - If sockbet buffer is flushed, then "not ready" mbufs shouln't be freed. o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc. The sb_ccc stands for ""claimed character count", or "committed character count". And the sb_acc is "available character count". Consumers of socket buffer API shouldn't already access them directly, but use sbused() and sbavail() respectively. o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones with M_BLOCKED. o New field sb_fnrdy points to the first not ready mbuf, to avoid linear search. o New function sbready() is provided to activate certain amount of mbufs in a socket buffer. A special note on SCTP: SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet allow protocol specific sockbufs. Thus, SCTP does some hacks to make itself compatible with FreeBSD: it manages sockbufs on its own, but keeps sb_cc updated to inform the stack of amount of data in them. The new notion of "not ready" data isn't supported by SCTP. Instead, only a mechanical substitute is done: s/sb_cc/sb_ccc/. A proper solution would be to take away struct sockbuf from struct socket and allow protocols to implement their own socket buffers, like SCTP already does. This was discussed with rrs@. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 12:52:33 +00:00
glebius	176ae2299c	- Move sbcheck() declaration under SOCKBUF_DEBUG. - Improve SOCKBUF_DEBUG macros. - Improve sbcheck(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 11:22:39 +00:00
glebius	843bdaf93c	Make sballoc() and sbfree() functions. Ideally, they could be marked as static, but unfortunately Infiniband (ab)uses them. Sponsored by: Nginx, Inc.	2014-11-30 11:02:07 +00:00
imp	d737995628	The current limit of 100k for the linker hints file is getting a bit crowded as we now are at about 70k. Bump the limit to 1MB instead which is still quite a reasonable limit and allows for future growth of this file and possible future expansion to additional data. MFC After: 2 weeks	2014-11-29 17:29:30 +00:00
kib	9127a1f190	Remove lock recursion for the pipe pair mutex, and disable the recursion on mutex initialization. The only places where the recursive acquire is performed are read and write filters, since knlist, which uses the pipe pair mutex as lock, is locked when filter is called. The recursion was added in r93296, and consistent locking for kn_fop->f_event() introduced in r133741. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-11-29 17:18:20 +00:00
kib	1642e3809f	Assert the state of the process lock and sigact mutex in kern_sigprocmask() and reschedule_signals(). Discussed with: rea Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-28 10:20:00 +00:00
hselasky	f4e178e921	Style changes: - Move two IOCTL related defines to the top of the C-file - Add more comments describing the recently added IOCTL small size and small align macros	2014-11-28 09:32:07 +00:00
alfred	1413b742e9	Make igb and ixgbe check tunables at probe time. This allows one to make a kernel module to tune the number of queues before the driver loads. This is needed so that a module at SI_SUB_CPU can set tunables for these drivers to take. Otherwise getenv is called too early by the TUNABLE macros. Reviewed by: smh Phabric: https://reviews.freebsd.org/D1149	2014-11-26 20:19:36 +00:00
kib	11cee2ecf7	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
kib	4501dadd00	Fix SA_SIGINFO \| SA_RESETHAND handling. The sysent' sv_sendsig() method needs pre-reset state of the ps_siginfo to correctly construct signal frame. Move sigdflt() call after the sv_sendsig() invocation in postsig(). Simultaneously extract common code from trapsignal() and postsig() into new helper postsig_done(). Submitted by: rea MFC after: 1 week	2014-11-26 14:09:04 +00:00
jhb	5ffe4e5562	Add a bus_get_domain() wrapper around BUS_GET_DOMAIN(). Use this to add a new per-device '%domain' sysctl node that returns the NUMA domain a device is associated with if it is associated with one. Note that this API is still a WIP and might change before 11.0 actually ships. Differential Revision: https://reviews.freebsd.org/D930 Reviewed by: kib, adrian	2014-11-24 19:55:45 +00:00
jhb	c5e82d754f	Properly initialize the capability rights for vnodes exported to procstat that aren't for file descriptors (cwd, jdir, tracevp, etc.). Submitted by: Mikhail <mp@lenta.ru>	2014-11-24 18:34:11 +00:00
glebius	b4ef8e602d	Merge from projects/sendfile: o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-23 12:01:52 +00:00
mjg	a4324a5514	ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct Change racct_ create and destroy to macros evaluating to nothing without RACCT so that their callers passing ui_racct don't have to be ifdefed.	2014-11-23 08:25:44 +00:00
mjg	e31a493d7e	filedesc: plug a test for impossible condition in fgetvp_rights	2014-11-23 00:12:27 +00:00
kib	0a8ab23540	The size value should be asserted when it is known. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2014-11-22 18:15:02 +00:00
jhb	1671ac9155	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
glebius	79b6f9c70a	Do not allocate zero-length mbuf in sosend_generic(). Found by: pho Sponsored by: Nginx, Inc.	2014-11-19 14:27:38 +00:00
zbb	aee9f2a5a3	Stop using early_putc immediately after configuring console with cninit() Early UART should be released right after system console initialization is completed. Otherwise, after cninit() both early and system console coexist what may lead to various issues (i.a. writing to unmapped early UART address). This cannot be done in cninit_finish() since it can be called late at the end of MI configuration. Obtained from: Semihalf Reviewed by: andrew Sponsored by: The FreeBSD Foundation	2014-11-19 14:23:29 +00:00
imp	e1fec13f7c	opt_global.h is included automatically in the build. No need to explicitly include it in these places. Sponsored by: Netflix	2014-11-18 17:06:56 +00:00
jmg	39fa4746e5	prevent doing filter ops locking for staticly compiled filter ops... This significantly reduces lock contention when adding/removing knotes on busy multi-kq system... Next step is to cache these references per kq.. i.e. kq refs it once and keeps a local ref count so that the same refs don't get accessed by many cpus... only allocate a knote when we might use it... Add a new flag, _FORCEONESHOT.. This allows a thread to force the delivery of another event in a safe manner, say waking up an idle http connection to force it to be reaped... If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's disabled, so won't be delivered anyways.. Tested by: adrian	2014-11-16 01:18:41 +00:00
glebius	d69e7f6f82	- Use NULL to compare a pointer. - Use KASSERT() instead of panic. - Remove useless 'continue', no need to restart cycle here. Sponsored by: Nginx, Inc.	2014-11-14 15:44:19 +00:00
glebius	de0e48fa9a	Merge from projects/sendfile: Use sbcut_locked() instead of manually editing a sockbuf. Sponsored by: Nginx, Inc.	2014-11-14 15:33:40 +00:00
kib	d97b2c5d8d	In vfs_write_suspend_umnt(), if suspension cannot be established, do not forget to restore write ops count when returning the error. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-14 11:31:10 +00:00
glebius	3bc8a689c1	There should not be zero length mbufs in socket buffers. The code comes from r1451, and thus can't be explained. A patch with explicit panic() here survived all tests. Tested by: pho Sponsored by: Nginx, Inc.	2014-11-14 06:02:29 +00:00
jkim	66032933f9	Correct a typo to fix chown(2). It was broken since r274476. Pointy hat to: kib X-MFC-With: r274476	2014-11-13 23:51:13 +00:00
mjg	077a8b14ec	filedesc: fixup fdinit to lock fdp and preapare files conditinally Not all consumers providing fdp to copy from want files. Perhaps these functions should be reorganized to better express the outcome. This fixes up panics after r273895 . Reported by: markj	2014-11-13 21:15:09 +00:00
kib	de34fc931d	Fix assertion, &uc->uc_busy is never zero, the intent is to test the uc_busy value, and not its address [1]. Remove the single use of the macro, write KASSERT() explicitely in the code of umtxq_sleep_pi(). Submitted by: Eric van Gyzen <eric@vangyzen.net> [1] MFC after: 1 week	2014-11-13 18:51:09 +00:00
kib	b4ef709604	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
kib	6cedba80db	Do not try to dereference thread pointer when the value is not a pointer. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 17:44:35 +00:00
kib	e257542e11	Remove fossil. It has been present in 4.4Lite2, but its use was removed for some time. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 17:43:37 +00:00
dchagin	c0a51053a4	Regen for r274462.	2014-11-13 05:28:06 +00:00
dchagin	162012051b	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
kib	ff19294d91	For posix_fallocate(2) and posix_fadvise(2), return ESPIPE when underlying file does not have DFLAG_SEEKABLE set [1]. For posix_fallocate(2), simplify error handling logic. Do return when fp is not yet referenced. Noted by: bde [1] Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-12 17:31:38 +00:00
glebius	834e6d1d30	Merge from projects/sendfile: - Use KASSERT()s instead of panic(). - Use sbavail() instead of sb_cc. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-12 10:17:46 +00:00
glebius	c0b38b545a	In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-12 09:57:15 +00:00
glebius	b8af75c693	Fix build.	2014-11-11 22:08:18 +00:00
glebius	53273c84d0	Remove SF_KQUEUE code. This code was developed at Netflix, but was not ever used. It didn't go into stable/10, neither was documented. It might be useful, but we collectively decided to remove it, rather leave it abandoned and unmaintained. It is removed in one single commit, so restoring it should be easy, if anyone wants to reopen this idea. Sponsored by: Netflix	2014-11-11 20:32:46 +00:00
pjd	cb36b2a5c4	Add missing privilege check when setting the dump device. Before that change it was possible for a regular user to setup the dump device if he had write access to the given device. In theory it is a security issue as user might get access to kernel's memory after provoking kernel crash, but in practise it is not recommended to give regular users direct access to storage devices. Rework the code so that we do privileges check within the set_dumper() function to avoid similar problems in the future. Discussed with: secteam	2014-11-11 04:48:09 +00:00
kib	4c07fb2889	When sleeping waiting for the profiling stop, always set P_STOPPROF before dropping process lock. Clear P_STOPPROF when doing wakeup. Both issues caused thread to hang in stopprofclock() "stopprof" sleep. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-10 14:11:17 +00:00
melifaro	b7d1bcf8b2	Finish r274118#2: commit forgotten uipc_debug.c	2014-11-06 15:17:04 +00:00
bz	b9096df681	After the changes in r274118 make NOIP kernels compile by hiding an otherwise unused variable declaration behind INET6 \|\| INET. MFC after: 27 days X-MFS with: r274118	2014-11-06 12:19:39 +00:00
mjg	7e57127b46	Add sysctl kern.proc.cwd It returns only current working directory of given process which saves a lot of overhead over kern.proc.filedesc if given proc has a lot of open fds. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-06 08:12:34 +00:00
mjg	48a19ff17a	filedesc: avoid taking fdesc_mtx when not necessary in fddrop No functional changes.	2014-11-06 07:44:10 +00:00
mjg	355e7bb005	filedesc: just free old tables without altering the list which is freed anyway No functional changes.	2014-11-06 07:37:31 +00:00
mjg	dd190ce5d4	Extend struct ucred with group table. This saves one malloc + free with typical cases and better utilizes memory. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-05 02:08:37 +00:00
melifaro	c2069a39a4	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
des	95b02b5b83	[SA-14:25] Fix kernel stack disclosure in setlogin(2) / getlogin(2). [SA-14:26] Fix remote command execution in ftp(1). Approved by: so (des)	2014-11-04 23:29:29 +00:00
jhb	abae099c34	Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives.	2014-11-04 16:35:56 +00:00
hselasky	862145edac	Simplify logic a bit. Ensure data buffer is properly aligned, especially for platforms where unaligned access is not allowed. Make it possible to override the small buffer size. A simple continuous read string test using libusb showed a reduction in CPU usage from roughly 10% to less than 1% using a dual-core GHz CPU, when the malloc() operation was skipped for small buffers. MFC after: 2 weeks	2014-11-04 11:29:49 +00:00
dumbbell	5f06d19789	Enable vt(4) by default vt(4) is a new console driver which brings features such as: o Support for Unicode and double-width characters o Integration with the KMS kernel video drivers o Support for UEFI You may need to update your console settings in /etc/rc.conf, most probably the keymap. During boot, /etc/rc.d/syscons will indicate what you need to do. vt(4) still has issues and lacks some features compared to syscons(4). See the wiki for up-to-date information: https://wiki.freebsd.org/Newcons If you want to keep using syscons(4), you can do so by adding the following line to /boot/loader.conf: kern.vty=sc Differential Revision: https://reviews.freebsd.org/D1005 Discussed with: emaste@, nwhitehorn@, ray@ Relnotes: yes	2014-11-04 10:18:03 +00:00
kib	649fe8c57c	Clean up confusing comment. Move it to the place of code which is talked about. Explain where the mentioned trampoline located (usermode), and the fact that attempt to exit last thread is denied in kernel (by delegating the work to usermode). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-03 11:29:08 +00:00
kib	c852dfee5d	When other end of the pipe closed during the write, but some bytes were written, return short write instead of EPIPE. Update comment. Discussed with: bde (long time ago) MFC after: 2 weeks	2014-11-03 10:01:56 +00:00
mjg	82ce21e1bc	Provide an on-stack temporary buffer for small ioctl requests.	2014-11-03 07:46:51 +00:00
mjg	0983cfdba1	filedesc: plus sys/kdb.h include which crept in with r274007	2014-11-03 06:24:43 +00:00
mjg	04a088dde4	filedesc: plug unnecessary fdp NULL checks in fdescfreee and fdcopy Anything reaching these functions has fd table.	2014-11-03 05:12:17 +00:00
mjg	120816c07f	filedesc: create a dedicated zone for struct filedesc0 Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from malloc use 2048 bytes. There is no easy way to shrink the structure <= 1024 an it is likely to grow in the future.	2014-11-03 04:16:04 +00:00
kib	d83157092e	Followup to r273966. Fix the build with ADAPTIVE_LOCKMGRS kernel option. Note that the option is currently not used in any in-tree kernel configs, including LINTs. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 19:51:33 +00:00
mjg	d8d7f263db	filedesc: move freeing old tables to fdescfree They cannot be accessed by anyone and hold count only protects the structure from being freed.	2014-11-02 14:12:03 +00:00
mjg	31183326d5	filedesc: factor out some code out of fdescfree Previously it had a huge self-contained chunk dedicated to dealing with shared tables. No functional changes.	2014-11-02 13:43:04 +00:00
kib	cf11d25e18	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 13:10:31 +00:00
mjg	79f817d7d7	filedesc: tidy up fdcheckstd No functional changes.	2014-11-02 02:32:33 +00:00
mjg	22a53e3b5a	filedesc: lock filedesc lock in fdcloseexec only when needed	2014-11-02 01:13:11 +00:00
mjg	63b330d2cc	Fix up module unload for syscall_module_handler consumers. After r273707 it was registering syscalls as static. This fixes hwpmc module unload. Reported by: markj	2014-11-01 22:36:40 +00:00
dumbbell	035cb01fbb	vt(4): Adjust the cursor position after changing the window size A new terminal_set_cursor() is added: it wraps the existing teken_set_cursor() function. In vtbuf_grow(), the cursor position is adjusted at the end of the function. In vt_change_font(), we call terminal_set_cursor() just after terminal_set_winsize_blank(), while the terminal is mute. This fixes a bug where, after loading a kernel video driver which increases the terminal window size, the cursor remains at its old position, in other words, in the middle of the display content. PR: 194421 MFC after: 1 week	2014-11-01 17:05:15 +00:00
kib	888be1193f	Add type qualifier volatile to the base (userspace) address argument of fuword(9) and suword(9). This makes the functions type-compatible with volatile objects and does not require devolatile force, e.g. in kern_umtx.c. Requested by: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-10-31 17:43:21 +00:00
mjg	5b231323b2	filedesc: drop retval argument from do_dup It was almost always td_retval anyway. For the one case where it is not, preserve the old value across the call.	2014-10-31 10:35:01 +00:00
mjg	6b53d30f11	filedesc: fix missed comments about fdsetugidsafety While here just note that both fdsetugidsafety and fdcheckstd take sleepable locks.	2014-10-31 09:56:00 +00:00
mjg	efbe4d69c8	filedesc: make fdinit return with source filedesc locked and new one sized appropriately Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable. We don't have to call it with the lock held if we are just creating new filedesc. As a side note, strictly speaking processes can have fdtables with fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file descriptor they get will be 0 and the only syscall allowing to choose fd number requires an active file descriptor. Should this ever change, we can add an 'init' (or similar) parameter to fdgrowtable.	2014-10-31 09:25:28 +00:00
mjg	9772964585	filedesc: iterate over fd table only once in fdcopy While here add 'fdused_init' which does not perform unnecessary work. Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold it when appropriate. This function is only used with INVARIANTS. No functional changes intended.	2014-10-31 09:19:46 +00:00
mjg	94f45340d9	filedesc: tidy up fdfree Implement fdefree_last variant and get rid of 'last' parameter. No functional changes.	2014-10-31 09:15:59 +00:00
mjg	02363563c8	filedesc: tidy up fdcopy a little bit Test for file availability by fde_file != NULL instead of fdisused, this is consistent with similar checks later. Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never reached in practice. fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS. No functional changes.	2014-10-31 05:41:27 +00:00
markm	fce6747f55	This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des)	2014-10-30 21:21:53 +00:00
mjg	cda1078a58	filedesc: make sure to force table reload in fget_unlocked when count == 0 This is a fixup to r273843.	2014-10-30 07:21:38 +00:00
mjg	569cf8ac16	filedesc: microoptimize fget_unlocked by retrying obtaining reference count without restarting whole lookup Restart is only needed when fp was closed by current process, which is a much rarer event than ref/deref by some other thread.	2014-10-30 05:21:12 +00:00
mjg	5bb6a8bca1	filedesc: get rid of atomic_load_acq_int from fget_unlocked A read barrier was necessary because fd table pointer and table size were updated separately, opening a window where fget_unlocked could read new size and old pointer. This patch puts both these fields into one dedicated structure, pointer to which is later atomically updated. As such, fget_unlocked only needs data a dependency barrier which is a noop on all supported architectures. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-30 05:10:33 +00:00
jhb	d47eb7d2d4	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
kib	95304fc8a8	Convert kern_umtx.c to use fueword() and casueword(). Also fix some mishandling of suword(9) errors as errno, which resulted in spurious ERESTART. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:30:33 +00:00
kib	ad7bf17db7	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
kib	29a659ef8e	Add fueword(9) and casueword(9) functions. They are like fuword(9) and casuword(9), but do not mix value read and indication of fault. I know (or remember) enough assembly to handle x86 and powerpc. For arm, mips and sparc64, implement fueword() and casueword() as wrappers around fuword() and casuword(), which means that the functions cannot distinguish between -1 and fault. On architectures where fueword() and casueword() are native, implement fuword() and casuword() using fueword() and casuword(), to reduce assembly code duplication. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks (ia64 needs treating)	2014-10-28 15:22:13 +00:00
hselasky	a0b8ff0c54	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
mjg	bf3b8650d6	Simplify sys_getloginclass. Just use current thread credentials as they have the same accuracy as the ones obtained from proc..	2014-10-28 04:59:33 +00:00
mjg	37841a11a2	Change loginclass mutex to an rwlock. While here reduce nesting in loginclass_free. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> X-Additional: JuniorJobs project MFC after: 2 weeks	2014-10-28 04:33:57 +00:00
mjg	8c32132302	Tidy up functions related to uidinfo management. - reference found uidinfo in uilookup - reduce nesting by handling shorter cases first	2014-10-27 20:20:05 +00:00
mjg	26906a3e9c	De-k&r-ify function definitions in kern/kern_resource.c No functional changes.	2014-10-27 20:18:30 +00:00
mjg	a9faac8f4b	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
mjg	b6584b3e11	Fix up an assertion in kern_setgroups, it should compare with ngroups_max + 1 Bug introdued in r273685. Noted by: Tiwei Bie <btw mail.ustc.edu.cn>	2014-10-26 14:25:42 +00:00
mjg	f0db1caf67	Tidy up sys_setgroups and kern_setgroups. - 'groups' initialization to NULL is always ovewrwriten before use, so plug it - get rid of 'goto out' - kern_setgroups's callers already validate ngrp, so only assert the condition - ngrp is an u_int, so 'ngrp < 1' is more readable as 'ngrp == 0' No functional changes.	2014-10-26 06:04:09 +00:00
mjg	db02fb1250	Use a temporary buffer in sys_setgroups for requests with <= XU_NGROUPS groups. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> X-Additional: JuniorJobs project MFC after: 2 weeks	2014-10-26 05:39:42 +00:00
mjg	5b3100bae0	Now that sysctl_root is only called with sysctl lock in shared mode, update its assertion to require that. Update comment missed in r273400: sysctl_xlock/unlock -> sysctl_xlock/xunlock Noted by: jhb	2014-10-26 01:47:55 +00:00
jhb	f014f5242a	Use correct type in __DEVOLATILE().	2014-10-25 20:42:47 +00:00
mav	f21b293af4	Revert somewhat hackish geom_disk optimization, committed as part of r256880, and the following r273143 commit, supposed to workaround introduced issue by quite innocent-looking change. While there is no clear understanding why, but r273143 is accused in data corruption in some environments with high I/O load. I personally don't see any problem in that commit, and possibly it is just a trigger to some other bug somewhere, but better safe then sorry for now. Requested by: scottl@ MFC after: 3 days	2014-10-25 15:16:19 +00:00
mjg	909f8c4c4a	rlimit: plug duplicate assertion counter sanity is already checked by refcount_release.	2014-10-25 05:56:21 +00:00
delphij	8cf66ad851	Fix build.	2014-10-25 00:16:36 +00:00
jhb	5dd26e948d	The current POSIX semaphore implementation stores the _has_waiters flag in a separate word from the _count. This does not permit both items to be updated atomically in a portable manner. As a result, sem_post() must always perform a system call to safely clear _has_waiters. This change removes the _has_waiters field and instead uses the high bit of _count as the _has_waiters flag. A new umtx object type (_usem2) and two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement these semantics. The older operations are still supported under the COMPAT_FREEBSD9/10 options. The POSIX semaphore API in libc has been updated to use the new implementation. Note that the new implementation is not compatible with the previous implementation. However, this only affects static binaries (which cannot be helped by symbol versioning). Binaries using a dynamic libc will continue to work fine. SEM_MAGIC has been bumped so that mismatched binaries will error rather than corrupting a shared semaphore. In addition, a padding field has been added to sem_t so that it remains the same size. Differential Revision: https://reviews.freebsd.org/D961 Reported by: adrian Reviewed by: kib, jilles (earlier version) Sponsored by: Norse	2014-10-24 20:02:44 +00:00
des	3312d35f6b	In all cases except CTLTYPE_STRING, penv is NULL here, so passing it indiscriminately to printf() and freeenv() is incorrect. Add a NULL check before freeenv(); as for printf(), we could use req.newptr instead, but we'd have to select the correct format string based on the type, and that's too much work for an error message, so just remove it.	2014-10-23 22:42:56 +00:00
mjg	d40cc92af6	In selfdfree re-evaulate sf_si after takin the lock. Otherwise we can race with doselwakeup. This is a fixup to r273549 Reviewed by: jhb Reported by: everyone and their dog	2014-10-23 19:06:08 +00:00
delphij	466c8cbf48	Test if 'env' is NULL before doing memset() and strlen(), the caller may pass NULL to freeenv().	2014-10-23 18:23:50 +00:00
mjg	b26a2106d9	Avoid taking the lock in selfdfree when not needed.	2014-10-23 15:35:47 +00:00
cperciva	93829a91a2	Avoid leaking data from the kernel environment: When we convert the initial static environment to a dynamic one, zero the static environment buffer, and zero individual values when kern_unsetenv and freeenv are called. Tested by: kmoore (VM memory dump + grep) Tested by: cperciva (kernel panic dump + grep)	2014-10-22 23:35:32 +00:00
mjg	04223abe34	filedesc assert that table size is at least 3 in fdsetugidsafety Requested by: kib	2014-10-22 08:56:57 +00:00
mjg	71643d972a	Plug unnecessary PRS_NEW check in kern_procctl. pfind does not return processes in such state.	2014-10-22 04:16:09 +00:00
mjg	b43c178ac3	Reduce nesting in vn_access. No functional changes.	2014-10-22 01:53:00 +00:00
mjg	a5dd454a60	Avoid crdup when possible in kern_accessat. While here tidy up a little.	2014-10-22 01:09:07 +00:00
mjg	2ebe66c290	filedesc: cleanup setugidsafety a little Rename it to fdsetugidsafety for consistency with other functions. There is no need to take filedesc lock if not closing any files. The loop has to verify each file and we are guaranteed fdtable has space for at least 20 fds. As such there is no need to check fd_lastfile. While here tidy up is_unsafe.	2014-10-22 00:23:43 +00:00
mjg	4386bf043d	Eliminate unnecessary memory allocation in sys_getgroups and its ibcs2 counterpart.	2014-10-21 23:08:46 +00:00
mjg	421d04dd11	Take the lock shared in linker_search_symbol_name. This helps sysctl kern.proc.stack.	2014-10-21 21:29:20 +00:00
mjg	baf980585a	Mark some more sysctl stuff shared-locked and MPSAFE.	2014-10-21 21:08:45 +00:00
mjg	32838cf097	Make sysctl name2oid shared-locked as well. This is a follow-up to r273401.	2014-10-21 19:45:08 +00:00
mjg	96cc1d223f	Implement shared locking for sysctl.	2014-10-21 19:05:44 +00:00
mjg	0aa65e85f3	Rename sysctl_lock and _unlock to sysctl_xlock and _xunlock.	2014-10-21 19:02:26 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
mjg	a21afe6138	Plug unnecessary binvp NULL initialization and test. Reported by: Coverity CID: 1018889	2014-10-20 22:52:15 +00:00
mjg	f1a57b3826	filedesc: plug 2 write-only variables Reported by: Coverity CID: 1245745, 1245746	2014-10-20 21:57:24 +00:00
markj	7e306363d8	Fix a typo from r189544, which replaced unp_global_rwlock with unp_list_lock and unp_link_rwlock. MFC after: 3 days	2014-10-20 20:21:40 +00:00
mjg	12e0034dd0	Provide vfs suspension support only for filesystems which need it, take two. nullfs and unionfs need to request suspension if underlying filesystem(s) use it. Utilize mnt_kern_flag for this purpose. This is a fixup for 273271. No strong objections from: kib Pointy hat to: mjg MFC after: 2 weeks	2014-10-20 18:00:50 +00:00
marcel	d2387926ba	Fully support constructors for the purpose of code coverage analysis. This involves: 1. Have the loader pass the start and size of the .ctors section to the kernel in 2 new metadata elements. 2. Have the linker backends look for and record the start and size of the .ctors section in dynamically loaded modules. 3. Have the linker backends call the constructors as part of the final work of initializing preloaded or dynamically loaded modules. Note that LLVM appends the priority of the constructors to the name of the .ctors section. Not so when compiling with GCC. The code currently works for GCC and not for LLVM. Submitted by: Dmitry Mikulin <dmitrym@juniper.net> Obtained from: Juniper Networks, Inc.	2014-10-20 17:04:03 +00:00
mjg	65ead9d18a	Provide vfs suspension support only for filesystems which need it. Need is expressed by providing vfs_susp_clean function in vfsops. Differential Revision: D952 Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-19 06:59:33 +00:00
adrian	9343e2e256	Convert a missed u_char cpu -> int cpu. This was caught by a gcc build. Reported by: luigi Sponsored by: Norse Corp, Inc.	2014-10-19 04:38:02 +00:00
adrian	9b44fe556b	Update the ULE scheduler + thread and kinfo structs to use int for cpuid rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.	2014-10-18 19:36:11 +00:00
davide	e88bd26b3f	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
mav	cfb0e0275e	Remove setting BIO_DONE flag for BIOs that have done() method. This fixes use-after-free, caused by geom_disk, completing same BIO twice to save extra allocation, and getting BIO_DONE set after the first. MFC after: 1 week	2014-10-15 18:36:34 +00:00
kib	5f9c41699a	Implement FIODTYPE for master ptys. Requested and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-15 12:38:26 +00:00
mjg	143261aa90	Don't take devmtx unnecessarily in vn_isdisk. MFC after: 1 week	2014-10-15 05:17:36 +00:00
mjg	ece6d4cf1c	filedesc: plug 2 assignments to M_ZERO-ed pointers in falloc_noinstall No functional changes.	2014-10-15 01:16:11 +00:00
marcel	889914101e	Fix nits in previous commit: 1. Remove initializer for badstack_sbuf_size; it gets set unconditionally. 2. Remove meaningless comment. 3. Group witness_count and its sysctl together. 4. Fix spacing in for statements (space after for and within condition). 5. Change all M_NOWAIT usages in witness_initialize() to M_WAITOK; not just those that were newly introduced -- the allocation is assumed to succeed for all allocations. 6. Avoid using uint8_t as the base type in sizeof() expressions; Use the variable name (w_rmatrix) as much as possible. Pointed out by: jhb@ (thanks!)	2014-10-11 16:34:01 +00:00
marcel	358ea67a98	Turn WITNESS_COUNT into a tunable and sysctl. This allows adjusting the value without recompiling the kernel. This is useful when recompiling is not possible as an immediate solution. When we run out of witness objects, witness is completely disabled. Not having an immediate solution can therefore be problematic. Submitted by: Sreekanth Rupavatharam <rupavath@juniper.net> Obtained from: Juniper Networks, Inc.	2014-10-11 02:02:58 +00:00
marcel	005c9e3ebe	Regenerate after r272823: Move the SCTP syscalls to netinet with the rest of the SCTP code. Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:19:35 +00:00
marcel	42d9d5479e	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00
adrian	b0c040ce18	Add a bus method to fetch the VM domain for the given device/bus. * Add a bus_if.m method - get_domain() - returning the VM domain or ENOENT if the device isn't in a VM domain; * Add bus methods to print out the domain of the device if appropriate; * Add code in srat.c to save the PXM -> VM domain mapping that's done and expose a function to translate VM domain -> PXM; * Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute and if so map it to the VM domain; * (.. yes, this works recursively.) * Have the pci bus glue print out the device VM domain if present. Note: this is just the plumbing to start enumerating information - it doesn't at all modify behaviour. Differential Revision: D906 Reviewed by: jhb Sponsored by: Norse Corp	2014-10-09 05:33:25 +00:00
marcel	9baa5db055	Fix draining in ttydev_leave(): 1. ERESTART is not only returned when the revoke count changed. It is also returned when a signal is received. While a change in the revoke count should be ignored, a signal should not. 2. Waiting until the output queue is entirely drained can cause a hang when the underlying device is stuck or broken. Have tty_drain() take care of this by telling it when we're leaving. When leaving, tty_drain() will use a timed wait to address point 2 above and it will check the revoke count to handle point 1 above. The timeout is set to 1 second, which is arbitrary and long enough to expect a change in the output queue. Discussed with: jilles@ Reported by: Yamagi Burmeister <lists@yamagi.org>	2014-10-09 02:30:38 +00:00
marcel	6c9a9abc7c	Apply r269126 to tty_timedwait(): Don't return ERESTART when the device is gone.	2014-10-09 01:59:25 +00:00
jhb	877e018654	Add schedgraph traces for callout handlers. Specifically, a callwheel logs a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph.	2014-10-08 16:22:59 +00:00
jkim	52fd30aa97	Make kern.nswbuf tunable from loader. MFC after: 1 week	2014-10-07 20:13:47 +00:00
mjg	7c659f8738	Convert racct stubs to inline functions. This saves some symbols and function calls for kernel without RACCT. MFC after: 1 week	2014-10-06 02:31:33 +00:00
mjg	98fa5f5d8b	filedesc: fix up breakage introduced in 272505 Include sequence counter supports incoditionally [1]. This fixes reprted build problems with e.g. nvidia driver due to missing opt_capsicum.h. Replace fishy looking sizeof with offsetof. Make fde_seq the last member in order to simplify calculations. Suggested by: kib [1] X-MFC: with 272505	2014-10-05 19:40:29 +00:00
kib	2ad09fbf89	On error, sbuf_bcat() returns -1. Some callers returned this -1 to the upper layers, which interpret it as errno value, which happens to be ERESTART. The result was spurious restarts of the sysctls in loop, e.g. kern.proc.proc, instead of returning ENOMEM to caller. Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers expecting errno. In collaboration with: pho Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2014-10-05 17:35:59 +00:00
mjg	e42745632d	Avoid unnecessary ppeers_lock acquisition in exit1. MFC after: 1 week	2014-10-05 07:21:41 +00:00
mjg	698b287014	Get rid of crshared.	2014-10-05 02:16:53 +00:00
kib	b64b91af74	Slightly reword comment. Move code, which is described by the comment, after it. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-04 18:51:55 +00:00
kib	feb6ca868c	Add kernel option KSTACK_USAGE_PROF to sample the stack depth on interrupts and report the largest value seen as sysctl debug.max_kstack_used. Useful to estimate how close the kernel stack size is to overflow. In collaboration with: Larry Baird <lab@gta.com> Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2014-10-04 18:38:14 +00:00
kib	18bdc94c19	Fixes for i/o during coredumping: - Do not dump into system files. - Do not acquire write reference to the mount point where img.core is written, in the coredump(). The vn_rdwr() calls from ELF imgact request the write ref from vn_rdwr(). Recursive acqusition of the write ref deadlocks with the unmount. - Instead, take the range lock for the whole core file. This prevents parallel dumping from two processes executing the same image, converting the useless interleaved dump into sequential dumping, with second core overwriting the first. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-10-04 18:35:00 +00:00
kib	6be2713240	Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is not locked, but range is. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-10-04 18:28:27 +00:00
ian	e35c0ad650	Make kevent(2) periodic timer events more reliably periodic. The event callout is now scheduled using the C_ABSOLUTE flag, and the absolute time of each event is calculated as the time the previous event was scheduled for plus the interval. This ensures that latency in processing a given event doesn't perturb the arrival time of any subsequent events. Reviewed by: jhb	2014-10-04 15:59:15 +00:00
mjg	c0fe514f04	Plug capability races. fp and appropriate capability lookups were not atomic, which could result in improper capabilities being checked. This could result either in protection bypass or in a spurious ENOTCAPABLE. Make fp + capability check atomic with the help of sequence counters. Reviewed by: kib MFC after: 3 weeks	2014-10-04 08:08:56 +00:00
jhb	08d8637e39	Require p_cansched() for changing a process' protection status via procctl() rather than p_cansee(). Submitted by: rwatson MFC after: 3 days	2014-10-02 21:18:16 +00:00
will	eba83cccb3	In the syncer, drop the sync mutex while patting the watchdog. Some watchdog drivers (like ipmi) need to sleep while patting the watchdog. See sys/dev/ipmi/ipmi.c:ipmi_wd_event(), which calls malloc(M_WAITOK). Submitted by: asomers MFC after: 1 month Sponsored by: Spectra Logic MFSpectraBSD: 637548 on 2012/10/04	2014-10-01 15:32:28 +00:00
np	8780788a34	Test for absence of M_NOFREE before attempting to purge the mbuf's tags. This will leave more state intact should the assertion go off. MFC after: 1 month	2014-09-30 23:16:26 +00:00
mjg	24150776bd	Use bzero instead of explicitly zeroing stuff in do_execve. While strictly speaking this is not correct since some fields are pointers, it makes no difference on all supported archs and we already rely on it doing the right thing in other places. No functional changes.	2014-09-29 23:59:19 +00:00
neel	566859d273	tty_rel_free() can be called more than once for the same tty so make sure that the tty is dequeued from 'tty_list' only the first time. The panic below was seen when a revoke(2) was issued on an nmdm device. In this case there was also a thread that was blocked on a read(2) on the device. The revoke(2) woke up the blocked thread which would typically return an error to userspace. In this case the reader also held the last reference on the file descriptor so fdrop() ended up calling tty_rel_free() via ttydev_close(). tty_rel_free() then tried to dequeue 'tp' again which led to the panic. panic: Bad link elm 0xfffff80042602400 prev->next != elm cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f9c90460 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00f9c90510 vpanic() at vpanic+0x189/frame 0xfffffe00f9c90590 panic() at panic+0x43/frame 0xfffffe00f9c905f0 tty_rel_free() at tty_rel_free+0x29b/frame 0xfffffe00f9c90640 ttydev_close() at ttydev_close+0x1f9/frame 0xfffffe00f9c90690 devfs_close() at devfs_close+0x298/frame 0xfffffe00f9c90720 VOP_CLOSE_APV() at VOP_CLOSE_APV+0x13c/frame 0xfffffe00f9c90770 vn_close() at vn_close+0x194/frame 0xfffffe00f9c90810 vn_closefile() at vn_closefile+0x48/frame 0xfffffe00f9c90890 devfs_close_f() at devfs_close_f+0x2c/frame 0xfffffe00f9c908c0 _fdrop() at _fdrop+0x29/frame 0xfffffe00f9c908e0 sys_read() at sys_read+0x63/frame 0xfffffe00f9c90980 amd64_syscall() at amd64_syscall+0x2b3/frame 0xfffffe00f9c90ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00f9c90ab0 --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b78d8a, rsp = 0x7fffffbfdaf8, rbp = 0x7fffffbfdb30 --- CR: https://reviews.freebsd.org/D851 Reviewed by: glebius, ed Reported by: Leon Dang Sponsored by: Nahanni Systems MFC after: 1 week	2014-09-28 21:12:23 +00:00
glebius	0f9d61b26b	- Remove empty wrappers ether_poll_[de]register_drv(). [1] - Move polling(9) declarations out of ifq.h back to if_var.h they are absolutely unrelated to queues. Submitted by: Mikhail <mp lenta.ru> [1]	2014-09-28 14:05:18 +00:00
mjg	54f38c8738	Make do_dup() static and move relevant macros to kern_descrip.c No functional changes.	2014-09-26 19:48:47 +00:00
jhb	b1e77b05a1	Don't panic if a resource is allocated twice. Instead, print a warning and fail the allocation request. Allocations of "reserved" resources such as PCI BARs already fail the request instead of panic'ing in this case. MFC after: 1 week	2014-09-26 18:37:49 +00:00
kib	d972eee1e7	Fix fcntl(2) compat32 after r270691. The copyin and copyout of the struct flock are done in the sys_fcntl(), which mean that compat32 used direct access to userland pointers. Move code from sys_fcntl() to new wrapper, kern_fcntl_freebsd(), which performs neccessary userland memory accesses, and use it from both native and compat32 fcntl syscalls. Reported by: jhibbits Sponsored by: The FreeBSD Foundation MFC after: 3 days	2014-09-25 21:07:19 +00:00
kib	0529718f1d	In kern_linkat() and kern_renameat(), do not call namei(9) while holding a write reference on the filesystem. Try to get write reference in unblocked way after all vnodes are resolved; if failed, drop all locks and retry after waiting for suspension end. The VFS_UNMOUNT() methods for UFS and tmpfs try to establish suspension on unmount, while covered vnode is locked by VFS, which prevents namei() from stepping over the mount point. The thread doing namei() sleeps on the covered vnode lock, owning the write ref. Reported by: bdrewery Tested by: bdrewery (previous version), pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-09-25 20:42:25 +00:00
jhibbits	6208989a41	Stage one of multipass suspend/resume Summary: Add the beginnings of multipass suspend/resume, by introducing BUS_SUSPEND_CHILD/BUS_RESUME_CHILD, and move the PCI driver to this. Reviewers: jhb Reviewed By: jhb Differential Revision: https://reviews.freebsd.org/D590	2014-09-23 02:56:40 +00:00
jhb	8f082668d0	Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations. Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)	2014-09-22 16:20:47 +00:00
jhb	d08fb7f877	Convert from timeout(9) to callout(9).	2014-09-22 14:27:26 +00:00
hselasky	bdacf9ba4d	Improve transmit sending offload, TSO, algorithm in general. The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week	2014-09-22 08:27:27 +00:00
sbruno	3118b08dd6	svn revisions r269964 and r269963 seemed to have impaired small memory footprint systems(32M/64M) and didn't leave enough free memory to load modules when it was setting up page tables that for sizes that are never used on these smallish boards. Set kmem_zmax to PAGE_SIZE on these smaller systems (< 128M) to keep this from happening. Verified on mips32 h/w. PR: 193465 Submitted by: delphij Reviewed by: adrian	2014-09-22 05:07:22 +00:00
mav	d4e6695660	Reprase r271616 comments. Submitted by: alc MFC after: 1 month	2014-09-17 17:43:32 +00:00
adrian	e4c630d701	Migrate ie->ie_assign_cpu and associated code to use an int for CPU rather than u_char. Migrate post_filter to use an int for a CPU rather than u_char. Change intr_event_bind() to use an int for CPU rather than u_char. It touches the ppc, sparc64, arm and mips machdep code but it should (hah!) be a no-op. Tested: * i386, AMD64 laptops Reviewed by: jhb	2014-09-17 17:33:22 +00:00
adrian	d3fedbed40	Modify cpuset_setithread() to take a CPU ID as an integer, not a char. We're going to end up having > 254 CPUs at some point.	2014-09-16 01:21:47 +00:00
ngie	356c289c25	Validate the mode argument in access, eaccess, and faccessat for optional POSIX compliance and to improve compatibility with Linux and NetBSD The issue was identified with lib/libc/sys/t_access:access_inval from NetBSD Update the manpage accordingly PR: 181155 Reviewed by: jilles (code), jmmv (code), wblock (manpage), wollman (code) MFC after: 4 weeks Phabric: D678 (code), D786 (manpage) Sponsored by: EMC / Isilon Storage Division	2014-09-16 00:56:47 +00:00
mav	023a2a140b	Add comments describing r271604 change. MFC after: 3 days	2014-09-15 11:17:36 +00:00
mav	1cfaa7d62b	Add couple memory barries to serialize tdq_cpu_idle and tdq_load accesses. This change fixes transient performance drops in some of my benchmarks, vanishing as soon as I am trying to collect any stats from the scheduler. It looks like reordered access to those variables sometimes caused loss of IPI_PREEMPT, that delayed thread execution until some later interrupt. MFC after: 3 days	2014-09-14 22:13:19 +00:00
melifaro	0a46d9d7d5	Fix error handling in cpuset_setithread() introduced in r267716. Noted by: kib MFC after: 1 week	2014-09-13 13:46:16 +00:00
jhb	4cd91e9d81	Fix various issues with invalid file operations: - Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(), and invfo_kqfilter() for use by file types that do not support the respective operations. Home-grown versions of invfo_poll() were universally broken (they returned an errno value, invfo_poll() uses poll_no_poll() to return an appropriate event mask). Home-grown ioctl routines also tended to return an incorrect errno (invfo_ioctl returns ENOTTY). - Use the invfo_() functions instead of local versions for unsupported file operations. - Reorder fileops members to match the order in the structure definition to make it easier to spot missing members. - Add several missing methods to linuxfileops used by the OFED shim layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most of these used invfo_(), but a dummy fo_stat() implementation was added.	2014-09-12 21:29:10 +00:00
jhb	94540ef6a5	Tweak pipe_truncate() to more closely match pipe_chown() and pipe_chmod() by checking PIPE_NAMED and using invfo_truncate() for unnamed pipes.	2014-09-12 21:20:36 +00:00
jhb	a17a2d5156	Simplify vntype_to_kinfo() by returning when the desired value is found instead of breaking out of the loop and then immediately checking the loop index so that if it was broken out of the proper value can be returned. While here, use nitems().	2014-09-12 20:56:09 +00:00

... 3 4 5 6 7 ...

14303 Commits