freebsd-nq

Author	SHA1	Message	Date
Hans Petter Selasky	38668c6044	Implement a simple OID number garbage collector. Given the increasing number of dynamically created and destroyed SYSCTLs during runtime it is very likely that the current new OID number limit of 0x7fffffff can be reached. Especially if dynamic OID creation and destruction results from automatic tests. Additional changes: - Optimize the typical use case by decrementing the next automatic OID sequence number instead of incrementing it. This saves searching time when inserting new OIDs into a fresh parent OID node. - Add simple check for duplicate non-automatic OID numbers. MFC after: 1 week	2015-03-25 08:55:34 +00:00
Hans Petter Selasky	502702c644	Make sure tunable sysctls are only fetched once. The existing code can re-register sysctls when destroying sysctl contexts or when moving sysctls from one tree to another.	2015-03-24 17:42:53 +00:00
Gleb Smirnoff	a2d4a7e456	Do not include if_var.h and in6_var.h into kern_jail.c. It is now possible after r280444. Sponsored by: Nginx, Inc.	2015-03-24 16:46:40 +00:00
Hans Petter Selasky	ab91c9a743	Correct string pointer offset for error printout.	2015-03-24 16:37:19 +00:00
Rui Paulo	0da9e11b7e	Disable coredump_devctl because it could lead to leaking paths to jails.	2015-03-24 02:17:17 +00:00
Mateusz Guzik	ea926658ff	filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch Casting fd to an unsigned type simplifies fd range coparison to mere checking if the result is bigger than the table.	2015-03-24 00:10:11 +00:00
Ian Lepore	296f235de0	The sysctls that return process argv and envv return binary data, so clear the SBUF_INCLUDENUL flag. Pointed out by: tijl@	2015-03-22 21:18:44 +00:00
Hans Petter Selasky	2793ea13aa	Fix for out of order device destruction notifications when using the delist_dev() function. In addition to this change: - add a proper description of this function - add a proper witness assert inside this function - switch a nearby line to use the "cdp" pointer instead of cdev2priv() MFC after: 3 days	2015-03-22 13:11:56 +00:00
Mateusz Guzik	f97af9706b	proc: use MTX_NEW flag in proc_init This allows us to get rid of bzero which was added specifically to make mtx_init on p_mtx reliable. This also fixes a potential problem where mtx_init on other mutexes could trip over on unitialized memory and fire an assertion. Reviewed by: kib	2015-03-21 20:25:34 +00:00
Mateusz Guzik	ffb34484ee	cred: add proc_set_cred_init helper proc_set_cred_init can be used to set first credentials of a new process. Update proc_set_cred assertions so that it only expects already used processes. This fixes panics where p_ucred of a new process happens to be non-NULL. Reviewed by: kib	2015-03-21 20:24:54 +00:00
Mateusz Guzik	12cec311e6	fork: assign refed credentials earlier Prior to this change the kernel would take p1's credentials and assign them tempororarily to p2. But p1 could change credentials at that time and in effect give us a use-after-free. No objections from: kib	2015-03-21 20:24:03 +00:00
Alan Cox	3d653db063	Introduce vm_object_color() and use it in mmap(2) to set the color of named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks	2015-03-21 17:56:55 +00:00
Olivier Houchard	d8d2f47629	error is only used if MAC is defined, so make its declaration conditional as well.	2015-03-21 16:16:17 +00:00
Konstantin Belousov	0555fb3523	Somewhat modernize the SysV shm code: - Use real locking, replace Giant with global sx protecting the subsystem. Since the subsystem' lock is no longer dropped during the sleepsk, remove not needed SHMSEG_WANTED segment flag, and revert r278963. - To do proper code simplification possible after the change of the lock, restructure several functions into _locked body and originally-named wrapper which calls into _locked variant. This allows to eliminate the 'goto done2' spread over the code. - Merge shm_find_segment_by_shmid() and shm_find_segment_by_shmidx(). - Consistently change all function prototypes to ANSI C. Reviewed by: mjg (who has earlier version of the similar patch to introduce real locking) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-03-21 15:01:19 +00:00
Mateusz Guzik	5bc0ff888a	coredump: protect corefilename access with a lock Previously format string traversal could happen while the string itself was being modified. Use allproc_lock as coredumping is a rare operation and as such we don't have to create a dedicated lock. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> Reviewed by: kib X-Additional: JuniorJobs project	2015-03-21 04:39:33 +00:00
Ian Lepore	612d9391a4	The minimum sbuf buffer size is 2 bytes (a byte plus a nulterm), assert that. Values smaller than two lead to strange asserts that have nothing to do with the actual problem (in the case of size=0), or to writing beyond the end of the allocated buffer in sbuf_finish() (in the case of size=1).	2015-03-17 21:00:31 +00:00
Ian Lepore	2834924513	In sbuf_new_for_sysctl(), default the buffer size to 64 bytes if the passed-in pointer is NULL and the length is zero.	2015-03-17 20:56:24 +00:00
Gleb Smirnoff	8c4df6296b	Reduce header pollution.	2015-03-17 14:16:50 +00:00
Benno Rice	43348dc2ad	Reset bp->bio_done to unmapped_buf when removing a transient map in biodone. Submitted by: Scott Ferris <scott.ferris@isilon.com> Sponsored by: EMC / Isilon Storage Division Reviewed by: kib	2015-03-16 20:00:09 +00:00
Ian Lepore	f62fbd30cb	Trivial change / forced-commit to document prior change that slipped in without a commit message... Use sbuf_new() + SYSCTL_OUT() instead of wiring the userland buffer and using sbuf_new_for_sysctl(). The preallocated 256 byte buffer is always going to be big enough to hold these results, and this should be more efficient than wiring the old buffer.	2015-03-16 19:29:19 +00:00
Ian Lepore	ff352d8978		2015-03-16 19:25:03 +00:00
Ian Lepore	ba00885515	Use a regular sbuf + SYSCTL_OUT() rather than sbuf_new_for_sysctl() with auto-draining, to avoid a potential copyout fault while holding a lock. Pointed out by: jhb Pointy hat to: ian	2015-03-16 19:18:45 +00:00
Ian Lepore	8d5628fdb8	Update an sbuf assertion to allow for the new SBUF_INCLUDENUL flag. If INCLUDENUL is set and sbuf_finish() has been called, the length has been incremented to count the nulterm byte, and in that case current length is allowed to be equal to buffer size, otherwise it must be less than. Add a predicate macro to test for SBUF_INCLUDENUL, and use it in tests, to be consistant with the style in the rest of this file.	2015-03-16 17:45:41 +00:00
Mateusz Guzik	fbe503d462	proc: get rid of proc lock + unlock pair in proc_reap A comment in the code stated we PROC_LOCK and as a side effect guarantee all writers released process lock. But at that point such lock was already taken while we were removing the process from all lists, so it should be already unreachable.	2015-03-16 01:09:49 +00:00
Mateusz Guzik	daf63fd2f9	cred: add proc_set_cred helper The goal here is to provide one place altering process credentials. This eases debugging and opens up posibilities to do additional work when such an action is performed.	2015-03-16 00:10:03 +00:00
Ian Lepore	e5197e3a08	Add a nulterm byte to the returned sysctl string. PR: 195668	2015-03-15 00:39:18 +00:00
Ian Lepore	657282e062	Include the nulterm byte in the sysctl string. PR: 195668	2015-03-15 00:36:08 +00:00
Ian Lepore	91d9eda200	Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf().	2015-03-14 23:16:12 +00:00
Ian Lepore	acfc962f82	Use SYSCTL_OUT_STR() to return strings. PR: 195668	2015-03-14 21:40:01 +00:00
Ian Lepore	b773372938	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:46:33 +00:00
Ian Lepore	b97fa22cd6	Use sbuf_new_for_sysctl() instead of plain sbuf_new() to ensure sysctl string returned to userland is nulterminated. PR: 195668	2015-03-14 18:42:30 +00:00
Ian Lepore	1eafc07856	Set the SBUF_INCLUDENUL flag in sbuf_new_for_sysctl() so that sysctl strings returned to userland include the nulterm byte. Some uses of sbuf_new_for_sysctl() write binary data rather than strings; clear the SBUF_INCLUDENUL flag after calling sbuf_new_for_sysctl() in those cases. (Note that the sbuf code still automatically adds a nulterm byte in sbuf_finish(), but since it's not included in the length it won't get copied to userland along with the binary data.) Remove explicit adding of a nulterm byte in a couple places now that it gets done automatically by the sbuf drain code. PR: 195668	2015-03-14 17:08:28 +00:00
Ian Lepore	f4d281428f	Add a new flag, SBUF_INCLUDENUL, and new get/set/clear functions for flags. The SBUF_INCLUDENUL flag causes the nulterm byte at the end of the string to be counted in the length of the data. If copying the data using the sbuf_data() and sbuf_len() functions, or if writing it automatically with a drain function, the net effect is that the nulterm byte is copied along with the rest of the data.	2015-03-14 16:02:11 +00:00
Hans Petter Selasky	b7ba031ff7	Factor out mbuf hashing code from LAGG driver so that other network drivers can use it. This avoids some code duplication. Add missing default case to all switch statements while at it. Also move the hashing of the IPv6 flow field to layer 4 because the IPv6 flow field is constant on a per L4 connection basis and not on a per L3 network. Differential Revision: https://reviews.freebsd.org/D1987 Sponsored by: Mellanox Technologies MFC after: 1 month	2015-03-11 16:02:24 +00:00
Ryan Stone	1c229658b9	Fix SR-IOV passthrough devices to allow ppt to attach A late change to the SR-IOV infrastructure broke passthrough of VFs. device_set_devclass() was being used to try to force the ppt driver to attach to the device, but this didn't work because the DF_FIXEDCLASS flag wasn't being set on the device, so the ppt driver probe routine would not match when it returned BUS_NOWILDCARD. Fix this by adding a new device function that both sets the devclass and sets the DF_FIXEDCLASS flag, and use that to force the ppt driver to attach to VFs. Differential Revision: https://reviews.freebsd.org/D2041 Reviewed by: jhb MFC after: 3 weeks	2015-03-10 23:27:13 +00:00
Mark Johnston	aa14e9b7c9	Reimplement support for userland core dump compression using a new interface in kern_gzio.c. The old gzio interface was somewhat inflexible and has not worked properly since r272535: currently, the gzio functions are called with a range lock held on the output vnode, but kern_gzio.c does not pass the IO_RANGELOCKED flag to vn_rdwr() calls, resulting in deadlock when vn_rdwr() attempts to reacquire the range lock. Moreover, the new gzio interface can be used to implement kernel core compression. This change also modifies the kernel configuration options needed to enable userland core dump compression support: gzio is now an option rather than a device, and the COMPRESS_USER_CORES option is removed. Core dump compression is enabled using the kern.compress_user_cores sysctl/tunable. Differential Revision: https://reviews.freebsd.org/D1832 Reviewed by: rpaulo Discussed with: kib	2015-03-09 03:50:53 +00:00
Nathan Whitehorn	5c845fde2e	Make 32-bit PowerPC kernels, like 64-bit PowerPC kernels, position-independent executables. The goal here, not yet accomplished, is to let the e500 kernel run under QEMU by setting KERNBASE to something that fits in low memory and then having the kernel relocate itself at runtime.	2015-03-07 20:14:46 +00:00
Hans Petter Selasky	35ee8a4a59	Add mutex support to the pps_ioctl() API in the kernel. Bump kernel version to reflect structure change. PR: 196897 MFC after: 1 week	2015-03-07 18:23:32 +00:00
Ryan Stone	4d6a976e37	Move libnv into the kernel and hook it into the kernel build Differential Revision: https://reviews.freebsd.org/D1883 Reviewed by: jfv MFC after: 1 month Sponsored by: Sandvine Inc.	2015-03-01 00:34:27 +00:00
Ryan Stone	3d59729556	Correct the use of an unitialized variable in sendfind_getobj() When sendfile_getobj() is called on a DTYPE_SHM file, it never initializes error, which is eventually returned to the caller. Differential Revision: https://reviews.freebsd.org/D1989 Reviewed by: kib Reported by: Brainy Code Scanner, by Maxime Villard.	2015-02-28 21:49:59 +00:00
Ian Lepore	bd96bd15b2	Format the line properly (wrap before column 80).	2015-02-28 17:44:31 +00:00
Ian Lepore	a1a4c1b0d4	Export the new osreldate and osrelease jail parms in jail_get(2).	2015-02-28 17:32:31 +00:00
Konstantin Belousov	13dad10871	The umtx_lock mutex is used by top-half of the kernel, but is currently a spin lock. Apparently, the only reason for this is that umtx_thread_exit() is called under the process spinlock, which put the requirement on the umtx_lock. Note that the witness static order list is wrong for the umtx_lock, umtx_lock is explicitely before any thread lock, so it is also before sleepq locks. Change umtx_lock to be the sleepable mutex. For the reason above, the calls to umtx_thread_exit() are moved from thread_exit() earlier in each caller, when the process spin lock is not yet taken. Discussed with: jhb Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2015-02-28 04:19:02 +00:00
Warner Losh	5837276ce2	Put back Andy's void for gcc happiness. Submitted by: jchandra@	2015-02-27 23:14:08 +00:00
Warner Losh	b250ad3499	Make sched_random() return an unsigned number, and use uint32_t consistently. This also matches the per-cpu pointer declaration anyway. This changes the tweak we give to the load from -32..31 to be 0..31 which seems more inline with the rest of the code (- rnd and the -= 64). It should also provide the randomness we need, and may fix a signedness bug in the old code (it isn't clear that the effect was intentional as opposed to sloppy, and the right shift of a signed value is undefined to boot). This stores sched_balance() behavior when it used random(). Differential Revision: https://reviews.freebsd.org/D1981	2015-02-27 21:15:12 +00:00
Konstantin Belousov	08189ed667	The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties. Sponsored by: EMC / Isilon Storage Division Submitted by: Conrad Meyer MFC after: 1 week	2015-02-27 16:43:50 +00:00
Ian Lepore	b96bd95b85	Allow the kern.osrelease and kern.osreldate sysctl values to be set in a jail's creation parameters. This allows the kernel version to be reliably spoofed within the jail whether examined directly with sysctl or indirectly with the uname -r and -K options. The values can only be set at jail creation time, to eliminate the need for any locking when accessing the values via sysctl. The overridden values are inherited by nested jails (unless the config for the nested jails also overrides the values). There is no sanity or range checking, other than disallowing an empty release string or a zero release date, by design. The system administrator is trusted to set sane values. Setting values that are newer than the actual running kernel will likely cause compatibility problems. Differential Revision: https://reviews.freebsd.org/D1948 Relnotes: yes	2015-02-27 16:28:55 +00:00
Andrew Turner	ccc41f3e66	Fix sched_ule on sparc64, gcc complains sched_random is not a correct prototype. Sponsored by: The FreeBSD Foundation	2015-02-27 15:05:20 +00:00
Andrew Turner	09d0653552	sched_random is only called for SMP, only define it there. Sponsored by: The FreeBSD Foundation	2015-02-27 12:38:24 +00:00
Warner Losh	0567b6cc16	Create sched_rand() and move the LCG code into that. Call this when we need randomness in ULE. This removes random() call from the rebalance interval code. Submitted by: Harrison Grundy Differential Revision: https://reviews.freebsd.org/D1968	2015-02-27 02:56:58 +00:00
Adrian Chadd	75493a82e0	Remove taskqueue_start_threads_pinned(); there's noa generic cpuset version of this. Sponsored by: Norse Corp, Inc.	2015-02-25 21:59:03 +00:00
Konstantin Belousov	84b736b268	When failing to claim ownership of a umtx_pi, restore the umutex owner to its previous, unowned state. This avoids compounding an existing problem of inconsistent ownership. Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com> Obtained from: Dell Inc. PR: 198914 MFC after: 1 week	2015-02-25 16:17:16 +00:00
Konstantin Belousov	cc876d2c5c	When unlocking a contested PI pthread mutex, if the queue of waiters is empty, look up the umtx_pi and disown it if the current thread owns it. This can happen if a signal or timeout removed the last waiter from the queue, but there is still a thread in do_lock_pi() holding a reference on the umtx_pi. The unlocking thread might not own the umtx_pi in this case, but if it does, it must disown it to keep the ownership consistent between the umtx_pi and the umutex. Submitted by: Eric van Gyzen <eric_van_gyzen@dell.com> with advice from: Elliott Rabe and Jim Muchow, also at Dell Inc. Obtained from: Dell Inc. PR: 198914	2015-02-25 16:12:56 +00:00
Konstantin Belousov	dacbc9dbe7	Keep a reference on the coredump vnode for vn_fullpath() call. Do it by moving vn_close() after the point where notification is sent. Reported by: sbruno Tested by: pho, sbruno Sponsored by: The FreeBSD Foundation	2015-02-24 13:07:31 +00:00
Andrey V. Elsukov	e9b70483d1	soreceive_generic() still has similar KASSERT(), therefore instead of remove KASSERT(), change it to check mbuf isn't NULL. Suggested by: kib MFC after: 1 week	2015-02-23 15:24:43 +00:00
Andrey V. Elsukov	f21684bc75	In some cases soreceive_dgram() can return no data, but has control message. This can happen when application is sending packets too big for the path MTU and recvmsg() will return zero (indicating no data) but there will be a cmsghdr with cmsg_type set to IPV6_PATHMTU. Remove KASSERT() which does NULL pointer dereference in such case. Also call m_freem() only when m isn't NULL. PR: 197882 MFC after: 1 week Sponsored by: Yandex LLC	2015-02-23 13:41:35 +00:00
Nathan Whitehorn	c6014c739c	Make kernel ELF image parsing not crash for kernels running at locations other than their link address.	2015-02-21 23:20:05 +00:00
Mark Johnston	7abb0b0922	Don't specify a resid parameter if we're just going to ignore it. Instead, let vn_rdwr() check for short reads. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2015-02-20 20:49:00 +00:00
Mark Johnston	ce47682c6c	Remove unnecessary checks for a return value of NULL from M_WAITOK allocations. MFC after: 3 days	2015-02-19 03:32:48 +00:00
Mark Johnston	250246706f	Free the zlib stream after expanding a compressed CTF section. Note that this memory would only be leaked once, since CTF info for a kld file is cached after the first access. MFC after: 3 days	2015-02-19 03:29:46 +00:00
Konstantin Belousov	1395226703	If malloc() sleeps, Giant is dropped. Recheck for another thread doing our work. Remove unneeded check for failed M_WAITOK allocation. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-02-18 18:12:06 +00:00
Mateusz Guzik	8fbda7f00b	filedesc: obtain a stable copy of credentials in fget_unlocked This was broken in r278930. While here tidy up fget_mmap to use fdp from local var instead of obtaining the same pointer from td.	2015-02-18 13:37:28 +00:00
Mateusz Guzik	b7a39e9e07	filedesc: simplify fget_unlocked & friends Introduce fget_fcntl which performs appropriate checks when needed. This removes a branch from fget_unlocked. Introduce fget_mmap dealing with cap_rights_to_vmprot conversion. This removes a branch from _fget. Modify fget_unlocked to pass sequence counter to interested callers so that they can perform their own checks and make sure the result was otained from stable & current state. Reviewed by: silence on -hackers	2015-02-17 23:54:06 +00:00
Gleb Smirnoff	ee52391ebe	Use anonymous unions and structs to organize shared space in mbuf(9), instead of preprocessor macros. This will make debugger output of 'print *m' exactly match the names we use in code, making life of a kernel hacker way more pleasant. And this also allows to rename struct_m_ext back to m_ext.	2015-02-17 20:52:51 +00:00
Gleb Smirnoff	ec9d83dd9b	Use anonymous unions to add possibility to put mbufs into queue(3) STAILQs and SLISTs using the same structure field as good old m_next and m_nextpkt linkage occupy. New code is encouraged to use queue(3) macros, instead of implementing the wheel. However, better not to have a mixture of old style and queue(3) in one file or subsystem. Reviewed by: rwatson, rrs, rpaulo Differential Revision: D1499	2015-02-17 19:32:11 +00:00
Enji Cooper	c514f051b7	Add the mnt_lockref field to the ddb(4) 'show mount' command MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D1688 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division	2015-02-17 09:31:58 +00:00
Adrian Chadd	bfa102cae1	Implement taskqueue_start_threads_cpuset(). This is a more generic version of taskqueue_start_threads_pinned() which only supports a single cpuid. This originally came from John Baldwin <jhb@> who implemented it as part of a push towards NUMA awareness in drivers. I started implementing something similar for RSS and NUMA, then found he already did it. I'd like to axe taskqueue_start_threads_pinned() so it doesn't become part of a longer-term API. (Read: hps@ wants to MFC things, and if I don't do this soon, he'll MFC what's here. :-) I have a follow-up commit which converts the intel drivers over to using the cpuset version of this function, so we can eventually nuke the the pinned version. Tested: * igb, ixgbe Obtained from: jhbbsd	2015-02-17 02:35:06 +00:00
Konstantin Belousov	45f1ade79b	Reparenting done by debugger attach can leave reaper without direct children. Handle the situation instead asserting that it is impossible. Reported and tested by: emaste Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-02-15 08:44:30 +00:00
Konstantin Belousov	4b685a2862	Return with the process locked, caller expects p still locked after the call. Reported and tested by: bapt Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-02-15 08:43:19 +00:00
Davide Italiano	a76d4388e1	Don't access sockbuf fields directly, use accessor functions instead. It is safe to move the call to socantsendmore_locked() after sbdrop_locked() as long as we hold the sockbuf lock across the two calls. CR: D1805 Reviewed by: adrian, kmacy, julian, rwatson	2015-02-14 20:00:57 +00:00
John Baldwin	bc411bc2d0	Include OBJT_PHYS VM objects in ELF core dumps. In particular this includes the shared page allowing debuggers to use the signal trampoline code to identify signal frames in core dumps. Differential Revision: https://reviews.freebsd.org/D1828 Reviewed by: alc, kib MFC after: 1 week	2015-02-14 17:12:31 +00:00
John Baldwin	1b76e0b732	Add two new counters for vnode life cycle events: - vfs.recycles counts the number of vnodes forcefully recycled to avoid exceeding kern.maxvnodes. - vfs.vnodes_created counts the number of vnodes created by successful calls to getnewvnode(). Differential Revision: https://reviews.freebsd.org/D1671 Reviewed by: kib MFC after: 1 week	2015-02-14 17:02:51 +00:00
Alan Cox	c2d5d3ee0e	Preset the object's color, or alignment, to maximize superpage usage. MFC after: 5 days	2015-02-13 19:58:53 +00:00
Randall Stewart	66525b2d16	This fixes a bug I in-advertantly inserted when I updated the callout code in my last commit. The cc_exec_next is used to track the next when a direct call is being made from callout. It is never used in the in-direct method. When macro-izing I made it so that it would separate out direct/vs/non-direct. This is incorrect and can cause panics as Peter Holm has found for me (Thanks so much Peter for all your help in this). What this change does is restore that behavior but also get rid of the cc_next from the array and instead make it be part of the base callout structure. This way no one else will get confused since we will never use it for non-direct. Reviewed by: Peter Holm and more importantly tested by him ;-) MFC after: 3 days. Sponsored by: Netflix Inc.	2015-02-12 13:31:08 +00:00
Rui Paulo	b5263b26db	Remove check against NULL after M_WAITOK. Submitted by: Oliver Pinter	2015-02-11 19:07:05 +00:00
Rui Paulo	6fbc0f7d98	Restore the data array in coredump(), but use a different style to calculate the length. Requested by: kib	2015-02-11 00:58:15 +00:00
Rui Paulo	624157bb5e	Remove a printf and an strlen() from the coredump code.	2015-02-10 18:35:46 +00:00
Konstantin Belousov	5d6f5b24ca	Mountd iterating over the mount points may race with the parallel unmount, which causes error from nmount(2) call when performing MNT_DELEXPORT over the directory which ceased to be a mount point. The race is legitimate and innocent, but results in the chatty mountd. Silence it by providing an distinguished error code for the situation, and ignoring the error in mountd loop. Based on the patch by: Andreas Longwitz <longwitz@incore.de> Prodded and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-02-10 18:00:32 +00:00
Rui Paulo	eb6368d4f8	Sanitise the coredump file names sent to devd. While there, add a sysctl to turn this feature off as requested by kib@.	2015-02-10 04:34:39 +00:00
Rui Paulo	842ab62b05	Notify devd(8) when a process crashed. This change implements a notification (via devctl) to userland when the kernel produces coredumps after a process has crashed. devd can then run a specific command to produce a human readable crash report. The command is most usually a helper that runs gdb/lldb commands on the file/coredump pair. It's possible to use this functionality for implementing automatic generation of crash reports. devd(8) will be notified of the full path of the binary that crashed and the full path of the coredump file.	2015-02-09 23:13:50 +00:00
Randall Stewart	d2854fa488	This fixes two conditions that can incur when migration is being done in the callout code and harmonizes the macro use.: 1) The callout_active() will lie. Basically if a migration is occuring and the callout is about to expire and the migration has been deferred, the callout_active will no longer return true until after the migration. This confuses and breaks callers that are doing callout_init(&c, 1); such as TCP. 2) The migration code had a bug in it where when migrating, if a two calls to callout_reset came in and they both collided with the callout on the wheel about to run, then the second call to callout_reset would corrupt the list the callout wheel uses putting the callout thread into a endless loop. 3) Per imp, I have fixed all the macro occurance in the code that were for the most part being ignored. Phabricator D1711 and looked at by lstewart and jhb and sbruno. Reviewed by: kostikbel, imp, adrian, hselasky MFC after: 3 days Sponsored by: Netflix Inc.	2015-02-09 19:19:44 +00:00
Alan Cox	f4c6aea395	Preset the object's color, or alignment, to maximize superpage usage. MFC after: 5 days	2015-02-08 21:00:51 +00:00
John Baldwin	64de80195b	Add a new device control utility for new-bus devices called devctl. This allows the user to request administrative changes to individual devices such as attach or detaching drivers or disabling and re-enabling devices. - Add a new /dev/devctl2 character device which uses ioctls for device requests. The ioctls use a common 'struct devreq' which is somewhat similar to 'struct ifreq'. - The ioctls identify the device to operate on via a string. This string can either by the device's name, or it can be a bus-specific address. (For unattached devices, a bus address is the only way to locate a device.) Bus drivers register an eventhandler to claim unrecognized device names that the driver recognizes as a valid address. Two buses currently support addresses: ACPI recognizes any device in the ACPI namespace via its full path starting with "\" and the PCI bus driver recognizes an address specification of 'pci[<domain>:]<bus>:<slot>:<func>' (identical to the PCI selector strings supported by pciconf). - To make it easier to cut and paste, change the PnP location string in the PCI bus driver to output a full PCI selector string rather than 'slot=<slot> function=<func>'. - Add a devctl(3) interface in libdevctl which provides a wrapper around the ioctls and is the preferred interface for other userland code. - Add a devctl(8) program which is a simple wrapper around the requests supported by devctl(3). - Add a device_is_suspended() function to check DF_SUSPENDED. - Add a resource_unset_value() function that can be used to remove a hint from the kernel environment. This is used to clear a hint.<driver>.<unit>.disabled hint when re-enabling a boot-time disabled device. Reviewed by: imp (parts) Requested by: imp (changing PCI location string) Relnotes: yes	2015-02-06 16:09:01 +00:00
John Baldwin	94f0eafcd2	Expose the constants for internal new-bus device flags to userland. The flag value is already exposed via dv_flags, just not the meaning of the flags themselves. Use these constants to annotate devices that are disabled or suspended in devinfo output.	2015-02-05 22:42:44 +00:00
John Baldwin	a1324315e3	Set and clear the DF_SUSPENDED flag on the child device being manipulated rather than on the parent.	2015-02-05 22:24:22 +00:00
John-Mark Gurney	9541307fb7	turn GEOM_UNCOMPRESS_DEBUG into a proper option so it can be specified in kernel config files.. put VERBOSE_SYSINIT in it's own option header so the one file, init_main.c, can use it instead of requiring an entire kernel recompile to change one file..	2015-02-05 07:51:38 +00:00
Peter Wemm	0c56c4f1ab	Initialize ticks so that it wraps 10 minutes after boot to increase the chances of finding problems related to wraparound sooner. This comes from P4 change 167856 on 2009/08/26 around when we had problems with the TCP stack with ticks after 24 days of uptime.	2015-02-05 01:43:21 +00:00
Konstantin Belousov	6e3bf5392d	Add ddb command 'show clocksource' to display state of the per-cpu clock events. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-02-04 14:49:47 +00:00
Konstantin Belousov	ff5ba73987	Fix use after free in pipe_dtor(). PIPE_NAMED flag must be tested before pipeclose() is called, since for !PIPE_NAMED case, when peer is already closed, the pipe pair memory is freed. Submitted by: luke.tw@gmail.com PR: 197246 Tested by: pho MFC after: 3 days	2015-02-03 10:29:40 +00:00
Konstantin Belousov	6690381ef1	The dependency chain for priority-inheritance mutexes could be subverted by userspace into cycle. Both umtx_propagate_priority() and umtx_repropagate_priority() would then loop infinitely, owning the spinlock. Check for the cycle using standard Floyd' algorithm before doing the pass in the affected functions. Add simple check for condition of tricking the thread into a wait for itself, which could be easily simulated by usermode without race. Found by: Eric van Gyzen <eric@vangyzen.net> In collaboration with: Eric van Gyzen <eric@vangyzen.net> Tested by: pho MFC after: 1 week	2015-01-31 12:27:40 +00:00
Jamie Gritton	464aad1407	Add allow.mount.fdescfs jail flag. PR: 192951 Submitted by: ruben@verweg.com MFC after: 3 days	2015-01-28 21:08:09 +00:00
John Baldwin	4f621933a5	Fix a couple of panics when detaching from a cxgbe/cxl interface that was never brought up: - Allow NULL to be passed to sglist_free(). - Don't try to stop an interface that was never fully initialized. Reviewed by: np	2015-01-26 16:26:28 +00:00
Adrian Chadd	9500dd9f0b	Call WITNESS_WARN() in callout_drain() to check whether any locks are being held before sleeping. This has bitten me (in ath(4)) once before and I'd like to see this not bite anyone else. Differential Revision: D1638 Reviewed by: jhb, hselasky MFC after: 1 week	2015-01-26 04:04:57 +00:00
John Baldwin	a77a12340a	Change the default VFS timestamp precision from seconds to microseconds. Discussed on: arch@ MFC after: 2 weeks	2015-01-25 19:56:45 +00:00
Jilles Tjoelker	2b35e6a9f2	Run make sysent.	2015-01-23 21:08:24 +00:00
Jilles Tjoelker	2205e0d1bd	Add futimens and utimensat system calls. The core kernel part is patch file utimes.2008.4.diff from pluknet@FreeBSD.org. I updated the code for API changes, added the manual page and added compatibility code for old kernels. There is also audit and Capsicum support. A new UTIME_* constant might allow setting birthtimes in future. Differential Revision: https://reviews.freebsd.org/D1426 Submitted by: pluknet (partially) Reviewed by: delphij, pluknet, rwatson Relnotes: yes	2015-01-23 21:07:08 +00:00
Alexey Dokuchaev	c5f282daad	Fix usage example in kvprintf(9) and its copy in libstand(3): trailing '\n' in bitfield argument is wrong, as it will be treated as bit 10, causing any code printing >=10 bits with bit 10 on as having a trailing comma. Newline (intended one) should be part of the format string (already present in the examples). Also fix grammar and kill EOL whitespace in comment while here. PR: 195005 Approved by: bdrewery	2015-01-23 07:30:57 +00:00
Hans Petter Selasky	a115fb62ed	Revert for r277213: FreeBSD developers need more time to review patches in the surrounding areas like the TCP stack which are using MPSAFE callouts to restore distribution of callouts on multiple CPUs. Bump the __FreeBSD_version instead of reverting it. Suggested by: kmacy, adrian, glebius and kib Differential Revision: https://reviews.freebsd.org/D1438	2015-01-22 11:12:42 +00:00
Mateusz Guzik	5e7cd3ec22	filedesc: avoid spurious copying of capabilities in fget_unlocked We obtain a stable copy and store it in local 'fde' variable. Storing another copy (based on aforementioned variable) does not serve any purpose. No functional changes.	2015-01-21 18:32:53 +00:00
Mateusz Guzik	f9051b0e02	filedesc: return 0 from badfo_close The only potential in-tree consumer (_fdrop) special-cased it and returns 0 0 on its own instead of calling badfo_close. Remove the special case since it is not needed and very unlikely to encounter anyway. No objections from: kib	2015-01-21 18:05:42 +00:00
Mateusz Guzik	5751146497	filedesc: fix whitespace nits in fget and fget_read No functional changes.	2015-01-21 18:02:28 +00:00
Konstantin Belousov	fe63170115	Do not assert that the new pipepair mutex is not initialized. The backing memory contains garbage and might trigger the assertion. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-21 16:32:54 +00:00
Mateusz Guzik	c31c057957	filedesc: plug a test for impossible condition in _fget	2015-01-21 01:06:14 +00:00
Neel Natu	d1b1b60065	Update the vdso timehands only via tc_windup(). Prior to this change CLOCK_MONOTONIC could go backwards when the timecounter hardware was changed via 'sysctl kern.timecounter.hardware'. This happened because the vdso timehands update was missing the special treatment in tc_windup() when changing timecounters. Reviewed by: kib	2015-01-20 03:54:30 +00:00
Konstantin Belousov	3b50dff506	Stop enforcing additional reference on all cdevs, which was introduced in r277199. Acquire the neccessary reference in delist_dev_locked() and inform destroy_devl() about it using CDP_UNREF_DTR flag. Fix some style nits, add asserts. Discussed with: hselasky Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-19 17:36:52 +00:00
Konstantin Belousov	677258f7e7	Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger attachment to the process. Note that the command is not intended to be a security measure, rather it is an obfuscation feature, implemented for parity with other operating systems. Discussed with: jilles, rwatson Man page fixes by: rwatson Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:13:11 +00:00
Konstantin Belousov	e3612a4c1f	Make SIGSTOP working for sleeps done while waiting for fifo readers or writers in open(2), when the fifo is located on an NFS mount. Reported by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:03:26 +00:00
Konstantin Belousov	271ab2406f	For sigaction(2), ignore possible garbage in sa_flags for sa_handler == SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct sigaction for such cases, and being too demanding in the case of default handler does not catch anything. Reported and tested by: Alex Tutubalin <lexa@lexa.ru> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-16 07:06:58 +00:00
Hans Petter Selasky	1a26c3c047	Major callout subsystem cleanup and rewrite: - Close a migration race where callout_reset() failed to set the CALLOUT_ACTIVE flag. - Callout callback functions are now allowed to be protected by spinlocks. - Switching the callout CPU number cannot always be done on a per-callout basis. See the updated timeout(9) manual page for more information. - The timeout(9) manual page has been updated to reflect how all the functions inside the callout API are working. The manual page has been made function oriented to make it easier to deduce how each of the functions making up the callout API are working without having to first read the whole manual page. Group all functions into a handful of sections which should give a quick top-level overview when the different functions should be used. - The CALLOUT_SHAREDLOCK flag and its functionality has been removed to reduce the complexity in the callout code and to avoid problems about atomically stopping callouts via callout_stop(). If someone needs it, it can be re-added. From my quick grep there are no CALLOUT_SHAREDLOCK clients in the kernel. - A new callout API function named "callout_drain_async()" has been added. See the updated timeout(9) manual page for a complete description. - Update the callout clients in the "kern/" folder to use the callout API properly, like cv_timedwait(). Previously there was some custom sleepqueue code in the callout subsystem, which has been removed, because we now allow callouts to be protected by spinlocks. This allows us to tear down the callout like done with regular mutexes, and a "td_slpmutex" has been added to "struct thread" to atomically teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and "SWT_SLEEPQTIMO" states can now be completely removed. Currently they are marked as available and will be cleaned up in a follow up commit. - Bump the __FreeBSD_version to indicate kernel modules need recompilation. - There has been several reports that this patch "seems to squash a serious bug leading to a callout timeout and panic". Kernel build testing: all architectures were built MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D1438 Sponsored by: Mellanox Technologies Reviewed by: jhb, adrian, sbruno and emaste	2015-01-15 15:32:30 +00:00
Robert Watson	3d1a9ed34e	In order to support ongoing work to implement variable-size mbufs, and more generally make it easier to extend 'struct mbuf in the future', make a number of changes to the data structure: - As we anticipate embedding mbufs headers within variable-size regions of memory in the future, change the definitions of byte arrays embedded in mbufs to be of size [0] rather than [MLEN] and [MHLEN]. In fact, the cxgbe driver already uses 'struct mbuf' on the front of other storage sizes, but we would like the global mbuf allocator do be able to do this as well. - Fold 'struct m_hdr' into 'struct mbuf' itself, eliminating a set of macros that aliased 'mh_foo' field names to 'm_foo' names such as 'm_next'. These present a particular problem as we would like to add new mbuf-header fields -- e.g., 'm_size' -- that, if similarly named via macros, would introduce collisions with many other variable names in the kernel. - Rename 'struct m_ext' to 'struct struct_m_ext' so that we can add compile-time assertions without bumping into the still-extant 'm_ext' macro. - Remove the MSIZE compile-time assertion for 'struct mbuf', but add new assertions for alignment of embedded data arrays (64-bit alignment even on 32-bit platforms), and for the sizes the mbuf header, packet header, and m_ext structure. - Document that these assertions exist in comments in mbuf.h. This change is not intended to cause (non-trivial) behavioural differences, but is a precursor to further mbuf-allocator work. Differential Revision: https://reviews.freebsd.org/D1483 Reviewed by: bz, gnn, np, glebius ("go ahead, I trust you") Sponsored by: EMC / Isilon Storage Division	2015-01-14 23:44:00 +00:00
Hans Petter Selasky	d2955419cd	Avoid race with "dev_rel()" when using the recently added "delist_dev()" function. Make sure the character device structure doesn't go away until the end of the "destroy_dev()" function due to concurrently running cleanup code inside "devfs_populate()". MFC after: 1 week Reported by: dchagin@	2015-01-14 22:07:13 +00:00
Hans Petter Selasky	07dbde6777	Add a kernel function to delist our kernel character devices, so that the device name can be re-used right away in case we are destroying the character devices in the background. MFC after: 4 days Reported by: dchagin@	2015-01-14 14:04:29 +00:00
Jamie Gritton	6a3f277901	Remove the prison flags PR_IP4_DISABLE and PR_IP6_DISABLE, which have been write-only for as long as they've existed.	2015-01-14 04:50:28 +00:00
Jamie Gritton	0e5e396ede	Don't set prison's pr_ip4s or pr_ip6s to -1. PR: 196474 MFC after: 3 days	2015-01-14 03:52:41 +00:00
Konstantin Belousov	18cc2ff047	Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmem does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-12 08:58:07 +00:00
Ian Lepore	f8f02928d3	Fix an off by one in ppsratecheck(). If you asked for N=1 you'd get one, but for any N>1 you'd get N-1 packets/events per second.	2015-01-11 20:48:29 +00:00
Robert Watson	a5e9a6d56f	Garbage collect m_copymdata(), an mbuf utility routine introduced in FreeBSD 7 that has not been used since. It contains a number of unresolved bugs including an inverted bcopy() and incorrect handling of read-only mbufs using internal storage. Removing this unused code is substantially essier than fixing it in order to update it to the coming mbuf world order -- but it can always be restored from revision history if it turns out to prove useful for future work. Pointed out by: jmallett Sponsored by: EMC / Isilon Storage Division	2015-01-10 10:41:23 +00:00
Dmitry Chagin	47ce3e5326	Allow clock_getcpuclockid() on the CPU-time clock for zombie process. Posix does not prohibit this. Differential Revision: https://reviews.freebsd.org/D1470 Reviewed by: kib MFC after: 1 week	2015-01-10 07:22:38 +00:00
Xin LI	6e19f0def0	Improve style and fix a possible use-after-free case introduced in r268384 by reinitializing the 'freestate' pointer after freeing the memory. Obtained from: HardenedBSD (71fab80c5dd3034b71a29a61064625018671bbeb) PR: 194525 Submitted by: Oliver Pinter <oliver.pinter@hardenedbsd.org> MFC after: 2 weeks	2015-01-10 06:48:35 +00:00
Robert Watson	3df42f654e	Remove a 'This is dumb' comment that has been incorrect for at least a decade: m_pulldown() is willing to consider ordinary mbufs writable. Retain another, related, and also outdated comment, but with a caveat that it is partially stale. Do not, for now, address the problem that it raises (that only EXT_CLUSTER external storage is considered writable, regardless of the results of M_WRITABLE() on the mbuf). MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2015-01-09 12:08:51 +00:00
John Baldwin	7a635f0d70	Change the default method for device_quiesce() to return 0 instead of EOPNOTSUPP. The current behavior can mask real quiesce errors since devclass_quiesce_driver() stops iterating over drivers as soon as it gets an error (incluiding EOPNOTSUPP), but the caller it returns the error to explicitly ignores EOPNOTSUPP. Reviewed by: imp	2015-01-08 21:46:28 +00:00
John Baldwin	bbf686ed60	Reject attempts to read the cpuset mask of a negative domain ID.	2015-01-08 19:11:14 +00:00
John Baldwin	c0ae66888b	Create a cpuset mask for each NUMA domain that is available in the kernel via the global cpuset_domain[] array. To export these to userland, add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a specific domain. Add a -d flag to cpuset(1) that can be used to fetch the mask for a given domain. Differential Revision: https://reviews.freebsd.org/D1232 Submitted by: jeff (kernel bits) Reviewed by: adrian, jeff	2015-01-08 15:53:13 +00:00
Robert Watson	b66f2a48e6	Replace hand-crafted versions of M_SIZE() and M_START() in uipc_mbuf.c with calls to the centralised macros, reducing direct use of MLEN and MHLEN. Differential Revision: https://reviews.freebsd.org/D1444 Reviewed by: bz Sponsored by: EMC / Isilon Storage Division	2015-01-08 11:16:21 +00:00
Mark Johnston	bdb9ab0dd9	Factor out duplicated code from dumpsys() on each architecture into generic code in sys/kern/kern_dump.c. Most dumpsys() implementations are nearly identical and simply redefine a number of constants and helper subroutines; a generic implementation will make it easier to implement features around kernel core dumps. This change does not alter any minidump code and should have no functional impact. PR: 193873 Differential Revision: https://reviews.freebsd.org/D904 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Reviewed by: jhibbits (earlier version) Sponsored by: EMC / Isilon Storage Division	2015-01-07 01:01:39 +00:00
Mark Johnston	bbd685e3a5	Use crcopysafe(9) to make a copy of a process' credential struct. crcopy(9) may perform a blocking memory allocation, which is unsafe when holding a mutex. Differential Revision: https://reviews.freebsd.org/D1443 Reviewed by: rwatson MFC after: 1 week Sponsored by: EMC / Isilon Storage Division	2015-01-05 23:07:22 +00:00
John Baldwin	531d65e139	Trim trailing whitespace.	2015-01-05 20:50:44 +00:00
John Baldwin	92597e064b	On some Intel CPUs with a P-state but not C-state invariant TSC the TSC may also halt in C2 and not just C3 (it seems that in some cases the BIOS advertises its C3 state as a C2 state in _CST). Just play it safe and disable both C2 and C3 states if a user forces the use of the TSC as the timecounter on such CPUs. PR: 192316 Differential Revision: https://reviews.freebsd.org/D1441 No objection from: jkim MFC after: 1 week	2015-01-05 20:44:44 +00:00
Robert Watson	ed6a66ca6c	To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division	2015-01-05 09:58:32 +00:00
Justin T. Gibbs	5b326a32ce	Prevent live-lock and access of destroyed data in taskqueue_drain_all(). Phabric: https://reviews.freebsd.org/D1247 Reviewed by: jhb, avg Sponsored by: Spectra Logic Corporation sys/kern_subr_taskqueue.c: Modify taskqueue_drain_all() processing to use a temporary "barrier task", rather than rely on a user task that may be destroyed during taskqueue_drain_all()'s execution. The barrier task is queued behind all previously queued tasks and then has its priority elevated so that future tasks cannot pass it in the queue. Use a similar barrier scheme to drain threads processing current tasks. This requires taskqueue_run_locked() to insert and remove the taskqueue_busy object for the running thread for every task processed. share/man/man9/taskqueue.9: Remove warning about live-lock issues with taskqueue_drain_all() and indicate that it does not wait for tasks queued after it begins processing.	2015-01-04 19:55:44 +00:00
Dmitry Chagin	1beb1a8e13	Regen for r276654 (__getcwd()).	2015-01-04 10:40:23 +00:00
Dmitry Chagin	9f7a06f27e	Indeed, instead of hiding the kern___getcwd() bug by bogus cast in r276564, change path type to char * (pathnames are always char ). And remove bogus casts of malloc(). kern___getcwd() internally doesn't actually use or support u_char paths, except to copy them to a normal char * path. These changes are not visible to libc as libc/gen/getcwd.c misdeclares __getcwd() as taking a plain char * path. While here remove _SYS_SYSPROTO_H_ for __getcwd() syscall as we always have sysproto.h. Pointed out by: bde MFC after: 1 week	2015-01-04 10:34:02 +00:00
Hans Petter Selasky	04a8159ddf	Rework r276532 a bit. Always avoid recursing into the console drivers clients, hence they might not handle it very well. This change allows debugging mutex problems with kernel console drivers when "debug.witness.skipspin=0" is set in the boot environment. MFC after: 1 week	2015-01-03 17:21:19 +00:00
Hans Petter Selasky	2029b6c9e6	The "cnputs_mtx" mutex must be allowed to recurse. Debug prints and/or witness printouts in the console driver clients can cause this mutex to recurse by calls to "printf()" from witness for example. In particular this can happen if "debug.witness.skipspin=0" is set in the boot environment. MFC after: 1 week	2015-01-02 13:10:33 +00:00
Mateusz Guzik	af77c1a620	Convert vfs hash lock from a mutex to an rwlock.	2014-12-30 21:40:45 +00:00
Warner Losh	12d7eaa009	Turns out, this isn't only called from i386...	2014-12-30 02:39:47 +00:00
Mateusz Guzik	84267cacf4	sysctl: don't modify oid_running for static nodes It is necessary to prevent nodes from being destroyed while used, but static ones cannot be destroyed.	2014-12-28 19:24:01 +00:00
Rick Macklem	46b34e5adc	Fix the comment introduced in r276192 so that it clearly states that the change is needed to avoid a deadlock. Suggested by: kib MFC after: 1 week	2014-12-25 14:44:04 +00:00
Rick Macklem	65cb225c5e	Modify vop_stdadvlock{async}() so that it only locks/unlocks the vnode and does a VOP_GETATTR() for the SEEK_END case. This is safe to do, since lf_advlock{async}() only uses the size argument for the SEEK_END case. The NFSv4 server needs this when vfs.nfsd.enable_locallocks!=0 since locking the vnode results in a LOR that can cause a deadlock for the nfsd threads. Reviewed by: kib MFC after: 1 week	2014-12-24 22:58:08 +00:00
Gleb Smirnoff	53b680caa2	In sbappend*() family of functions clear M_PROTO flags of incoming mbufs. sbappendstream() already does this in m_demote(). PR: 196174 Sponsored by: Nginx, Inc.	2014-12-22 15:39:24 +00:00
Konstantin Belousov	8ee9765a9d	Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps. Requested by: rpaulo Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-21 13:32:07 +00:00
Warner Losh	61f26cae7d	Where appropriate, use the modern terms for the one true time base (UTC) rather than the archaic (GMT) in comments. Except where the comments are making fun of people doing this (and pedants who insist on the new terms).	2014-12-21 05:07:11 +00:00
Gleb Smirnoff	e834a84026	Revert r274494, r274712, r275955 and provide extra comments explaining why there could appear a zero-sized mbufs in socket buffers. A proper fix would be to divorce record socket buffers and stream socket buffers, and divorce pru_send that accepts normal data from pru_send that accepts control data.	2014-12-20 22:12:04 +00:00
Gleb Smirnoff	b7413232f6	Add to sbappendstream_locked() a check against NULL mbuf, like it is done in sbappend_locked() and sbappendrecord_locked(). This is a quick fix to the panic introduced by r274712. A proper solution should be to make sosend_generic() avoid calling pru_send() with NULL mbuf for the protocols that do not understand control messages. Those protocols that understand control messages, should be able to receive NULL mbuf, if control is non-NULL.	2014-12-20 14:19:46 +00:00
Konstantin Belousov	6c21f6edb8	The VOP_LOOKUP() implementations for CREATE op do not put the name into namecache, to avoid cache trashing when doing large operations. E.g., tar archive extraction is not usually followed by access to many of the files created. Right now, each VOP_LOOKUP() implementation explicitely knowns about this quirk and tests for both MAKEENTRY flag presence and op != CREATE to make the call to cache_enter(). Centralize the handling of the quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP. VFS now sets NOCACHE flag for CREATE namei() calls. Note that the change in semantic is backward-compatible and could be merged to the stable branch, and is compatible with non-changed third-party filesystems which correctly handle MAKEENTRY. Suggested by: Chris Torek <torek@pi-coral.com> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-18 10:01:12 +00:00
Gleb Kurtsou	dde58752db	Adjust printf format specifiers for dev_t and ino_t in kernel. ino_t and dev_t are about to become uint64_t. Reviewed by: kib, mckusick	2014-12-17 07:27:19 +00:00
Konstantin Belousov	5b73811feb	Add missed break. CID: 1258587 Sponsored by: The FreeBSD Foundation MFC after: 20 days	2014-12-16 09:49:07 +00:00
Konstantin Belousov	917dd39084	Add missed break. CID: 1258586 Sponsored by: The FreeBSD Foundation MFC after: 4 days	2014-12-16 09:48:23 +00:00
John Baldwin	5ad25ceb41	Check for SS_NBIO in so->so_state instead of sb->sb_flags in soreceive_stream(). Differential Revision: https://reviews.freebsd.org/D1299 Reviewed by: bz, gnn MFC after: 1 week	2014-12-15 17:52:08 +00:00
Konstantin Belousov	237623b028	Add a facility for non-init process to declare itself the reaper of the orphaned descendants. Base of the API is modelled after the same feature from the DragonFlyBSD. Requested by: bapt Reviewed by: jilles (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-15 12:01:42 +00:00
Konstantin Belousov	83d7eb544c	Fix gcc build. Sponsored by: The FreeBSD Foundation MFC after: 13 days	2014-12-14 08:43:13 +00:00
Dmitry Chagin	fd07ddcf6f	Add _NEW flag to mtx(9), sx(9), rmlock(9) and rwlock(9). A _NEW flag passed to _init_flags() to avoid check for double-init. Differential Revision: https://reviews.freebsd.org/D1208 Reviewed by: jhb, wblock MFC after: 1 Month	2014-12-13 21:00:10 +00:00
Konstantin Belousov	6ddcc23386	Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-13 16:18:29 +00:00
Konstantin Belousov	0061ddb3ed	Only sleep interruptible while waiting for suspension end when filesystem specified VFCF_SBDRY flag, i.e. for NFS. There are two issues with the sleeps. First, applications may get unexpected EINTR from the disk i/o syscalls. Second, interruptible sleep allows the stop of the process, and since mount point is referenced while thread sleeps, unmount cannot free mount point structure' memory, blocking unmount indefinitely. Even for NFS, it is probably only reasonable to enable PCATCH for intr mounts, but this information is currently not available at VFS level. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-13 16:07:01 +00:00
Konstantin Belousov	ea117d1735	The vinactive() call in vgonel() may start writes for the dirty pages, creating delayed write buffers belonging to the reclaimed vnode. Put the buffer cleanup code after inactivation. Add asserts that ensure that buffer queues are empty and add BO_DEAD flag for bufobj to check that no buffers are added after the cleanup. BO_DEAD is only used by INVARIANTS-enabled kernels. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-13 16:02:37 +00:00
Konstantin Belousov	fe21241ee0	For architectures where time_t is wide enough, in particular, 64bit platforms, avoid overflow after year 2038 in clock_ct_to_ts(). PR: 195868 Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-12 09:37:18 +00:00
Konstantin Belousov	b2344ab5ff	Do not call VFS_SYNC() before VFS_UNMOUNT() for forced unmount. Since VFS does not/cannot stop writes, sync might run indefinitely, or be a wrong thing to do at all. E. g. NFS ignores VFS_SYNC() for forced unmounts, since non-responding server does not allow sync to finish. On the other hand, filesystems can and do stop writes using fs-specific facilities, and should already fully flush caches in VFS_UNMOUNT() due to the race. Adjust msdosfs tp sync in unmount for forced call, to accomodate the new behaviour. Note that it is still racy, since writes are not stopped. Discussed with: avg, bjk, mckusick Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-12-09 10:00:47 +00:00
Konstantin Belousov	a77c72f5ae	Apply chunk forgotten in r275620. Remove local variable for real. CID: 1257462 Sponsored by: The FreeBSD Foundation	2014-12-09 09:36:28 +00:00
Konstantin Belousov	a25100c539	Add functions syncer_suspend() and syncer_resume(), which are supposed to be called before suspension and after resume, correspondingly. The syncer_suspend() ensures that all filesystems dirty data and metadata are saved to the permanent storage, and stops kernel threads which might modify filesystems. The syncer_resume() restores stopped threads. For now, only syncer is stopped. This is needed, because each sync loop causes superblock updates for UFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:48:57 +00:00
Konstantin Belousov	904ed548bb	When getnewbuf_reuse_bp() is called to reclaim some (clean) buffer, the vnode owning the buffer is not locked. More, it cannot be locked safely, since getnewbuf_reuse_bp() is called from newbuf(), and some other vnode is already locked, for which reused buffer will be reassigned. As the consequence, reclamation of the owning vnode could go in parallel, in particular, the call to vnode_destroy_vobject(), which deallocates the vm object and zeroes the v_bufobj->bo_object. Note that the pages wired by the buffer are left wired and can be safely freed by the vfs_vmio_release() without the need for the vm object lock. Also, seeing stale pointer to the v_object is safe due to vm object type stability. Check for bo_bufobj != NULL and cache the value in local variable to avoid trying to lock NULL vm object. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:42:34 +00:00
Konstantin Belousov	07a9368a48	Do some refactoring and minor cleanups of the thread_single() code in preparation for the global stop commit. Move the code to weed suspended or sleeping threads into the appropriate state, into the helper weed_inhib(). Current code already has deep nesting and hard to follow [1]. Add currently useless helper remain_for_mode(), which returns the count of threads which are allowed to run, according to the single-threading mode. In thread_single_end(), do not save curthread into local variable, it is unused after, except to find curproc. Remove stray empty line. Requested by: avg [1] Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:27:43 +00:00
Konstantin Belousov	8638fe7bea	Thread waiting for the vfork(2)-ed child to exec or exit, must allow for the suspension. Currently, the loop performs uninterruptible cv_wait(9) call, which prevents suspension until child allows further execution of parent. If child is stopped, suspension or single-threading is delayed indefinitely. Create a helper thread_suspend_check_needed() to identify the need for a call to thread_suspend_check(). It is required since call to the thread_suspend_check() cannot be safely done while owning the child (p2) process lock. Only when suspension is needed, drop p2 lock and call thread_suspend_check(). Perform wait for cv with timeout, in case suspend is requested after wait started; I do not see a better way to interrupt the wait. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:18:05 +00:00
Konstantin Belousov	aba1ca528e	When process is exiting, check for suspension regardless of multithreaded status of the process. The stopped state must be cleared before P_WEXIT is set. A stop signal delivered just before first PROC_LOCK() block in exit1(9) would put the process into pending stop with P_WEXIT set or assertion triggered. Also recheck for the suspension after failed thread_single(9) call, since process lock could be dropped. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:02:02 +00:00
Andriy Gapon	036a8c5dac	remove opensolaris cyclic code, replace with high-precision callouts In the old days callout(9) had 1 tick precision and that was inadequate for some uses, e.g. DTrace profile module, so we had to emulate cyclic API and behavior. Now we can directly use callout(9) in the very few places where cyclic was used. Differential Revision: https://reviews.freebsd.org/D1161 Reviewed by: gnn, jhb, markj MFC after: 2 weeks	2014-12-07 11:21:41 +00:00
Warner Losh	d0b6da086f	Const poison in a few places to ensure we don't modify things through the module data pointer.	2014-12-03 22:14:13 +00:00
John Baldwin	b10c08a52b	Revert device_getenv_int() for now as it duplicates resource_int_value(). We should perhaps implement a device_getenv_() and device_setenv_() API as a convenience wrapper on top of resource__value() and resource_set_().	2014-12-03 15:29:53 +00:00
Konstantin Belousov	6afb32fc67	Disable recursion for the process spinlock. Tested by: pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-12-01 17:36:10 +00:00
Justin T. Gibbs	2c6bf3d90b	Remove trailing whitespace.	2014-11-30 19:32:00 +00:00
Gleb Smirnoff	c80ea19b38	Merge from projects/sendfile: Provide pru_ready for AF_LOCAL sockets. Local sockets sendsdata directly to the receive buffer of the peer, thus pru_ready also works on the peer socket. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 13:40:58 +00:00
Gleb Smirnoff	651e4e6a30	Merge from projects/sendfile: extend protocols API to support sending not ready data: o Add new flag to pru_send() flags - PRUS_NOTREADY. o Add new protocol method pru_ready(). Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:24:21 +00:00
Gleb Smirnoff	0f9d0a73a4	Merge from projects/sendfile: o Introduce a notion of "not ready" mbufs in socket buffers. These mbufs are now being populated by some I/O in background and are referenced outside. This forces following implications: - An mbuf which is "not ready" can't be taken out of the buffer. - An mbuf that is behind a "not ready" in the queue neither. - If sockbet buffer is flushed, then "not ready" mbufs shouln't be freed. o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc. The sb_ccc stands for ""claimed character count", or "committed character count". And the sb_acc is "available character count". Consumers of socket buffer API shouldn't already access them directly, but use sbused() and sbavail() respectively. o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones with M_BLOCKED. o New field sb_fnrdy points to the first not ready mbuf, to avoid linear search. o New function sbready() is provided to activate certain amount of mbufs in a socket buffer. A special note on SCTP: SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet allow protocol specific sockbufs. Thus, SCTP does some hacks to make itself compatible with FreeBSD: it manages sockbufs on its own, but keeps sb_cc updated to inform the stack of amount of data in them. The new notion of "not ready" data isn't supported by SCTP. Instead, only a mechanical substitute is done: s/sb_cc/sb_ccc/. A proper solution would be to take away struct sockbuf from struct socket and allow protocols to implement their own socket buffers, like SCTP already does. This was discussed with rrs@. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 12:52:33 +00:00
Gleb Smirnoff	57f43a45a3	- Move sbcheck() declaration under SOCKBUF_DEBUG. - Improve SOCKBUF_DEBUG macros. - Improve sbcheck(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 11:22:39 +00:00
Gleb Smirnoff	8967b220a3	Make sballoc() and sbfree() functions. Ideally, they could be marked as static, but unfortunately Infiniband (ab)uses them. Sponsored by: Nginx, Inc.	2014-11-30 11:02:07 +00:00
Warner Losh	fac92ae126	The current limit of 100k for the linker hints file is getting a bit crowded as we now are at about 70k. Bump the limit to 1MB instead which is still quite a reasonable limit and allows for future growth of this file and possible future expansion to additional data. MFC After: 2 weeks	2014-11-29 17:29:30 +00:00
Konstantin Belousov	6762091ea4	Remove lock recursion for the pipe pair mutex, and disable the recursion on mutex initialization. The only places where the recursive acquire is performed are read and write filters, since knlist, which uses the pipe pair mutex as lock, is locked when filter is called. The recursion was added in r93296, and consistent locking for kn_fop->f_event() introduced in r133741. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-11-29 17:18:20 +00:00
Konstantin Belousov	70778bba03	Assert the state of the process lock and sigact mutex in kern_sigprocmask() and reschedule_signals(). Discussed with: rea Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-28 10:20:00 +00:00
Hans Petter Selasky	50ae6690fc	Style changes: - Move two IOCTL related defines to the top of the C-file - Add more comments describing the recently added IOCTL small size and small align macros	2014-11-28 09:32:07 +00:00
Alfred Perlstein	56c14bca7e	Make igb and ixgbe check tunables at probe time. This allows one to make a kernel module to tune the number of queues before the driver loads. This is needed so that a module at SI_SUB_CPU can set tunables for these drivers to take. Otherwise getenv is called too early by the TUNABLE macros. Reviewed by: smh Phabric: https://reviews.freebsd.org/D1149	2014-11-26 20:19:36 +00:00
Konstantin Belousov	5c7bebf961	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
Konstantin Belousov	e442f29f08	Fix SA_SIGINFO \| SA_RESETHAND handling. The sysent' sv_sendsig() method needs pre-reset state of the ps_siginfo to correctly construct signal frame. Move sigdflt() call after the sv_sendsig() invocation in postsig(). Simultaneously extract common code from trapsignal() and postsig() into new helper postsig_done(). Submitted by: rea MFC after: 1 week	2014-11-26 14:09:04 +00:00
John Baldwin	a2d751936b	Add a bus_get_domain() wrapper around BUS_GET_DOMAIN(). Use this to add a new per-device '%domain' sysctl node that returns the NUMA domain a device is associated with if it is associated with one. Note that this API is still a WIP and might change before 11.0 actually ships. Differential Revision: https://reviews.freebsd.org/D930 Reviewed by: kib, adrian	2014-11-24 19:55:45 +00:00
John Baldwin	20abb66ede	Properly initialize the capability rights for vnodes exported to procstat that aren't for file descriptors (cwd, jdir, tracevp, etc.). Submitted by: Mikhail <mp@lenta.ru>	2014-11-24 18:34:11 +00:00
Gleb Smirnoff	90effb2341	Merge from projects/sendfile: o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-23 12:01:52 +00:00
Mateusz Guzik	dff9862c0e	ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct Change racct_ create and destroy to macros evaluating to nothing without RACCT so that their callers passing ui_racct don't have to be ifdefed.	2014-11-23 08:25:44 +00:00
Mateusz Guzik	0c0d16e8ac	filedesc: plug a test for impossible condition in fgetvp_rights	2014-11-23 00:12:27 +00:00
Konstantin Belousov	64779280c9	The size value should be asserted when it is known. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2014-11-22 18:15:02 +00:00
John Baldwin	180e57e5c7	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
Gleb Smirnoff	67af272bcf	Do not allocate zero-length mbuf in sosend_generic(). Found by: pho Sponsored by: Nginx, Inc.	2014-11-19 14:27:38 +00:00
Zbigniew Bodek	dc61566f95	Stop using early_putc immediately after configuring console with cninit() Early UART should be released right after system console initialization is completed. Otherwise, after cninit() both early and system console coexist what may lead to various issues (i.a. writing to unmapped early UART address). This cannot be done in cninit_finish() since it can be called late at the end of MI configuration. Obtained from: Semihalf Reviewed by: andrew Sponsored by: The FreeBSD Foundation	2014-11-19 14:23:29 +00:00
Warner Losh	40e6bdaf1e	opt_global.h is included automatically in the build. No need to explicitly include it in these places. Sponsored by: Netflix	2014-11-18 17:06:56 +00:00
John-Mark Gurney	2c30bc1fcf	prevent doing filter ops locking for staticly compiled filter ops... This significantly reduces lock contention when adding/removing knotes on busy multi-kq system... Next step is to cache these references per kq.. i.e. kq refs it once and keeps a local ref count so that the same refs don't get accessed by many cpus... only allocate a knote when we might use it... Add a new flag, _FORCEONESHOT.. This allows a thread to force the delivery of another event in a safe manner, say waking up an idle http connection to force it to be reaped... If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's disabled, so won't be delivered anyways.. Tested by: adrian	2014-11-16 01:18:41 +00:00
Gleb Smirnoff	8146bcfea1	- Use NULL to compare a pointer. - Use KASSERT() instead of panic. - Remove useless 'continue', no need to restart cycle here. Sponsored by: Nginx, Inc.	2014-11-14 15:44:19 +00:00
Gleb Smirnoff	6bf6b25e88	Merge from projects/sendfile: Use sbcut_locked() instead of manually editing a sockbuf. Sponsored by: Nginx, Inc.	2014-11-14 15:33:40 +00:00
Konstantin Belousov	5fab60a071	In vfs_write_suspend_umnt(), if suspension cannot be established, do not forget to restore write ops count when returning the error. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-14 11:31:10 +00:00
Gleb Smirnoff	f274e25659	There should not be zero length mbufs in socket buffers. The code comes from r1451, and thus can't be explained. A patch with explicit panic() here survived all tests. Tested by: pho Sponsored by: Nginx, Inc.	2014-11-14 06:02:29 +00:00
Jung-uk Kim	db1ec81edd	Correct a typo to fix chown(2). It was broken since r274476. Pointy hat to: kib X-MFC-With: r274476	2014-11-13 23:51:13 +00:00
Mateusz Guzik	eb48fbd963	filedesc: fixup fdinit to lock fdp and preapare files conditinally Not all consumers providing fdp to copy from want files. Perhaps these functions should be reorganized to better express the outcome. This fixes up panics after r273895 . Reported by: markj	2014-11-13 21:15:09 +00:00
Konstantin Belousov	416be7a1c6	Fix assertion, &uc->uc_busy is never zero, the intent is to test the uc_busy value, and not its address [1]. Remove the single use of the macro, write KASSERT() explicitely in the code of umtxq_sleep_pi(). Submitted by: Eric van Gyzen <eric@vangyzen.net> [1] MFC after: 1 week	2014-11-13 18:51:09 +00:00
Konstantin Belousov	6e646651d3	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
Konstantin Belousov	e64b4fa858	Do not try to dereference thread pointer when the value is not a pointer. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 17:44:35 +00:00
Konstantin Belousov	f2c1a52afb	Remove fossil. It has been present in 4.4Lite2, but its use was removed for some time. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 17:43:37 +00:00
Dmitry Chagin	c28d9d0f9f	Regen for r274462.	2014-11-13 05:28:06 +00:00
Dmitry Chagin	186d9c3473	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
Konstantin Belousov	389a25c716	For posix_fallocate(2) and posix_fadvise(2), return ESPIPE when underlying file does not have DFLAG_SEEKABLE set [1]. For posix_fallocate(2), simplify error handling logic. Do return when fp is not yet referenced. Noted by: bde [1] Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-12 17:31:38 +00:00
Gleb Smirnoff	2b21d0e883	Merge from projects/sendfile: - Use KASSERT()s instead of panic(). - Use sbavail() instead of sb_cc. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-12 10:17:46 +00:00
Gleb Smirnoff	cfa6009e36	In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-12 09:57:15 +00:00
Gleb Smirnoff	efe28398f5	Fix build.	2014-11-11 22:08:18 +00:00
Gleb Smirnoff	0e87b36eaa	Remove SF_KQUEUE code. This code was developed at Netflix, but was not ever used. It didn't go into stable/10, neither was documented. It might be useful, but we collectively decided to remove it, rather leave it abandoned and unmaintained. It is removed in one single commit, so restoring it should be easy, if anyone wants to reopen this idea. Sponsored by: Netflix	2014-11-11 20:32:46 +00:00
Pawel Jakub Dawidek	5ebb15b942	Add missing privilege check when setting the dump device. Before that change it was possible for a regular user to setup the dump device if he had write access to the given device. In theory it is a security issue as user might get access to kernel's memory after provoking kernel crash, but in practise it is not recommended to give regular users direct access to storage devices. Rework the code so that we do privileges check within the set_dumper() function to avoid similar problems in the future. Discussed with: secteam	2014-11-11 04:48:09 +00:00
Konstantin Belousov	0436fcb809	When sleeping waiting for the profiling stop, always set P_STOPPROF before dropping process lock. Clear P_STOPPROF when doing wakeup. Both issues caused thread to hang in stopprofclock() "stopprof" sleep. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-10 14:11:17 +00:00
Alexander V. Chernikov	5e11eb847e	Finish r274118#2: commit forgotten uipc_debug.c	2014-11-06 15:17:04 +00:00
Bjoern A. Zeeb	763f2e7844	After the changes in r274118 make NOIP kernels compile by hiding an otherwise unused variable declaration behind INET6 \|\| INET. MFC after: 27 days X-MFS with: r274118	2014-11-06 12:19:39 +00:00
Mateusz Guzik	bfda9935bd	Add sysctl kern.proc.cwd It returns only current working directory of given process which saves a lot of overhead over kern.proc.filedesc if given proc has a lot of open fds. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-06 08:12:34 +00:00
Mateusz Guzik	3ae366de58	filedesc: avoid taking fdesc_mtx when not necessary in fddrop No functional changes.	2014-11-06 07:44:10 +00:00
Mateusz Guzik	eb6021fb96	filedesc: just free old tables without altering the list which is freed anyway No functional changes.	2014-11-06 07:37:31 +00:00
Mateusz Guzik	a99500a912	Extend struct ucred with group table. This saves one malloc + free with typical cases and better utilizes memory. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-05 02:08:37 +00:00
Alexander V. Chernikov	9f25cbe45e	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
Dag-Erling Smørgrav	bccb6d5aa1	[SA-14:25] Fix kernel stack disclosure in setlogin(2) / getlogin(2). [SA-14:26] Fix remote command execution in ftp(1). Approved by: so (des)	2014-11-04 23:29:29 +00:00
John Baldwin	2cba8dd301	Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives.	2014-11-04 16:35:56 +00:00
Hans Petter Selasky	0ecd606b24	Simplify logic a bit. Ensure data buffer is properly aligned, especially for platforms where unaligned access is not allowed. Make it possible to override the small buffer size. A simple continuous read string test using libusb showed a reduction in CPU usage from roughly 10% to less than 1% using a dual-core GHz CPU, when the malloc() operation was skipped for small buffers. MFC after: 2 weeks	2014-11-04 11:29:49 +00:00
Jean-Sébastien Pédron	2d6f6d6373	Enable vt(4) by default vt(4) is a new console driver which brings features such as: o Support for Unicode and double-width characters o Integration with the KMS kernel video drivers o Support for UEFI You may need to update your console settings in /etc/rc.conf, most probably the keymap. During boot, /etc/rc.d/syscons will indicate what you need to do. vt(4) still has issues and lacks some features compared to syscons(4). See the wiki for up-to-date information: https://wiki.freebsd.org/Newcons If you want to keep using syscons(4), you can do so by adding the following line to /boot/loader.conf: kern.vty=sc Differential Revision: https://reviews.freebsd.org/D1005 Discussed with: emaste@, nwhitehorn@, ray@ Relnotes: yes	2014-11-04 10:18:03 +00:00
Konstantin Belousov	74d5b4af82	Clean up confusing comment. Move it to the place of code which is talked about. Explain where the mentioned trampoline located (usermode), and the fact that attempt to exit last thread is denied in kernel (by delegating the work to usermode). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-03 11:29:08 +00:00
Konstantin Belousov	ab57474c83	When other end of the pipe closed during the write, but some bytes were written, return short write instead of EPIPE. Update comment. Discussed with: bde (long time ago) MFC after: 2 weeks	2014-11-03 10:01:56 +00:00
Mateusz Guzik	5cbf44bf89	Provide an on-stack temporary buffer for small ioctl requests.	2014-11-03 07:46:51 +00:00
Mateusz Guzik	324a7026f1	filedesc: plus sys/kdb.h include which crept in with r274007	2014-11-03 06:24:43 +00:00
Mateusz Guzik	1d29258ac2	filedesc: plug unnecessary fdp NULL checks in fdescfreee and fdcopy Anything reaching these functions has fd table.	2014-11-03 05:12:17 +00:00
Mateusz Guzik	32417098f0	filedesc: create a dedicated zone for struct filedesc0 Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from malloc use 2048 bytes. There is no easy way to shrink the structure <= 1024 an it is likely to grow in the future.	2014-11-03 04:16:04 +00:00
Konstantin Belousov	cc24666735	Followup to r273966. Fix the build with ADAPTIVE_LOCKMGRS kernel option. Note that the option is currently not used in any in-tree kernel configs, including LINTs. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 19:51:33 +00:00
Mateusz Guzik	3dca54ab98	filedesc: move freeing old tables to fdescfree They cannot be accessed by anyone and hold count only protects the structure from being freed.	2014-11-02 14:12:03 +00:00
Mateusz Guzik	3dc85312b2	filedesc: factor out some code out of fdescfree Previously it had a huge self-contained chunk dedicated to dealing with shared tables. No functional changes.	2014-11-02 13:43:04 +00:00
Konstantin Belousov	72ba3c0822	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 13:10:31 +00:00
Mateusz Guzik	080fdefc28	filedesc: tidy up fdcheckstd No functional changes.	2014-11-02 02:32:33 +00:00
Mateusz Guzik	d3f3e12a4f	filedesc: lock filedesc lock in fdcloseexec only when needed	2014-11-02 01:13:11 +00:00
Mateusz Guzik	cdcf242896	Fix up module unload for syscall_module_handler consumers. After r273707 it was registering syscalls as static. This fixes hwpmc module unload. Reported by: markj	2014-11-01 22:36:40 +00:00
Jean-Sébastien Pédron	da49f6bcc3	vt(4): Adjust the cursor position after changing the window size A new terminal_set_cursor() is added: it wraps the existing teken_set_cursor() function. In vtbuf_grow(), the cursor position is adjusted at the end of the function. In vt_change_font(), we call terminal_set_cursor() just after terminal_set_winsize_blank(), while the terminal is mute. This fixes a bug where, after loading a kernel video driver which increases the terminal window size, the cursor remains at its old position, in other words, in the middle of the display content. PR: 194421 MFC after: 1 week	2014-11-01 17:05:15 +00:00
Konstantin Belousov	2361c6d135	Add type qualifier volatile to the base (userspace) address argument of fuword(9) and suword(9). This makes the functions type-compatible with volatile objects and does not require devolatile force, e.g. in kern_umtx.c. Requested by: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-10-31 17:43:21 +00:00
Mateusz Guzik	2534d8eeb6	filedesc: drop retval argument from do_dup It was almost always td_retval anyway. For the one case where it is not, preserve the old value across the call.	2014-10-31 10:35:01 +00:00
Mateusz Guzik	8a5177cca3	filedesc: fix missed comments about fdsetugidsafety While here just note that both fdsetugidsafety and fdcheckstd take sleepable locks.	2014-10-31 09:56:00 +00:00
Mateusz Guzik	f652d856ab	filedesc: make fdinit return with source filedesc locked and new one sized appropriately Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable. We don't have to call it with the lock held if we are just creating new filedesc. As a side note, strictly speaking processes can have fdtables with fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file descriptor they get will be 0 and the only syscall allowing to choose fd number requires an active file descriptor. Should this ever change, we can add an 'init' (or similar) parameter to fdgrowtable.	2014-10-31 09:25:28 +00:00
Mateusz Guzik	ffeb890592	filedesc: iterate over fd table only once in fdcopy While here add 'fdused_init' which does not perform unnecessary work. Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold it when appropriate. This function is only used with INVARIANTS. No functional changes intended.	2014-10-31 09:19:46 +00:00
Mateusz Guzik	1a0c80a3df	filedesc: tidy up fdfree Implement fdefree_last variant and get rid of 'last' parameter. No functional changes.	2014-10-31 09:15:59 +00:00
Mateusz Guzik	b97a758ffc	filedesc: tidy up fdcopy a little bit Test for file availability by fde_file != NULL instead of fdisused, this is consistent with similar checks later. Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never reached in practice. fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS. No functional changes.	2014-10-31 05:41:27 +00:00
Mark Murray	10cb24248a	This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des)	2014-10-30 21:21:53 +00:00
Mateusz Guzik	f55cf4b0d1	filedesc: make sure to force table reload in fget_unlocked when count == 0 This is a fixup to r273843.	2014-10-30 07:21:38 +00:00
Mateusz Guzik	29c85772bb	filedesc: microoptimize fget_unlocked by retrying obtaining reference count without restarting whole lookup Restart is only needed when fp was closed by current process, which is a much rarer event than ref/deref by some other thread.	2014-10-30 05:21:12 +00:00
Mateusz Guzik	aa77d52800	filedesc: get rid of atomic_load_acq_int from fget_unlocked A read barrier was necessary because fd table pointer and table size were updated separately, opening a window where fget_unlocked could read new size and old pointer. This patch puts both these fields into one dedicated structure, pointer to which is later atomically updated. As such, fget_unlocked only needs data a dependency barrier which is a noop on all supported architectures. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-30 05:10:33 +00:00
John Baldwin	01e1933dcc	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
Konstantin Belousov	f7e91c288a	Convert kern_umtx.c to use fueword() and casueword(). Also fix some mishandling of suword(9) errors as errno, which resulted in spurious ERESTART. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:30:33 +00:00
Konstantin Belousov	0a2c94b86e	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
Konstantin Belousov	4f3dc90023	Add fueword(9) and casueword(9) functions. They are like fuword(9) and casuword(9), but do not mix value read and indication of fault. I know (or remember) enough assembly to handle x86 and powerpc. For arm, mips and sparc64, implement fueword() and casueword() as wrappers around fuword() and casuword(), which means that the functions cannot distinguish between -1 and fault. On architectures where fueword() and casueword() are native, implement fuword() and casuword() using fueword() and casuword(), to reduce assembly code duplication. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks (ia64 needs treating)	2014-10-28 15:22:13 +00:00

... 3 4 5 6 7 ...

14366 Commits