freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	a99500a912	Extend struct ucred with group table. This saves one malloc + free with typical cases and better utilizes memory. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-05 02:08:37 +00:00
Alexander V. Chernikov	9f25cbe45e	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
Dag-Erling Smørgrav	bccb6d5aa1	[SA-14:25] Fix kernel stack disclosure in setlogin(2) / getlogin(2). [SA-14:26] Fix remote command execution in ftp(1). Approved by: so (des)	2014-11-04 23:29:29 +00:00
John Baldwin	2cba8dd301	Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives.	2014-11-04 16:35:56 +00:00
Hans Petter Selasky	0ecd606b24	Simplify logic a bit. Ensure data buffer is properly aligned, especially for platforms where unaligned access is not allowed. Make it possible to override the small buffer size. A simple continuous read string test using libusb showed a reduction in CPU usage from roughly 10% to less than 1% using a dual-core GHz CPU, when the malloc() operation was skipped for small buffers. MFC after: 2 weeks	2014-11-04 11:29:49 +00:00
Jean-Sébastien Pédron	2d6f6d6373	Enable vt(4) by default vt(4) is a new console driver which brings features such as: o Support for Unicode and double-width characters o Integration with the KMS kernel video drivers o Support for UEFI You may need to update your console settings in /etc/rc.conf, most probably the keymap. During boot, /etc/rc.d/syscons will indicate what you need to do. vt(4) still has issues and lacks some features compared to syscons(4). See the wiki for up-to-date information: https://wiki.freebsd.org/Newcons If you want to keep using syscons(4), you can do so by adding the following line to /boot/loader.conf: kern.vty=sc Differential Revision: https://reviews.freebsd.org/D1005 Discussed with: emaste@, nwhitehorn@, ray@ Relnotes: yes	2014-11-04 10:18:03 +00:00
Konstantin Belousov	74d5b4af82	Clean up confusing comment. Move it to the place of code which is talked about. Explain where the mentioned trampoline located (usermode), and the fact that attempt to exit last thread is denied in kernel (by delegating the work to usermode). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-03 11:29:08 +00:00
Konstantin Belousov	ab57474c83	When other end of the pipe closed during the write, but some bytes were written, return short write instead of EPIPE. Update comment. Discussed with: bde (long time ago) MFC after: 2 weeks	2014-11-03 10:01:56 +00:00
Mateusz Guzik	5cbf44bf89	Provide an on-stack temporary buffer for small ioctl requests.	2014-11-03 07:46:51 +00:00
Mateusz Guzik	324a7026f1	filedesc: plus sys/kdb.h include which crept in with r274007	2014-11-03 06:24:43 +00:00
Mateusz Guzik	1d29258ac2	filedesc: plug unnecessary fdp NULL checks in fdescfreee and fdcopy Anything reaching these functions has fd table.	2014-11-03 05:12:17 +00:00
Mateusz Guzik	32417098f0	filedesc: create a dedicated zone for struct filedesc0 Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from malloc use 2048 bytes. There is no easy way to shrink the structure <= 1024 an it is likely to grow in the future.	2014-11-03 04:16:04 +00:00
Konstantin Belousov	cc24666735	Followup to r273966. Fix the build with ADAPTIVE_LOCKMGRS kernel option. Note that the option is currently not used in any in-tree kernel configs, including LINTs. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 19:51:33 +00:00
Mateusz Guzik	3dca54ab98	filedesc: move freeing old tables to fdescfree They cannot be accessed by anyone and hold count only protects the structure from being freed.	2014-11-02 14:12:03 +00:00
Mateusz Guzik	3dc85312b2	filedesc: factor out some code out of fdescfree Previously it had a huge self-contained chunk dedicated to dealing with shared tables. No functional changes.	2014-11-02 13:43:04 +00:00
Konstantin Belousov	72ba3c0822	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 13:10:31 +00:00
Mateusz Guzik	080fdefc28	filedesc: tidy up fdcheckstd No functional changes.	2014-11-02 02:32:33 +00:00
Mateusz Guzik	d3f3e12a4f	filedesc: lock filedesc lock in fdcloseexec only when needed	2014-11-02 01:13:11 +00:00
Mateusz Guzik	cdcf242896	Fix up module unload for syscall_module_handler consumers. After r273707 it was registering syscalls as static. This fixes hwpmc module unload. Reported by: markj	2014-11-01 22:36:40 +00:00
Jean-Sébastien Pédron	da49f6bcc3	vt(4): Adjust the cursor position after changing the window size A new terminal_set_cursor() is added: it wraps the existing teken_set_cursor() function. In vtbuf_grow(), the cursor position is adjusted at the end of the function. In vt_change_font(), we call terminal_set_cursor() just after terminal_set_winsize_blank(), while the terminal is mute. This fixes a bug where, after loading a kernel video driver which increases the terminal window size, the cursor remains at its old position, in other words, in the middle of the display content. PR: 194421 MFC after: 1 week	2014-11-01 17:05:15 +00:00
Konstantin Belousov	2361c6d135	Add type qualifier volatile to the base (userspace) address argument of fuword(9) and suword(9). This makes the functions type-compatible with volatile objects and does not require devolatile force, e.g. in kern_umtx.c. Requested by: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-10-31 17:43:21 +00:00
Mateusz Guzik	2534d8eeb6	filedesc: drop retval argument from do_dup It was almost always td_retval anyway. For the one case where it is not, preserve the old value across the call.	2014-10-31 10:35:01 +00:00
Mateusz Guzik	8a5177cca3	filedesc: fix missed comments about fdsetugidsafety While here just note that both fdsetugidsafety and fdcheckstd take sleepable locks.	2014-10-31 09:56:00 +00:00
Mateusz Guzik	f652d856ab	filedesc: make fdinit return with source filedesc locked and new one sized appropriately Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable. We don't have to call it with the lock held if we are just creating new filedesc. As a side note, strictly speaking processes can have fdtables with fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file descriptor they get will be 0 and the only syscall allowing to choose fd number requires an active file descriptor. Should this ever change, we can add an 'init' (or similar) parameter to fdgrowtable.	2014-10-31 09:25:28 +00:00
Mateusz Guzik	ffeb890592	filedesc: iterate over fd table only once in fdcopy While here add 'fdused_init' which does not perform unnecessary work. Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold it when appropriate. This function is only used with INVARIANTS. No functional changes intended.	2014-10-31 09:19:46 +00:00
Mateusz Guzik	1a0c80a3df	filedesc: tidy up fdfree Implement fdefree_last variant and get rid of 'last' parameter. No functional changes.	2014-10-31 09:15:59 +00:00
Mateusz Guzik	b97a758ffc	filedesc: tidy up fdcopy a little bit Test for file availability by fde_file != NULL instead of fdisused, this is consistent with similar checks later. Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never reached in practice. fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS. No functional changes.	2014-10-31 05:41:27 +00:00
Mark Murray	10cb24248a	This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des)	2014-10-30 21:21:53 +00:00
Mateusz Guzik	f55cf4b0d1	filedesc: make sure to force table reload in fget_unlocked when count == 0 This is a fixup to r273843.	2014-10-30 07:21:38 +00:00
Mateusz Guzik	29c85772bb	filedesc: microoptimize fget_unlocked by retrying obtaining reference count without restarting whole lookup Restart is only needed when fp was closed by current process, which is a much rarer event than ref/deref by some other thread.	2014-10-30 05:21:12 +00:00
Mateusz Guzik	aa77d52800	filedesc: get rid of atomic_load_acq_int from fget_unlocked A read barrier was necessary because fd table pointer and table size were updated separately, opening a window where fget_unlocked could read new size and old pointer. This patch puts both these fields into one dedicated structure, pointer to which is later atomically updated. As such, fget_unlocked only needs data a dependency barrier which is a noop on all supported architectures. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-30 05:10:33 +00:00
John Baldwin	01e1933dcc	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
Konstantin Belousov	f7e91c288a	Convert kern_umtx.c to use fueword() and casueword(). Also fix some mishandling of suword(9) errors as errno, which resulted in spurious ERESTART. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:30:33 +00:00
Konstantin Belousov	0a2c94b86e	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
Konstantin Belousov	4f3dc90023	Add fueword(9) and casueword(9) functions. They are like fuword(9) and casuword(9), but do not mix value read and indication of fault. I know (or remember) enough assembly to handle x86 and powerpc. For arm, mips and sparc64, implement fueword() and casueword() as wrappers around fuword() and casuword(), which means that the functions cannot distinguish between -1 and fault. On architectures where fueword() and casueword() are native, implement fuword() and casuword() using fueword() and casuword(), to reduce assembly code duplication. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks (ia64 needs treating)	2014-10-28 15:22:13 +00:00
Hans Petter Selasky	0e1152fcc2	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
Mateusz Guzik	a7963b9758	Simplify sys_getloginclass. Just use current thread credentials as they have the same accuracy as the ones obtained from proc..	2014-10-28 04:59:33 +00:00
Mateusz Guzik	b720dc9749	Change loginclass mutex to an rwlock. While here reduce nesting in loginclass_free. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> X-Additional: JuniorJobs project MFC after: 2 weeks	2014-10-28 04:33:57 +00:00
Mateusz Guzik	8d866b10fc	Tidy up functions related to uidinfo management. - reference found uidinfo in uilookup - reduce nesting by handling shorter cases first	2014-10-27 20:20:05 +00:00
Mateusz Guzik	8101958c4f	De-k&r-ify function definitions in kern/kern_resource.c No functional changes.	2014-10-27 20:18:30 +00:00
Mateusz Guzik	e015b1ab0a	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
Mateusz Guzik	b90638866e	Fix up an assertion in kern_setgroups, it should compare with ngroups_max + 1 Bug introdued in r273685. Noted by: Tiwei Bie <btw mail.ustc.edu.cn>	2014-10-26 14:25:42 +00:00
Mateusz Guzik	7e9a456a53	Tidy up sys_setgroups and kern_setgroups. - 'groups' initialization to NULL is always ovewrwriten before use, so plug it - get rid of 'goto out' - kern_setgroups's callers already validate ngrp, so only assert the condition - ngrp is an u_int, so 'ngrp < 1' is more readable as 'ngrp == 0' No functional changes.	2014-10-26 06:04:09 +00:00
Mateusz Guzik	92b064f43d	Use a temporary buffer in sys_setgroups for requests with <= XU_NGROUPS groups. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> X-Additional: JuniorJobs project MFC after: 2 weeks	2014-10-26 05:39:42 +00:00
Mateusz Guzik	f84f8f9468	Now that sysctl_root is only called with sysctl lock in shared mode, update its assertion to require that. Update comment missed in r273400: sysctl_xlock/unlock -> sysctl_xlock/xunlock Noted by: jhb	2014-10-26 01:47:55 +00:00
John Baldwin	1bc9ea1caa	Use correct type in __DEVOLATILE().	2014-10-25 20:42:47 +00:00
Alexander Motin	ccf8a5688a	Revert somewhat hackish geom_disk optimization, committed as part of r256880, and the following r273143 commit, supposed to workaround introduced issue by quite innocent-looking change. While there is no clear understanding why, but r273143 is accused in data corruption in some environments with high I/O load. I personally don't see any problem in that commit, and possibly it is just a trigger to some other bug somewhere, but better safe then sorry for now. Requested by: scottl@ MFC after: 3 days	2014-10-25 15:16:19 +00:00
Mateusz Guzik	675c3507d4	rlimit: plug duplicate assertion counter sanity is already checked by refcount_release.	2014-10-25 05:56:21 +00:00
Xin LI	6362e06b42	Fix build.	2014-10-25 00:16:36 +00:00
John Baldwin	53e1ffbbce	The current POSIX semaphore implementation stores the _has_waiters flag in a separate word from the _count. This does not permit both items to be updated atomically in a portable manner. As a result, sem_post() must always perform a system call to safely clear _has_waiters. This change removes the _has_waiters field and instead uses the high bit of _count as the _has_waiters flag. A new umtx object type (_usem2) and two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement these semantics. The older operations are still supported under the COMPAT_FREEBSD9/10 options. The POSIX semaphore API in libc has been updated to use the new implementation. Note that the new implementation is not compatible with the previous implementation. However, this only affects static binaries (which cannot be helped by symbol versioning). Binaries using a dynamic libc will continue to work fine. SEM_MAGIC has been bumped so that mismatched binaries will error rather than corrupting a shared semaphore. In addition, a padding field has been added to sem_t so that it remains the same size. Differential Revision: https://reviews.freebsd.org/D961 Reported by: adrian Reviewed by: kib, jilles (earlier version) Sponsored by: Norse	2014-10-24 20:02:44 +00:00
Dag-Erling Smørgrav	b0d69dfad9	In all cases except CTLTYPE_STRING, penv is NULL here, so passing it indiscriminately to printf() and freeenv() is incorrect. Add a NULL check before freeenv(); as for printf(), we could use req.newptr instead, but we'd have to select the correct format string based on the type, and that's too much work for an error message, so just remove it.	2014-10-23 22:42:56 +00:00
Mateusz Guzik	ffc5ce7b75	In selfdfree re-evaulate sf_si after takin the lock. Otherwise we can race with doselwakeup. This is a fixup to r273549 Reviewed by: jhb Reported by: everyone and their dog	2014-10-23 19:06:08 +00:00
Xin LI	2735a91d93	Test if 'env' is NULL before doing memset() and strlen(), the caller may pass NULL to freeenv().	2014-10-23 18:23:50 +00:00
Mateusz Guzik	73f2e5f759	Avoid taking the lock in selfdfree when not needed.	2014-10-23 15:35:47 +00:00
Colin Percival	b9f6af45a5	Avoid leaking data from the kernel environment: When we convert the initial static environment to a dynamic one, zero the static environment buffer, and zero individual values when kern_unsetenv and freeenv are called. Tested by: kmoore (VM memory dump + grep) Tested by: cperciva (kernel panic dump + grep)	2014-10-22 23:35:32 +00:00
Mateusz Guzik	58a3dcb229	filedesc assert that table size is at least 3 in fdsetugidsafety Requested by: kib	2014-10-22 08:56:57 +00:00
Mateusz Guzik	4bc68ed7bc	Plug unnecessary PRS_NEW check in kern_procctl. pfind does not return processes in such state.	2014-10-22 04:16:09 +00:00
Mateusz Guzik	a39d200bb9	Reduce nesting in vn_access. No functional changes.	2014-10-22 01:53:00 +00:00
Mateusz Guzik	eac9678110	Avoid crdup when possible in kern_accessat. While here tidy up a little.	2014-10-22 01:09:07 +00:00
Mateusz Guzik	11888da8d9	filedesc: cleanup setugidsafety a little Rename it to fdsetugidsafety for consistency with other functions. There is no need to take filedesc lock if not closing any files. The loop has to verify each file and we are guaranteed fdtable has space for at least 20 fds. As such there is no need to check fd_lastfile. While here tidy up is_unsafe.	2014-10-22 00:23:43 +00:00
Mateusz Guzik	07b384cbe2	Eliminate unnecessary memory allocation in sys_getgroups and its ibcs2 counterpart.	2014-10-21 23:08:46 +00:00
Mateusz Guzik	2afec8edfc	Take the lock shared in linker_search_symbol_name. This helps sysctl kern.proc.stack.	2014-10-21 21:29:20 +00:00
Mateusz Guzik	fca7732078	Mark some more sysctl stuff shared-locked and MPSAFE.	2014-10-21 21:08:45 +00:00
Mateusz Guzik	b564c5d6aa	Make sysctl name2oid shared-locked as well. This is a follow-up to r273401.	2014-10-21 19:45:08 +00:00
Mateusz Guzik	efe0abddf5	Implement shared locking for sysctl.	2014-10-21 19:05:44 +00:00
Mateusz Guzik	580a011762	Rename sysctl_lock and _unlock to sysctl_xlock and _xunlock.	2014-10-21 19:02:26 +00:00
Hans Petter Selasky	f0188618f2	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
Mateusz Guzik	5c37b305fd	Plug unnecessary binvp NULL initialization and test. Reported by: Coverity CID: 1018889	2014-10-20 22:52:15 +00:00
Mateusz Guzik	966ee9f25f	filedesc: plug 2 write-only variables Reported by: Coverity CID: 1245745, 1245746	2014-10-20 21:57:24 +00:00
Mark Johnston	4fd6ca7275	Fix a typo from r189544, which replaced unp_global_rwlock with unp_list_lock and unp_link_rwlock. MFC after: 3 days	2014-10-20 20:21:40 +00:00
Mateusz Guzik	4fce16e4c9	Provide vfs suspension support only for filesystems which need it, take two. nullfs and unionfs need to request suspension if underlying filesystem(s) use it. Utilize mnt_kern_flag for this purpose. This is a fixup for 273271. No strong objections from: kib Pointy hat to: mjg MFC after: 2 weeks	2014-10-20 18:00:50 +00:00
Marcel Moolenaar	0067051fe7	Fully support constructors for the purpose of code coverage analysis. This involves: 1. Have the loader pass the start and size of the .ctors section to the kernel in 2 new metadata elements. 2. Have the linker backends look for and record the start and size of the .ctors section in dynamically loaded modules. 3. Have the linker backends call the constructors as part of the final work of initializing preloaded or dynamically loaded modules. Note that LLVM appends the priority of the constructors to the name of the .ctors section. Not so when compiling with GCC. The code currently works for GCC and not for LLVM. Submitted by: Dmitry Mikulin <dmitrym@juniper.net> Obtained from: Juniper Networks, Inc.	2014-10-20 17:04:03 +00:00
Mateusz Guzik	020b8f17a0	Provide vfs suspension support only for filesystems which need it. Need is expressed by providing vfs_susp_clean function in vfsops. Differential Revision: D952 Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-19 06:59:33 +00:00
Adrian Chadd	3fe93b946f	Convert a missed u_char cpu -> int cpu. This was caught by a gcc build. Reported by: luigi Sponsored by: Norse Corp, Inc.	2014-10-19 04:38:02 +00:00
Adrian Chadd	e77f9fed15	Update the ULE scheduler + thread and kinfo structs to use int for cpuid rather than u_char. To try and play nice with the ABI, the u_char CPU ID values are clamped at 254. The new fields now contain the full CPU ID, or -1 for no cpu. Differential Revision: D955 Reviewed by: jhb, kib Sponsored by: Norse Corp, Inc.	2014-10-18 19:36:11 +00:00
Davide Italiano	2be111bf7d	Follow up to r225617. In order to maximize the re-usability of kernel code in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe	2014-10-16 18:04:43 +00:00
Alexander Motin	99b9076c21	Remove setting BIO_DONE flag for BIOs that have done() method. This fixes use-after-free, caused by geom_disk, completing same BIO twice to save extra allocation, and getting BIO_DONE set after the first. MFC after: 1 week	2014-10-15 18:36:34 +00:00
Konstantin Belousov	f821fad417	Implement FIODTYPE for master ptys. Requested and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-15 12:38:26 +00:00
Mateusz Guzik	32e7f8e4d5	Don't take devmtx unnecessarily in vn_isdisk. MFC after: 1 week	2014-10-15 05:17:36 +00:00
Mateusz Guzik	55056be254	filedesc: plug 2 assignments to M_ZERO-ed pointers in falloc_noinstall No functional changes.	2014-10-15 01:16:11 +00:00
Marcel Moolenaar	ddbe5b951f	Fix nits in previous commit: 1. Remove initializer for badstack_sbuf_size; it gets set unconditionally. 2. Remove meaningless comment. 3. Group witness_count and its sysctl together. 4. Fix spacing in for statements (space after for and within condition). 5. Change all M_NOWAIT usages in witness_initialize() to M_WAITOK; not just those that were newly introduced -- the allocation is assumed to succeed for all allocations. 6. Avoid using uint8_t as the base type in sizeof() expressions; Use the variable name (w_rmatrix) as much as possible. Pointed out by: jhb@ (thanks!)	2014-10-11 16:34:01 +00:00
Marcel Moolenaar	90a5222f14	Turn WITNESS_COUNT into a tunable and sysctl. This allows adjusting the value without recompiling the kernel. This is useful when recompiling is not possible as an immediate solution. When we run out of witness objects, witness is completely disabled. Not having an immediate solution can therefore be problematic. Submitted by: Sreekanth Rupavatharam <rupavath@juniper.net> Obtained from: Juniper Networks, Inc.	2014-10-11 02:02:58 +00:00
Marcel Moolenaar	2e7634503e	Regenerate after r272823: Move the SCTP syscalls to netinet with the rest of the SCTP code. Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:19:35 +00:00
Marcel Moolenaar	80b47aefa1	Move the SCTP syscalls to netinet with the rest of the SCTP code. The syscalls themselves are tightly coupled with the network stack and therefore should not be in the generic socket code. The following four syscalls have been marked as NOSTD so they can be dynamically registered in sctp_syscalls_init() function: sys_sctp_peeloff sys_sctp_generic_sendmsg sys_sctp_generic_sendmsg_iov sys_sctp_generic_recvmsg The syscalls are also set up to be dynamically registered when COMPAT32 option is configured. As a side effect of moving the SCTP syscalls, getsock_cap needs to be made available outside of the uipc_syscalls.c source file. A proper prototype has been added to the sys/socketvar.h header file. API tests from the SCTP reference implementation have been run to ensure compatibility. (http://code.google.com/p/sctp-refimpl/source/checkout) Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: tuexen, rrs Obtained from: Juniper Networks, Inc.	2014-10-09 15:16:52 +00:00
Adrian Chadd	ffcf962dab	Add a bus method to fetch the VM domain for the given device/bus. * Add a bus_if.m method - get_domain() - returning the VM domain or ENOENT if the device isn't in a VM domain; * Add bus methods to print out the domain of the device if appropriate; * Add code in srat.c to save the PXM -> VM domain mapping that's done and expose a function to translate VM domain -> PXM; * Add ACPI and ACPI PCI methods to check if the bus has a _PXM attribute and if so map it to the VM domain; * (.. yes, this works recursively.) * Have the pci bus glue print out the device VM domain if present. Note: this is just the plumbing to start enumerating information - it doesn't at all modify behaviour. Differential Revision: D906 Reviewed by: jhb Sponsored by: Norse Corp	2014-10-09 05:33:25 +00:00
Marcel Moolenaar	383f423be1	Fix draining in ttydev_leave(): 1. ERESTART is not only returned when the revoke count changed. It is also returned when a signal is received. While a change in the revoke count should be ignored, a signal should not. 2. Waiting until the output queue is entirely drained can cause a hang when the underlying device is stuck or broken. Have tty_drain() take care of this by telling it when we're leaving. When leaving, tty_drain() will use a timed wait to address point 2 above and it will check the revoke count to handle point 1 above. The timeout is set to 1 second, which is arbitrary and long enough to expect a change in the output queue. Discussed with: jilles@ Reported by: Yamagi Burmeister <lists@yamagi.org>	2014-10-09 02:30:38 +00:00
Marcel Moolenaar	75c2b79df8	Apply r269126 to tty_timedwait(): Don't return ERESTART when the device is gone.	2014-10-09 01:59:25 +00:00
John Baldwin	232e8b52b0	Add schedgraph traces for callout handlers. Specifically, a callwheel logs a running event each time it executes a callout function. The event includes the function pointer, argument, and whether or not it was run from hardware interrupt context. The callwheel is marked idle when each handler completes. This effectively logs the duration of each callout routine in the graph.	2014-10-08 16:22:59 +00:00
Jung-uk Kim	37417245bf	Make kern.nswbuf tunable from loader. MFC after: 1 week	2014-10-07 20:13:47 +00:00
Mateusz Guzik	dd2390be68	Convert racct stubs to inline functions. This saves some symbols and function calls for kernel without RACCT. MFC after: 1 week	2014-10-06 02:31:33 +00:00
Mateusz Guzik	2b4a2528d7	filedesc: fix up breakage introduced in 272505 Include sequence counter supports incoditionally [1]. This fixes reprted build problems with e.g. nvidia driver due to missing opt_capsicum.h. Replace fishy looking sizeof with offsetof. Make fde_seq the last member in order to simplify calculations. Suggested by: kib [1] X-MFC: with 272505	2014-10-05 19:40:29 +00:00
Konstantin Belousov	57c2505e65	On error, sbuf_bcat() returns -1. Some callers returned this -1 to the upper layers, which interpret it as errno value, which happens to be ERESTART. The result was spurious restarts of the sysctls in loop, e.g. kern.proc.proc, instead of returning ENOMEM to caller. Convert -1 from sbuf_bcat() to ENOMEM, when returning to the callers expecting errno. In collaboration with: pho Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2014-10-05 17:35:59 +00:00
Mateusz Guzik	bad2520a2b	Avoid unnecessary ppeers_lock acquisition in exit1. MFC after: 1 week	2014-10-05 07:21:41 +00:00
Mateusz Guzik	25108069ec	Get rid of crshared.	2014-10-05 02:16:53 +00:00
Konstantin Belousov	4142462eeb	Slightly reword comment. Move code, which is described by the comment, after it. Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-10-04 18:51:55 +00:00
Konstantin Belousov	b76278407d	Add kernel option KSTACK_USAGE_PROF to sample the stack depth on interrupts and report the largest value seen as sysctl debug.max_kstack_used. Useful to estimate how close the kernel stack size is to overflow. In collaboration with: Larry Baird <lab@gta.com> Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week	2014-10-04 18:38:14 +00:00
Konstantin Belousov	539c9eef12	Fixes for i/o during coredumping: - Do not dump into system files. - Do not acquire write reference to the mount point where img.core is written, in the coredump(). The vn_rdwr() calls from ELF imgact request the write ref from vn_rdwr(). Recursive acqusition of the write ref deadlocks with the unmount. - Instead, take the range lock for the whole core file. This prevents parallel dumping from two processes executing the same image, converting the useless interleaved dump into sequential dumping, with second core overwriting the first. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-10-04 18:35:00 +00:00
Konstantin Belousov	e3d6feceb1	Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is not locked, but range is. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-10-04 18:28:27 +00:00
Ian Lepore	41e8f7efbe	Make kevent(2) periodic timer events more reliably periodic. The event callout is now scheduled using the C_ABSOLUTE flag, and the absolute time of each event is calculated as the time the previous event was scheduled for plus the interval. This ensures that latency in processing a given event doesn't perturb the arrival time of any subsequent events. Reviewed by: jhb	2014-10-04 15:59:15 +00:00
Mateusz Guzik	ee3fd7bbb1	Plug capability races. fp and appropriate capability lookups were not atomic, which could result in improper capabilities being checked. This could result either in protection bypass or in a spurious ENOTCAPABLE. Make fp + capability check atomic with the help of sequence counters. Reviewed by: kib MFC after: 3 weeks	2014-10-04 08:08:56 +00:00

1 2 3 4 5 ...

14001 Commits