freebsd-skq

Author	SHA1	Message	Date
jhb	1e8b1cd510	Revert device_getenv_int() for now as it duplicates resource_int_value(). We should perhaps implement a device_getenv_() and device_setenv_() API as a convenience wrapper on top of resource__value() and resource_set_().	2014-12-03 15:29:53 +00:00
kib	1941346c9d	Disable recursion for the process spinlock. Tested by: pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-12-01 17:36:10 +00:00
gibbs	36e4267804	Remove trailing whitespace.	2014-11-30 19:32:00 +00:00
glebius	3dd3d6d9ff	Merge from projects/sendfile: Provide pru_ready for AF_LOCAL sockets. Local sockets sendsdata directly to the receive buffer of the peer, thus pru_ready also works on the peer socket. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 13:40:58 +00:00
glebius	9cadf1b974	Merge from projects/sendfile: extend protocols API to support sending not ready data: o Add new flag to pru_send() flags - PRUS_NOTREADY. o Add new protocol method pru_ready(). Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-30 13:24:21 +00:00
glebius	25da94eb3e	Merge from projects/sendfile: o Introduce a notion of "not ready" mbufs in socket buffers. These mbufs are now being populated by some I/O in background and are referenced outside. This forces following implications: - An mbuf which is "not ready" can't be taken out of the buffer. - An mbuf that is behind a "not ready" in the queue neither. - If sockbet buffer is flushed, then "not ready" mbufs shouln't be freed. o In struct sockbuf the sb_cc field is split into sb_ccc and sb_acc. The sb_ccc stands for ""claimed character count", or "committed character count". And the sb_acc is "available character count". Consumers of socket buffer API shouldn't already access them directly, but use sbused() and sbavail() respectively. o Not ready mbufs are marked with M_NOTREADY, and ready but blocked ones with M_BLOCKED. o New field sb_fnrdy points to the first not ready mbuf, to avoid linear search. o New function sbready() is provided to activate certain amount of mbufs in a socket buffer. A special note on SCTP: SCTP has its own sockbufs. Unfortunately, FreeBSD stack doesn't yet allow protocol specific sockbufs. Thus, SCTP does some hacks to make itself compatible with FreeBSD: it manages sockbufs on its own, but keeps sb_cc updated to inform the stack of amount of data in them. The new notion of "not ready" data isn't supported by SCTP. Instead, only a mechanical substitute is done: s/sb_cc/sb_ccc/. A proper solution would be to take away struct sockbuf from struct socket and allow protocols to implement their own socket buffers, like SCTP already does. This was discussed with rrs@. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 12:52:33 +00:00
glebius	176ae2299c	- Move sbcheck() declaration under SOCKBUF_DEBUG. - Improve SOCKBUF_DEBUG macros. - Improve sbcheck(). Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-30 11:22:39 +00:00
glebius	843bdaf93c	Make sballoc() and sbfree() functions. Ideally, they could be marked as static, but unfortunately Infiniband (ab)uses them. Sponsored by: Nginx, Inc.	2014-11-30 11:02:07 +00:00
imp	d737995628	The current limit of 100k for the linker hints file is getting a bit crowded as we now are at about 70k. Bump the limit to 1MB instead which is still quite a reasonable limit and allows for future growth of this file and possible future expansion to additional data. MFC After: 2 weeks	2014-11-29 17:29:30 +00:00
kib	9127a1f190	Remove lock recursion for the pipe pair mutex, and disable the recursion on mutex initialization. The only places where the recursive acquire is performed are read and write filters, since knlist, which uses the pipe pair mutex as lock, is locked when filter is called. The recursion was added in r93296, and consistent locking for kn_fop->f_event() introduced in r133741. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month	2014-11-29 17:18:20 +00:00
kib	1642e3809f	Assert the state of the process lock and sigact mutex in kern_sigprocmask() and reschedule_signals(). Discussed with: rea Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-28 10:20:00 +00:00
hselasky	f4e178e921	Style changes: - Move two IOCTL related defines to the top of the C-file - Add more comments describing the recently added IOCTL small size and small align macros	2014-11-28 09:32:07 +00:00
alfred	1413b742e9	Make igb and ixgbe check tunables at probe time. This allows one to make a kernel module to tune the number of queues before the driver loads. This is needed so that a module at SI_SUB_CPU can set tunables for these drivers to take. Otherwise getenv is called too early by the TUNABLE macros. Reviewed by: smh Phabric: https://reviews.freebsd.org/D1149	2014-11-26 20:19:36 +00:00
kib	11cee2ecf7	The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). The split is done mostly for code clarity, and should not affect scalability. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-26 14:10:00 +00:00
kib	4501dadd00	Fix SA_SIGINFO \| SA_RESETHAND handling. The sysent' sv_sendsig() method needs pre-reset state of the ps_siginfo to correctly construct signal frame. Move sigdflt() call after the sv_sendsig() invocation in postsig(). Simultaneously extract common code from trapsignal() and postsig() into new helper postsig_done(). Submitted by: rea MFC after: 1 week	2014-11-26 14:09:04 +00:00
jhb	5ffe4e5562	Add a bus_get_domain() wrapper around BUS_GET_DOMAIN(). Use this to add a new per-device '%domain' sysctl node that returns the NUMA domain a device is associated with if it is associated with one. Note that this API is still a WIP and might change before 11.0 actually ships. Differential Revision: https://reviews.freebsd.org/D930 Reviewed by: kib, adrian	2014-11-24 19:55:45 +00:00
jhb	c5e82d754f	Properly initialize the capability rights for vnodes exported to procstat that aren't for file descriptors (cwd, jdir, tracevp, etc.). Submitted by: Mikhail <mp@lenta.ru>	2014-11-24 18:34:11 +00:00
glebius	b4ef8e602d	Merge from projects/sendfile: o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-23 12:01:52 +00:00
mjg	a4324a5514	ifdef RACCT ui_racct_foreach and struct uidinfo's ui_racct Change racct_ create and destroy to macros evaluating to nothing without RACCT so that their callers passing ui_racct don't have to be ifdefed.	2014-11-23 08:25:44 +00:00
mjg	e31a493d7e	filedesc: plug a test for impossible condition in fgetvp_rights	2014-11-23 00:12:27 +00:00
kib	0a8ab23540	The size value should be asserted when it is known. Reported and tested by: pho Sponsored by: The FreeBSD Foundation	2014-11-22 18:15:02 +00:00
jhb	1671ac9155	Improve support for XSAVE with debuggers. - Dump an NT_X86_XSTATE note if XSAVE is in use. This note is designed to match what Linux does in that 1) it dumps the entire XSAVE area including the fxsave state, and 2) it stashes a copy of the current xsave mask in the unused padding between the fxsave state and the xstate header at the same location used by Linux. - Teach readelf() to recognize NT_X86_XSTATE notes. - Change PT_GET/SETXSTATE to take the entire XSAVE state instead of only the extra portion. This avoids having to always make two ptrace() calls to get or set the full XSAVE state. - Add a PT_GET_XSTATE_INFO which returns the length of the current XSTATE save area (so the size of the buffer needed for PT_GETXSTATE) and the current XSAVE mask (%xcr0). Differential Revision: https://reviews.freebsd.org/D1193 Reviewed by: kib MFC after: 2 weeks	2014-11-21 20:53:17 +00:00
glebius	79b6f9c70a	Do not allocate zero-length mbuf in sosend_generic(). Found by: pho Sponsored by: Nginx, Inc.	2014-11-19 14:27:38 +00:00
zbb	aee9f2a5a3	Stop using early_putc immediately after configuring console with cninit() Early UART should be released right after system console initialization is completed. Otherwise, after cninit() both early and system console coexist what may lead to various issues (i.a. writing to unmapped early UART address). This cannot be done in cninit_finish() since it can be called late at the end of MI configuration. Obtained from: Semihalf Reviewed by: andrew Sponsored by: The FreeBSD Foundation	2014-11-19 14:23:29 +00:00
imp	e1fec13f7c	opt_global.h is included automatically in the build. No need to explicitly include it in these places. Sponsored by: Netflix	2014-11-18 17:06:56 +00:00
jmg	39fa4746e5	prevent doing filter ops locking for staticly compiled filter ops... This significantly reduces lock contention when adding/removing knotes on busy multi-kq system... Next step is to cache these references per kq.. i.e. kq refs it once and keeps a local ref count so that the same refs don't get accessed by many cpus... only allocate a knote when we might use it... Add a new flag, _FORCEONESHOT.. This allows a thread to force the delivery of another event in a safe manner, say waking up an idle http connection to force it to be reaped... If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's disabled, so won't be delivered anyways.. Tested by: adrian	2014-11-16 01:18:41 +00:00
glebius	d69e7f6f82	- Use NULL to compare a pointer. - Use KASSERT() instead of panic. - Remove useless 'continue', no need to restart cycle here. Sponsored by: Nginx, Inc.	2014-11-14 15:44:19 +00:00
glebius	de0e48fa9a	Merge from projects/sendfile: Use sbcut_locked() instead of manually editing a sockbuf. Sponsored by: Nginx, Inc.	2014-11-14 15:33:40 +00:00
kib	d97b2c5d8d	In vfs_write_suspend_umnt(), if suspension cannot be established, do not forget to restore write ops count when returning the error. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-14 11:31:10 +00:00
glebius	3bc8a689c1	There should not be zero length mbufs in socket buffers. The code comes from r1451, and thus can't be explained. A patch with explicit panic() here survived all tests. Tested by: pho Sponsored by: Nginx, Inc.	2014-11-14 06:02:29 +00:00
jkim	66032933f9	Correct a typo to fix chown(2). It was broken since r274476. Pointy hat to: kib X-MFC-With: r274476	2014-11-13 23:51:13 +00:00
mjg	077a8b14ec	filedesc: fixup fdinit to lock fdp and preapare files conditinally Not all consumers providing fdp to copy from want files. Perhaps these functions should be reorganized to better express the outcome. This fixes up panics after r273895 . Reported by: markj	2014-11-13 21:15:09 +00:00
kib	de34fc931d	Fix assertion, &uc->uc_busy is never zero, the intent is to test the uc_busy value, and not its address [1]. Remove the single use of the macro, write KASSERT() explicitely in the code of umtxq_sleep_pi(). Submitted by: Eric van Gyzen <eric@vangyzen.net> [1] MFC after: 1 week	2014-11-13 18:51:09 +00:00
kib	b4ef709604	Remove the no-at variants of the kern_xx() syscall helpers. E.g., we have both kern_open() and kern_openat(); change the callers to use kern_openat(). This removes one (sometimes two) levels of indirection and consolidates arguments checks. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 18:01:51 +00:00
kib	6cedba80db	Do not try to dereference thread pointer when the value is not a pointer. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 17:44:35 +00:00
kib	e257542e11	Remove fossil. It has been present in 4.4Lite2, but its use was removed for some time. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-13 17:43:37 +00:00
dchagin	c0a51053a4	Regen for r274462.	2014-11-13 05:28:06 +00:00
dchagin	162012051b	Add the ppoll() system call. Export kern_poll() needed by an upcoming Linuxulator change. Differential Revision: https://reviews.freebsd.org/D1133 Reviewed by: kib, wblock MFC after: 1 month	2014-11-13 05:26:14 +00:00
kib	ff19294d91	For posix_fallocate(2) and posix_fadvise(2), return ESPIPE when underlying file does not have DFLAG_SEEKABLE set [1]. For posix_fallocate(2), simplify error handling logic. Do return when fp is not yet referenced. Noted by: bde [1] Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-12 17:31:38 +00:00
glebius	834e6d1d30	Merge from projects/sendfile: - Use KASSERT()s instead of panic(). - Use sbavail() instead of sb_cc. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2014-11-12 10:17:46 +00:00
glebius	c0b38b545a	In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-12 09:57:15 +00:00
glebius	b8af75c693	Fix build.	2014-11-11 22:08:18 +00:00
glebius	53273c84d0	Remove SF_KQUEUE code. This code was developed at Netflix, but was not ever used. It didn't go into stable/10, neither was documented. It might be useful, but we collectively decided to remove it, rather leave it abandoned and unmaintained. It is removed in one single commit, so restoring it should be easy, if anyone wants to reopen this idea. Sponsored by: Netflix	2014-11-11 20:32:46 +00:00
pjd	cb36b2a5c4	Add missing privilege check when setting the dump device. Before that change it was possible for a regular user to setup the dump device if he had write access to the given device. In theory it is a security issue as user might get access to kernel's memory after provoking kernel crash, but in practise it is not recommended to give regular users direct access to storage devices. Rework the code so that we do privileges check within the set_dumper() function to avoid similar problems in the future. Discussed with: secteam	2014-11-11 04:48:09 +00:00
kib	4c07fb2889	When sleeping waiting for the profiling stop, always set P_STOPPROF before dropping process lock. Clear P_STOPPROF when doing wakeup. Both issues caused thread to hang in stopprofclock() "stopprof" sleep. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-10 14:11:17 +00:00
melifaro	b7d1bcf8b2	Finish r274118#2: commit forgotten uipc_debug.c	2014-11-06 15:17:04 +00:00
bz	b9096df681	After the changes in r274118 make NOIP kernels compile by hiding an otherwise unused variable declaration behind INET6 \|\| INET. MFC after: 27 days X-MFS with: r274118	2014-11-06 12:19:39 +00:00
mjg	7e57127b46	Add sysctl kern.proc.cwd It returns only current working directory of given process which saves a lot of overhead over kern.proc.filedesc if given proc has a lot of open fds. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-06 08:12:34 +00:00
mjg	48a19ff17a	filedesc: avoid taking fdesc_mtx when not necessary in fddrop No functional changes.	2014-11-06 07:44:10 +00:00
mjg	355e7bb005	filedesc: just free old tables without altering the list which is freed anyway No functional changes.	2014-11-06 07:37:31 +00:00
mjg	dd190ce5d4	Extend struct ucred with group table. This saves one malloc + free with typical cases and better utilizes memory. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> (slightly modified) X-Additional: JuniorJobs project	2014-11-05 02:08:37 +00:00
melifaro	c2069a39a4	Remove old hack abusing domattach from NFS code. According to IANA RPC uaddr registry, there are no AFs except IPv4 and IPv6, so it's not worth being too abstract here. Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries. Use own initialization without relying on domattach code. While I admit that this was one of the rare places in kernel networking code which really was capable of doing multi-AF without any AF-depended code, it is not possible anymore to rely on dom* code. While here, change terrifying "Invalid radix node head, rn:" message, to different non-understandable "netcred already exists for given addr/mask", but less terrifying. Since we know that rn_addaddr() returns NULL if the same record already exists, we should provide more friendly error. MFC after: 1 month	2014-11-05 00:58:01 +00:00
des	95b02b5b83	[SA-14:25] Fix kernel stack disclosure in setlogin(2) / getlogin(2). [SA-14:26] Fix remote command execution in ftp(1). Approved by: so (des)	2014-11-04 23:29:29 +00:00
jhb	abae099c34	Add a new thread state "spinning" to schedgraph and add tracepoints at the start and stop of spinning waits in lock primitives.	2014-11-04 16:35:56 +00:00
hselasky	862145edac	Simplify logic a bit. Ensure data buffer is properly aligned, especially for platforms where unaligned access is not allowed. Make it possible to override the small buffer size. A simple continuous read string test using libusb showed a reduction in CPU usage from roughly 10% to less than 1% using a dual-core GHz CPU, when the malloc() operation was skipped for small buffers. MFC after: 2 weeks	2014-11-04 11:29:49 +00:00
dumbbell	5f06d19789	Enable vt(4) by default vt(4) is a new console driver which brings features such as: o Support for Unicode and double-width characters o Integration with the KMS kernel video drivers o Support for UEFI You may need to update your console settings in /etc/rc.conf, most probably the keymap. During boot, /etc/rc.d/syscons will indicate what you need to do. vt(4) still has issues and lacks some features compared to syscons(4). See the wiki for up-to-date information: https://wiki.freebsd.org/Newcons If you want to keep using syscons(4), you can do so by adding the following line to /boot/loader.conf: kern.vty=sc Differential Revision: https://reviews.freebsd.org/D1005 Discussed with: emaste@, nwhitehorn@, ray@ Relnotes: yes	2014-11-04 10:18:03 +00:00
kib	649fe8c57c	Clean up confusing comment. Move it to the place of code which is talked about. Explain where the mentioned trampoline located (usermode), and the fact that attempt to exit last thread is denied in kernel (by delegating the work to usermode). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-11-03 11:29:08 +00:00
kib	c852dfee5d	When other end of the pipe closed during the write, but some bytes were written, return short write instead of EPIPE. Update comment. Discussed with: bde (long time ago) MFC after: 2 weeks	2014-11-03 10:01:56 +00:00
mjg	82ce21e1bc	Provide an on-stack temporary buffer for small ioctl requests.	2014-11-03 07:46:51 +00:00
mjg	0983cfdba1	filedesc: plus sys/kdb.h include which crept in with r274007	2014-11-03 06:24:43 +00:00
mjg	04a088dde4	filedesc: plug unnecessary fdp NULL checks in fdescfreee and fdcopy Anything reaching these functions has fd table.	2014-11-03 05:12:17 +00:00
mjg	120816c07f	filedesc: create a dedicated zone for struct filedesc0 Currently sizeof(struct filedesc0) is 1096 bytes, which means allocations from malloc use 2048 bytes. There is no easy way to shrink the structure <= 1024 an it is likely to grow in the future.	2014-11-03 04:16:04 +00:00
kib	d83157092e	Followup to r273966. Fix the build with ADAPTIVE_LOCKMGRS kernel option. Note that the option is currently not used in any in-tree kernel configs, including LINTs. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 19:51:33 +00:00
mjg	d8d7f263db	filedesc: move freeing old tables to fdescfree They cannot be accessed by anyone and hold count only protects the structure from being freed.	2014-11-02 14:12:03 +00:00
mjg	31183326d5	filedesc: factor out some code out of fdescfree Previously it had a huge self-contained chunk dedicated to dealing with shared tables. No functional changes.	2014-11-02 13:43:04 +00:00
kib	cf11d25e18	Fix two issues with lockmgr(9) LK_CAN_SHARE() test, which determines whether the shared request for already shared-locked lock could be granted. Both problems result in the exclusive locker starvation. The concurrent exclusive request is indicated by either LK_EXCLUSIVE_WAITERS or LK_EXCLUSIVE_SPINNERS flags. The reverse condition, i.e. no exclusive waiters, must check that both flags are cleared. Add a flag LK_NODDLKTREAT for shared lock request to indicate that current thread guarantees that it does not own the lock in shared mode. This turns back the exclusive lock starvation avoidance code; see man page update for detailed description. Use LK_NODDLKTREAT when doing lookup(9). Reported and tested by: pho No objections from: attilio Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-11-02 13:10:31 +00:00
mjg	79f817d7d7	filedesc: tidy up fdcheckstd No functional changes.	2014-11-02 02:32:33 +00:00
mjg	22a53e3b5a	filedesc: lock filedesc lock in fdcloseexec only when needed	2014-11-02 01:13:11 +00:00
mjg	63b330d2cc	Fix up module unload for syscall_module_handler consumers. After r273707 it was registering syscalls as static. This fixes hwpmc module unload. Reported by: markj	2014-11-01 22:36:40 +00:00
dumbbell	035cb01fbb	vt(4): Adjust the cursor position after changing the window size A new terminal_set_cursor() is added: it wraps the existing teken_set_cursor() function. In vtbuf_grow(), the cursor position is adjusted at the end of the function. In vt_change_font(), we call terminal_set_cursor() just after terminal_set_winsize_blank(), while the terminal is mute. This fixes a bug where, after loading a kernel video driver which increases the terminal window size, the cursor remains at its old position, in other words, in the middle of the display content. PR: 194421 MFC after: 1 week	2014-11-01 17:05:15 +00:00
kib	888be1193f	Add type qualifier volatile to the base (userspace) address argument of fuword(9) and suword(9). This makes the functions type-compatible with volatile objects and does not require devolatile force, e.g. in kern_umtx.c. Requested by: bde Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 3 weeks	2014-10-31 17:43:21 +00:00
mjg	5b231323b2	filedesc: drop retval argument from do_dup It was almost always td_retval anyway. For the one case where it is not, preserve the old value across the call.	2014-10-31 10:35:01 +00:00
mjg	6b53d30f11	filedesc: fix missed comments about fdsetugidsafety While here just note that both fdsetugidsafety and fdcheckstd take sleepable locks.	2014-10-31 09:56:00 +00:00
mjg	efbe4d69c8	filedesc: make fdinit return with source filedesc locked and new one sized appropriately Assert FILEDESC_XLOCK_ASSERT only for already used tables in fdgrowtable. We don't have to call it with the lock held if we are just creating new filedesc. As a side note, strictly speaking processes can have fdtables with fd_lastfile = -1, but then they cannot enter fdgrowtable. Very first file descriptor they get will be 0 and the only syscall allowing to choose fd number requires an active file descriptor. Should this ever change, we can add an 'init' (or similar) parameter to fdgrowtable.	2014-10-31 09:25:28 +00:00
mjg	9772964585	filedesc: iterate over fd table only once in fdcopy While here add 'fdused_init' which does not perform unnecessary work. Drop FILEDESC_LOCK_ASSERT from fdisused and rely on callers to hold it when appropriate. This function is only used with INVARIANTS. No functional changes intended.	2014-10-31 09:19:46 +00:00
mjg	94f45340d9	filedesc: tidy up fdfree Implement fdefree_last variant and get rid of 'last' parameter. No functional changes.	2014-10-31 09:15:59 +00:00
mjg	02363563c8	filedesc: tidy up fdcopy a little bit Test for file availability by fde_file != NULL instead of fdisused, this is consistent with similar checks later. Drop badfileops check. badfileops don't have DFLAG_PASSABLE set, so it was never reached in practice. fdiused is now only used in some KASSERTS, so ifdef it under INVARIANTS. No functional changes.	2014-10-31 05:41:27 +00:00
markm	fce6747f55	This is the much-discussed major upgrade to the random(4) device, known to you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des)	2014-10-30 21:21:53 +00:00
mjg	cda1078a58	filedesc: make sure to force table reload in fget_unlocked when count == 0 This is a fixup to r273843.	2014-10-30 07:21:38 +00:00
mjg	569cf8ac16	filedesc: microoptimize fget_unlocked by retrying obtaining reference count without restarting whole lookup Restart is only needed when fp was closed by current process, which is a much rarer event than ref/deref by some other thread.	2014-10-30 05:21:12 +00:00
mjg	5bb6a8bca1	filedesc: get rid of atomic_load_acq_int from fget_unlocked A read barrier was necessary because fd table pointer and table size were updated separately, opening a window where fget_unlocked could read new size and old pointer. This patch puts both these fields into one dedicated structure, pointer to which is later atomically updated. As such, fget_unlocked only needs data a dependency barrier which is a noop on all supported architectures. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-30 05:10:33 +00:00
jhb	d47eb7d2d4	Rework virtual machine hypervisor detection. - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel	2014-10-28 19:17:44 +00:00
kib	95304fc8a8	Convert kern_umtx.c to use fueword() and casueword(). Also fix some mishandling of suword(9) errors as errno, which resulted in spurious ERESTART. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:30:33 +00:00
kib	ad7bf17db7	Replace some calls to fuword() by fueword() with proper error checking. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 3 weeks	2014-10-28 15:28:20 +00:00
kib	29a659ef8e	Add fueword(9) and casueword(9) functions. They are like fuword(9) and casuword(9), but do not mix value read and indication of fault. I know (or remember) enough assembly to handle x86 and powerpc. For arm, mips and sparc64, implement fueword() and casueword() as wrappers around fuword() and casuword(), which means that the functions cannot distinguish between -1 and fault. On architectures where fueword() and casueword() are native, implement fuword() and casuword() using fueword() and casuword(), to reduce assembly code duplication. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks (ia64 needs treating)	2014-10-28 15:22:13 +00:00
hselasky	a0b8ff0c54	The SYSCTL data pointers can come from userspace and must not be directly accessed. Although this will work on some platforms, it can throw an exception if the pointer is invalid and then panic the kernel. Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-28 12:00:39 +00:00
mjg	bf3b8650d6	Simplify sys_getloginclass. Just use current thread credentials as they have the same accuracy as the ones obtained from proc..	2014-10-28 04:59:33 +00:00
mjg	37841a11a2	Change loginclass mutex to an rwlock. While here reduce nesting in loginclass_free. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> X-Additional: JuniorJobs project MFC after: 2 weeks	2014-10-28 04:33:57 +00:00
mjg	8c32132302	Tidy up functions related to uidinfo management. - reference found uidinfo in uilookup - reduce nesting by handling shorter cases first	2014-10-27 20:20:05 +00:00
mjg	26906a3e9c	De-k&r-ify function definitions in kern/kern_resource.c No functional changes.	2014-10-27 20:18:30 +00:00
mjg	a9faac8f4b	Avoid dynamic syscall overhead for statically compiled modules. The kernel tracks syscall users so that modules can safely unregister them. But if the module is not unloadable or was compiled into the kernel, there is no need to do this. Achieve this by adding SY_THR_STATIC_KLD macro which expands to SY_THR_STATIC during kernel build and 0 otherwise. Reviewed by: kib (previous version) MFC after: 2 weeks	2014-10-26 19:42:44 +00:00
mjg	b6584b3e11	Fix up an assertion in kern_setgroups, it should compare with ngroups_max + 1 Bug introdued in r273685. Noted by: Tiwei Bie <btw mail.ustc.edu.cn>	2014-10-26 14:25:42 +00:00
mjg	f0db1caf67	Tidy up sys_setgroups and kern_setgroups. - 'groups' initialization to NULL is always ovewrwriten before use, so plug it - get rid of 'goto out' - kern_setgroups's callers already validate ngrp, so only assert the condition - ngrp is an u_int, so 'ngrp < 1' is more readable as 'ngrp == 0' No functional changes.	2014-10-26 06:04:09 +00:00
mjg	db02fb1250	Use a temporary buffer in sys_setgroups for requests with <= XU_NGROUPS groups. Submitted by: Tiwei Bie <btw mail.ustc.edu.cn> X-Additional: JuniorJobs project MFC after: 2 weeks	2014-10-26 05:39:42 +00:00
mjg	5b3100bae0	Now that sysctl_root is only called with sysctl lock in shared mode, update its assertion to require that. Update comment missed in r273400: sysctl_xlock/unlock -> sysctl_xlock/xunlock Noted by: jhb	2014-10-26 01:47:55 +00:00
jhb	f014f5242a	Use correct type in __DEVOLATILE().	2014-10-25 20:42:47 +00:00
mav	f21b293af4	Revert somewhat hackish geom_disk optimization, committed as part of r256880, and the following r273143 commit, supposed to workaround introduced issue by quite innocent-looking change. While there is no clear understanding why, but r273143 is accused in data corruption in some environments with high I/O load. I personally don't see any problem in that commit, and possibly it is just a trigger to some other bug somewhere, but better safe then sorry for now. Requested by: scottl@ MFC after: 3 days	2014-10-25 15:16:19 +00:00
mjg	909f8c4c4a	rlimit: plug duplicate assertion counter sanity is already checked by refcount_release.	2014-10-25 05:56:21 +00:00
delphij	8cf66ad851	Fix build.	2014-10-25 00:16:36 +00:00
jhb	5dd26e948d	The current POSIX semaphore implementation stores the _has_waiters flag in a separate word from the _count. This does not permit both items to be updated atomically in a portable manner. As a result, sem_post() must always perform a system call to safely clear _has_waiters. This change removes the _has_waiters field and instead uses the high bit of _count as the _has_waiters flag. A new umtx object type (_usem2) and two new umtx operations are added (SEM_WAIT2 and SEM_WAKE2) to implement these semantics. The older operations are still supported under the COMPAT_FREEBSD9/10 options. The POSIX semaphore API in libc has been updated to use the new implementation. Note that the new implementation is not compatible with the previous implementation. However, this only affects static binaries (which cannot be helped by symbol versioning). Binaries using a dynamic libc will continue to work fine. SEM_MAGIC has been bumped so that mismatched binaries will error rather than corrupting a shared semaphore. In addition, a padding field has been added to sem_t so that it remains the same size. Differential Revision: https://reviews.freebsd.org/D961 Reported by: adrian Reviewed by: kib, jilles (earlier version) Sponsored by: Norse	2014-10-24 20:02:44 +00:00

1 2 3 4 5 ...

14078 Commits