freebsd-skq

Author	SHA1	Message	Date
Justin T. Gibbs	5405e7e2ee	Provide high precision conversion from ns,us,ms -> sbintime in kevent In timer2sbintime(), calculate the second and fractional second portions of the sbintime separately. When calculating the the fractional second portion, use a 64bit multiply to prevent excess truncation. This avoids the ~7% error in the original conversion for ns, and smaller errors of the same type for us and ms. PR: 198139 Reviewed by: jhb MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D5397	2016-03-12 23:02:53 +00:00
Mark Johnston	88c2beac9c	Ensure that we test the event condition when a disabled kevent is enabled. r274560 modified kqueue_register() to only test the event condition if the corresponding knote is not disabled. However, this check takes place before the EV_ENABLE flag is used to clear the KN_DISABLED flag on the knote, so enabling a previously-disabled kevent would not result in a notification for a triggered event. This change fixes the problem by testing for EV_ENABLED before possibly checking the event condition. This change also updates a kqueue regression test to exercise this case. PR: 206368 Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307	2016-02-19 01:49:33 +00:00
Mark Johnston	fe169828c3	Return an error if both EV_ENABLE and EV_DISABLE are specified for a kevent. Currently, this combination results in EV_DISABLE being ignored. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D5307	2016-02-19 01:35:01 +00:00
Eric van Gyzen	0e3d6ed44e	kqueue EVFILT_PROC: avoid collision between NOTE_CHILD and NOTE_EXIT NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT. Do not let the two events be combined, since one would overwrite the other's data. PR: 180385 Submitted by: David A. Bright <david_a_bright@dell.com> Reviewed by: jhb MFC after: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D4900	2016-01-28 20:24:15 +00:00
Mateusz Guzik	3c44a3495f	kqueue: simplify kern_kqueue by not refing/unrefing creds too early No functional changes.	2015-09-23 12:45:08 +00:00
Konstantin Belousov	6ae26d06dc	Exit notification for EVFILT_PROC removes knote from the knlist. In particular, this invalidates the knote kn_link linkage, making the SLIST_FOREACH() loop accessing undefined values (e.g. trashed by QUEUE_MACRO_DEBUG). If the knote is freed by other thread when kq lock is released or when influx is cleared, e.g. by knote_scan() for kqueue owning the knote, the iteration step would access freed memory. Use SLIST_FOREACH_SAFE() to fix iteration. Diagnosed by: avg Tested by: avg, lstewart, pawel Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-09-01 14:05:29 +00:00
Konstantin Belousov	78b9afe121	Clean up the kqueue use of the uma KPI. Explain why it is fine to not check for M_NOWAIT failures in kqueue_register(). Remove unneeded check for NULL result from waitable allocation in kqueue_scan(). uma_free(9) handles NULL argument correctly, remove checks for NULL. Remove useless cast and adjust style in knote_alloc(). Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-09-01 13:21:32 +00:00
Ed Schouten	880e2c6c52	Perform cleanups in response to D3307. - Document the kern_kevent_anonymous() function. - Add assertions to ensure that we don't silently leave the kqueue linked from a file descriptor table. Reviewed by: jmg Differential Revision: https://reviews.freebsd.org/D3364	2015-08-12 17:46:26 +00:00
Ed Schouten	e26f6b5f6b	Add support for anonymous kqueues. CloudABI's polling system calls merge the concept of one-shot polling (poll, select) and stateful polling (kqueue). They share the same data structures. Extend FreeBSD's kqueue to provide support for waiting for events on an anonymous kqueue. Unlike stateful polling, there is no need to support timeouts, as an additional timer event could be used instead. Furthermore, it makes no sense to use a different number of input and output kevents. Merge this into a single argument. Obtained from: https://github.com/NuxiNL/freebsd Differential Revision: https://reviews.freebsd.org/D3307	2015-08-11 13:47:23 +00:00
Ed Schouten	a2034cc98a	Allow the creation of kqueues with a restricted set of Capsicum rights. On CloudABI we want to create file descriptors with just the minimal set of Capsicum rights in place. The reason for this is that it makes it easier to obtain uniform behaviour across different operating systems. By explicitly whitelisting the operations, we can return consistent error codes, but also prevent applications from depending OS-specific behaviour. Extend kern_kqueue() to take an additional struct filecaps that is passed on to falloc_caps(). Update the existing consumers to pass in NULL. Differential Revision: https://reviews.freebsd.org/D3259	2015-08-05 07:36:50 +00:00
Konstantin Belousov	b4490c6e93	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
Mateusz Guzik	f6f6d24062	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
Dmitry Chagin	7236f2c220	For future use in the Linuxulator: 1. Add a kern_kqueue() counterpart for kqueue() with flags parameter. 2. Be a bit secure. To avoid a double fp lookup add a kern_kevent_fp() counterpart for kern_kevent() with file pointer parameter instead of file descriptor an pass the buck to it. Suggested by: mjg [2] Differential Revision: https://reviews.freebsd.org/D1091 Reviewed by: trasz	2015-05-24 16:36:29 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
John-Mark Gurney	2c30bc1fcf	prevent doing filter ops locking for staticly compiled filter ops... This significantly reduces lock contention when adding/removing knotes on busy multi-kq system... Next step is to cache these references per kq.. i.e. kq refs it once and keeps a local ref count so that the same refs don't get accessed by many cpus... only allocate a knote when we might use it... Add a new flag, _FORCEONESHOT.. This allows a thread to force the delivery of another event in a safe manner, say waking up an idle http connection to force it to be reaped... If we are _DISABLE'ing a knote, don't bother to call f_event on it, it's disabled, so won't be delivered anyways.. Tested by: adrian	2014-11-16 01:18:41 +00:00
Ian Lepore	41e8f7efbe	Make kevent(2) periodic timer events more reliably periodic. The event callout is now scheduled using the C_ABSOLUTE flag, and the absolute time of each event is calculated as the time the previous event was scheduled for plus the interval. This ensures that latency in processing a given event doesn't perturb the arrival time of any subsequent events. Reviewed by: jhb	2014-10-04 15:59:15 +00:00
John Baldwin	9696feebe2	Add a new fo_fill_kinfo fileops method to add type-specific information to struct kinfo_file. - Move the various fill_*_info() methods out of kern_descrip.c and into the various file type implementations. - Rework the support for kinfo_ofile to generate a suitable kinfo_file object for each file and then convert that to a kinfo_ofile structure rather than keeping a second, different set of code that directly manipulates type-specific file information. - Remove the shm_path() and ksem_info() layering violations. Differential Revision: https://reviews.freebsd.org/D775 Reviewed by: kib, glebius (earlier version)	2014-09-22 16:20:47 +00:00
John Baldwin	2d69d0dcc2	Fix various issues with invalid file operations: - Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(), and invfo_kqfilter() for use by file types that do not support the respective operations. Home-grown versions of invfo_poll() were universally broken (they returned an errno value, invfo_poll() uses poll_no_poll() to return an appropriate event mask). Home-grown ioctl routines also tended to return an incorrect errno (invfo_ioctl returns ENOTTY). - Use the invfo_() functions instead of local versions for unsupported file operations. - Reorder fileops members to match the order in the structure definition to make it easier to spot missing members. - Add several missing methods to linuxfileops used by the OFED shim layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most of these used invfo_(), but a dummy fo_stat() implementation was added.	2014-09-12 21:29:10 +00:00
Baptiste Daroussin	42e62eca52	Extend kqueue's EVFILT_TIMER by adding precision unit flags support Define the precision macros as bits sets to conform with XNU equivalent. Test fflags passed for EVFILT_TIMER and return EINVAL in case an invalid flag is passed. Phabric: https://phabric.freebsd.org/D421 Reviewed by: kib	2014-07-18 14:27:04 +00:00
Davide Italiano	4bc38a5ab0	Hide internal details of sbintime_t implementation wrapping INT64_MAX into SBT_MAX, to make it more robust in case internal type representation will change in the future. All the consumers were migrated to SBT_MAX and every new consumer (if any) should from now use this interface. Requested by: bapt, jmg, Ryan Lortie (implictly) Reviewed by: mav, bde	2014-04-12 23:29:29 +00:00
Ed Schouten	38219d6acd	Implement kqueue(2) for procdesc(4). kqueue(2) already supports EVFILT_PROC. Add an EVFILT_PROCDESC that behaves the same, but operates on a procdesc(4) instead. Only implement NOTE_EXIT for now. The nice thing about NOTE_EXIT is that it also returns the exit status of the process, meaning that we can now obtain this value, even if pdwait4(2) is still unimplemented. Notes: - Simply reuse EVFILT_NETDEV for EVFILT_PROCDESC. As both of these will be used on totally different descriptor types, this should not clash. - Let procdesc_kqops_event() reuse the same structure as filt_proc(). The only difference is that procdesc_kqops_event() should also be able to deal with the case where the process was already terminated after registration. Simply test this when hint == 0. - Fix some style(9) issues in filt_proc() to keep it consistent with the newly added procdesc_kqops_event(). - Save the exit status of the process in pd->pd_xstat, as we cannot pick up the proctree_lock from within procdesc_kqops_event(). Discussed on: arch@ Reviewed by: kib@	2014-04-07 18:10:49 +00:00
Konstantin Belousov	1a5edcf8ea	When KN_INFLUX is set on the knote due to kqueue_register() or kqueue_scan() unlocking the kqueue to call f_event, knote() or knote_fork() should not skip the knote. The knote is not going to disappear during the influx time, and the mutual exclusion between scan and knote() is ensured by both code pathes taking knlist lock. The race appears since knlist lock is before kq lock, so KN_INFLUX must be set, kq lock must be dropped and only then knlist lock can be taken. The window between kq unlock and knlist lock causes lost events. Add a flag KN_SCAN to indicate that KN_INFLUX is set in a manner safe for the knote(), and check for it to ignore KN_INFLUX in the knote*() as needed. Also, in knote(), remove the lockless check for the KN_INFLUX flag, which could also result in the lost notification. Reported and tested by: Kohji Okuno <okuno.kohji@jp.panasonic.com> Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-04-05 14:09:16 +00:00
Robert Watson	4a14441044	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks	2014-03-16 10:55:57 +00:00
Adrian Chadd	fda21f4d2a	Add in a default initialiser for the EVOPS_SENDFILE kqueue filterops. Sponsored by: Netflix, Inc.	2014-01-17 05:15:44 +00:00
Adrian Chadd	faa9b054a0	Add a compile-time control over the size of KN_HASHSIZE. This is needed for applications that use a lot of non-filedescriptor knotes. MFC after: 1 week Sponsored by: Netflix, Inc.	2014-01-07 01:17:27 +00:00
Stefan Eßer	774e8d906f	Fix compilation on 32 bit architectures and use INT64_MAX instead of LONG_MAX for the upper bound check.	2013-12-19 21:35:33 +00:00
Stefan Eßer	53d5cc255d	Fix overflow for timeout values of more than 68 years, which is the maximum covered by sbintime (LONG_MAX seconds). Some programs use timeout values in excess of 1000 years. The conversion to sbintime caused wrap-around on overflow, which resulted in short or negative timeout values. This caused long delays on sockets opened by affected programs (e.g. OpenSSH). Kernels compiled without -fno-strict-overflow were not affected, apparently because the compiler tested the sign of the timeout value before performing the multiplication that lead to overflow. When the -fno-strict-overflow option was added to CFLAGS, this optimization was disabled and the test was performed on the result of the multiplication. Negative products were caught and resulted in EINVAL being returned, but wrap-around to positive values just shortened the timeout value to the residue of the result that could be represented by sbintime. The fix is to cap the timeout values at the maximum that can be represented by sbintime, which is 2^31 - 1 seconds or more than 68 years. After this change, the kernel can be compiled with -fno-strict-overflow with no ill effects. MFC after: 3 days	2013-12-19 09:01:46 +00:00
Pawel Jakub Dawidek	ed5848c835	Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I had a very hard time to fully understand) with much more intuitive rights: CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2). CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor. Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE. Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2013-11-15 19:55:35 +00:00
Jilles Tjoelker	1947c8a6d1	kqueue: Change error for kqueues rlimit from EMFILE to ENOMEM and document this error condition in the kqueue(2) manual page. Discussed with: kib	2013-11-03 23:06:24 +00:00
Konstantin Belousov	9110db818a	Add a resource limit for the total number of kqueues available to the user. Kqueue now saves the ucred of the allocating thread, to correctly decrement the counter on close. Under some specific and not real-world use scenario for kqueue, it is possible for the kqueues to consume memory proportional to the square of the number of the filedescriptors available to the process. Limit allows administrator to prevent the abuse. This is kernel-mode side of the change, with the user-mode enabling commit following. Reported and tested by: pho Discussed with: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2013-10-21 16:44:53 +00:00
Konstantin Belousov	9d2abcd01a	Do not allow negative timeouts for kqueue timers, check for the negative timeout both before and after the conversion to sbintime_t. For periodic kqueue timer, convert zero timeout into 1ms, to avoid interrupt storm on fast event timers. Reported and tested by: pho Discussed with: mav Reviewed by: davide Sponsored by: The FreeBSD Foundation Approved by: re (marius)	2013-09-26 13:17:31 +00:00
Konstantin Belousov	19f6a6a1ca	Pre-acquire the filedesc sx when a possibility exists that the later code could need to remove a kqueue from the filedesc list. Global lock is already locked, which causes sleepable after non-sleepable lock acquisition. Reported and tested by: pho Reviewed by: jmg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2013-09-22 19:54:47 +00:00
Roman Divacky	b12698e1a1	Revert r255672, it has some serious flaws, leaking file references etc. Approved by: re (delphij)	2013-09-18 18:48:33 +00:00
Roman Divacky	253c75c0de	Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue to implement epoll subset of functionality. The kqueue user data are 32bit on i386 which is not enough for epoll user data so this patch overrides kqueue fileops to maintain enough space in struct file. Initial patch developed by me in 2007 and then extended and finished by Yuri Victorovich. Approved by: re (delphij) Sponsored by: Google Summer of Code Submitted by: Yuri Victorovich <yuri at rawbw dot com> Tested by: Yuri Victorovich <yuri at rawbw dot com>	2013-09-18 17:56:04 +00:00
Konstantin Belousov	e8de242d3a	Use TAILQ instead of STAILQ for kqeueue filedescriptors to ensure constant time removal on kqueue close. Reported and tested by: pho Reviewed by: jmg Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (delphij)	2013-09-13 19:50:50 +00:00
Pawel Jakub Dawidek	7008be5bd7	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation	2013-09-05 00:09:56 +00:00
John-Mark Gurney	57150ff69b	fix up some comments and a white space issue... MFC after: 3 days	2013-08-26 18:53:19 +00:00
Gleb Smirnoff	ca04d21d5f	Make sendfile() a method in the struct fileops. Currently only vnode backed file descriptors have this method implemented. Reviewed by: kib Sponsored by: Nginx, Inc. Sponsored by: Netflix	2013-08-15 07:54:31 +00:00
John Baldwin	e05bf4cf95	Some small cleanups to the fixes in r180340: - Set NOTE_TRACKERR before running filt_proc(). If the knote did not have NOTE_FORK set in fflags when registered, then the TRACKERR event could miss being posted. - Don't pass the pid in to filt_proc() for NOTE_FORK events. The special handling for pids is done knote_fork() directly and no longer in filt_proc(). MFC after: 2 weeks	2013-08-13 18:45:58 +00:00
John Baldwin	5b596f0f5f	Don't emit a spurious EVFILT_PROC event with no fflags set on process exit if NOTE_EXIT is not being monitored. The rationale is that a listener should only get an event for exit() if they registered interest via NOTE_EXIT. This matches the behavior on OS X. - Don't save the exit status on process exit unless NOTE_EXIT is being monitored. - Add an internal EV_DROP flag that requests kqueue_scan() to free the knote without signalling it to userland and use this when a process exits but the fflags in the knote is zero. Reviewed by: jmg MFC after: 1 month	2013-08-07 19:56:35 +00:00
Ed Schouten	2381f6ef8c	Change callout use counter to use C11 atomics. In order to get some coverage of C11 atomics in kernelspace, switch at least one piece of code in kernelspace to use C11 atomics instead of <machine/atomic.h>. While there, slightly improve the code by adding an assertion to prevent the use count from going negative.	2013-06-16 09:30:35 +00:00
Alexander Motin	21a37a7196	Rework overflow checks of r247898 to not let too "intelligent" compiler to optimize it out. Submitted by: bde	2013-03-09 09:07:13 +00:00
Alexander Motin	836972b877	Fix off-by-one error in nanoseconds validation. Submitted by: bde	2013-03-07 16:50:07 +00:00
Alexander Motin	980c545d76	Fix time math overflows and improve zero intervals handling in poll(), select(), nanosleep() and kevent() functions after calloutng changes. Reported by: bde	2013-03-06 19:37:38 +00:00
Davide Italiano	40e794ab19	MFcalloutng: - Rewrite kevent() timeout implementation to allow sub-tick precision. - Make the interval timings for EVFILT_TIMER more accurate. This also removes an hack introduced in r238424. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, markj, Fabian Keil	2013-03-04 16:55:16 +00:00
John Baldwin	2919668490	Make the interval timings for EVFILT_TIMER more accurate. tvtohz() always adds an extra tick to account for the current partial clock tick. However, that is not appropriate for a repeating timer when the exact tvtohz() value should be used for subsequent intervals. Fix repeating callouts for EVFILT_TIMER by subtracting 1 tick from the tvtohz() result similar to the fix used in realitexpire() for interval timers. While here, update a few comments to note that if the EVFILT_TIMER code were to move out of kern_event.c, it should move to kern_time.c (where the interval timer code it mimics lives) rather than kern_timeout.c. MFC after: 1 month	2012-07-13 13:24:33 +00:00
Pawel Jakub Dawidek	a79de683f5	Update comment. MFC after: 1 month	2012-06-14 17:32:58 +00:00
Alexander V. Chernikov	b25711e6b0	- Add knlist_init_rw_reader() function to kqueue(9). Function acquired reader lock if needed. Assert check for reader or writer lock (RA_LOCKED / RA_UNLOCKED) - While here, add knlist_init_mtx.9 to MLINKS and fix some style(9) issues Reviewed by: glebius Approved by: ae(mentor) MFC after: 2 weeks	2012-03-26 09:34:17 +00:00
Kip Macy	8451d0dd78	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
Attilio Rao	6aba400a70	Fix a deficiency in the selinfo interface: If a selinfo object is recorded (via selrecord()) and then it is quickly destroyed, with the waiters missing the opportunity to awake, at the next iteration they will find the selinfo object destroyed, causing a PF#. That happens because the selinfo interface has no way to drain the waiters before to destroy the registered selinfo object. Also this race is quite rare to get in practice, because it would require a selrecord(), a poll request by another thread and a quick destruction of the selrecord()'ed selinfo object. Fix this by adding the seldrain() routine which should be called before to destroy the selinfo objects (in order to avoid such case), and fix the present cases where it might have already been called. Sometimes, the context is safe enough to prevent this type of race, like it happens in device drivers which installs selinfo objects on poll callbacks. There, the destruction of the selinfo object happens at driver detach time, when all the filedescriptors should be already closed, thus there cannot be a race. For this case, mfi(4) device driver can be set as an example, as it implements a full correct logic for preventing this from happening. Sponsored by: Sandvine Incorporated Reported by: rstone Tested by: pluknet Reviewed by: jhb, kib Approved by: re (bz) MFC after: 3 weeks	2011-08-25 15:51:54 +00:00

1 2 3 4

193 Commits