Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.
While here, make the kevent changelist argument const.
Reviewed by: kib
NOTE_ABSTIME may also have a zero timeout, which indicates that we
should still fire immediately as an absolute time in the past. A test
has been added for this one as well.
Fixes: 9c999a259f00 ("kqueue: don't arbitrarily restrict long-past...")
Point hat: kevans
Reported by: syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com
NOTE_ABSTIME values are converted to values relative to boottime in
filt_timervalidate(), and negative values are currently rejected. We
don't reject times in the past in general, so clamp this up to 0 as
needed such that the timer fires immediately rather than imposing what
looks like an arbitrary restriction.
Another possible scenario is that the system clock had to be adjusted
by ~minutes or ~hours and we have less than that in terms of uptime,
making a reasonable short-timeout suddenly invalid. Firing it is still
a valid choice in this scenario so that applications can at least
expect a consistent behavior.
Reviewed by: kib, markj
Discussed with: allanjude
Differential Revision: https://reviews.freebsd.org/D32230
When this flag is set, operations that update an existing kevent will
not change the udata field. This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.
Reviewed by: Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from: CheriBSD
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D30286
Otherwise return from the syscall and next syscall, which could be
kevent(2) on the kqueue that should be notified, races with the kqueue
taskqueue thread, and potentially misses the wakeup. This is reliably
visible when kevent(2) only peeks into events using zeroed timeout.
PR: 258310
Reported by: arichardson, Jan Kokemüller <jan.kokemueller@gmail.com>
Reviewed by: arichardson, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31858
Previously, a negative change list length would be treated the same as
an empty change list. A negative event list length would result in
bogus copyouts. Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.
Reviewed by: imp, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30480
There are some scenarios where a timer event may be detached when it is
on the process' kqueue timer stop queue. If kqtimer_proc_continue() is
called after that point, it will iterate over the queue and access freed
timer structures.
It is also possible, at least in a multithreaded program, for a stopped
timer event to be scheduled without removing it from the process' stop
queue. Ensure that we do not doubly enqueue the event structure in this
case.
Reported by: syzbot+cea0931bb4e34cd728bd@syzkaller.appspotmail.com
Reported by: syzbot+9e1a2f3734652015998c@syzkaller.appspotmail.com
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30251
User-supplied data might make this loop too time-consuming. Divide
directly, and handle both the possibility that we were woken up earlier,
and arithmetic overflows/underflows from the calculation.
Reported and tested by: pho (previous version)
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30069
We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.
MFC After: 3 days
Sponsored by: Netflix
Found by: syzkaller
Reported and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29746
This way, even if the process specified very tight reschedule
intervals, it should be stoppable/killable.
Reported and reviewed by: markj
Tested by: markj, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29106
When using NOTE_NSECONDS in the kevent(2) API, US_TO_SBT should be
used instead of NS_TO_SBT, otherwise the timeout results are
misleading.
PR: 252539
Reviewed by: kevans, kib
Approved by: kevans
MFC after: 3 weeks
This unbreaks the i386 kqueue timer tests after a recent change switched
NOTE_ABSTIME over to using microseconds. Notably, the data argument (which
holds useconds) is an int64_t, but we were passing it to timer2sbintime
which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would
have made sense, but now it just leads to the timestamp getting truncated
and subsequently rejected when it no longer fits in an intptr_t.
PR: 245768
Reported by: lwhsu / CI
MFC after: 1 week
kqueue would always relock immediately afterwards.
While here drop the NULL check for list itself. The list is
always allocated.
Sponsored by: The FreeBSD Foundation
Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9). Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.
No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18318
The kernel may register for events on behalf of a userspace process,
in which case it must be careful to zero the kevent struct that will be
copied out to userspace.
Reviewed by: kib
MFC after: 3 days
Security: kernel stack memory disclosure
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18317
KQ_CLOSING is set before draining the knotes associated with a kqueue,
so we must ensure that new knotes are not added after that point. In
particular, some kernel facilities may register for events on behalf
of a userspace process and race with a close of the kqueue.
PR: 228858
Reviewed by: kib
Tested by: pho
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18316
Otherwise there is a window, before iteration is resumed, during which
the knote may be freed. The in-flux state ensures that the knote will
not be removed from the knlist while locks are dropped.
PR: 228858
Reviewed by: kib
Tested by: pho
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18316
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held. For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.
Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter. KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.
Reported and tested by: Sylvain GALLIANO <sg@efficientip.com>
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18060
It is a write-only flag whose last use was removed in r302235.
No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18059
If a timer is updated (re-added) with a different time period
(specified in the .data field of the kevent), the new time period has
no effect; the timer will not expire until the original time has
elapsed. This violates the documented behavior as the kqueue(2) man
page says (in part) "Re-adding an existing event will modify the
parameters of the original event, and not result in a duplicate
entry."
This modification, adapted from a patch submitted by cem@ to PR214987,
fixes the kqueue system to allow updating a timer entry. The
kevent timer behavior is changed to:
* When a timer is re-added, update the timer parameters to and
re-start the timer using the new parameters.
* Allow updating both active and already expired timers.
* When the timer has already expired, dequeue any undelivered events
and clear the count of expirations.
All of these changes address the original PR and also bring the
FreeBSD and macOS kevent timer behaviors into agreement.
A few other changes were made along the way:
* Update the kqueue(2) man page to reflect the new timer behavior.
* Fix man page style issues in kqueue(2) diagnosed by igor.
* Update the timer libkqueue system test to test for the updated
timer behavior.
* Fix the (test) libkqueue common.h file so that it includes
config.h which defines various HAVE_* feature defines, before the
#if tests for such variables in common.h. This enables the use of
the actual err(3) family of functions.
* Fix the usages of the err(3) functions in the tests for incorrect
type of variables. Those were formerly undiagnosed due to the
disablement of the err(3) functions (see previous bullet point).
PR: 214987
Reported by: Brian Wellington <bwelling@xbill.org>
Reviewed by: kib
MFC after: 1 week
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D15778
- Add macros to allow preinitialization of cap_rights_t.
- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.
Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
structures.
The structure name in the record payload is preceded by a size_t
containing the size of the individual structures. Use this to
replace the previous code that dumped the kevent arrays dumped for
kevent(). kdump is now able to decode the kevent structures rather
than dumping their contents via a hexdump.
One change from before is that the 'changes' and 'events' arrays are
not marked with separate 'read' and 'write' annotations in kdump
output. Instead, the first array is the 'changes' array, and the
second array (only present if kevent doesn't fail with an error) is
the 'events' array. For kevent(), empty arrays are denoted by an
entry with an array containing zero entries rather than no record.
- Move kevent decoding tables from truss to libsysdecode.
This adds three new functions to decode members of struct kevent:
sysdecode_kevent_filter, sysdecode_kevent_flags, and
sysdecode_kevent_fflags.
kdump uses these helper functions to pretty-print kevent fields.
- Move structure definitions for freebsd11 and freebsd32 kevent
structures to <sys/event.h> so that they can be shared with userland.
The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
defined. The 32-bit freebsd11 structure requires both.
- Decode freebsd11 kevent structures in truss for the compat11.kevent()
system call.
- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
system calls.
- While here, constify the 'void *data' argument to ktrstruct().
Reviewed by: kib (earlier version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12470
struct g_kevent_args.
On some architectures, e.g. PowerPC, there is additional padding in uap.
Reported and tested by: andreast
Sponsored by: The FreeBSD Foundation
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.
To make this useful, data member of the struct kevent must be extended
to 64bit. Using the opportunity, I also added ext members. This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.
The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).
Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2). Compat shims
are provided for both host native and compat32.
Requested by: bapt
Reviewed by: bapt, brooks, ngie (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D11025
overly large allocation requests.
When ktrace-ing io, sys_kevent() allocates memory to copy the
requested changes and reported events. Allocations are sized by the
incoming syscall lengths arguments, which are user-controlled, and
might cause overflow in calculations or too large allocations.
Since io trace chunks are limited by ktr_geniosize, there is no sense
it even trying to satisfy unbounded allocations. Export ktr_geniosize
and clamp the buffers sizes in advance.
PR: 217435
Reported by: Tim Newsham <tim.newsham@nccgroup.trust>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
closed by r310302 for knote().
If KN_INFLUX | KN_SCAN flags are set for the note passed to knote() or
knote_fork(), i.e. the knote is scanned, we might erronously clear
INFLUX when finishing notification. For normal knote() it was fixed
in r310302 simply by remembering the fact that we do not own
KN_INFLUX, since there we own knlist lock and scan thread cannot clear
KN_INFLUX until we drop the lock. For knote_fork(), the situation is
more complicated, e must drop knlist lock AKA the process lock, since
we need to register new knotes.
Change KN_INFLUX into counter and allow shared ownership of the
in-flux state between scan and knote_fork() or knote(). Both in-flux
setters need to ensure that knote is not dropped in parallel. Added
assert about kn_influx == 1 in knote_drop() verifies that in-flux state
is not shared when knote is destroyed.
Since KBI of the struct knote is changed by addition of the int
kn_influx field, reorder kn_hook and kn_hookid to fill pad on LP64
arches [1]. This keeps sizeof(struct knote) to same 128 bytes as it
was before addition of kn_influx, on amd64.
Reviewed by: markj
Suggested by: markj [1]
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D8898
accepting the wrong state and printing warning. Do not obliterate
kl_lock and kl_unlock pointers, they are often useful for post-mortem
analysis.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D8898