Commit Graph

256 Commits

Author SHA1 Message Date
Damjan Jovanovic
8c309d48aa struct kinfo_file changes needed for lsof to work using only usermode APIs`
Add kf_pipe_buffer_[in/out/size] fields to kf_pipe, and populate them.

Add a kf_kqueue struct to the kf_un union, to allow querying kqueue state,
and populate it.

Populate the kf_sock_rcv_sb_state and kf_sock_snd_sb_state fields in
kf_sock for INET/INET6 sockets, and populate all other fields for all
transport layer protocols, not just TCP.

Bump __FreeBSD_version.

Differential revision:	https://reviews.freebsd.org/D34184
Reviewed by:	jhb, kib, se
MFC after:	1 week
2022-06-18 12:34:25 +03:00
Mark Johnston
524dadf7a8 kevent: Fix an off-by-one in filt_timerexpire_l()
Suppose a periodic kevent timer fires close to its deadline, so that
now - kc->next is small.  Then delta ends up being 1, and the next timer
deadline is set to (delta + 1) * kc->to, where kc->to is the timer
period.  This means that the timer fires at half of the requested rate,
and the value returned in kn_data is similarly inaccurate.

PR:		264131
Fixes:		7cb40543e9 ("filt_timerexpire: do not iterate over the interval")
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35313
2022-05-24 20:14:33 -04:00
Dmitry Chagin
2479e381cd kqueue: Trim trailing whitespace
MFC after:		1 week
2022-05-19 19:52:02 +03:00
Dmitry Chagin
91e7bdcdcf Add timespecvalid_interval macro and use it.
Reviewed by:		jhb, imp (early rev)
Differential revision:	https://reviews.freebsd.org/D34848
MFC after:		2 weeks
2022-04-25 10:20:54 +03:00
Gordon Bergling
c3721292e3 kern: Remove a double word in a source code comment
- s/for for/for/

MFC after:	3 days
2022-04-09 10:50:04 +02:00
Brooks Davis
8e4a3add99 struct kevent_freebsd11 -> struct freebsd11_kevent
Rename to match the naming of syscalls and allow 32 to be appended
without making an ugly name like kevent_freebsd1132.

While here, make the kevent changelist argument const.

Reviewed by:	kib
2021-11-15 18:34:27 +00:00
Mateusz Guzik
2b68eb8e1d vfs: remove thread argument from VOP_STAT
and fo_stat.
2021-10-11 13:22:32 +00:00
Kyle Evans
2f4dbe279f kqueue: fix recent assertion
NOTE_ABSTIME may also have a zero timeout, which indicates that we
should still fire immediately as an absolute time in the past.  A test
has been added for this one as well.

Fixes:	9c999a259f ("kqueue: don't arbitrarily restrict long-past...")
Point hat:	kevans
Reported by:	syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com
2021-10-01 13:17:30 -05:00
Kyle Evans
9c999a259f kqueue: don't arbitrarily restrict long-past values for NOTE_ABSTIME
NOTE_ABSTIME values are converted to values relative to boottime in
filt_timervalidate(), and negative values are currently rejected.  We
don't reject times in the past in general, so clamp this up to 0 as
needed such that the timer fires immediately rather than imposing what
looks like an arbitrary restriction.

Another possible scenario is that the system clock had to be adjusted
by ~minutes or ~hours and we have less than that in terms of uptime,
making a reasonable short-timeout suddenly invalid. Firing it is still
a valid choice in this scenario so that applications can at least
expect a consistent behavior.

Reviewed by:	kib, markj
Discussed with:	allanjude
Differential Revision:	https://reviews.freebsd.org/D32230
2021-09-30 21:31:24 -05:00
Nathaniel Wesley Filardo
0321a7990b kqueue: Add EV_KEEPUDATA flag
When this flag is set, operations that update an existing kevent will
not change the udata field.  This can be used to NOTE_TRIGGER or
EV_{EN,DIS}ABLE events without overwriting the stashed pointer.

Reviewed by:	Domagoj Stolfa <domagoj.stolfa@gmail.com>
Obtained from:	CheriBSD
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D30286
2021-09-23 17:31:39 -07:00
Konstantin Belousov
98168a6e6c kqueue: drain kqueue taskqueue if syscall tickled it
Otherwise return from the syscall and next syscall, which could be
kevent(2) on the kqueue that should be notified, races with the kqueue
taskqueue thread, and potentially misses the wakeup.  This is reliably
visible when kevent(2) only peeks into events using zeroed timeout.

PR:	258310
Reported by:	arichardson, Jan Kokemüller <jan.kokemueller@gmail.com>
Reviewed by:	arichardson, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31858
2021-09-07 02:43:34 +03:00
Mark Johnston
c511383de7 kevent: Fix races between timer detach and kqtimer_proc_continue()
- When detaching a knote, we need to double check the enqueued flag
  after acquiring the process lock, as kqtimer_proc_continue() may have
  toggled it.
- kqtimer_proc_continue() could in principle reschedule a stopped
  callout after filt_timerdetach() drains the callout.  So, we need to
  re-check.

Reported by:	syzbot+4a4cebb3ec07892cb040@syzkaller.appspotmail.com
Reported by:	syzbot+a9c04bc76078a3b7dd8d@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31772
2021-09-01 14:18:58 -04:00
Mateusz Guzik
c9f8dcda85 kqueue: replace kq_ncallouts loop with atomic_fetchadd 2021-06-02 15:14:58 +00:00
Mark Johnston
e00bae5c18 kevent: Prohibit negative change and event list lengths
Previously, a negative change list length would be treated the same as
an empty change list.  A negative event list length would result in
bogus copyouts.  Make kevent(2) return EINVAL for both cases so that
application bugs are more easily found, and to be more robust against
future changes to kevent internals.

Reviewed by:	imp, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30480
2021-05-27 15:52:20 -04:00
Mark Johnston
2cca77ee01 kqueue timer: Remove detached knotes from the process stop queue
There are some scenarios where a timer event may be detached when it is
on the process' kqueue timer stop queue.  If kqtimer_proc_continue() is
called after that point, it will iterate over the queue and access freed
timer structures.

It is also possible, at least in a multithreaded program, for a stopped
timer event to be scheduled without removing it from the process' stop
queue.  Ensure that we do not doubly enqueue the event structure in this
case.

Reported by:	syzbot+cea0931bb4e34cd728bd@syzkaller.appspotmail.com
Reported by:	syzbot+9e1a2f3734652015998c@syzkaller.appspotmail.com
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30251
2021-05-14 10:08:14 -04:00
Konstantin Belousov
7cb40543e9 filt_timerexpire: do not iterate over the interval
User-supplied data might make this loop too time-consuming. Divide
directly, and handle both the possibility that we were woken up earlier,
and arithmetic overflows/underflows from the calculation.

Reported and tested by:	pho (previous version)
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30069
2021-05-03 19:49:54 +03:00
Warner Losh
f1f9870668 Minor style cleanup
We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.

MFC After:		3 days
Sponsored by:		Netflix
2021-04-18 11:14:17 -06:00
Konstantin Belousov
75c5cf7a72 filt_timerexpire: avoid process lock recursion
Found by:	syzkaller
Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29746
2021-04-14 10:53:28 +03:00
Konstantin Belousov
2fd1ffefaa Stop arming kqueue timers on knote owner suspend or terminate
This way, even if the process specified very tight reschedule
intervals, it should be stoppable/killable.

Reported and reviewed by:	markj
Tested by:	markj, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29106
2021-04-09 23:43:51 +03:00
Konstantin Belousov
533e5057ed Add helper for kqueue timers callout scheduling
Reviewed by:	markj
Tested by:	markj, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D29106
2021-04-09 23:42:56 +03:00
Mateusz Guzik
6b3a9a0f3d Convert remaining cap_rights_init users to cap_rights_init_one
semantic patch:

@@

expression rights, r;

@@

- cap_rights_init(&rights, r)
+ cap_rights_init_one(&rights, r)
2021-01-12 13:16:10 +00:00
Jan Kokemüller
4d0c33be63 kevent(2): Bugfix for wrong EVFILT_TIMER timeouts
When using NOTE_NSECONDS in the kevent(2) API, US_TO_SBT should be
used instead of NS_TO_SBT, otherwise the timeout results are
misleading.

PR:		252539
Reviewed by:	kevans, kib
Approved by:	kevans
MFC after:	3 weeks
2021-01-09 20:00:25 +01:00
Mateusz Guzik
e90afaa015 kqueue: save space by using only one func pointer for assertions 2020-11-09 00:04:35 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Kyle Evans
59dafcde62 kqueue: fix conversion of timer data to sbintime
This unbreaks the i386 kqueue timer tests after a recent change switched
NOTE_ABSTIME over to using microseconds. Notably, the data argument (which
holds useconds) is an int64_t, but we were passing it to timer2sbintime
which takes an intptr_t. Perhaps in a previous incarnation, intptr_t would
have made sense, but now it just leads to the timestamp getting truncated
and subsequently rejected when it no longer fits in an intptr_t.

PR:		245768
Reported by:	lwhsu / CI
MFC after:	1 week
2020-04-21 03:57:30 +00:00
Mateusz Guzik
445faddf7f kqueue: use new capsicum helpers 2020-02-15 01:30:13 +00:00
Mark Johnston
918988576c Avoid relying on header pollution from sys/refcount.h.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-07-29 20:26:01 +00:00
Mateusz Guzik
e52327e3c5 proc: postpone proc unlock until after reporting with kqueue
kqueue would always relock immediately afterwards.

While here drop the NULL check for list itself. The list is
always allocated.

Sponsored by:	The FreeBSD Foundation
2018-12-08 06:34:12 +00:00
Mark Johnston
792843c38f Pass malloc flags directly through kevent(2) subroutines.
Some kevent functions have a boolean "waitok" parameter for use when
calling malloc(9).  Replace them with the corresponding malloc() flags:
the desired behaviour is known at compile-time, so this eliminates a
couple of conditional branches, and makes the code easier to read.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18318
2018-11-24 17:06:01 +00:00
Mark Johnston
36c4960ef8 Plug some kernel memory disclosures via kevent(2).
The kernel may register for events on behalf of a userspace process,
in which case it must be careful to zero the kevent struct that will be
copied out to userspace.

Reviewed by:	kib
MFC after:	3 days
Security:	kernel stack memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18317
2018-11-24 17:02:31 +00:00
Mark Johnston
a2afae524a Ensure that knotes do not get registered when KQ_CLOSING is set.
KQ_CLOSING is set before draining the knotes associated with a kqueue,
so we must ensure that new knotes are not added after that point.  In
particular, some kernel facilities may register for events on behalf
of a userspace process and race with a close of the kqueue.

PR:		228858
Reviewed by:	kib
Tested by:	pho
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18316
2018-11-24 16:58:34 +00:00
Mark Johnston
1eeab857a3 Lock the knlist before releasing the in-flux state in knote_fork().
Otherwise there is a window, before iteration is resumed, during which
the knote may be freed.  The in-flux state ensures that the knote will
not be removed from the knlist while locks are dropped.

PR:		228858
Reviewed by:	kib
Tested by:	pho
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18316
2018-11-24 16:41:29 +00:00
Mark Johnston
96fdfb3649 Honour the waitok parameter in kevent_expand().
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18316
2018-11-23 23:10:03 +00:00
Mark Johnston
d5e494fee4 Avoid unsynchronized updates to kn_status.
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held.  For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.

Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter.  KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.

Reported and tested by:	Sylvain GALLIANO <sg@efficientip.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18060
2018-11-21 17:32:09 +00:00
Mark Johnston
45aecd0422 Remove KN_HASKQLOCK.
It is a write-only flag whose last use was removed in r302235.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18059
2018-11-21 17:28:10 +00:00
David Bright
95c05062ec Allow a EVFILT_TIMER kevent to be updated.
If a timer is updated (re-added) with a different time period
(specified in the .data field of the kevent), the new time period has
no effect; the timer will not expire until the original time has
elapsed. This violates the documented behavior as the kqueue(2) man
page says (in part) "Re-adding an existing event will modify the
parameters of the original event, and not result in a duplicate
entry."

This modification, adapted from a patch submitted by cem@ to PR214987,
fixes the kqueue system to allow updating a timer entry. The
kevent timer behavior is changed to:

  * When a timer is re-added, update the timer parameters to and
    re-start the timer using the new parameters.
  * Allow updating both active and already expired timers.
  * When the timer has already expired, dequeue any undelivered events
    and clear the count of expirations.

All of these changes address the original PR and also bring the
FreeBSD and macOS kevent timer behaviors into agreement.

A few other changes were made along the way:

  * Update the kqueue(2) man page to reflect the new timer behavior.
  * Fix man page style issues in kqueue(2) diagnosed by igor.
  * Update the timer libkqueue system test to test for the updated
    timer behavior.
  * Fix the (test) libkqueue common.h file so that it includes
    config.h which defines various HAVE_* feature defines, before the
    #if tests for such variables in common.h. This enables the use of
    the actual err(3) family of functions.
  * Fix the usages of the err(3) functions in the tests for incorrect
    type of variables. Those were formerly undiagnosed due to the
    disablement of the err(3) functions (see previous bullet point).

PR:		214987
Reported by:	Brian Wellington <bwelling@xbill.org>
Reviewed by:	kib
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D15778
2018-07-27 13:49:17 +00:00
Matt Macy
1c0336c1c1 kevent: annotate unused stack local 2018-05-19 05:06:18 +00:00
Matt Macy
ec8d23352b filt_timerdetach: only assign to old if we're going to check it in
a KASSERT
2018-05-19 04:07:00 +00:00
Matt Macy
cbd92ce62e Eliminate the overhead of gratuitous repeated reinitialization of cap_rights
- Add macros to allow preinitialization of cap_rights_t.

- Convert most commonly used code paths to use preinitialized cap_rights_t.
  A 3.6% speedup in fstat was measured with this change.

Reported by:	mjg
Reviewed by:	oshogbo
Approved by:	sbruno
MFC after:	1 month
2018-05-09 18:47:24 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
John Baldwin
ffb6607984 Decode kevent structures logged via ktrace(2) in kdump.
- Add a new KTR_STRUCT_ARRAY ktrace record type which dumps an array of
  structures.

  The structure name in the record payload is preceded by a size_t
  containing the size of the individual structures.  Use this to
  replace the previous code that dumped the kevent arrays dumped for
  kevent().  kdump is now able to decode the kevent structures rather
  than dumping their contents via a hexdump.

  One change from before is that the 'changes' and 'events' arrays are
  not marked with separate 'read' and 'write' annotations in kdump
  output.  Instead, the first array is the 'changes' array, and the
  second array (only present if kevent doesn't fail with an error) is
  the 'events' array.  For kevent(), empty arrays are denoted by an
  entry with an array containing zero entries rather than no record.

- Move kevent decoding tables from truss to libsysdecode.

  This adds three new functions to decode members of struct kevent:
  sysdecode_kevent_filter, sysdecode_kevent_flags, and
  sysdecode_kevent_fflags.

  kdump uses these helper functions to pretty-print kevent fields.

- Move structure definitions for freebsd11 and freebsd32 kevent
  structures to <sys/event.h> so that they can be shared with userland.
  The 32-bit structures are only exposed if _WANT_KEVENT32 is defined.
  The freebsd11 structures are only exposed if _WANT_FREEBSD11_KEVENT is
  defined.  The 32-bit freebsd11 structure requires both.

- Decode freebsd11 kevent structures in truss for the compat11.kevent()
  system call.

- Log 32-bit kevent structures via ktrace for 32-bit compat kevent()
  system calls.

- While here, constify the 'void *data' argument to ktrstruct().

Reviewed by:	kib (earlier version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12470
2017-11-25 04:49:12 +00:00
Mateusz Guzik
6e1619dae3 Add pfind_any
It looks for both regular and zombie processes. This avoids allproc relocking
previously seen with pfind -> zpfind calls.
2017-11-11 18:04:39 +00:00
Konstantin Belousov
d137278838 Do not cast struct kevent_args or struct freebsd11_kevent_args to
struct g_kevent_args.

On some architectures, e.g. PowerPC, there is additional padding in uap.

Reported and tested by:	andreast
Sponsored by:	The FreeBSD Foundation
2017-06-29 14:40:33 +00:00
Konstantin Belousov
2b34e84335 Add abstime kqueue(2) timers and expand struct kevent members.
This change implements NOTE_ABSTIME flag for EVFILT_TIMER, which
specifies that the data field contains absolute time to fire the
event.

To make this useful, data member of the struct kevent must be extended
to 64bit.  Using the opportunity, I also added ext members.  This
changes struct kevent almost to Apple struct kevent64, except I did
not changed type of ident and udata, the later would cause serious API
incompatibilities.

The type of ident was kept uintptr_t since EVFILT_AIO returns a
pointer in this field, and e.g. CHERI is sensitive to the type
(discussed with brooks, jhb).

Unlike Apple kevent64, symbol versioning allows us to claim ABI
compatibility and still name the new syscall kevent(2).  Compat shims
are provided for both host native and compat32.

Requested by:	bapt
Reviewed by:	bapt, brooks, ngie (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D11025
2017-06-17 00:57:26 +00:00
Konstantin Belousov
f2eb97b2cd Style.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-Differential revision:	https://reviews.freebsd.org/D11025
2017-06-16 23:41:13 +00:00
Konstantin Belousov
01feb4c3d4 Use designated initializers for kevent_copyops.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-14 09:25:01 +00:00
Konstantin Belousov
67d0b0ea60 Hide kev_iovlen() definition under #ifdef KTRACE, fixing build of
kernel configs without KTRACE.

Reported by:	rpokala
Sponsored by:	The FreeBSD Foundation
MFC after:	4 days
2017-03-14 08:45:52 +00:00
Konstantin Belousov
1e4296c919 Ktracing kevent(2) calls with unusual arguments might leads to an
overly large allocation requests.

When ktrace-ing io, sys_kevent() allocates memory to copy the
requested changes and reported events.  Allocations are sized by the
incoming syscall lengths arguments, which are user-controlled, and
might cause overflow in calculations or too large allocations.

Since io trace chunks are limited by ktr_geniosize, there is no sense
it even trying to satisfy unbounded allocations.  Export ktr_geniosize
and clamp the buffers sizes in advance.

PR:	217435
Reported by:	Tim Newsham <tim.newsham@nccgroup.trust>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-03-12 13:48:24 +00:00
Hiren Panchasara
7d03ff1fe9 Add kevent EVFILT_EMPTY for notification when a client has received all data
i.e. everything outstanding has been acked.

Reviewed by:	bz, gnn (previous version)
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D9150
2017-01-16 08:25:33 +00:00