that returns struct kinfo_file for the given file descriptor. Among
other data, it also returns kf_path, if file op was able to restore file
path.
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33277
Specify that the only supported architecture for a.out is ia32 (either
i386 or amd64 host kernel).
Requested by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32960
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
which will report where the epoch was entered and also
mark the tracker, so that exit will also be reported.
Helps to understand epoch entrance/exit scenarios in
complex cases, like network stack. As everything else
under EPOCH_TRACE it is a developer only tool.
With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc). However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.
Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree(). However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it. Details:
- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
Once the desired inpcb is found we need to transition from SMR
section to r/w lock on the inpcb itself, with a check that inpcb
isn't yet freed. This requires some compexity, since SMR section
itself is a critical(9) section. The complexity is hidden from
KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
notification) also a new KPI is provided, that hides internals of
the database - inp_next(struct inp_iterator *).
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33022
The vast majority of the busy/unbusy users in the tree don't acquire
Giant before calling device_busy/unbusy. However, if multiple threads
are opening a file, say, that causes the device to busy/unbusy, then we
can race to the root marking things busy. Move to using a reference
count to keep track of how many times a device_t has been made busy. Use
that count to make the same decisions that we'd make with the old device
state.
Note: gpiopps.c uses D_TRACKCLOSE. Others do as well. However, there's a
known race with closes that will be corrected for all the drivers that
do this in a future commit.
Sponsored by: Netflix
Reviewed by: hselasky, jhb
Differential Revision: https://reviews.freebsd.org/D26284
This reverts commit 08e781915363f98f4318a864b3b5a52bd99424c6.
Commit message was for a very old version of the patch. Will re-commit
with the right one since it's so bad. There's no locked versions of
it...that code was reworked to use refcnt APIs.
Noticed by: jhb, jtrc27
Sponsored by: Netflix
The vast majority of the busy/unbusy users in the tree don't acquire Giant
before calling device_busy/unbusy. However, if multiple threads are opening a
file, say, that causes the device to busy/unbusy, then we can race to the root
marking things busy. Create a new device_busy_locked and device_unbusy_locked
that are the current implemntations of device_busy and device_unbusy. Make
device_busy and unbusy acquire Giant before calling the _locked versrions. Since
we never sleep in the busy/unbusy path, Giant's single threaded semantics
suffice to keep this safe.
Sponsored by: Netflix
Reviewed by: hselasky, jhb
Differential Revision: https://reviews.freebsd.org/D26284
There are two places where we convert from a timecounter delta to
a bintime delta: tc_windup and bintime_off.
Both functions use the same calculations when the timecounter delta is
small. But for a large delta (greater than approximately an equivalent
of 1 second) the calculations were different. Both functions use
approximate calculations based on th_scale that avoid division. Both
produce values slightly greater than a true value, calculated with
division by tc_frequency, would be. tc_windup is slightly more
accurate, so its result is closer to the true value and, thus, smaller
than bintime_off result.
As a consequence there can be a jump back in time when time hands are
switched after a long period of time (a large delta). Just before the
switch the time would be calculated with a large delta from
th_offset_count in bintime_off. tc_windup does the switch using its own
calculations of a new th_offset using the large delta. As explained
earlier, the new th_offset may end up being less than the previously
produced binuptime. So, for a period of time new binuptime values may
be "back in time" comparing to values just before the switch.
Such a jump must never happen. All the code assumes that the uptime is
monotonically nondecreasing and some code works incorrectly when that
assumption is broken. For example, we have observed sleepq_timeout()
ignoring a timeout when the sbinuptime value obtained by the callout
code was greater than the expiration value, but the sbinuptime obtained
in sleepq_timeout() was less than it. In that case the target thread
would never get woken up.
The unified calculations should ensure the monotonic property of the
uptime.
The problem is quite rare as normally tc_windup should be called HZ
times per second (typically 1000 or 100). But it may happen in VMs on
very busy hypervisors where a VM's virtual CPU may not get an execution
time slot for a second or more.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Panzura LLC
The mprotect syscall decleration is not const. I added this one
incorrectly in a944d28d0edf7ceb1bef4d789dfa4e8e18331658.
Reported by: kib
Reviewed by: kib, imp
Rather than combining the declearation of nosys with the registration
of SYS_syscall, declare syscall(2) and __syscall(2) with the new
SYSMUX type in syscalls.master and declare nosys directly. This
eliminates the last use of syscall aliases in the tree.
Reviewed by: kib, imp
This type is for system call multiplexers (syscall(2), __syscall(2))
that don't have a normal handler and instead are handled in the
machine-dependent syscall code.
Reviewed by: kib, imp
Declare the exit system call normally. This results in the
implementation being named sys_exit rather than sys_sys_exit and
being decalred as returning an int. Infact it does not return
at all because exit1 does not, so add an __unreachable() to let the
compiler know that.
Reviewed by: kib, imp
Declare o<foo>_args rather than reusing the equivalent <foo>_args
structs. Avoiding the addition of a new type isn't worth the
gratutious differences.
Reviewed by: kib, imp
Stop using <foo>_args structs as part of internal kernel APIs. Add
a kern_recvfrom and adjust getsockname and getpeername's equivalent
functions to take individual arguments rather than a uap pointer.
Adopt a convention from CheriBSD that a function interacting with
userspace pointers and sitting between the sys_<foo> syscall and
kern_<foo> implementation is named user_<foo>.
Reviewed by: kib, imp
Swap on file requires operational underlying mount, otherwise
swapoff_all() is guaranteed to panic due to the default strategy VOP for
reclaimed vnodes.
Reported and tested by: peterj
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33147
This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33127
This avoids needing to inspect the mount point every time.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D33125
This is needed to ensure that resolvers that reference global symbols
return correct results.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33120
This avoids the need to keep a freebsd32-specific syscalls.master
in sync with the default ABI. As evidenced by the number of commits
required to sync the two, it is extremely easy for them to get out
of sync due to misunderstandings and user errors.
Reviewed by: kevans, kib
Add _Contains_ annotations indicating that the data pointed to by a
pointer argument contains types that vary between FreeBSD ABIs. The
supported set is long (including size_t), pointer (including
intptr_t), and time_t. The first two vary between 32- and 64-bit
ABIs. The laste betwen i386 and everything else.
These will be used to detect which syscalls require handling on
particular ABIs.
Reviewed by: kevans, kib
Optionally return errors when truncating dev_t, ino_t, and nlink_t.
In the interest of code reuse, use freebsd11_cvtstat() to perform the
truncation and error handling and then convert the resulting struct
freebsd11_stat to struct nstat.
Add missing freebsd32 compat syscalls. These syscalls require
translation because struct nstat contains four instances of struct
timespec which in turn contains a time_t and a long.
Reviewed by: kib