The constant seems to exists on MacOS X >= 10.8.
Requested by: swills
Reviewed by: allanjude, kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25933
This significantly speeds up path lookup, Cascade Lake doing access(2) on ufs
on /usr/obj/usr/src/amd64.amd64/sys/GENERIC/vnode_if.c, ops/s:
before: 2535298
after: 2797621
Over +10%.
The reversed order of computation here does not seem to matter for hash
distribution.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25921
A free buf's lock may be held (temporarily) due to unlocked lookup, so
buf_alloc() must acquire it without LK_NOWAIT. The unlocked getblk path
should unlock it promptly once it realizes the identity does not match
the buffer it was searching for.
Reported by: gallatin
Reviewed by: kib
Tested by: pho
X-MFC-With: r363482
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25914
No functional change.
LK_SLEEPFAIL implies a behavior that is only possible if the lock operation can
sleep. LK_NOWAIT prevents the lock operation from sleeping.
Discussed with: kib
If the buffer identity changed during lookup, sleeping could introduce a
lock order reversal. Since we do not know if the identity changed until we
get the lock, we must try-lock (LK_NOWAIT) only. EINTR and ERESTART error
handling becomes irrelevant, as we no longer sleep.
Reported by: kib
Reviewed by: kib
X-MFC-With: r363482
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25898
Almost all consumers use the NDF_ONLY_PNBUF macro, making them avoidably branch
a lot in the NDFREE routine. Also note most of them should not need to call
any cleanup anyway as they don't request HASBUF.
This makes the realpath syscall operational with the new lookup. Note that the
walk to obtain the full path name still takes locks.
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23917
When processing the last record in a socket buffer, take care to avoid a
NULL pointer dereference when advancing the record iterator.
Reported by: syzbot+6a689cc9c27bd265237a@syzkaller.appspotmail.com
Fixes: r359778
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
If the remote end closes a TLS socket and the socket buffer still
contains not-yet-decrypted TLS records but no decrypted TLS records,
soreceive needs to block or fail with EWOULDBLOCK. Previously it was
trying to return data and dereferencing a NULL pointer.
Reviewed by: np
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D25838
This has for a while been replaced by makesyscalls.lua in the stock FreeBSD
build. Ensure downstreams get some notice that it'a going away if they're
reliant on it, maybe.
swap size by dividing a value, which was always a multiple of 64, by
64. Remove the code that reduced max swap size down to that cap.
Eliminate the distinction between BLIST_BMAP_RADIX and
BLIST_META_RADIX. Call them both BLIST_RADIX.
Make improvments to the blist self-test code to silence compiler
warnings and to test larger blists.
Reported by: jmallett
Reviewed by: alc
Discussed with: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25736
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754
Provides full scalability as long as all visited filesystems support the
lookup and terminal vnodes are different.
Inner workings are explained in the comment above cache_fplookup.
Capabilities and fd-relative lookups are not supported and will result in
immediate fallback to regular code.
Symlinks, ".." in the path, mount points without support for lockless lookup
and mismatched counters will result in an attempt to get a reference to the
directory vnode and continue in regular lookup. If this fails, the entire
operation is aborted and regular lookup starts from scratch. However, care is
taken that data is not copied again from userspace.
Sample benchmark:
incremental -j 104 bzImage on tmpfs:
before: 142.96s user 1025.63s system 4924% cpu 23.731 total
after: 147.36s user 313.40s system 3216% cpu 14.326 total
Sample microbenchmark: access calls to separate files in /tmpfs, 104 workers, ops/s:
before: 2165816
after: 151216530
Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25578
Modified on each permission change and link/unlink.
Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25573
Convert the bufobj tries to an SMR zone/PCTRIE and add a gbincore_unlocked()
API wrapping this functionality. Use it for a fast path in getblkx(),
falling back to locked lookup if we raced a thread changing the buf's
identity.
Reported by: Attilio
Reviewed by: kib, markj
Testing: pho (in progress)
Sponsored by: Isilon
Differential Revision: https://reviews.freebsd.org/D25782
Adapt r358130, for the almost identical vm_radix, to the pctrie subsystem.
Like that change, the tree is kept correct for readers with store barriers
and careful ordering. Existing locks serialize writers.
Add a PCTRIE_DEFINE_SMR() wrapper that takes an additional smr_t parameter
and instantiates a FOO_PCTRIE_LOOKUP_UNLOCKED() function, in addition to the
usual definitions created by PCTRIE_DEFINE().
Interface consumers will be introduced in later commits.
As future work, it might be nice to add vm_radix algorithms missing from
generic pctrie to the pctrie interface, and then adapt vm_radix to use
pctrie.
Reported by: Attilio
Reviewed by: markj
Sponsored by: Isilon
Differential Revision: https://reviews.freebsd.org/D25781
Allow TLS records to be decrypted in the kernel after being received
by a NIC. At a high level this is somewhat similar to software KTLS
for the transmit path except in reverse. Protocols enqueue mbufs
containing encrypted TLS records (or portions of records) into the
tail of a socket buffer and the KTLS layer decrypts those records
before returning them to userland applications. However, there is an
important difference:
- In the transmit case, the socket buffer is always a single "record"
holding a chain of mbufs. Not-yet-encrypted mbufs are marked not
ready (M_NOTREADY) and released to protocols for transmit by marking
mbufs ready once their data is encrypted.
- In the receive case, incoming (encrypted) data appended to the
socket buffer is still a single stream of data from the protocol,
but decrypted TLS records are stored as separate records in the
socket buffer and read individually via recvmsg().
Initially I tried to make this work by marking incoming mbufs as
M_NOTREADY, but there didn't seemed to be a non-gross way to deal with
picking a portion of the mbuf chain and turning it into a new record
in the socket buffer after decrypting the TLS record it contained
(along with prepending a control message). Also, such mbufs would
also need to be "pinned" in some way while they are being decrypted
such that a concurrent sbcut() wouldn't free them out from under the
thread performing decryption.
As such, I settled on the following solution:
- Socket buffers now contain an additional chain of mbufs (sb_mtls,
sb_mtlstail, and sb_tlscc) containing encrypted mbufs appended by
the protocol layer. These mbufs are still marked M_NOTREADY, but
soreceive*() generally don't know about them (except that they will
block waiting for data to be decrypted for a blocking read).
- Each time a new mbuf is appended to this TLS mbuf chain, the socket
buffer peeks at the TLS record header at the head of the chain to
determine the encrypted record's length. If enough data is queued
for the TLS record, the socket is placed on a per-CPU TLS workqueue
(reusing the existing KTLS workqueues and worker threads).
- The worker thread loops over the TLS mbuf chain decrypting records
until it runs out of data. Each record is detached from the TLS
mbuf chain while it is being decrypted to keep the mbufs "pinned".
However, a new sb_dtlscc field tracks the character count of the
detached record and sbcut()/sbdrop() is updated to account for the
detached record. After the record is decrypted, the worker thread
first checks to see if sbcut() dropped the record. If so, it is
freed (can happen when a socket is closed with pending data).
Otherwise, the header and trailer are stripped from the original
mbufs, a control message is created holding the decrypted TLS
header, and the decrypted TLS record is appended to the "normal"
socket buffer chain.
(Side note: the SBCHECK() infrastucture was very useful as I was
able to add assertions there about the TLS chain that caught several
bugs during development.)
Tested by: rmacklem (various versions)
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24628
In such a case the second argument to lock_delay_arg_init was NULL which was
immediately causing a null pointer deref.
Since the sructure is only used for spin count, provide a dedicate routine
initializing it.
Reported by: andrew
It is very conservative. Only spinning when LK_ADAPTIVE is passed, only on
exclusive lock and never when any waiters are present. buffer cache is remains
not spinning.
This reduces total sleep times during buildworld etc., but it does not shorten
total real time (culprits are contention in the vm subsystem along with slock +
upgrade which is not covered).
For microbenchmarks: open3_processes -t 52 (open/close of the same file for
writing) ops/s:
before: 258845
after: 801638
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25753
During device attachment, all interrupt sources will bind to the BSP,
as it is the only processor online. This means interrupts must be
redistributed ("shuffled") later, during SI_SUB_SMP.
For the EARLY_AP_STARTUP case, this is no longer true. SI_SUB_SMP will
execute much earlier, meaning APs will be online and available before
devices begin attachment, and there will therefore be nothing to
shuffle.
All PIC-conforming interrupt controllers will handle this early
distribution properly, except for RISC-V's PLIC. Make the necessary
tweak to the PLIC driver.
While here, convert irq_assign_cpu from a boolean_t to a bool.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25693