Part of i_flag can persist across a drop to hold count of 0, at which
point the vnode is taken off the lazy list. Then whoever locks and unlocks
the vnode can trip on the assert.
This trips over kyua running a test untarring character devices to ufs.
Reported by: lwhsu
This is a follow up to r356481. In locore.S, before virtual memory is
set up, we should avoid using indirect address lookups through the GOT.
Therefore we need to convert uses of the la instruction to lla, which
always generates an auipc/addi pair of instructions. This conversion was
done for the BSP case, but not the AP case, resulting in a fault
somewhere before mpva and a failure to bring APs online.
Reported by: lwhsu
Reviewed by: lwhsu, jrtc27 (accepted in a comment)
Differential Revision: https://reviews.freebsd.org/D23138
Constant requeuing adds significant lock contention in certain
workloads. Lessen the problem by batching it.
Per-cpu areas are locked in order to synchronize against UMA freeing
memory.
vnode's v_mflag is converted to short to prevent the struct from
growing.
Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 122.38s user 1780.45s system 6242% cpu 30.480 total
patched: 144.84s user 985.90s system 4856% cpu 23.282 total
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22998
The current notion of an active vnode is eliminated.
Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.
Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.
Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22997
Quota code is temporarily regressed to do a full vnode scan.
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22996
This obviates the need to scan the entire active list looking for vnodes
of interest.
msync is handled by adding all vnodes with write count to the lazy list.
deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.
Vnodes get dequeued from the list when their hold count reaches 0.
Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.
Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22995
This will be used later to add vnodes to the lazy list.
Reviewed by: kib (previous version), jeff
Tested by: pho (in a larger patch)
Differential Revision: https://reviews.freebsd.org/D22994
Treat it as a synonym for GRND_NONBLOCK. The reasoning is this:
We have two choices for handling Linux's GRND_INSECURE API flag.
1. We could ignore it completely (like GRND_RANDOM). However, this might
produce the surprising result of GRND_INSECURE requests blocking, when the
Linux API does not block.
2. Alternatively, we could treat GRND_INSECURE requests as requests for
GRND_NONBLOCk. Here, the surprising result for Linux programs is that
invocations with unseeded random(4) will produce EAGAIN, rather than
garbage.
Honoring the flag in the way Linux does seems fraught. If we actually use
the output of a random(4) implementation prior to seeding, we leak some
entropy (in an information theory and also practical sense) from what will
be the initial seed to attackers (or allow attackers to arbitrary DoS
initial seeding, if we don't leak). This seems unacceptable -- it defeats
the purpose of blocking on initial seeding.
Secondary to that concern, before seeding we may have arbitrarily little
entropy collected; producing output from zero or a handful of entropy bits
does not seem particularly useful to userspace.
If userspace can accept garbage, insecure, non-random bytes, they can create
their own insecure garbage with srandom(time(NULL)) or similar. Any program
which would be satisfied with a 3-bit key CTR stream has no need for CSPRNG
bytes. So asking the kernel to produce such an output from the secure
getrandom(2) API seems inane.
For now, we've elected to emulate GRND_INSECURE as an alternative spelling
of GRND_NONBLOCK (2). Consider this API not-quite stable for now. We
guarantee it will never block. But we will attempt to monitor actual port
uptake of this bizarre API and may revise our plans for the unseeded
behavior (prior stable/13 branching).
Approved by: csprng(markm), manpages(bcr)
See also: https://lwn.net/ml/linux-kernel/cover.1577088521.git.luto@kernel.org/
See also: https://lwn.net/ml/linux-kernel/20200107204400.GH3619@mit.edu/
Differential Revision: https://reviews.freebsd.org/D23130
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
table.
2) The remote address was filled in and the inp was relocated in the
hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.
Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!
Reviewed by: rrs@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D22971
Over time one or two hard coded function names did not match the
actual function anymore. Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.
MFC after: 2 weeks
Highlights:
- Exit early if we're not disabling unused regulators; there's no need to
take the regulator topology lock and re-evaluate this every iteration, as
it's not going to change.
- Don't emit a notice that we're shutting down a regulator if it's not
enabled, to reduce noise.
- Mention the outcome of the shutdown, to aide debugging and easily let
developer/user collect list of regulators we actually shutdown to
determine problematic one.
Reviewed by: manu
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D22213
This creates a dedicated routine (vn_alloc) to allocate vnodes.
As a side effect code duplicationw with getnewvnode_reserve is eleminated.
Add vn_free for symmetry.
Having a reserved vnode count does not guarantee that getnewvnodes wont
block later. Said blocking partially defeats the purpose of reserving in
the first place.
Preallocate instaed. The only consumer was always passing "1" as count
and never nesting reservations.
r302340, as an attempt to fix the localbus child handling post-rman change,
actually broke child resource allocation, due to typos in
fdt_lbc_reg_decode(). This went unnoticed because there aren't any drivers
currently in tree that use localbus.
This overlays can be used on A64 board to use spigen and spi(8)
on the spi0 pins.
Tested On: Pine64-LTS, A64-Olinuxino
Submitted by: Gary Otten <gdotten@gmail.com>
or rename an entry in it, properly reset the link count of the inode
associated with the entry that was to have been changed.
Tested by: Peter Holm
MFC after: 7 days
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.
However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
respectively. The tunable controls how big is the size of per-cpu
vm page cache. Previously the value was split for all CPUs in system,
so configuring same value on machines with different count of CPUs
yielded in different cache size available to a particular CPU.
Reviewed by: markj
Obtained from: Netflix
We use to handle each message separately in i2c_transfer but that cannot
work with message with NOSTOP as it confuses the controller that we disable
the interrupts and start a new message.
Handle every message in the interrupt handler and fire a new start condition
if the previous message have NOSTOP, the controller understand this as a
repeated start.
This fixes booting on Allwinner A10/A20 platform where before the i2c controller
used to write 0 to the PMIC register that control the regulators as it though that
this was the continuation of the write message.
Tested on: A20 BananaPi, Cubieboard 1 (kevans)
Reported by: kevans
MFC after: 1 month
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.
Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.
Reviewed by: brooks
Discussed with: sjg
Differential Revision: https://reviews.freebsd.org/D23099
This makes makesyscalls.lua more parallel-friendly, or at least not as
hostile to the idea. We get into situations where we're running parallel if
we end up with MAKE_JOBS>1 entering any of the sysent targets, since each
output file is recognized a distinct build step that needs to be executed.
Another commit will add some .ORDER to further improve the situation.
Reported by: jhb
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D23098
This regulator is marked regulator-boot-on, but it will get shutdown if it's
not actually used/enabled by a driver. This should fix sata on the
cubieboard{1,2}.
Reported by: Ray White @ UWaterloo
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D23112
This avoids getting the XHCI_TRB_ERROR_CONTEXT_STATE error code from the XHCI
controller when the endpoint is disabled or already stopped.
Suggested by: Shichun.Ma@dell.com
MFC after: 1 week
Sponsored by: Mellanox Technologies
hw.floatingpoint and hw.altivec are effectively runtime constants (bits from
the cpu_feature bitfield), so don't need Giant, or any locking for that
matter.
It may be possible to make this completely lock free, but for now it's using
a statically allocated bounce buffer in the softc, so it needs to be
guarded.