kproc_suspend_check. In r329612 bufspacedaemon was turned into a thread
of the bufdaemon process causing both to call kproc_suspend_check with the
same proc argument and that function contains the following while loop:
while (SIGISMEMBER(p->p_siglist, SIGSTOP)) {
wakeup(&p->p_siglist);
msleep(&p->p_siglist, &p->p_mtx, PPAUSE, "kpsusp", 0);
}
So one thread wakes up the other and the other wakes up the first again,
locking up UP machines on shutdown.
Also register the shutdown handlers with SHUTDOWN_PRI_LAST + 100 so they
run after the syncer has shutdown, because the syncer can cause a
situation where bufdaemon help is needed to proceed.
PR: 227404
Reviewed by: kib
Tested by: cy, rmacklem
1. check if P_ADVLOCK is already set and if so, don't lock to set it
(stolen from DragonFly)
2. when trying for fast path unlock, check that we are doing unlock
first instead of taking the interlock for no reason (e.g. if we want
to *lock*). whilere make it more likely that falling fast path will
not take the interlock either by checking for state
Note the code is severely pessimized both single- and multithreaded.
sysentvec::sv_hwcap/sv_hwcap2 are pointers to u_long, so cpu_features* need
to be u_long to use the pointers. This also requires a temporary cast in
printing the bitfields, which is fine because the feature flag fields are
only 32 bits anyway.
these assumptions may not hold true once we've panic'd. Therefore, the
checks hold less value after a panic. Additionally, if one of the checks
fails while we are already panic'd, this creates a double-panic which can
interfere with debugging the original panic.
Therefore, this commit allows an administrator to suppress a response to
KASSERT checks after a panic by setting a tunable/sysctl. The
tunable/sysctl (debug.kassert.suppress_in_panic) defaults to being
enabled.
Reviewed by: kib
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D12920
FreeBSD exports the AT_HWCAP* auxvec items if provided by the ELF sysentvec
structure. Add the CPU features to be exported, so user space can more
easily check for them without using the hw.cpu_features and hw.cpu_features2
sysctls.
Not all feature flags are synced. Those for processors we don't currently
support are ignored currently. Those that are supported are synced best I
can tell. One flag was renamed to match the Linux flag name
(PPC_FEATURE2_VCRYPTO -> PPC_FEATURE2_VEC_CRYPTO).
When disabling regulator when they are unused, check before is they are
enabled.
While here don't check the enable_cnt on the regulator entry as it is
checked by regnode_stop.
This solve the panic on any board using a fixed regulator that is driven
by a gpio when the regulator is unused.
Tested On: OrangePi One
Pointy Hat to: myself
Reported by: kevans, Milan Obuch (freebsd-arm@dino.sk)
-> PROC_PDEATHSIG_STATUS for consistency with other procctl(2)
operations names.
Requested by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
The lock is required to ensure that the switch to the new credentials
and the transfer of the process's accounting data from the old
credentials to the new ones is done atomically. Otherwise, some updates
may be applied to the new credentials and then additionally transferred
from the old credentials if the updates happen after proc_set_cred() and
before racct_proc_ucred_changed().
The problem is especially pronounced for RACCT_RSS because
- there is a strict accounting for this resource (it's reclaimable)
- it's updated asynchronously by the vm daemon
- it's updated by setting an absolute value instead of applying a delta
I had to remove a call to rctl_proc_ucred_changed() from
racct_proc_ucred_changed() and make all callers of latter call the
former as well. The reason is that rctl_proc_ucred_changed, as it is
implemented now, cannot be called while holding the proc lock, so the
lock is dropped after calling racct_proc_ucred_changed. Additionally,
I've added calls to crhold / crfree around the rctl call, because
without the proc lock there is no gurantee that the new credentials,
owned by the process, will stay stable. That does not eliminate a
possibility that the credentials passed to the rctl will get stale.
Ideally, rctl_proc_ucred_changed should be able to work under the proc
lock.
Many thanks to kib for pointing out the above problems.
PR: 222027
Discussed with: kib
No comment: trasz
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D15048
Using a pointer after setting it NULL is probably not a good plan.
Spotted by inspection during changes for Flexible File Layout Ioerr handling.
This code path obviously isn't normally executed.
MFC after: 1 week
during ifnet detach.
Since destroying interface is not atomic operation and due to the
lack of synhronization during destroy, it is possible, that in the
time between bpfdetach() and if_free() some queued on destroying
interface mbuf will be used by ether_input_internal() and
bpf_peers_present() can dereference NULL bpf_if pointer. To protect
from this, assign pointer to empty bpf_if_ext structure instead of
NULL pointer after bpfdetach().
Reviewed by: melifaro, eugen
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D15083
Summary:
POWER9 also contains 32 slbs entries as explained by the POWER9 User Manual:
"For HPT translation, the POWER9 core contains a unified (combined for both
instruction and data), 32-entry, fully-associative SLB per thread"
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15128
Summary:
Powerpc64 has support for a register called Data Stream Control Register
(DSCR), which basically controls how the hardware controls the caching and
prefetch for stream operations.
Since mfdscr and mtdscr are privileged instructions, we need to emulate them,
and
keep the custom DSCR configuration per thread.
The purpose of this feature is to change DSCR depending on the operation, set
to DSCR Default Prefetch Depth to deepest on string operations, as memcpy.
Submitted by: Breno Leitao
Differential Revision: https://reviews.freebsd.org/D15081
The NFSv4.1 RFC specifies that the OPEN_SHARE_ACCESS_WANT bits can be set
in the OpenDowngrade share_access argument and are basically ignored.
I do not know of a extant NFSv4.1 client that does this, but this little
patch fixes it just in case.
It also changes the error from NFSERR_BADXDR to NFSERR_INVAL since the NFSv4.1
RFC specifies this as the error to be returned if bogus bits are set.
(The NFSv4.0 RFC didn't specify any error for this, so the error reply can
be changed for NFSv4.0 as well.)
Found by inspection while looking at a problem with OpenDowngrade reported
for the ESXi 6.5 NFSv4.1 client.
Reported by: andreas.nagy@frequentis.com
PR: 227214
MFC after: 1 week
region marked "available" by firmware is contained entirely in the kernel.
This had a tendency to happen with FDTs passed by loader, though could for
other reasons as well, and would result in the kernel slowly cannibalizing
itself for other purposes, eventually resulting in a crash.
A similar fix is needed for mmu_oea.c and should probably just be rolled
at that point into some generic code in platform.c for taking a mem_region
list and removing chunks.
PR: 226974
Submitted by: leandro.lupori@gmail.com
Reviewed by: jhibbits
Differential Revision: D15121
Retrieve the tag from the correct ifnet and use the provided tag
(instead of hardcoded 0xffff, implying no tag) in the routines that
process offload policy.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Sponsored by: Chelsio Communications
Remove auxarg_size as it was only used once right after a confusing
assignment in each of the variants of exec_copyout_strings().
Reviewed by: emaste
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D15123
that cross this boundary, they perform better when this isn't the
case. Intel uses the 3rd byte in the vendor specific area for
this. The DC P3500 was previously listed without any explanation. Add
the DC P3520 and DC P4500 to the list.
There won't be any others drives needing this quirk. Intel has
standardized a field in the namespace data in 1.3 (noiob). A future
patch will use that if it exists, with fallback to this method.
Submitted by: Keith Busch
Reviewed by: jimharris@
When a caller passes in a uio or mbuf chain that is longer than crd_len, in
tandem with a transform that supports the multi-block interface,
swcr_encdec() would process the entire mbuf or uio instead of just the
portion indicated by crd_len (+ crd_skip).
De/encryption are performed in-place, so this would trash subsequent uio or
mbuf contents.
This was introduced in r331639 (mea culpa). It only affects the
{de,en}crypt_multi() family of interfaces. That interface only has one
consumer transform in-tree (for now): Chacha20.
PR: 227605
Submitted by: Valentin Vergez <valentin.vergez AT stormshield.eu>
They were previously initialized by the corresponding page daemon
threads, but for vmd_inacthead this may be too late if
vm_page_deactivate_noreuse() is called during boot.
Reported and tested by: cperciva
Reviewed by: alc, kib
MFC after: 1 week
It is the forerunner/foundational work of bringing in both Rack and BBR
which use hpts for pacing out packets. The feature is optional and requires
the TCPHPTS option to be enabled before the feature will be active. TCP
modules that use it must assure that the base component is compile in
the kernel in which they are loaded.
MFC after: Never
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D15020
This will allow to hook a ddb script to "kdb.enter.trap" event.
Previously there was no specific name for this event, so it could only
be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks.
Both are very unspecific.
Having a specific event is useful because the fatal trap condition is
very similar to panic but it has an additional property that the current
stack frame is the frame where the trap occurred. So, both a register
dump and a stack bottom dump have additional information that can help
analyze the problem.
I have added the event only on architectures that have trap_fatal()
function defined. I haven't looked at other architectures. Their
maintainers can add support for the event later.
Sample script:
kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20
Reviewed by: kib, jhb, markj
MFC after: 11 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D15093
Allow processes to request the delivery of a signal upon death of
their parent process. Supposed consumer of the feature is PostgreSQL.
Submitted by: Thomas Munro
Reviewed by: jilles, mjg
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D15106
x86 enforces an (arbitray) limit on the number of available MSI and
MSI-X interrupts to simplify code (in particular, interrupt_source[]
is statically sized). This means that an attempt to allocate an MSI
vector needs to fail if it would go beyond the limit, but the checks
for exceeding the limit had an off-by-one error. In the case of MSI-X
which allocates interrupts one at a time this meant that IRQ 768 kept
getting handed out multiple times for msix_alloc() instead of failing
because all MSI IRQs were in use.
Tested by: lidl
MFC after: 1 week
ACPI I/O port descriptors use _MIN and _MAX fields to specify the set
of allowable base (start) addresses for an I/O port resource along with
a _LEN field specifying the length. A fixed resource is supposed to be
encoded with _MIN == _MAX, but some buggy firmwares instead set _MAX to
the end of the fixed range. Relocating I/O ranges only make sense in
_PRS (possible resource settings), not in _CRS (current resource settings),
so if an I/O port range with _MAX set set to the end of the range is
present in _CRS, treat it as a fixed I/O port resource starting at
_MIN.
PR: 224096
Submitted by: Harald Böhm <harald@boehm.codes>
Pointy hat to: jhb (taking so long to actually commit this)
MFC after: 1 week
Previously, if there are no threads, all queues which targeted
cores that share an L2 cache were bound to a single core. The intent is
to distribute them across these cores.
Reported by: olivier
Reviewed by: sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15120
Pages allocated from a given reservation may belong to different
objects. It is therefore possible for vm_page_ps_test() to be called
with the base page's object unlocked. Check for this case before
asserting that the object lock is held.
Reported by: jhb
Reviewed by: kib
MFC after: 1 week
fget_cap() tries to do a cheaper snapshot of a file descriptor without
holding the file descriptor lock. This snapshot does not do a deep
copy of the ioctls capability array, but instead uses a different
return value to inform the caller to retry the copy with the lock
held. However, filecaps_copy() was returning 1 to indicate that a
retry was required, and fget_cap() was checking for 0 (actually
'!filecaps_copy()'). As a result, fget_cap() did not do a deep copy
of the ioctls array and just reused the original pointer. This cause
multiple file descriptor entries to think they owned the same pointer
and eventually resulted in duplicate frees.
The only code path that I'm aware of that triggers this is to create a
listen socket that has a restricted list of ioctls and then call
accept() which calls fget_cap() with a valid filecaps structure from
getsock_cap().
To fix, change the return value of filecaps_copy() to return true if
it succeeds in copying the caps and false if it fails because the lock
is required. I find this more intuitive than fixing the caller in
this case. While here, change the return type from 'int' to 'bool'.
Finally, make filecaps_copy() more robust in the failure case by not
copying any of the source filecaps structure over. This avoids the
possibility of leaking a pointer into a structure if a similar future
caller doesn't properly handle the return value from filecaps_copy()
at the expense of one more branch.
I also added a test case that panics before this change and now passes.
Reviewed by: kib
Discussed with: mjg (not a fan of the extra branch)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D15047
Half of implementations always failed (returned (-1)) and they were
previously used in only one place.
Reviewed by: kib, andrew
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15102
Also remove the commented out documentation. The documentation arrived
with the import of the copy.9 manpage. I suspect the implementations
came from NetBSD while bootstrapping the Arm and MIPS ports.
Reviewed by: andrew, jmallett
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15108
When ixgbe was converted to iflib, it lost the SIOCGI2C support
that allows ifconfig to print SFP state, optical light levels, etc.
Restore this by plugging in to the ifdi_i2c_req iflib method. Note
that the sanity checking on dev_addr that used to be done in ixgbe is
now done in iflib.
Reviewed by: erj, Matthew Macy <mmacy@mattmacy.io>
Sponsored by: Netflix
Adjust sys/conf/files and sys/modules/puc/Makefile to omit
pucdata.c now tht it's included by puc_pci.c.
Submitted by: Lakhan Shiva Kamireddy (with build fixes by me)
Pull Request: https://github.com/freebsd/freebsd/pull/136
Always take the AST path rather than calling MD functions which are
often implemented as always failing. The is the case on amd64, arm,
i386, and powerpc. This optimization (inherited from 4.4 Lite) is a
pessimization on those architectures and is the sole use of these
functions. They will be removed in a seperate commit.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15101