Remove the KAME custom circular queue for fragments and fragmented packets
and replace them with a standard TAILQ.
This make the code a lot more understandable and maintainable and removes
further hand-rolled code from the the tree using a standard interface instead.
Hide the still public structures under #ifdef _KERNEL as there is no
use for them in user space.
The naming is a bit confusing now as struct ip6q and the ip6q[] buckets
array are not the same anymore; sadly struct ip6q is also used by the
MAC framework and we cannot rename it.
Submitted by: jtl (initally)
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16847 (jtl's original)
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.
Reported by: jhb
Reviewed by: kib
MFC with: r353890
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22120
As with amd64 NUMA is required for reasonable operation on big-iron
arm64 systems and is expected to have no significant impact on small
systems. Enable it now for wider testing in advance of FreeBSD 13.0.
You can use the 'vm.ndomains' sysctl to see if multiple domains are in
use - for example (from Cavium/Marvell ThunderX2):
# sysctl vm.ndomains
vm.ndomains: 2
No objection: manu
Sponsored by: The FreeBSD Foundation
This significantly reduces contention since chunks get created and removed
all the time. See the review for sample results.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21976
No functional change (in program code). Additional DWARF metadata is
generated in the .eh_frame section. Also, it is now a compile-time
requirement that machine/asm.h ENTRY() and END() macros are paired. (This
is subject to ongoing discussion and may change.)
This DWARF metadata allows llvm-libunwind to unwind program stacks when the
program is executing the function. The goal is to collect accurate
userspace stacktraces when programs have entered syscalls.
(The motivation for "Call Frame Information," or CFI for short -- not to be
confused with Control Flow Integrity -- is to sufficiently annotate assembly
functions such that stack unwinders can unwind out of the local frame
without the requirement of a dedicated framepointer register; i.e.,
-fomit-frame-pointer. This is necessary for C++ exception handling or
collecting backtraces.)
For the curious, a more thorough description of the metadata and some
examples may be found at [1] and documentation at [2]. You can also look at
'cc -S -o - foo.c | less' and search for '.cfi_' to see the CFI directives
generated by your C compiler.
[1]: https://www.imperialviolet.org/2017/01/18/cfi.html
[2]: https://sourceware.org/binutils/docs/as/CFI-directives.html
Reviewed by: emaste, kib (with reservations)
Differential Revision: https://reviews.freebsd.org/D22122
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.
Reported by: syzkaller
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22124
A caller that does not guarantee that a page's identity won't change
while sleeping for a busy lock must specify either NOWAIT or WAITFAIL.
Reported by: syzkaller
Reviewed by: alc, kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22124
The LinuxKPI linux_dma code calls PCTRIE_INSERT with a
mutex held, but does not set M_NOWAIT when allocating
nodes, leading to a potential panic. All of this code
can handle an allocation failure here, so prefer an
allocation failure to sleeping on memory.
Also fix a related case where NOWAIT/WAITOK was not
specified. In this case it's not clear whether sleeping
is allowed so be conservative and assume not. There are
a lot of other paths in this code that can fail due to
a lack of memory anyway.
Differential Revision: https://reviews.freebsd.org/D22127
Reviewed by: imp
Sponsored by: Dell EMC Isilon
MFC After: 1 week
Summary:
Historically, we have built toolchain components such as cc, ld, etc as
statically linked executables. One of the reasons being that you could
sometimes save yourself from botched upgrades, by e.g. recompiling a
"known good" libc and reinstalling it.
In this day and age, we have boot environments, virtual machine
snapshots, cloud backups, and other much more reliable methods to
restore systems to working order. So I think the time is ripe to flip
this default, and link the toolchain components dynamically, just like
almost all other executables on FreeBSD.
Maybe at some point they can even become PIE executables by default! :)
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22061
i686, as per the discussion on the freebsd-arch mailing list. Earlier
in r352030, I had already bumped it to i586, to work around missing
atomic 64 bit functions for the i386 architecture.
Relnotes: yes
The object does not provide anonymous memory.
Reported by: kib
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22123
The lr.w instruction used to read the value from memory sign-extends
the value read from memory. GCC sign-extends the 32-bit comparison
value passed in whereas clang currently does not. As a result, if the
value being compared has the MSB set, the comparison fails for
matching 32-bit values when compiled with clang.
Use a cast to explicitly sign-extend the unsigned comparison value.
This works with both GCC and clang.
There is commentary in the RISC-V spec that suggests that GCC's
approach is more correct, but it is not clear if the commentary in the
RISC-V spec is binding.
Reviewed by: mhorne
Obtained from: Axiado
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22084
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().
Reported and reviewed by: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22126
When we raise a data abort from the kernel we need to enable interrupts,
however we shouldn't be doing this when in the kernel debugger. In this
case interrupts can lead to a further panic as they don't expect to be
run from such a context.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
This method check that boot_on or always_on is set to 1 and if it
is it will try to enable the regulator.
The binding docs aren't clear on what to do but Linux enable the regulator
if any of those properties is set so we want to do the same.
The function first check the status to see if the regulator is
already enabled it then get the voltage to check if it is in a acceptable
range and then enables it.
This will be either called from the regnode_init method (if it's needed by the platform)
or by a SYSINIT at SI_SUB_LAST
Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22106
operation & ~limit where limit is a bool is clearly not what was intended,
given the line prior. Correct it to use the calculated mask for validation.
The cap_sysctl tests should now be functional again.
Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.
Reported by: cem
r353489 added minidump support for powerpc64, but it added a dependency on
the dump_avail array. Leaving it uninitialized caused breakage in late
boot. Initialize dump_avail, even though the 64-bit booke pmap doesn't yet
support minidumps, but will in the future.
Instead of superpages use. The current code employs superpage-wide locking
regardless and the better locking granularity is welcome with NUMA enabled
even when superpage support is not used.
Requested by: alc
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21982
Vast majority of uses the cache are just checking if there is an entry
present on process exit (and evicting it if so). Both checking and
eviction process are very expensive and put the lock protecting it high
up on the profile during poudriere -j 104.
Convert the linked list into a hash. This allows to almost always avoid
taking the lock in the first place (and consequently almost removes it
from the profile). Note only one lock is preserved as a split did not
meaningfully impact contention.
Should the cache be used for something it will still run into contention
issues. The code needs a rewrite, but should someone want to tidy it up
further the following can be done:
1) per-chain locks (or at least an array)
2) hashing by something else than just pid
Sponsored by: The FreeBSD Foundation
This appears to be a copy-pasto from previous lines that propagated to v6
over the years. Indeed, nothing references kernelstack beyond
USPACE_SVC_STACK_TOP and it would be odd if anything did.
Noticed by: markj
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a
distinct tag from a ratelimit tag. To support this, refactor
cxgbe_snd_tag to be a simple send tag with a type and convert the
existing ratelimit tag to a new cxgbe_rate_tag structure.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D22072
Previously the table was allocated on first use by TOE and the
ratelimit code. The forthcoming NIC KTLS code also uses this table.
Allocate it unconditionally during attach to simplify consumers.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D22028
On POWER8 systems with only one memory domain, the "ibm,associativity"
number that corresponds to it is 0, unlike POWER9 systems with two
or more domains, in which the minimum value is 1.
In POWER8 case, subtracting 1 causes an underflow on the unsigned domain
variable and a subsequent index out-of-bounds access.
Reviewed by: jhibbits
Tested by: bdragon, luporl
- td_kstack_pages was not being initialized.
- td_kstack is supposed to be the base address of the stack region,
not the top.
The arm ports seem to have similar problems and will be fixed next.
Reported by: Jenkins via lwhsu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked. This ensures that page fault always can find the
backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK
flag on NFS nodes.
The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well. Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.
Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.
Tested by: pho
Discussed with: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock. It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().
Tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883