This permits constructing the entire TLS header in ktls_frame() rather
than ktls_seq(). This also matches the approach used by OpenSSL which
uses an incrementing nonce as the explicit IV rather than the sequence
number.
Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22117
Per sepcification the upper layer header needs to be within the first
fragment. The check was not done so far and there is an open review for
related work, so just leave a note as to where to put it.
Move the extraction of frag offset up to this as it is needed to determine
whether this is a first fragment or not.
MFC after: 3 weeks
Sponsored by: Netflix
Check whether we are accepting more fragments (based on global limits)
before doing expensive operations of calculating the hash and taking the
bucket lock. This slightly increases a "race" between check time and
incrementing counters (which is already there) possibly allowing a few
more fragments than the maximum limits. However, when under attack,
we rather save this CPU time for other packets/work.
MFC after: 3 weeks
Sponsored by: Netflix
Rather than walking the mbuf chain manually use m_last() which doing
exactly that for us.
Defer initializing srcifp for longer as there are multiple exit paths
out of the function which do not need it set. Initialize before taking
the lock though.
Rename the mtx lock to match the type better.
MFC after: 3 weeks
Sponsored by: Netflix
The IP6_REASS_MBUF() macro did some pointer gynmastics to end up with the
same type as it gets in [*(cast **)&]. Spelling it out instead saves all
this and makes the code more readable and less obfuscated directly using
the structure field.
MFC after: 3 weeks
Sponsored by: Netflix
happens is we are more delayed in the pacer calling in so
we remove the stack from the pacer and recalculate how
much time is left after all data has been acknowledged. However
the comparision was backwards so we end up with a negative
value in the last_pacing_delay time which causes us to
add in a huge value to the next pacing time thus stalling
the connection.
Reported by: vm2.finance@gmail.com
Summary:
Due to bugs in the enumeration code, fsl_pcib_init() was not configuring
sub-bridges properly, so devices hanging off a separate bridge would not
be found. Since the generic PCI code already supports probing child
buses, just delete this code and initialize only the device itself,
letting the generic code handle all the additional probing and
initializing.
This also deletes setup for some PCI peripherals found on some MPC85XX
evaluation boards. The code can be resurrected if needed, but overly
complicated this code in the first place.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D22050
From Jake:
Calling ether_ifdetach after iflib_stop leads to a potential race where
a stale ifp pointer can remain in the route entry list for IPv6 traffic.
This will potentially cause a page fault or other system instability if
the ifp pointer is accessed.
Move both iflib_netmap_detach and ether_ifdetach to be called prior to
iflib_stop. This avoids the race above, and helps ensure that other ifp
references are removed before stopping the interface.
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: erj@, gallatin@, jhb@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D22071
Add some ASCII relation of how the bits plug together. The terminology
difference of "fragmented packets" and "fragment packets" is subtle.
While here clear up more whitespace and comments.
No functional change.
MFC after: 3 weeks
Sponsored by: Netflix
Remove the KAME custom circular queue for fragments and fragmented packets
and replace them with a standard TAILQ.
This make the code a lot more understandable and maintainable and removes
further hand-rolled code from the the tree using a standard interface instead.
Hide the still public structures under #ifdef _KERNEL as there is no
use for them in user space.
The naming is a bit confusing now as struct ip6q and the ip6q[] buckets
array are not the same anymore; sadly struct ip6q is also used by the
MAC framework and we cannot rename it.
Submitted by: jtl (initally)
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16847 (jtl's original)
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.
Reported by: jhb
Reviewed by: kib
MFC with: r353890
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22120
As with amd64 NUMA is required for reasonable operation on big-iron
arm64 systems and is expected to have no significant impact on small
systems. Enable it now for wider testing in advance of FreeBSD 13.0.
You can use the 'vm.ndomains' sysctl to see if multiple domains are in
use - for example (from Cavium/Marvell ThunderX2):
# sysctl vm.ndomains
vm.ndomains: 2
No objection: manu
Sponsored by: The FreeBSD Foundation
This significantly reduces contention since chunks get created and removed
all the time. See the review for sample results.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21976
No functional change (in program code). Additional DWARF metadata is
generated in the .eh_frame section. Also, it is now a compile-time
requirement that machine/asm.h ENTRY() and END() macros are paired. (This
is subject to ongoing discussion and may change.)
This DWARF metadata allows llvm-libunwind to unwind program stacks when the
program is executing the function. The goal is to collect accurate
userspace stacktraces when programs have entered syscalls.
(The motivation for "Call Frame Information," or CFI for short -- not to be
confused with Control Flow Integrity -- is to sufficiently annotate assembly
functions such that stack unwinders can unwind out of the local frame
without the requirement of a dedicated framepointer register; i.e.,
-fomit-frame-pointer. This is necessary for C++ exception handling or
collecting backtraces.)
For the curious, a more thorough description of the metadata and some
examples may be found at [1] and documentation at [2]. You can also look at
'cc -S -o - foo.c | less' and search for '.cfi_' to see the CFI directives
generated by your C compiler.
[1]: https://www.imperialviolet.org/2017/01/18/cfi.html
[2]: https://sourceware.org/binutils/docs/as/CFI-directives.html
Reviewed by: emaste, kib (with reservations)
Differential Revision: https://reviews.freebsd.org/D22122
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.
Reported by: syzkaller
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22124
A caller that does not guarantee that a page's identity won't change
while sleeping for a busy lock must specify either NOWAIT or WAITFAIL.
Reported by: syzkaller
Reviewed by: alc, kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22124
The LinuxKPI linux_dma code calls PCTRIE_INSERT with a
mutex held, but does not set M_NOWAIT when allocating
nodes, leading to a potential panic. All of this code
can handle an allocation failure here, so prefer an
allocation failure to sleeping on memory.
Also fix a related case where NOWAIT/WAITOK was not
specified. In this case it's not clear whether sleeping
is allowed so be conservative and assume not. There are
a lot of other paths in this code that can fail due to
a lack of memory anyway.
Differential Revision: https://reviews.freebsd.org/D22127
Reviewed by: imp
Sponsored by: Dell EMC Isilon
MFC After: 1 week
The object does not provide anonymous memory.
Reported by: kib
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22123
The lr.w instruction used to read the value from memory sign-extends
the value read from memory. GCC sign-extends the 32-bit comparison
value passed in whereas clang currently does not. As a result, if the
value being compared has the MSB set, the comparison fails for
matching 32-bit values when compiled with clang.
Use a cast to explicitly sign-extend the unsigned comparison value.
This works with both GCC and clang.
There is commentary in the RISC-V spec that suggests that GCC's
approach is more correct, but it is not clear if the commentary in the
RISC-V spec is binding.
Reviewed by: mhorne
Obtained from: Axiado
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22084
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().
Reported and reviewed by: imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22126
When we raise a data abort from the kernel we need to enable interrupts,
however we shouldn't be doing this when in the kernel debugger. In this
case interrupts can lead to a further panic as they don't expect to be
run from such a context.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
This method check that boot_on or always_on is set to 1 and if it
is it will try to enable the regulator.
The binding docs aren't clear on what to do but Linux enable the regulator
if any of those properties is set so we want to do the same.
The function first check the status to see if the regulator is
already enabled it then get the voltage to check if it is in a acceptable
range and then enables it.
This will be either called from the regnode_init method (if it's needed by the platform)
or by a SYSINIT at SI_SUB_LAST
Reviewed by: mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D22106
Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.
Reported by: cem
r353489 added minidump support for powerpc64, but it added a dependency on
the dump_avail array. Leaving it uninitialized caused breakage in late
boot. Initialize dump_avail, even though the 64-bit booke pmap doesn't yet
support minidumps, but will in the future.
Instead of superpages use. The current code employs superpage-wide locking
regardless and the better locking granularity is welcome with NUMA enabled
even when superpage support is not used.
Requested by: alc
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21982
Vast majority of uses the cache are just checking if there is an entry
present on process exit (and evicting it if so). Both checking and
eviction process are very expensive and put the lock protecting it high
up on the profile during poudriere -j 104.
Convert the linked list into a hash. This allows to almost always avoid
taking the lock in the first place (and consequently almost removes it
from the profile). Note only one lock is preserved as a split did not
meaningfully impact contention.
Should the cache be used for something it will still run into contention
issues. The code needs a rewrite, but should someone want to tidy it up
further the following can be done:
1) per-chain locks (or at least an array)
2) hashing by something else than just pid
Sponsored by: The FreeBSD Foundation
This appears to be a copy-pasto from previous lines that propagated to v6
over the years. Indeed, nothing references kernelstack beyond
USPACE_SVC_STACK_TOP and it would be odd if anything did.
Noticed by: markj
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a
distinct tag from a ratelimit tag. To support this, refactor
cxgbe_snd_tag to be a simple send tag with a type and convert the
existing ratelimit tag to a new cxgbe_rate_tag structure.
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D22072
Previously the table was allocated on first use by TOE and the
ratelimit code. The forthcoming NIC KTLS code also uses this table.
Allocate it unconditionally during attach to simplify consumers.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D22028
On POWER8 systems with only one memory domain, the "ibm,associativity"
number that corresponds to it is 0, unlike POWER9 systems with two
or more domains, in which the minimum value is 1.
In POWER8 case, subtracting 1 causes an underflow on the unsigned domain
variable and a subsequent index out-of-bounds access.
Reviewed by: jhibbits
Tested by: bdragon, luporl
- td_kstack_pages was not being initialized.
- td_kstack is supposed to be the base address of the stack region,
not the top.
The arm ports seem to have similar problems and will be fixed next.
Reported by: Jenkins via lwhsu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked. This ensures that page fault always can find the
backing page if the object size check succeeded. Set VV_VMSIZEVNLOCK
flag on NFS nodes.
The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well. Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.
Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP. When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.
Tested by: pho
Discussed with: rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock. It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().
Tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883
This change consists of two parts.
First, nctgpio now supports hardware access via an I/O port window if
it's configured by firmware. For instance, PC Engines firmware
v4.10.0.2 does that. This is faster than going through the Super I/O
configuration registers.
Second, nctgpio now caches values of bits that it controls. For
example, the driver does not need to access the hardware to determine if
a pin is an output or an input, or a state of an output. Also, the
driver makes use of the fact that the hardware preserves an output state
of a pin accross a switch to the input mode and back.
With this change I am able to use the 1-Wire bus over nctgpio whereas
previously the driver introduced too much latency to be compliant with
the relatively strict protocol timings.
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
gpio1: <Nuvoton GPIO controller> at GPIO ldn 0x07 on superio0
pcib0: allocated type 4 (0x220-0x226) for rid 0 of gpio1
gpiobus1: <GPIO bus> on gpio1
owc0: <GPIO attached one-wire bus> at pin 4 on gpiobus1
ow0: <1 Wire Bus> on owc0
ow0: romid 28:b2:9e:45:92:10:02:34: no driver
ow_temp0: <Advanced One Wire Temperature> romid 28:b2:9e:45:92:10:02:34 on ow0
MFC after: 4 weeks
The correctness of per-CPU cache accounting in that function is
dependent on reading per-CPU pointers exactly once. Ensure that
the compiler does not emit multiple loads of those pointers.
Reported and tested by: pho
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22081
Simply adding MODULE_VERSION does not do the trick, because the modules
haven't been declared. This should actually fix modfind/kldstat, which
r351229 aimed and failed to do.
This should make vm-bhyve do the right thing again when using the ports
version, rather than the latest version not in ports.
MFC after: 3 days