Commit Graph

138616 Commits

Author SHA1 Message Date
jhb
7622bc9ddb Use a counter with a random base for explicit IVs in GCM.
This permits constructing the entire TLS header in ktls_frame() rather
than ktls_seq().  This also matches the approach used by OpenSSL which
uses an incrementing nonce as the explicit IV rather than the sequence
number.

Reviewed by:	gallatin
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22117
2019-10-24 18:13:26 +00:00
bz
27aa98f3aa frag6: leave a note about upper layer header checks TBD
Per sepcification the upper layer header needs to be within the first
fragment.  The check was not done so far and there is an open review for
related work, so just leave a note as to where to put it.
Move the extraction of frag offset up to this as it is needed to determine
whether this is a first fragment or not.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 12:16:15 +00:00
bz
ac29b94465 frag6: check global limits before hash and lock
Check whether we are accepting more fragments (based on global limits)
before doing expensive operations of calculating the hash and taking the
bucket lock.   This slightly increases a "race" between check time and
incrementing counters (which is already there) possibly allowing a few
more fragments than the maximum limits.  However, when under attack,
we rather save this CPU time for other packets/work.

MFC after:		3 weeks
Sponsored by:		Netflix
2019-10-24 11:58:24 +00:00
tuexen
ddaae075ce Store a handle for the event handler. This will be used when unloading the
SCTP as a module.

Obtained from:		markj@
2019-10-24 09:22:23 +00:00
bz
6d29bbbab8 frag6: small improvements
Rather than walking the mbuf chain manually use m_last() which doing
exactly that for us.
Defer initializing srcifp for longer as there are multiple exit paths
out of the function which do not need it set.  Initialize before taking
the lock though.
Rename the mtx lock to match the type better.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 08:15:40 +00:00
bz
8194a6554c frag6: remove IP6_REASS_MBUF macro
The IP6_REASS_MBUF() macro did some pointer gynmastics to end up with the
same type as it gets in [*(cast **)&].  Spelling it out instead saves all
this and makes the code more readable and less obfuscated directly using
the structure field.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 07:53:10 +00:00
rrs
f651cbcfcc Fix a small bug in bbr when running under a VM. Basically what
happens is we are more delayed in the pacer calling in so
we remove the stack from the pacer and recalculate how
much time is left after all data has been acknowledged. However
the comparision was backwards so we end up with a negative
value in the last_pacing_delay time which causes us to
add in a huge value to the next pacing time thus stalling
the connection.

Reported by:	vm2.finance@gmail.com
2019-10-24 05:54:30 +00:00
jhibbits
92b9f222a9 powerpc/booke: Simplify the MPC85XX PCIe root complex driver
Summary:
Due to bugs in the enumeration code, fsl_pcib_init() was not configuring
sub-bridges properly, so devices hanging off a separate bridge would not
be found.  Since the generic PCI code already supports probing child
buses, just delete this code and initialize only the device itself,
letting the generic code handle all the additional probing and
initializing.

This also deletes setup for some PCI peripherals found on some MPC85XX
evaluation boards.  The code can be resurrected if needed, but overly
complicated this code in the first place.

Reviewed by:	bdragon
Differential Revision:	https://reviews.freebsd.org/D22050
2019-10-24 03:51:33 +00:00
erj
a49574e5a2 iflib: call ether_ifdetach and netmap_detach before stop
From Jake:
Calling ether_ifdetach after iflib_stop leads to a potential race where
a stale ifp pointer can remain in the route entry list for IPv6 traffic.
This will potentially cause a page fault or other system instability if
the ifp pointer is accessed.

Move both iflib_netmap_detach and ether_ifdetach to be called prior to
iflib_stop. This avoids the race above, and helps ensure that other ifp
references are removed before stopping the interface.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22071
2019-10-23 23:20:49 +00:00
bz
06d37eb5ee frag6: add "big picture"
Add some ASCII relation of how the bits plug together.  The terminology
difference of "fragmented packets" and "fragment packets" is subtle.
While here clear up more whitespace and comments.

No functional change.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-23 23:10:12 +00:00
bz
0c95efba5c frag6: replace KAME hand-rolled queues with queue(9) TAILQs
Remove the KAME custom circular queue for fragments and fragmented packets
and replace them with a standard TAILQ.
This make the code a lot more understandable and maintainable and removes
further hand-rolled code from the the tree using a standard interface instead.

Hide the still public structures under #ifdef _KERNEL as there is no
use for them in user space.
The naming is a bit confusing now as struct ip6q and the ip6q[] buckets
array are not the same anymore;  sadly struct ip6q is also used by the
MAC framework and we cannot rename it.

Submitted by:	jtl (initally)
MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16847 (jtl's original)
2019-10-23 23:01:18 +00:00
markj
7670099e0b Modify release_page() to handle a missing fault page.
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.

Reported by:	jhb
Reviewed by:	kib
MFC with:	r353890
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22120
2019-10-23 20:39:21 +00:00
bz
7b56e1d765 frag6: whitespace changes
Remove trailing white space, add a blank line, and compress a comment.
No functional changes.

MFC after:	10 days
Sponsored by:	Netflix
2019-10-23 20:37:15 +00:00
emaste
d0f324ec6a arm64: enable options NUMA in GENERIC
As with amd64 NUMA is required for reasonable operation on big-iron
arm64 systems and is expected to have no significant impact on small
systems.  Enable it now for wider testing in advance of FreeBSD 13.0.

You can use the 'vm.ndomains' sysctl to see if multiple domains are in
use - for example (from Cavium/Marvell ThunderX2):

# sysctl vm.ndomains
vm.ndomains: 2

No objection:	manu
Sponsored by:	The FreeBSD Foundation
2019-10-23 19:35:26 +00:00
mjg
570f22e680 amd64 pmap: per-domain pv chunk list
This significantly reduces contention since chunks get created and removed
all the time. See the review for sample results.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21976
2019-10-23 19:17:10 +00:00
cem
c6d50eff8a amd64: Add CFI directives for libc syscall stubs
No functional change (in program code).  Additional DWARF metadata is
generated in the .eh_frame section.  Also, it is now a compile-time
requirement that machine/asm.h ENTRY() and END() macros are paired.  (This
is subject to ongoing discussion and may change.)

This DWARF metadata allows llvm-libunwind to unwind program stacks when the
program is executing the function.  The goal is to collect accurate
userspace stacktraces when programs have entered syscalls.

(The motivation for "Call Frame Information," or CFI for short -- not to be
confused with Control Flow Integrity -- is to sufficiently annotate assembly
functions such that stack unwinders can unwind out of the local frame
without the requirement of a dedicated framepointer register; i.e.,
-fomit-frame-pointer.  This is necessary for C++ exception handling or
collecting backtraces.)

For the curious, a more thorough description of the metadata and some
examples may be found at [1] and documentation at [2].  You can also look at
'cc -S -o - foo.c | less' and search for '.cfi_' to see the CFI directives
generated by your C compiler.

[1]: https://www.imperialviolet.org/2017/01/18/cfi.html
[2]: https://sourceware.org/binutils/docs/as/CFI-directives.html

Reviewed by:	emaste, kib (with reservations)
Differential Revision:	https://reviews.freebsd.org/D22122
2019-10-23 19:03:03 +00:00
markj
3a3f11c990 Check for bogus_page in vnode_pager_generic_getpages_done().
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 18:00:22 +00:00
markj
a789d36dbe Verify identity after checking for WAITFAIL in vm_page_busy_acquire().
A caller that does not guarantee that a page's identity won't change
while sleeping for a busy lock must specify either NOWAIT or WAITFAIL.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 17:58:19 +00:00
rstone
3878ab63de Add missing M_NOWAIT flag
The LinuxKPI linux_dma code calls PCTRIE_INSERT with a
mutex held, but does not set M_NOWAIT when allocating
nodes, leading to a potential panic.  All of this code
can handle an allocation failure here, so prefer an
allocation failure to sleeping on memory.

Also fix a related case where NOWAIT/WAITOK was not
specified.  In this case it's not clear whether sleeping
is allowed so be conservative and assume not.  There are
a lot of other paths in this code that can fail due to
a lack of memory anyway.

Differential Revision: https://reviews.freebsd.org/D22127
Reviewed by: imp
Sponsored by: Dell EMC Isilon
MFC After: 1 week
2019-10-23 17:20:20 +00:00
markj
4f6e3684c3 Set OBJ_NOSPLIT on the ksyms(4) VM object.
The object does not provide anonymous memory.

Reported by:	kib
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22123
2019-10-23 16:53:37 +00:00
cem
cae3e20483 Prevent a panic when a driver provides bogus debugnet parameters
This is just a bandaid; we should fix the driver(s) too.  Introduced in
r353685.

PR:		241403
X-MFC-With:	r353685
Reported by:	np and others
2019-10-23 16:48:22 +00:00
jhb
8eea4337db Fix atomic_*cmpset32 on riscv64 with clang.
The lr.w instruction used to read the value from memory sign-extends
the value read from memory.  GCC sign-extends the 32-bit comparison
value passed in whereas clang currently does not.  As a result, if the
value being compared has the MSB set, the comparison fails for
matching 32-bit values when compiled with clang.

Use a cast to explicitly sign-extend the unsigned comparison value.
This works with both GCC and clang.

There is commentary in the RISC-V spec that suggests that GCC's
approach is more correct, but it is not clear if the commentary in the
RISC-V spec is binding.

Reviewed by:	mhorne
Obtained from:	Axiado
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22084
2019-10-23 16:41:31 +00:00
kib
a6dbd93798 Fix undefined behavior.
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().

Reported and reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22126
2019-10-23 16:06:47 +00:00
kib
2393dd146c vn_printf(): Decode VI_TEXT_REF.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-10-23 15:51:26 +00:00
andrew
a0c18fcde6 Stop enabling interrupts when reentering kdb on arm64
When we raise a data abort from the kernel we need to enable interrupts,
however we shouldn't be doing this when in the kernel debugger. In this
case interrupts can lead to a further panic as they don't expect to be
run from such a context.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-10-23 13:21:15 +00:00
manu
62a17e539a regulator: Add a regnode_set_constraint function
This method check that boot_on or always_on is set to 1 and if it
is it will try to enable the regulator.
The binding docs aren't clear on what to do but Linux enable the regulator
if any of those properties is set so we want to do the same.
The function first check the status to see if the regulator is
already enabled it then get the voltage to check if it is in a acceptable
range and then enables it.
This will be either called from the regnode_init method (if it's needed by the platform)
or by a SYSINIT at SI_SUB_LAST

Reviewed by:	mmel
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22106
2019-10-23 09:56:53 +00:00
manu
e09708bc24 axp81x: Use the default regnode_init method
MFC after:	1 week
2019-10-23 09:54:50 +00:00
manu
5fdced0195 regulator: Add a regnode_method_init
This is a default init method for regulator that don't really
need one.

MFC after:	1 week
2019-10-23 09:54:12 +00:00
kib
b382fb1fa5 Assert that vm_fault_lock_vnode() returns locked saved vnode.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22113
2019-10-23 07:36:26 +00:00
kevans
83fa8806ca tuntap(4): Fix NOINET build after r353741
Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.

Reported by:	cem
2019-10-23 02:15:15 +00:00
jhibbits
75b58f4b94 powerpc/booke: Fix Book-E boot post-minidump
r353489 added minidump support for powerpc64, but it added a dependency on
the dump_avail array.  Leaving it uninitialized caused breakage in late
boot.  Initialize dump_avail, even though the 64-bit booke pmap doesn't yet
support minidumps, but will in the future.
2019-10-23 00:31:19 +00:00
mjg
c1cc9fe9b4 amd64 pmap: conditionalize per-superpage locks on NUMA
Instead of superpages use. The current code employs superpage-wide locking
regardless and the better locking granularity is welcome with NUMA enabled
even when superpage support is not used.

Requested by:	alc
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21982
2019-10-22 22:55:46 +00:00
mjg
d15bdaeeaa amd64 pmap: fixup invlgen lookup for fictitious mappings
Similarly to r353438, use dummy entry.

Reported and tested by:	Neel Chauhan
Sponsored by:	The FreeBSD Foundation
2019-10-22 22:54:41 +00:00
mjg
68e6247c20 pseudofs: hashed vncache
Vast majority of uses the cache are just checking if there is an entry
present on process exit (and evicting it if so). Both checking and
eviction process are very expensive and put the lock protecting it high
up on the profile during poudriere -j 104.

Convert the linked list into a hash. This allows to almost always avoid
taking the lock in the first place (and consequently almost removes it
from the profile). Note only one lock is preserved as a split did not
meaningfully impact contention.

Should the cache be used for something it will still run into contention
issues. The code needs a rewrite, but should someone want to tidy it up
further the following can be done:

1) per-chain locks (or at least an array)
2) hashing by something else than just pid

Sponsored by:	The FreeBSD Foundation
2019-10-22 22:52:53 +00:00
kevans
5e4c0469f8 arm: correct kernelstack allocation size
This appears to be a copy-pasto from previous lines that propagated to v6
over the years. Indeed, nothing references kernelstack beyond
USPACE_SVC_STACK_TOP and it would be odd if anything did.

Noticed by:	markj
2019-10-22 21:46:03 +00:00
jhb
b3b627518b Split Chelsio send tags into a generic base tag and a ratelimit tag.
NIC KTLS will add a new TLS send tag type in cxgbe(4) that is a
distinct tag from a ratelimit tag.  To support this, refactor
cxgbe_snd_tag to be a simple send tag with a type and convert the
existing ratelimit tag to a new cxgbe_rate_tag structure.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D22072
2019-10-22 20:41:54 +00:00
jhb
d05dfca956 Always allocate the atid table during attach.
Previously the table was allocated on first use by TOE and the
ratelimit code.  The forthcoming NIC KTLS code also uses this table.
Allocate it unconditionally during attach to simplify consumers.

Reviewed by:	np
Differential Revision:	https://reviews.freebsd.org/D22028
2019-10-22 20:01:47 +00:00
luporl
97ed980e58 [PPC] Avoid underflows in NUMA domains
On POWER8 systems with only one memory domain, the "ibm,associativity"
number that corresponds to it is 0, unlike POWER9 systems with two
or more domains, in which the minimum value is 1.

In POWER8 case, subtracting 1 causes an underflow on the unsigned domain
variable and a subsequent index out-of-bounds access.

Reviewed by:	jhibbits
Tested by:	bdragon, luporl
2019-10-22 18:28:58 +00:00
glebius
10ee156a04 Allow epoch tracker to use the very last byte of the stack. Not sure
this will help to avoid panic in this function, since it will also use
some stack, but makes code more strict.

Submitted by:	hselasky
2019-10-22 18:05:15 +00:00
markj
9582ed1083 Apply r353893 to arm64.
Reported by:	Jenkins (hardware CI lab)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-22 17:22:10 +00:00
markj
0bf520b0ec Initialize thread0.td_kstack_pages on arm.
Fix style on the line below.

Reported by:	Jenkins (hardware CI lab)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-22 17:21:23 +00:00
markj
599c125fa1 Fix thread0 kernel stack initialization on riscv.
- td_kstack_pages was not being initialized.
- td_kstack is supposed to be the base address of the stack region,
  not the top.

The arm ports seem to have similar problems and will be fixed next.

Reported by:	Jenkins via lwhsu
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-10-22 16:52:56 +00:00
kib
6d8044f142 Assert that vnode_pager_setsize() is called with the vnode exclusively locked
except for filesystems that set the MNTK_VMSETSIZE_BUG,  Set the flag for ZFS.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:21:24 +00:00
kib
ae2561e593 Fix interface between nfsclient and vnode pager.
Make the nfsclient always call vnode_pager_setsize() with the vnode
exclusively locked.  This ensures that page fault always can find the
backing page if the object size check succeeded.  Set VV_VMSIZEVNLOCK
flag on NFS nodes.

The main offender breaking the interface in nfsclient is
nfs_loadattrcache(), which is used whenever server responded with
updated attributes, which can happen on non-changing operations as
well.  Also, iod threads only have buffers locked (and even that is
LK_KERNPROC), but they still may call nfs_loadattrcache() on RPC
response.

Instead of immediately calling vnode_pager_setsize() if server
response indicated changed file size, but the vnode is not exclusively
locked, set a new node flag NVNSETSZSKIP.  When the vnode exclusively
locked, or when we can temporary upgrade the lock to exclusive, call
vnode_pager_setsize(), by providing the nfsclient VOP_LOCK() implementation.

Tested by:	pho
Discussed with:	rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:17:38 +00:00
kib
e970ed6a1e Add VV_VMSIZEVNLOCK flag.
The flag specifies that vm_fault() handler should check the vnode'
vm_object size under the vnode lock.  It is converted into the object'
OBJ_SIZEVNLOCK flag in vnode_pager_alloc().

Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 16:09:25 +00:00
glebius
fd813c4f09 Execute nd6_dad_timer() in the network epoch, since nd6_dad_duplicated()
requires it.
Make nd6_dad_starttimer() require network epoch.  Two calls out of three
happen from nd6_dad_timer().  Enter epoch in the remaining one.
2019-10-22 16:06:33 +00:00
kib
5c066fbc3a vm_fault(): extract code to lock the vnode into a helper vn_fault_lock_vnode().
Tested by:	pho
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D21883
2019-10-22 15:59:16 +00:00
avg
39a9e50d87 nctgpio: improve performance (latency) of operation
This change consists of two parts.

First, nctgpio now supports hardware access via an I/O port window if
it's configured by firmware.  For instance, PC Engines firmware
v4.10.0.2 does that.  This is faster than going through the Super I/O
configuration registers.

Second, nctgpio now caches values of bits that it controls.  For
example, the driver does not need to access the hardware to determine if
a pin is an output or an input, or a state of an output.  Also, the
driver makes use of the fact that the hardware preserves an output state
of a pin accross a switch to the input mode and back.

With this change I am able to use the 1-Wire bus over nctgpio whereas
previously the driver introduced too much latency to be compliant with
the relatively strict protocol timings.

superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
gpio1: <Nuvoton GPIO controller> at GPIO ldn 0x07 on superio0
pcib0: allocated type 4 (0x220-0x226) for rid 0 of gpio1
gpiobus1: <GPIO bus> on gpio1
owc0: <GPIO attached one-wire bus> at pin 4 on gpiobus1
ow0: <1 Wire Bus> on owc0
ow0: romid 28:b2:9e:45:92:10:02:34: no driver
ow_temp0: <Advanced One Wire Temperature> romid 28:b2:9e:45:92:10:02:34 on ow0

MFC after:	4 weeks
2019-10-22 14:20:35 +00:00
markj
93c4e46434 Avoid reloading bucket pointers in uma_vm_zone_stats().
The correctness of per-CPU cache accounting in that function is
dependent on reading per-CPU pointers exactly once.  Ensure that
the compiler does not emit multiple loads of those pointers.

Reported and tested by:	pho
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22081
2019-10-22 14:20:06 +00:00
kevans
eb99e62aa8 tuntap(4): properly declare if_tun and if_tap modules
Simply adding MODULE_VERSION does not do the trick, because the modules
haven't been declared. This should actually fix modfind/kldstat, which
r351229 aimed and failed to do.

This should make vm-bhyve do the right thing again when using the ports
version, rather than the latest version not in ports.

MFC after:	3 days
2019-10-22 00:18:16 +00:00