Commit Graph

264166 Commits

Author SHA1 Message Date
tuexen
56626fc5ad Ensure that the flags indicating IPv4/IPv6 are not changed by failing
bind() calls. This would lead to inconsistent state resulting in a panic.
A fix for stable/11 was committed in
https://svnweb.freebsd.org/base?view=revision&revision=338986
An accelerated MFC is planned as discussed with emaste@.

Reported by:		syzbot+2609a378d89264ff5a42@syzkaller.appspotmail.com
Obtained from:		jtl@
MFC after:		1 day
Sponsored by:		Netflix, Inc.
2019-10-24 20:05:10 +00:00
sjg
e7dfb8eab4 Add support for hypervisor check on x86
Add ficl words for isvirtualized
and move ficl inb and outb words to ficl/x86/sysdep.c
so can be shared by i386 and amd64

Reviewed by:	imp bdrewery
MFC after:	1 week
Sponsored by:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D22069
2019-10-24 20:02:48 +00:00
bz
a41c96bda5 frag6: export another counter read-only by sysctl
Similar to the system global counter also export the per-VNET counter
"frag6_nfragpackets" detailing the current number of fragment packets
in this VNET's reassembly queues.
The read-only counter is helpful for in-VNET statistical monitoring and
for test-cases.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 20:00:37 +00:00
bz
6b64e9286e frag6: fix counter leak in error case and optimise code
In case the first fragmented part (off=0) arrives we check for the
maximum packet size for each fragmented part we already queued with the
addition of the unfragmentable part from the first one.

For one we do not have to enter the loop at all if this is the first
fragmented part to arrive, and we can skip the check.

Should we encounter an error case we send an ICMPv6 message for any
fragment exceeding the maximum length limit.  While dequeueing the
original packet and freeing it, statistics were not updated and leaked
both the reassembly queue count for the fragment and the global
fragment count.  Found by code inspection and confirmed by tightening
test cases checking more statistical and system counters.

While here properly wrap a line.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 19:57:18 +00:00
sjg
66842d2121 Allow loader.efi to identify non-standard boot setup
PATH_BOOTABLE_TOKEN can be set to a non-standard
path that identifies a device as bootable.

Reviewed by: kevans, bcran
Differential Revision:  https://reviews.freebsd.org/D22062
2019-10-24 19:52:41 +00:00
sjg
76aa572ff6 Initialize verbosity and debug level from env
For EFI at least, we can seed the environment
with VE_VERBOSE etc.

Reviewed by:	stevek imp
Sponsored by:	Juniper Networks
MFC after:	1 week
Differential Revision:  https://reviews.freebsd.org/D22135
2019-10-24 19:50:18 +00:00
bz
d4a9dd6f75 frag6.c: do not leak packet queue entry in error case
When we are checking for the maximum reassembled packet size of the
fragmentable part and run into the error case (packet too big),
we are leaking the packet queue enntry if this was a first fragment
to arrive.
Properly cleanup, removing the queue entry from the bucket, decrementing
counters, and freeing the memory.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 19:47:32 +00:00
mckusick
97c391a761 Soft updates needs to keep an on-disk linked list of inodes that
have been unlinked, but are still referenced by open file descriptors.
These inodes cannot be freed until the final file descriptor reference
has been closed. If the system crashes while they are still being
referenced, these inodes and their referenced blocks need to be
freed by fsck. By having them on a linked list with the head pointer
in the superblock, fsck can quickly find and process them rather
than having to check every inode in the filesystem to see if it is
unreferenced.

When updating the head pointer of this list of unlinked inodes in
the superblock, the superblock check-hash was not getting updated.
If the system crashed with the incorrect superblock check-hash, the
superblock would appear to be corrupted. This patch ensures that
the superblock check-hash is updated when updating the head pointer
of the unlinked inodes list.

There is no need to MFC as superblock check hashes first appeared in
13.0.

Tested by:    Peter Holm
Sponsored by: Netflix
2019-10-24 19:47:18 +00:00
gallatin
fb7539ef5c Add a tunable to set the pgcache zone's maxcache
When it is set to 0 (the default), a heavy Netflix-style web workload
suffers from heavy lock contention on the vm page free queue called from
vm_page_zone_{import,release}() as the buckets are frequently drained.
When setting the maxcache, this contention goes away.

We should eventually try to autotune this, as well as make this
zone eligable for uma_reclaim().

Reviewed by:	alc, markj
Not Objected to by: jeff
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22112
2019-10-24 18:39:05 +00:00
jhb
7622bc9ddb Use a counter with a random base for explicit IVs in GCM.
This permits constructing the entire TLS header in ktls_frame() rather
than ktls_seq().  This also matches the approach used by OpenSSL which
uses an incrementing nonce as the explicit IV rather than the sequence
number.

Reviewed by:	gallatin
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22117
2019-10-24 18:13:26 +00:00
bz
27aa98f3aa frag6: leave a note about upper layer header checks TBD
Per sepcification the upper layer header needs to be within the first
fragment.  The check was not done so far and there is an open review for
related work, so just leave a note as to where to put it.
Move the extraction of frag offset up to this as it is needed to determine
whether this is a first fragment or not.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 12:16:15 +00:00
bz
ac29b94465 frag6: check global limits before hash and lock
Check whether we are accepting more fragments (based on global limits)
before doing expensive operations of calculating the hash and taking the
bucket lock.   This slightly increases a "race" between check time and
incrementing counters (which is already there) possibly allowing a few
more fragments than the maximum limits.  However, when under attack,
we rather save this CPU time for other packets/work.

MFC after:		3 weeks
Sponsored by:		Netflix
2019-10-24 11:58:24 +00:00
tuexen
ddaae075ce Store a handle for the event handler. This will be used when unloading the
SCTP as a module.

Obtained from:		markj@
2019-10-24 09:22:23 +00:00
bz
6d29bbbab8 frag6: small improvements
Rather than walking the mbuf chain manually use m_last() which doing
exactly that for us.
Defer initializing srcifp for longer as there are multiple exit paths
out of the function which do not need it set.  Initialize before taking
the lock though.
Rename the mtx lock to match the type better.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 08:15:40 +00:00
bz
8194a6554c frag6: remove IP6_REASS_MBUF macro
The IP6_REASS_MBUF() macro did some pointer gynmastics to end up with the
same type as it gets in [*(cast **)&].  Spelling it out instead saves all
this and makes the code more readable and less obfuscated directly using
the structure field.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-24 07:53:10 +00:00
tsoome
72f4bed476 userboot/test should use PRIx64 as one would expect from prefix 0x
Test is printing decimal value after prefix 0x.
2019-10-24 07:49:33 +00:00
rrs
f651cbcfcc Fix a small bug in bbr when running under a VM. Basically what
happens is we are more delayed in the pacer calling in so
we remove the stack from the pacer and recalculate how
much time is left after all data has been acknowledged. However
the comparision was backwards so we end up with a negative
value in the last_pacing_delay time which causes us to
add in a huge value to the next pacing time thus stalling
the connection.

Reported by:	vm2.finance@gmail.com
2019-10-24 05:54:30 +00:00
jhibbits
92b9f222a9 powerpc/booke: Simplify the MPC85XX PCIe root complex driver
Summary:
Due to bugs in the enumeration code, fsl_pcib_init() was not configuring
sub-bridges properly, so devices hanging off a separate bridge would not
be found.  Since the generic PCI code already supports probing child
buses, just delete this code and initialize only the device itself,
letting the generic code handle all the additional probing and
initializing.

This also deletes setup for some PCI peripherals found on some MPC85XX
evaluation boards.  The code can be resurrected if needed, but overly
complicated this code in the first place.

Reviewed by:	bdragon
Differential Revision:	https://reviews.freebsd.org/D22050
2019-10-24 03:51:33 +00:00
erj
a49574e5a2 iflib: call ether_ifdetach and netmap_detach before stop
From Jake:
Calling ether_ifdetach after iflib_stop leads to a potential race where
a stale ifp pointer can remain in the route entry list for IPv6 traffic.
This will potentially cause a page fault or other system instability if
the ifp pointer is accessed.

Move both iflib_netmap_detach and ether_ifdetach to be called prior to
iflib_stop. This avoids the race above, and helps ensure that other ifp
references are removed before stopping the interface.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	3 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D22071
2019-10-23 23:20:49 +00:00
bz
06d37eb5ee frag6: add "big picture"
Add some ASCII relation of how the bits plug together.  The terminology
difference of "fragmented packets" and "fragment packets" is subtle.
While here clear up more whitespace and comments.

No functional change.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-10-23 23:10:12 +00:00
bz
0c95efba5c frag6: replace KAME hand-rolled queues with queue(9) TAILQs
Remove the KAME custom circular queue for fragments and fragmented packets
and replace them with a standard TAILQ.
This make the code a lot more understandable and maintainable and removes
further hand-rolled code from the the tree using a standard interface instead.

Hide the still public structures under #ifdef _KERNEL as there is no
use for them in user space.
The naming is a bit confusing now as struct ip6q and the ip6q[] buckets
array are not the same anymore;  sadly struct ip6q is also used by the
MAC framework and we cannot rename it.

Submitted by:	jtl (initally)
MFC after:	3 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D16847 (jtl's original)
2019-10-23 23:01:18 +00:00
markj
7670099e0b Modify release_page() to handle a missing fault page.
r353890 introduced a case where we may call release_page() with
fs.m == NULL, since the fault handler may now lock the vnode prior
to allocating a page for a page-in.

Reported by:	jhb
Reviewed by:	kib
MFC with:	r353890
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22120
2019-10-23 20:39:21 +00:00
bz
7b56e1d765 frag6: whitespace changes
Remove trailing white space, add a blank line, and compress a comment.
No functional changes.

MFC after:	10 days
Sponsored by:	Netflix
2019-10-23 20:37:15 +00:00
emaste
d0f324ec6a arm64: enable options NUMA in GENERIC
As with amd64 NUMA is required for reasonable operation on big-iron
arm64 systems and is expected to have no significant impact on small
systems.  Enable it now for wider testing in advance of FreeBSD 13.0.

You can use the 'vm.ndomains' sysctl to see if multiple domains are in
use - for example (from Cavium/Marvell ThunderX2):

# sysctl vm.ndomains
vm.ndomains: 2

No objection:	manu
Sponsored by:	The FreeBSD Foundation
2019-10-23 19:35:26 +00:00
imp
2e590dfca4 exit requires stdlib.h to be included to use.
FreeBSD 10.3 requires this, and dtc is a bootstrap tool so it needs to compile
there.
2019-10-23 19:23:31 +00:00
mjg
570f22e680 amd64 pmap: per-domain pv chunk list
This significantly reduces contention since chunks get created and removed
all the time. See the review for sample results.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21976
2019-10-23 19:17:10 +00:00
cem
c6d50eff8a amd64: Add CFI directives for libc syscall stubs
No functional change (in program code).  Additional DWARF metadata is
generated in the .eh_frame section.  Also, it is now a compile-time
requirement that machine/asm.h ENTRY() and END() macros are paired.  (This
is subject to ongoing discussion and may change.)

This DWARF metadata allows llvm-libunwind to unwind program stacks when the
program is executing the function.  The goal is to collect accurate
userspace stacktraces when programs have entered syscalls.

(The motivation for "Call Frame Information," or CFI for short -- not to be
confused with Control Flow Integrity -- is to sufficiently annotate assembly
functions such that stack unwinders can unwind out of the local frame
without the requirement of a dedicated framepointer register; i.e.,
-fomit-frame-pointer.  This is necessary for C++ exception handling or
collecting backtraces.)

For the curious, a more thorough description of the metadata and some
examples may be found at [1] and documentation at [2].  You can also look at
'cc -S -o - foo.c | less' and search for '.cfi_' to see the CFI directives
generated by your C compiler.

[1]: https://www.imperialviolet.org/2017/01/18/cfi.html
[2]: https://sourceware.org/binutils/docs/as/CFI-directives.html

Reviewed by:	emaste, kib (with reservations)
Differential Revision:	https://reviews.freebsd.org/D22122
2019-10-23 19:03:03 +00:00
cem
3866b27a9c libthr: Add missing END() directive for umtx_op_err (amd64)
Like r353929, related to D22122.  No functional change.

Reviewed by:	emaste, kib (earlier version both)
2019-10-23 18:27:30 +00:00
markj
3a3f11c990 Check for bogus_page in vnode_pager_generic_getpages_done().
We now assert that a page is busy when updating its validity-tracking
state, but bogus_page is not busied during a getpages operation.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 18:00:22 +00:00
markj
a789d36dbe Verify identity after checking for WAITFAIL in vm_page_busy_acquire().
A caller that does not guarantee that a page's identity won't change
while sleeping for a busy lock must specify either NOWAIT or WAITFAIL.

Reported by:	syzkaller
Reviewed by:	alc, kib
Discussed with:	jeff
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22124
2019-10-23 17:58:19 +00:00
rstone
3878ab63de Add missing M_NOWAIT flag
The LinuxKPI linux_dma code calls PCTRIE_INSERT with a
mutex held, but does not set M_NOWAIT when allocating
nodes, leading to a potential panic.  All of this code
can handle an allocation failure here, so prefer an
allocation failure to sleeping on memory.

Also fix a related case where NOWAIT/WAITOK was not
specified.  In this case it's not clear whether sleeping
is allowed so be conservative and assume not.  There are
a lot of other paths in this code that can fail due to
a lack of memory anyway.

Differential Revision: https://reviews.freebsd.org/D22127
Reviewed by: imp
Sponsored by: Dell EMC Isilon
MFC After: 1 week
2019-10-23 17:20:20 +00:00
dim
1cdfabf5f2 Build toolchain components as dynamically linked executables by default
Summary:
Historically, we have built toolchain components such as cc, ld, etc as
statically linked executables.  One of the reasons being that you could
sometimes save yourself from botched upgrades, by e.g. recompiling a
"known good" libc and reinstalling it.

In this day and age, we have boot environments, virtual machine
snapshots, cloud backups, and other much more reliable methods to
restore systems to working order.  So I think the time is ripe to flip
this default, and link the toolchain components dynamically, just like
almost all other executables on FreeBSD.

Maybe at some point they can even become PIE executables by default! :)

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D22061
2019-10-23 17:02:45 +00:00
dim
60bfac69c1 Bump clang's default target CPU for the i386 architecture (aka "x86") to
i686, as per the discussion on the freebsd-arch mailing list.  Earlier
in r352030, I had already bumped it to i586, to work around missing
atomic 64 bit functions for the i386 architecture.

Relnotes:	yes
2019-10-23 16:57:11 +00:00
markj
4f6e3684c3 Set OBJ_NOSPLIT on the ksyms(4) VM object.
The object does not provide anonymous memory.

Reported by:	kib
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22123
2019-10-23 16:53:37 +00:00
cem
cae3e20483 Prevent a panic when a driver provides bogus debugnet parameters
This is just a bandaid; we should fix the driver(s) too.  Introduced in
r353685.

PR:		241403
X-MFC-With:	r353685
Reported by:	np and others
2019-10-23 16:48:22 +00:00
dim
4eb815fa6a Slightly expand description of WITH_SHARED_TOOLCHAIN, add a
corresponding WITHOUT_SHARED_TOOLCHAIN description, and regenerate
src.conf(5).

MFC after:	 3 days
2019-10-23 16:48:17 +00:00
jhb
e08f239a9e Strip "sf" suffix when generating a target triple.
This fixes the target triple used when compiling riscv64sf with clang.

Discussed with:	mhorne
MFC after:	2 weeks
Sponsored by:	DARPA
2019-10-23 16:43:51 +00:00
jhb
8eea4337db Fix atomic_*cmpset32 on riscv64 with clang.
The lr.w instruction used to read the value from memory sign-extends
the value read from memory.  GCC sign-extends the 32-bit comparison
value passed in whereas clang currently does not.  As a result, if the
value being compared has the MSB set, the comparison fails for
matching 32-bit values when compiled with clang.

Use a cast to explicitly sign-extend the unsigned comparison value.
This works with both GCC and clang.

There is commentary in the RISC-V spec that suggests that GCC's
approach is more correct, but it is not clear if the commentary in the
RISC-V spec is binding.

Reviewed by:	mhorne
Obtained from:	Axiado
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D22084
2019-10-23 16:41:31 +00:00
kib
a6dbd93798 Fix undefined behavior.
Create a sequence point by ending a full expression for call to
vspace() and use of the globals which are modified by vspace().

Reported and reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D22126
2019-10-23 16:06:47 +00:00
cem
b06191153e libm: Add missing END() directives for amd64 routines
No functional change.  Related to D22122.

Reviewed by:	emaste, kib (earlier version both)
2019-10-23 16:05:52 +00:00
kib
2393dd146c vn_printf(): Decode VI_TEXT_REF.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-10-23 15:51:26 +00:00
andrew
a0c18fcde6 Stop enabling interrupts when reentering kdb on arm64
When we raise a data abort from the kernel we need to enable interrupts,
however we shouldn't be doing this when in the kernel debugger. In this
case interrupts can lead to a further panic as they don't expect to be
run from such a context.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-10-23 13:21:15 +00:00
manu
62a17e539a regulator: Add a regnode_set_constraint function
This method check that boot_on or always_on is set to 1 and if it
is it will try to enable the regulator.
The binding docs aren't clear on what to do but Linux enable the regulator
if any of those properties is set so we want to do the same.
The function first check the status to see if the regulator is
already enabled it then get the voltage to check if it is in a acceptable
range and then enables it.
This will be either called from the regnode_init method (if it's needed by the platform)
or by a SYSINIT at SI_SUB_LAST

Reviewed by:	mmel
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22106
2019-10-23 09:56:53 +00:00
manu
e09708bc24 axp81x: Use the default regnode_init method
MFC after:	1 week
2019-10-23 09:54:50 +00:00
manu
5fdced0195 regulator: Add a regnode_method_init
This is a default init method for regulator that don't really
need one.

MFC after:	1 week
2019-10-23 09:54:12 +00:00
kib
b382fb1fa5 Assert that vm_fault_lock_vnode() returns locked saved vnode.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D22113
2019-10-23 07:36:26 +00:00
kevans
6b5b767237 cap_sysctl: correct typo from r347534-ish
operation & ~limit where limit is a bool is clearly not what was intended,
given the line prior. Correct it to use the calculated mask for validation.

The cap_sysctl tests should now be functional again.
2019-10-23 03:23:14 +00:00
kevans
83fa8806ca tuntap(4): Fix NOINET build after r353741
Shuffle headers around to more appropriate #ifdef OPTION blocks (INET vs.
INET6) -- double checked LINT-{NOINET,NOINET6,NOIP}, all seem good.

Reported by:	cem
2019-10-23 02:15:15 +00:00
kevans
febc320724 libcasper/services: include <src.opts.mk> to hook tests to build
Note that the cap_sysctl tests are currently failing and need some
attention.
2019-10-23 01:50:41 +00:00
grog
6bc988a89b Correct spelling, apply appropriate respect. 2019-10-23 01:11:25 +00:00