Commit Graph

124661 Commits

Author SHA1 Message Date
Conrad Meyer
0d1467b199 netdump: Fix netdumping with INVARIANTS kernels
Correct boneheaded assertion I added in r339501.  Mea culpa.

The intent is to notice when an M_WAITOK zone allocation would fail during
netdump, not to prevent all use of mbufs during netdump.

Reviewed by:	markj
X-MFC-With:	r339501
Differential Revision:	https://reviews.freebsd.org/D17957
2018-11-12 05:24:20 +00:00
Konstantin Belousov
8782eef46f Remove one-use variable.
This also removes a lot of #ifdefs and cleans up a warning when the
AUDIT kernel option is defined, but neither KDTRACE_HOOKS nor MAC are.

Reported and tested by:	danger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-11-11 00:21:28 +00:00
Konstantin Belousov
ade85c5eec Allow absolute paths for O_BENEATH.
The path must have a tail which does not escape starting/topping
directory.  The documentation will come shortly, see the man pages
commit message for the reason of separate commit.

Reviewed by:	jilles (previous version)
Discussed with:	emaste
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17714
2018-11-11 00:04:36 +00:00
Vladimir Kondratyev
236e308af1 wmt(4): Add PNP record so it could be picked by devd/devmatch.
Fix uhid(4) conflict with blacklisting of multitouch HID-usages
in uhid(4) probe handler.

Reviewed by:		imp
No objections from:	hps
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D17689
2018-11-10 22:14:09 +00:00
Emmanuel Vadot
5cc57c208a Update our devicetree to 4.19 for arm and arm64
MFC after:	2 months
2018-11-10 21:02:32 +00:00
Mark Johnston
0e48e06807 Re-apply r336984, reverting r339934.
r336984 exposed the bug fixed in r340241, leading to the initial revert
while the bug was being hunted down.  Now that the bug is fixed, we
can revert the revert.

Discussed with:	alc
MFC after:	3 days
2018-11-10 20:33:08 +00:00
Mark Johnston
86af1d0241 Ensure that IP fragments do not extend beyond IP_MAXPACKET.
Such fragments are obviously invalid, and when processed may end up
violating the sort order (by offset) of fragments of a given packet.
This doesn't appear to be exploitable, however.

Reviewed by:	emaste
Discussed with:	jtl
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17914
2018-11-10 03:00:36 +00:00
Justin Hibbits
266b2aa146 powerpc: Use MAX() macro instead of max() inline function to calculate Maxmem
Maxmem is the highest address for physical memory in the system.  It's
measured in pages which, since max() returns a u_int, should allow for up to
2^44 bytes of memory addressable by the system.  However, on POWER9 systems
at least, memory addressed by additional socketed CPUs begins at addresses
far above the 2^44 mark, causing issues with memory accesses and DMA, when
memory is addressed on the auxiliary CPUs.  Use the MAX() macro instead,
which doesn't convert arguments, so retains Maxmem and all calculations as
its defined long type (64-bit on powerpc64), keeping the maximum address
correct.

Submitted by:	mmacy
2018-11-10 02:37:56 +00:00
Alexander Motin
1fcdb58634 Do not ignore arc_adjust() return value.
This covers scenario when ARC may not shrink as fast as it could:
1. arc_size < arc_c and arc_adjust() does not evict anything, returning
   zero to arc_reclaim_thread();
2. arc_available_memory() reports memory pressure, which can not be
   satisfied by arc_kmem_reap_now();
3. arc_shrink() reduces arc_c and calls arc_adjust(), return of which is
   ignored;
4. even if the last arc_adjust() could not satisfy arc_size < arc_c,
   arc_reclaim_thread() will still go to sleep, since the first one
   returned zero.

Reviewed by:	allanjude, markj, sef
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D17927
2018-11-10 01:58:37 +00:00
Stephen Hurd
cf49cdd5a3 Fix first-packet completion
The first packet after the ring is initialized was never
completed as isc_txd_credits_update() would not include it in the
count of completed packets. This caused netmap to never complete
a batch. See PR 233022 for more details.

PR:		233022
Reported by:	lev
Reviewed by:	lev
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17931
2018-11-09 22:18:43 +00:00
John Baldwin
fe03ca08a6 Use tcp_state_change() in the cxgbe(4) TOE module.
r254889 added tcp_state_change() as a centralized place to log state
changes in TCP connections for DTrace.  r294869 and r296881 took
advantage of this central location to manage per-state counters.
However, TOE sockets were still performing some (but not all) state
change updates via direct assignments to t_state.  This resulted in
state counters underflowing when TOE was in use.  Fix by using
tcp_state_change() when changing a TOE connection's state.

Reviewed by:	np, markj
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17915
2018-11-09 21:16:45 +00:00
Brooks Davis
4b499c75f9 Regen after r340302: Fix freebsd32 mknod(at).
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:02:07 +00:00
Brooks Davis
9a38df59e9 Fix freebsd32 mknod(at).
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument.  64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations.  They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips.  Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.

The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.

Use uint32_t consistently as the type of the old dev_t.  This matches
the old definition.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:01:16 +00:00
Ed Maste
c4698dec73 Add comment to explain kernel ldscript 0x200000 constant
Reported by:	linimon
2018-11-09 20:33:38 +00:00
Ed Maste
1d3ffc719e Octeon SDK: avoid use of uninitialized variable
Reported by:	Clang
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-11-09 19:17:25 +00:00
Ed Maste
8573e2c388 use -m ${LD_EMULATION} for binary->elf link invocation
r306041 changed ld invocations for converting binary files to kernel
ELF objects to pass -m, but missed bespoke ld invocations in a pair of
arm file configs (one of which has since been removed).

This is needed to support some external toolchains and lld.

Sponsored by:	The FreeBSD Foundation
2018-11-09 19:16:01 +00:00
Kyle Evans
13cf5074d0 Use ${ECHO} in dtb/dtbo build, pass in from dtb.mk for -s
Reported by:	sbruno
MFC after:	3 days
2018-11-09 18:56:40 +00:00
Brooks Davis
1632f36305 Regen after r340294: Fix a number of bugs in freebsd32's capabilities.conf.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:06:25 +00:00
Brooks Davis
d457c0b61b Fix a number of bugs in freebsd32's capabilities.conf.
Bugs range from failure to update after changing syscall implementaion
names to using the wrong name.  Somewhat confusingly, the name in
capabilities.conf is exactly the string that appears in syscalls.master,
not the name with a COMPAT* prefix which is the actual function name.

Found while making a change to use the default capabilities.conf.

Fixes:	r335177, r336980, r340272, r340274, others
Reviewed by:	kib, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:03:01 +00:00
Brooks Davis
e1e300b671 Regen after r340274: Make freebsd32_utmx_op follow the freebsd32_foo
convention.
2018-11-09 00:46:50 +00:00
Brooks Davis
b34f4419fb Make freebsd32_umtx_op follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:46:10 +00:00
Brooks Davis
86e06fa55b Regen after 340272: Make __sysctl follow the freebsd32_foo convention
Sponsored by:	DARPA, AFRL
2018-11-09 00:22:45 +00:00
Brooks Davis
4074751718 Make __sysctl follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:21:58 +00:00
Kristof Provost
87e4ca37d5 pf: Prevent tables referenced by rules in anchors from getting disabled.
PR:		183198
Obtained from:	OpenBSD
MFC after:	2 weeks
2018-11-08 21:54:40 +00:00
Justin Hibbits
8f04c0c06b powerpc64: Fix "show spr" command on ELFv2 kernels
Summary: When compiling for ELFv2, it is necessary to adjust the offset to
get_spr and factor in the function prologue to ensure the correct instruction is
being edited.

Test Plan:
Before:
```
db> show spr 110
KDB: reentering
KDB: stack backtrace:
0xc008000020fb96e0: at 0xc000000002bb2e34 = kdb_backtrace+0x68
0xc008000020fb97f0: at 0xc000000002bb3798 = kdb_reenter+0x54
0xc008000020fb9860: at 0xc000000002f87090 = trap+0x4e4
0xc008000020fb9990: at 0xc000000002f78a60 = powerpc_interrupt+0x110
0xc008000020fb9a20: kernel trap 0xe40 by 0xc000000002401978 = get_spr+0x8: srr1=0x9000000000001032
            r1=0xc008000020fb9cd0 cr=0x80009438 xer=0x20040000 ctr=0xc000000002f7b40c r2=0xc0000000037fd000
saved LR(0xfffffffffffffffb) is invalid.
```

After:

```
db> show spr 110
SPR 272(110): c000000003cae900
```

Submitted by:	git_bdragon.rtk0.net
Differential Revision: https://reviews.freebsd.org/D17813
2018-11-08 20:48:44 +00:00
Justin Hibbits
ad39591ad2 powerpc/powernv: Restrict the busdma tag to only POWER8
It seems this tag is causing problems on POWER9 systems.  Since no POWER9 user
has encountered the problem fixed by r339589 just restrict it to POWER8 for now.
A better fix will likely be to update powerpc/busdma_machdep.c to handle the
window correctly.

Reported by:	mmacy, others
2018-11-08 20:31:12 +00:00
Ed Maste
2bfaf585ca Avoid buffer underwrite in icmp_error
icmp_error allocates either an mbuf (with pkthdr) or a cluster depending
on the size of data to be quoted in the ICMP reply, but the calculation
failed to account for the additional padding that m_align may apply.

Include the ip header in the size passed to m_align.  On 64-bit archs
this will have the net effect of moving everything 4 bytes later in the
mbuf or cluster.  This will result in slightly pessimal alignment for
the ICMP data copy.

Also add an assertion that we do not move m_data before the beginning of
the mbuf or cluster.

Reported by:	A reddit user
Reviewed by:	bz, jtl
MFC after:	3 days
Security:	CVE-2018-17156
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17909
2018-11-08 20:17:36 +00:00
Eric van Gyzen
68b840878c in6_ifattach_linklocal: handle immediate removal of the new LLA
If another thread immediately removes the link-local address
added by in6_update_ifa(), in6ifa_ifpforlinklocal() can return NULL,
so the following assertion (or dereference) is wrong.
Remove the assertion, and handle NULL somewhat better than panicking.
This matches all of the other callers of in6_update_ifa().

PR:		219250
Reviewed by:	bz, dab (both an earlier version)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17898
2018-11-08 19:50:23 +00:00
Eric Joyner
37761e2eda ixl/iavf(4): Fix TSO offloads when TXCSUM is disabled
From Jake:
The iflib stack does not disable TSO automatically when TXCSUM is
disabled, instead assuming that the driver will correctly handle TSOs
even when CSUM_IP is not set.

This results in iflib calling ixl_isc_txd_encap with packets which have
CSUM_IP_TSO, but do not have CSUM_IP or CSUM_IP_TCP set. Because of
this, ixl_tx_setup_offload will not setup the IPv4 checksum offloading.

This results in bad TSO packets being sent if a user disables TXCSUM
without disabling TSO.

Fix this by updating the ixl_tx_setup_offload function to check both
CSUM_IP and CSUM_IP_TSO when deciding whether to enable IPv4 checksums.

Once this is corrected, another issue for TSO packets is revealed. The
driver sets IFLIB_NEED_ZERO_CSUM in order to enable a work around that
causes the ip->sum field to be zero'd. This is necessary for ixl
hardware to correctly perform TSOs.

However, if TXCSUM is disabled, then the work around is not enabled, as
CSUM_IP will not be set when the iflib stack checks to see if it should
clear the sum field.

Fix this by adding IFLIB_TSO_INIT_IP to the iflib flags for the iavf and
ixl interface files.

It is uncertain if the hardware needs IFLIB_NEED_ZERO_CSUM for any other
case besides TSO, so leave that flag assigned. It may be worth
investigating to see if this work around flag could be disabled in
a future change.

Once both of these changes are made, the ixl driver should correctly
offload TSO packets when TSO4 offload is enabled, regardless of whether
TXCSUM is enabled or disabled.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, shurd@
MFC after:	0 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D17900
2018-11-08 19:10:43 +00:00
Mark Johnston
5186028dc4 Use --work-tree instead of specifying an absolute path.
Otherwise the diff command being run from outside the checkout resulted
in warnings.

Discussed with:	emaste
X-MFC with:	r340083
2018-11-08 17:20:00 +00:00
Mateusz Guzik
f1161465f4 amd64: align memset buffers to 16 bytes before using rep stos
Both Intel manual and Agner Fog's docs suggest aligning to 16.

See the review for benchmark results.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17661
2018-11-08 15:12:36 +00:00
Hans Petter Selasky
fb8a716d28 Don't read the USB audio sync endpoint when we don't use it to save
isochronous bandwidth.

MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-11-08 12:46:47 +00:00
Mark Johnston
150d384e5c Fix a use-after-free in swp_pager_meta_free().
This was introduced in r326329 and explains the crashes mentioned in
the commit log message for r339934.  In particular, on INVARIANTS
kernels, UMA trashing causes the loop to exit early, leaving swap
blocks behind when they should have been freed.  After r336984 this
became more problematic since new anonymous mappings were more
likely to reuse swapped-out subranges of existing VM objects, so faults
would trigger pageins of freed memory rather than returning zeroed
pages.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17897
2018-11-07 23:28:11 +00:00
Ed Maste
179460e148 newvers.sh: avoid regenerating vers.c if content unchanged
When reproducible build mode is enabled vers.c may be unchanged between
successive builds.  In this case avoid changing the file's metadata so
that it does not cause dependent targets to be rebuilt.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D17892
2018-11-07 20:36:57 +00:00
Stephen Hurd
a42546df88 Fix rxcsum issue introduced in r338838
r338838 attempted to fix issues with rxcsum and rxcsum6.
However, the rxcsum bits were set as though if_setcapenablebit() was
being called, not if_togglecapenable() which is in use. As a result,
it was not possible to disable rxcsum when rxcsum6 was supported.

PR:		233004
Reported by:	lev
Reviewed by:	lev
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17881
2018-11-07 19:31:48 +00:00
John Baldwin
4bf4b0f139 Enable non-executable stacks by default on RISC-V.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17878
2018-11-07 18:32:02 +00:00
John Baldwin
c5e797a836 Drop the legacy ELF brandinfo for the old rtld from arm64 and riscv.
These architectures never shipped binaries with an rtld path of
/usr/libexec/ld-elf.so.1.

Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17876
2018-11-07 18:28:55 +00:00
John Baldwin
274c0a806a Enable use of a global shared page for RISC-V.
machine/vmparam.h already defines the SHAREDPAGE constant.  This
change just enables it for ELF executables.  The only use of the
shared page currently is to hold the signal trampoline.

Reviewed by:	markj, kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17875
2018-11-07 18:27:43 +00:00
Brooks Davis
5577e44bf4 Regen after r340221: allow pointer return types.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:56:07 +00:00
Brooks Davis
e56ec0e519 makesyscalls.sh: allow pointer return types.
The previous code required that the return type be a single word.  This
allows it to be a pointer without using a typedef.

Update the return types of break, mmap, and shmat to be void * as
declared.  This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:55:04 +00:00
Maxim Sobolev
de66da7374 Revert r340187, it breaks EOD (end-of-device) detection logic. Turns out,
i/o into last_sector+N is handled differently for N==1 and N>1 cases to
accomodate that, so some other approach would be needed to fix DIOCGDELETE
ioctl(2).
2018-11-07 16:28:09 +00:00
Alex Richardson
57fe7128b7 Handle the DT_MIPS_RLD_MAP_REL dynamic tag in RTLD
This dynamic tag contains the location of the .rld_map section relative to
the location of the dynamic tag. For PIE MIPS binaries DT_MIPS_RLD_MAP can
not be used since it contains an absolute address. Without this change
GDB can not find the function program counters in other libraries and once
I apply this change I can successfully run info sharedlibraries again.

Reviewed By:	kib
Differential Revision: https://reviews.freebsd.org/D17867
2018-11-07 15:04:41 +00:00
Hans Petter Selasky
36d2d637dd Sometimes the complete split packet may be queued too early and the
transaction translator will return a NAK. Ignore this message and
retry the complete split instead.

MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-11-07 08:25:44 +00:00
Justin Hibbits
6a0fd1a51b powerpc/atomic: Loosen the memory barrier on atomic_load_acq_*()
'sync' is pretty heavy-handed, and is unnecessary for this use case.  It's a
full barrier, which is applicable for all storage types.  However,
atomic_load_acq_*() is only expected to operate on physical memory, not
device memory, so lwsync is sufficient (lwsync provides access ordering on
memory that is marked as Coherency Required and is not Write Through nor
Cache Inhibited).  On 32-bit systems, this is a nop, since powerpc_lwsync()
is defined to use sync, as a workaround for a silicon bug in the Freescale
e500 core.
2018-11-07 01:42:00 +00:00
Mark Johnston
f8a222010f Avoid fixing the tty_info() buffer size in tty.h.
Different compilation units may otherwise get a different view of the
layout of struct tty depending on whether they include opt_printf.h.
This caused a blowup in the number of types defined in the kernel's
CTF file after r339468; thanks to dim@ for bisecting down to that
revision.

PR:		232675
Reported by:	dim
Reviewed by:	cem (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17877
2018-11-06 23:41:44 +00:00
Rick Macklem
6ad8a6eaa4 Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end.
Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error);
in many places. This patch replaces these code sequenences with a "goto out;"
and does the NFSVOPUNLOCK(); return (error); at the end of the function
in order to make the vnode locking simpler.
This patch does not change the semantics of nfs_advlock().

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D17853
2018-11-06 22:50:50 +00:00
John Baldwin
2f3736eb65 Treat the memory lengths for CHELSIO_T4_GET_MEM as unsigned.
Previously attempts to read the MC region were failing since the
length was greater than 2^31.

Reviewed by:	np
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D17857
2018-11-06 22:33:36 +00:00
Mark Johnston
07702f72e5 Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map.
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.

Reported by:	jhb
Reviewed by:	alc, jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17827
2018-11-06 21:57:03 +00:00
Mark Johnston
6741ea083f We need opt_stack.h after r339605.
Reviewed by:	cem
Sponsored by:	The FreeBSD Foundation
2018-11-06 21:47:22 +00:00
Brooks Davis
dd4d2f216f Update some comments made obsolete by recent commits. 2018-11-06 20:45:15 +00:00
Brooks Davis
938e8dcf60 Regen after r340199: Use declared types for caddr_t arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:47:29 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Mariusz Zaborski
0b39d7e377 Remove ppoll. freebsd32 doesn't define a ppoll syscall.
Reported by:	jhb
2018-11-06 18:26:40 +00:00
Mariusz Zaborski
279e464dd5 Regenerate after r340195. 2018-11-06 18:06:52 +00:00
Mariusz Zaborski
4a1f3ed354 capsicum: Add ppoll and freebsd32_ppoll to compat32.
PR:		232495
Pointed out by: brooks
MFC after:	2 weeks
2018-11-06 18:05:46 +00:00
Mariusz Zaborski
f4a035b8df Regenerate after r340129.
Pointed out by:	brooks
2018-11-06 18:03:04 +00:00
Andrew Turner
3869df5d71 Add the KUBSAN options to the arm64 and amd64 GENERIC kernel config files.
As the kernel file size may be too large to run with a stock loader comment
them out for now.

Sponsored by:	DARPA, AFRL
2018-11-06 17:47:58 +00:00
Mark Johnston
f71ef9b686 Use plain atomic_{add,subtract} when that's sufficient.
CID:		1386920
MFC after:	2 weeks
2018-11-06 17:32:25 +00:00
Andrew Turner
4ea56599e8 Port the NetBSD ubsan runtime to the FreeBSD kernel.
This allows us to build the ubsan code added in r340189 into the kernel
with the KUBSAN option. This will report when undefined behaviour is
detected in the currently running kernel.

As it can be large, the kernel is 65MB on arm64, loader may not be able to
load the kernel on all architectures so is disabled by default for now.

Sponsored by:	DARPA, AFRL
2018-11-06 17:32:07 +00:00
Andrew Turner
0645126fae Import the NetBSD micro ubsan code for the kernel.
This imports revision 1.3 of common/lib/libc/misc/ubsan.c from NetBSD, the
micro-ubsan code. It is an implementation of the Undefined Behavior
Sanitizer runtime for use with recent clang and gcc.

The uubsan code will be used in a later commit to implement kubsan to help
find undefined behavior in the kernel.

Sponsored by:	DARPA, AFRL
2018-11-06 16:56:49 +00:00
Maxim Sobolev
8948179aba Don't allow BIO_READ, BIO_WRITE or BIO_DELETE requests that are
fully beyond the end of providers media. The only exception is made
for the zero length transfers which are allowed to be just on the
boundary. Previously, any requests starting on the boundary (i.e. next
byte after the last one) have been allowed to go through.

No response from:	freebsd-geom@, phk
MFC after:		1 month
2018-11-06 15:55:41 +00:00
Tijl Coosemans
02bf7e5e40 Fix builds with COMPAT_LINUX32 in the kernel config.
MFC after:	3 days
2018-11-06 15:29:44 +00:00
Tijl Coosemans
8fc08087a1 On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR:		206711
Reviewed by:	kib
MFC after:	3 days
2018-11-06 13:51:08 +00:00
Michael Tuexen
8553b984a5 Don't use a function when neither INET nor INET6 are defined.
This is a valid case for the userland stack, where this fixes
two set-but-not-used warnings in this case.

Thanks to Christian Wright for reporting the issue.
2018-11-06 12:55:03 +00:00
Mark Johnston
8002c3a495 Initialize last_target in the laundry thread control loop.
In practice it is always initialized because nfreed must be positive
in order to trigger background laundering, but this isn't obvious.

CID:		1387997
MFC after:	1 week
2018-11-06 02:52:54 +00:00
John Baldwin
5cdaef71a9 Add a facility for transmitting "raw" work requests on regular NIC queues.
- Use PH_loc.eight[1] as a general 'cflags' (Chelsio flags) field to
  describe properties of a queued packet.  The MC_RAW_WR flag
  indicates an mbuf holding a raw work request.  mbuf_cflags() returns
  the current flags.
- Raw work request mbufs are allocated via alloc_wr_mbuf() which will
  allocate a single contiguous range to hold the mbuf data.  The
  consumer can use mtod() to obtain the start of the work request and
  write the required work request in the buffer.  The mbuf can then be
  enqueued directly to the txq via mp_ring_enqueue().
- Since raw work requests might potentially send arbitrary work
  requests, only set the EQUIQ and EQUEQ bits on work requests that
  support them such as the normal tunneled Ethernet packet work
  requests.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17811
2018-11-06 00:11:36 +00:00
Brooks Davis
44cbc1c2b7 Fix a couple indentation errors in r339958. 2018-11-06 00:09:43 +00:00
Ed Maste
35dee42b5d capability.h: add comment about planned removal timeline
PR:		233007
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-11-06 00:05:17 +00:00
John Baldwin
7f7f6f85a1 Add a custom implementation of cpu_lock_delay() for x86.
Avoid using DELAY() since it can try to use spin locks on CPUs without
a P-state invariant TSC.  For cpu_lock_delay(), always use the TSC if
it exists (even if it is not P-state invariant) to delay for a
microsecond.  If the TSC does not exist, read from I/O port 0x84 to
delay instead.

PR:		228768
Reported by:	Roger Hammerstein <cheeky.m@live.com>
Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17851
2018-11-05 22:54:03 +00:00
John Baldwin
3c03efc4ab Add a delay_tsc() static function for when DELAY() uses the TSC.
This uses slightly simpler logic than the existing code by using the
full 64-bit counter and thus not having to worry about counter
overflow.

Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17850
2018-11-05 22:51:45 +00:00
Ed Maste
24ac7c3b27 revert r340156, restoring sys/sys/capability.h
More time is still needed for ports to accommodate the migration to
capsicum.h.

The header was renamed in 2014 due to concerns about conflicts with with
a draft POSIX.1e capability.h header on other systems and there is (now)
no need for complex autoconf tests for both capability.h and capsicum.h.
Any supported Capsicum-capable system has capsicum.h.

Reported by:	antoine
Sponsored by:	The FreeBSD Foundation
2018-11-05 22:36:45 +00:00
John Baldwin
4cbbb74888 Add a KPI for the delay while spinning on a spin lock.
Replace a call to DELAY(1) with a new cpu_lock_delay() KPI.  Currently
cpu_lock_delay() is defined to DELAY(1) on all platforms.  However,
platforms with a DELAY() implementation that uses spin locks should
implement a custom cpu_lock_delay() doesn't use locks.

Reviewed by:	kib
MFC after:	3 days
2018-11-05 21:34:17 +00:00
John Baldwin
ff9738d954 Rework setting PTE_D for kernel mappings.
Rather than unconditionally setting PTE_D for all writeable kernel
mappings, set PTE_D for writable mappings of unmanaged pages (whether
user or kernel).  This matches what amd64 does and also matches what
the RISC-V spec suggests (preset the A and D bits on mappings where
the OS doesn't care about the state).

Suggested by:	alc
Reviewed by:	alc, markj
Sponsored by:	DARPA
2018-11-05 20:00:36 +00:00
Ed Maste
335a736a20 Remove backwards-compatibility sys/capability.h
In r263232 sys/capability.h was renamed to sys/capsicum.h, to avoid
conflicts with a capability.h header found on other operating systems.

Sufficient time has now passed, so remove the old header at the
beginning of FreeBSD 13.

Discussed with:	oshogbo
Sponsored by:	The FreeBSD Foundation
2018-11-05 19:25:57 +00:00
Warner Losh
74c0112fef Only assert locked for many async events.
Many async events that we see are called for this specific path. When
calling an async callback for a targetted device, XTP will lock that
specific device's path lock (same as what cam_periph_lock does). For
those AC_ events, assert we have the lock rather than trying to
recusrively take it (which causes panics since it's not recursive).

Add annotations about this and about the fact that AC_SCSI_AEN events
are generated now only in the ata stack (which cannot have a scsi_da
attachment). Leave it in place in case I've overlooked something as
the code is harmless.

This is fallout from my attempts to "fix" locking for softc->flags in
r330796 that's not been triggered often enough to get my attention
until now.

Sponsored by: Netflix
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D17837
2018-11-05 18:47:29 +00:00
Matt Macy
acf50a7f68 hwpmc: limit wait for user callchain collection to 1 tick
The hwpmc pcpu sample buffer is prone to head of line blocking
when waiting for user process to return to user space and
collect a pending callchain. If more than one tick has elapsed
between the time the sample entry was marked for collection and
the time that the hardclock pmc handler runs to copy the records
to a larger temporary buffer, mark the sample entry as not in
use.

This changes reduces the number of samples marked as not valid
when collecting under load from ~99.5% to 5-20%.

Reported by:	mjg@
MFC after:	3 days
2018-11-05 08:11:16 +00:00
Justin Hibbits
b465e0bb56 powerpc/SMP: Don't spam the console with AP bringup messages
Especially on new POWER9 systems, the console can be filled with

  SMP: AP CPU #XX launched

messages.  This can also slow down the console printing.  Instead, do what
x86 now does, as of r333335, and print it all on one line, unless
bootverbose is set.
2018-11-05 01:53:20 +00:00
Konstantin Belousov
f6bb885ff6 Move the fixed base for PIE loading on arm.
Existing base causes conflicts for direct execution of ld-elf.so.1
because default linking base for non-PIE binaries is 0x10000.

Reported and tested by:	Mark Millard <marklmi26-fbsd@yahoo.com>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-11-04 19:11:32 +00:00
Eugene Grosbein
a594f9453b Make ng_pptpgre(8) netgraph node be able to restore order for packets
reordered in transit instead of dropping them altogether.
It uses sequence numbers of PPtPGRE packets.

A set of new sysctl(8) added to control this ability or disable it:

net.graph.pptpgre.reorder_max (1) defines maximum length of node's
private reorder queue used to keep data waiting for late packets.
Zero value disables reordering. Default value 1 allows the node to restore
the order for two packets swapped in transit. Greater values allow the node
to deliver packets being late after more packets in sequence
at cost of increased kernel memory usage.

net.graph.pptpgre.reorder_timeout (1) defines time value in miliseconds
used to wait for late packets. It may be useful to increase this
if reordering spot is distant.

MFC after:	1 month
2018-11-04 19:10:44 +00:00
Mariusz Zaborski
82560231d3 capsicum: allow ppoll(2) in capability mode
We already allow to use poll(2). There is no reason to disallow ppoll(2).

PR:		232495
Submitted by:	Stefan Grundmann <sg2342@googlemail.com>
Reviewed by:	cem, oshogbo
MFC after:	2 weeks
2018-11-04 17:12:53 +00:00
Matt Macy
dacc43df34 Add aditional counter descriptions to AMD 0x17
Submitted by:	Somalapuram Amaranath
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17401
2018-11-04 06:24:27 +00:00
Matt Macy
10f42d244b Convert epoch to read / write records per cpu
In discussing D17503 "Run epoch calls sooner and more reliably" with
sbahra@ we came to the conclusion that epoch is currently misusing the
ck_epoch API. It isn't safe to do a "write side" operation (ck_epoch_call
or ck_epoch_poll) in the middle of a "read side" section. Since, by definition,
it's possible to be preempted during the middle of an EPOCH_PREEMPT
epoch the GC task might call ck_epoch_poll or another thread might call
ck_epoch_call on the same section. The right solution is ultimately to change
the way that ck_epoch works for this use case. However, as a stopgap for
12 we agreed to simply have separate records for each use case.

Tested by: pho@

MFC after:	3 days
2018-11-03 03:43:32 +00:00
Alexander Motin
b4d66a1739 9952 Block size change during zfs receive drops spill block
Replication code in receive_object() falsely assumes that if received
object block size is different from local, then it must be a new object
and calls dmu_object_reclaim() to wipe it out. In most cases it is not a
problem, since all dnode, bonus buffer and data block(s) are immediately
rewritten any way, but the problem is that spill block (if used) is not.
This means loss of ACLs, extended attributes, etc.

This issue can be triggered in very simple way:
1. create 4KB file with 10+ ACL entries;
2. take snapshot and send it to different dataset;
3. append another 4KB to the file;
4. take another snapshot and send incrementally;
5. witness ACL loss on receive side.

PR:		198457
Discussed with:	mahrens
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2018-11-03 03:10:06 +00:00
Warner Losh
4668666fe4 Implement ability to turn on/off PHYs for AHCI devices.
As part of Chuck's work on fixing kernel crashes caused by disk I/O
errors, it is useful to be able to trigger various kinds of
errors. This patch allows causing an AHCI-attached disk to disappear,
by having the driver keep the PHY disabled when the driver would
otherwise enable the PHY. It also allows making the disk reappear by
having the driver go back to setting the PHY enable/disable state as
it normal would and simulating the hardware event that causes a bus
rescan.

Submitted by: Chuck Silvers
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D16043
2018-11-03 00:37:51 +00:00
Jung-uk Kim
4a38ee6de7 MFV: r339981
Merge ACPICA 20181031.
2018-11-02 22:50:13 +00:00
Ed Maste
50b53a8dc3 newvers.sh: fix git false positive -dirty tag
Assuming that any output from `git diff-index --name-only` implies
changes in the working tree results in false positives: files with
metadata, but not content, changes are also listed.

Check that content differences exist before adding the -dirty tag to
the git hash.

PR:		229230
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15968
2018-11-02 21:20:46 +00:00
Ed Maste
97a7bf3070 embed_mfs.sh: replace some compound statements with conventional ifs
Use the more readable form - there's no need to try being clever.
2018-11-02 21:07:06 +00:00
Brooks Davis
4e8c73eb20 Regen after r340080: Add const to input-only char * arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:56:19 +00:00
Brooks Davis
12e69f96a2 Add const to input-only char * arguments.
These arguments are mostly paths handled by NAMEI*() macros which already
take const char * arguments.

This change improves the match between syscalls.master and the public
declerations of system calls.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17812
2018-11-02 20:50:22 +00:00
Ed Maste
09f4e462fb sys/types.h: avoid using terse macro _M
Although _M is reserved for use by the implemenation it is rather non-
descriptive and conflicted with a libc++ test.  Just rename to _Major
and _Minor to avoid conflicts.

Reviewed by:	dim
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D16734
2018-11-02 20:48:29 +00:00
Kristof Provost
58ef854f8b pf: Fix build if INVARIANTS is not set
r340061 included a number of assertions pf_frent_remove(), but these assertions
were the only use of the 'prev' variable. As a result builds without
INVARIANTS had an unused variable, and failed.

Reported by:	vangyzen@
2018-11-02 19:23:50 +00:00
Jonathan T. Looney
54e675342b m_pulldown() may reallocate n. Update the oip pointer after the
m_pulldown() call.

MFC after:	2 weeks
Sponsored by:	Netflix
2018-11-02 19:14:15 +00:00
Ed Maste
e74f411d47 Define NT_FREEBSD_FEATURE_CTL ELF note type
This ELF note will be used to allow binaries to opt out of, or in to,
upcoming vulnerability mitigation and other features.

Committing the definition and readelf change separately to allow
independent MFC.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-02 19:02:03 +00:00
Warner Losh
003ffd57fe Add sysctl_usec_to_sbintime and sysctl_msec_to_sbintime.
These functions are used to present a sbintime_t as either a number of
microseconds or a number of milliseconds respectively.

Sponsored by: Netflix
2018-11-02 17:50:57 +00:00
Kristof Provost
14624ab582 pf: Keep a reference to struct ifnets we're using
Ensure that the struct ifnet we use can't go away until we're done with
it.
2018-11-02 17:05:40 +00:00
Kristof Provost
dde6e1fecb pfsync: Add missing unlock
If we fail to set up the multicast entry for pfsync and return an error
we must release the pfsync lock first.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17506
2018-11-02 17:03:53 +00:00
Alexander Motin
af6a86eb9a Adjust SiS 966/968 HDA controller naming.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
2018-11-02 17:02:10 +00:00
Kristof Provost
04fe85f068 pfsync: Allow module to be unloaded
MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17505
2018-11-02 17:01:18 +00:00
Kristof Provost
fbbf436d56 pfsync: Handle syncdev going away
If the syncdev is removed we no longer need to clean up the multicast
entry we've got set up for that device.

Pass the ifnet detach event through pf to pfsync, and remove our
multicast handle, and mark us as no longer having a syncdev.

Note that this callback is always installed, even if the pfsync
interface is disabled (and thus it's not a per-vnet callback pointer).

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17502
2018-11-02 16:57:23 +00:00
Kristof Provost
26549dfcad pfsync: Ensure uninit is done before pf
pfsync touches pf memory (for pf_state and the pfsync callback
pointers), not the other way around. We need to ensure that pfsync is
torn down before pf.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17501
2018-11-02 16:53:15 +00:00
Kristof Provost
25c6ab1b78 Notify that the ifnet will go away, even on vnet shutdown
pf subscribes to ifnet_departure_event events, so it can clean up the
ifg_pf_kif and if_pf_kif pointers in the ifnet.
During vnet shutdown interfaces could go away without sending the event,
so pf ends up cleaning these up as part of its shutdown sequence, which
happens after the ifnet has already been freed.

Send the ifnet_departure_event during vnet shutdown, allowing pf to
clean up correctly.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17500
2018-11-02 16:50:17 +00:00
Kristof Provost
5f6cf24e2d pfsync: Make pfsync callbacks per-vnet
The callbacks are installed and removed depending on the state of the
pfsync device, which is per-vnet. The callbacks must also be per-vnet.

MFC after:	2 weeks
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D17499
2018-11-02 16:47:07 +00:00
Mark Johnston
2203c46d87 Initialize the eflags field of vm_map headers.
Initializing the eflags field of the map->header entry to a value with a
unique new bit set makes a few comparisons to &map->header unnecessary.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc, kib
Tested by:	pho
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14005
2018-11-02 16:26:44 +00:00
Navdeep Parhar
5c239d80c0 cxgbe/iw_cxgbe: Suppress spurious "Unexpected streaming data ..."
messages.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-02 16:21:44 +00:00
Kristof Provost
790194cd47 pf: Limit the fragment entry queue length to 64 per bucket.
So we have a global limit of 1024 fragments, but it is fine grained to
the region of the packet.  Smaller packets may have less fragments.
This costs another 16 bytes of memory per reassembly and devides the
worst case for searching by 8.

Obtained from:	OpenBSD
Differential Revision:	https://reviews.freebsd.org/D17734
2018-11-02 15:32:04 +00:00
Kristof Provost
fd2ea405e6 pf: Split the fragment reassembly queue into smaller parts
Remember 16 entry points based on the fragment offset.  Instead of
a worst case of 8196 list traversals we now check a maximum of 512
list entries or 16 array elements.

Obtained from:	OpenBSD
Differential Revision:	https://reviews.freebsd.org/D17733
2018-11-02 15:26:51 +00:00
Kristof Provost
2b1c354ee6 pf: Count holes rather than fragments for reassembly
Avoid traversing the list of fragment entris to check whether the
pf(4) reassembly is complete.  Instead count the holes that are
created when inserting a fragment.  If there are no holes left, the
fragments are continuous.

Obtained from:	OpenBSD
Differential Revision:	https://reviews.freebsd.org/D17732
2018-11-02 15:23:57 +00:00
Hans Petter Selasky
46b05e1923 Add new USB v2.0 PCI ID.
Submitted by:		Dmitry Luhtionov <dmitryluhtionov@gmail.com>
Sponsored by:		Mellanox Technologies
2018-11-02 15:03:52 +00:00
Kristof Provost
19a22ae313 Revert "pf: Limit the maximum number of fragments per packet"
This reverts commit r337969.
We'll handle this the OpenBSD way, in upcoming commits.
2018-11-02 15:01:59 +00:00
Brooks Davis
1493c2ee62 Make vop_symlink take a const target path.
This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by:	kib (prior version)
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17805
2018-11-02 14:42:36 +00:00
Martin Wilke
f5a7a8cd67 - Add quirk for Samsung on Mac Mini 7,1
PR:		201676
Submitted by:	Ruben Kerkhof
Approved by:	araujo (mentor)
Obtained from:	TrueOS
Sponsored by:	iXsystems Inc.
Differential Revision:	https://review.freebsd.org/D17815
2018-11-02 07:48:23 +00:00
Conrad Meyer
78c2a9806e kern_poll: Restore explanatory comment removed in r177374
The comment isn't stale.  The check is bogus in the sense that poll(2)
does not require pollfd entries to be unique in fd space, so there is no
reason there cannot be more pollfd entries than open or even allowed
fds.  The check is mostly a seatbelt against accidental misuse or
abuse.  FD_SETSIZE, while usually unrelated to poll, is used as an
arbitrary floor for systems with very low kern.maxfilesperproc.

Additionally, document this possible EINVAL condition in the poll.2
manual.

No functional change.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17671
2018-11-01 23:46:23 +00:00
Ed Maste
ea96b3de2b Retire CLANG_NO_IAS34
CLANG_NO_IAS34 was introduced in r276696 to allow then-HEAD kernels to
be built with clang 3.4 in FreeBSD 10.  As FreeBSD 11 and later includes
a version of Clang with a sufficiently capable integrated assembler we
do not need the workaround any longer.

Sponsored by:	The FreeBSD Foundation
2018-11-01 23:11:47 +00:00
Brooks Davis
f7e5ce325f Regent after r340034: Use mode_t when the documented signature does.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17784
2018-11-01 23:10:53 +00:00
Brooks Davis
2105ac07d7 Use mode_t when the documented signature does.
This is more clear and produces better results when generating function
stubs from syscalls.master.

Reviewed by:	kib, emaste
Obtained from:	CheribSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17784
2018-11-01 23:06:50 +00:00
John Baldwin
d198cb6d83 Restrict setting PTE execute permissions on RISC-V.
Previously, RISC-V was enabling execute permissions in PTEs for any
readable page.  Now, execute permissions are only enabled if they were
explicitly specified (e.g. via PROT_EXEC to mmap).  The one exception
is that the initial kernel mapping in locore still maps all of the
kernel RWX.

While here, change the fault type passed to vm_fault and
pmap_fault_fixup to only include a single VM_PROT_* value representing
the faulting access to match other architectures rather than passing a
bitmask.

Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17783
2018-11-01 22:23:15 +00:00
John Baldwin
6f888020df Set PTE_A and PTE_D for user mappings in pmap_enter().
This assumes that an access according to the prot in 'flags' triggered
a fault and is going to be retried after the fault returns, so the two
flags are set preemptively to avoid refaulting on the retry.

While here, only bother setting PTE_D for kernel mappings in pmap_enter
for writable mappings.

Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17782
2018-11-01 22:17:51 +00:00
John Baldwin
a751b25546 SBI calls expect a pointer to a u_long rather than a pointer.
This is just cosmetic.

A weirder issue is that the SBI doc claims the hart mask pointer should
be a physical address, not a virtual address.  However, the implementation
in bbl seems to just dereference the address directly.

Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17781
2018-11-01 22:15:25 +00:00
John Baldwin
344adeab18 Don't allow debuggers to modify SSTATUS, only to read it.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17771
2018-11-01 22:13:22 +00:00
John Baldwin
ada1ceef0b Implement ptrace_set_pc() and fail PT_*STEP requests explicitly.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17769
2018-11-01 22:11:26 +00:00
Warner Losh
9385e92b25 Add comments explaining what hold/unhold do
They act as a simple one-deep semaphore to keep open/close/probe from
running at the same time to avoid races that creates.
2018-11-01 21:51:41 +00:00
John Baldwin
7890b5c14e Check cannot_use_txpkts() rather than needs_tso() in add_to_txpkts().
Currently this is a no-op, but will matter in the future when
cannot_use_txpkts() starts checking other conditions than just
needs_tso().

Sponsored by:	Chelsio Communications
2018-11-01 21:49:49 +00:00
John Baldwin
1d8d91db74 Add support for port unit wiring to cxgbe(4).
- Add a bus_child_location_str method to the nexus drivers that prints
  out 'port=N' as the location string exported via devinfo and the
  '%location' sysctl node.

- We can't use a bus_hint_device_unit to wire the unit numbers of
  devices with a fixed devclass as the device gets assigned a unit in
  make_device() before the device creator can set softc, etc.
  Instead, when adding a child device, use a helper function much like
  a bus_hint_device_unit method to look for wiring hints or to return
  -1 to let the system choose a unit number.  This function requires
  an "at" hint for the port pointing to the nexus device and a "port"
  hint listing the port number.  For example:

hint.cxl.4.at="t5nex0"
hint.cxl.4.port="0"

  wires cxl4 to the first port on the t5nex0 adapter.

Requested by:	gallatin
MFC after:	2 months
2018-11-01 21:46:37 +00:00
John Baldwin
dcd50a20b7 Assert that reclaim_tx_descs() is always making forward progress.
MFC after:	2 months
Sponsored by:	Chelsio Communications
2018-11-01 21:39:33 +00:00
John Baldwin
b317cfd4c0 Don't enter DDB for fatal traps before panic by default.
Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic'
and make the calls to kdb_trap() in MD fatal trap handlers prior to
calling panic() conditional on this new knob instead of
'debugger_on_panic'.  Disable the new knob by default.  Developers who
wish to recover from a fatal fault by adjusting saved register state
and retrying the faulting instruction can still do so by enabling the
new knob.  However, for the more common case this makes the user
experience for panics due to a fatal fault match the user experience
for other panics, e.g. 'c' in DDB will generate a crash dump and
reboot the system rather than being stuck in an infinite loop of fatal
fault messages and DDB prompts.

Reviewed by:	kib, avg
MFC after:	2 months
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17768
2018-11-01 21:34:17 +00:00
Andrew Turner
a9725b6332 Add the ARMv8.3 SCTLR_EL1 fields.
While here tag which architecture release fields were added and remove a
field that only existed in very early releases of the ARMv8 spec.

Sponsored by:	DARPA, AFRL
2018-11-01 17:43:28 +00:00
Eric Joyner
5e6652144d ixl/iavf(4): Update remaining references of "num_queues" to "num_rx_queues"
This should fix a build issue when "options RSS" is set.

Reported by:	bz@
Sponsored by:	Intel Corporation
2018-11-01 17:29:14 +00:00
Bjoern A. Zeeb
e2c532f156 carpstats are the last virtualised variable in the file and end up at the
end of the vnet_set.  The generated code uses an absolute relocation at
one byte beyond the end of the carpstats array.  This means the relocation
for the vnet does not happen for carpstats initialisation and as a result
the kernel panics on module load.

This problem has only been observed with carp and only on i386.
We considered various possible solutions including using linker scripts
to add padding to all kernel modules for pcpu and vnet sections.

While the symbols (by chance) stay in the order of appearance in the file
adding an unused non-file-local variable at the end of the file will extend
the size of set_vnet and hence make the absolute relocation for carpstats
work (think of this as a single-module set_vnet padding).

This is a (tmporary) hack.  It is the least intrusive one as we need a
timely solution for the upcoming release.  We will revisit the problem in
HEAD.  For a lot more information and the possible alternate solutions
please see the PR and the references therein.

PR:			230857
MFC after:		3 days
2018-11-01 17:26:18 +00:00
Andrew Turner
b4b90c1f4c Add the ARMv8.3 HCR_EL2 register fields.
MFC after:	1 month
Sponsored by:	DARPA, AFRL
2018-11-01 17:05:10 +00:00
Mark Johnston
d9ff5789be Remove redundant checks for a NULL lbgroup table.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17108
2018-11-01 15:52:49 +00:00
Mark Johnston
79ee680b65 Improve style in in_pcbinslbgrouphash() and related subroutines.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17107
2018-11-01 15:51:49 +00:00
Ben Widawsky
3d40cdf014 linuxkpi: Add GFP flags needed for ttm drivers
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Requested by:	bwidawsk
MFC after:	3 days
Approved by:	emaste (mentor)
2018-11-01 15:30:01 +00:00
Rick Macklem
881a9516a2 Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.

Tested by:	rlibby
PR:		232673
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17757
2018-11-01 15:27:22 +00:00
Michael Tuexen
6999f6975c Remove debug code which slipped in accidently.
MFC after:		4 weeks
X-MFC with:		r339989
Sponsored by:		Netflix, Inc.
2018-11-01 11:41:40 +00:00
Michael Tuexen
099ab39f44 Improve a comment to refer to the actual sections in the TCP
specification for the comparisons made.
Thanks to lstewart@ for the suggestion.

MFC after:		4 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17595
2018-11-01 11:35:28 +00:00
Andrew Turner
e403f9865d Use the correct offsets for the trap frame in fork_trampoline.
Sponsored by:	DARPA, AFRL
2018-11-01 10:25:22 +00:00
Konstantin Belousov
6bc6a54280 Add pci_early function to detect Intel stolen memory.
On some Intel devices BIOS does not properly reserve memory (called
"stolen memory") for the GPU.  If the stolen memory is claimed by the
OS, functions that depend on stolen memory (like frame buffer
compression) can't be used.

A function called pci_early_quirks that is called before the virtual
memory system is started was added. In Linux, this PCI early quirks
function iterates through all PCI slots to check for any device that
require quirks.  While this more generic solution is preferable I only
ported the Intel graphics specific parts because I think my
implementation would be too similar to Linux GPL'd solution after
looking at the Linux code too much.

The code regarding Intel graphics stolen memory was ported from
Linux. In the case of Intel graphics stolen memory this
pci_early_quirks will read the stolen memory base and size from north
bridge registers.  The values are stored in global variables that is
later read by linuxkpi_gplv2. Linuxkpi stores these values in a
Linux-specific structure that is read by the drm driver.

Relevant linuxkpi code is here:
https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.16/linuxkpi/gplv2/src/linux_compat.c#L37

For now, only amd64 arch is suppor ted since that is the only arch
supported by the new drm drivers. I was told that Intel GPUs are
always located on 0:2:0 so these values are hard coded for now.

Note that the structure and early execution of the detection code is
not required in its current form, but we expect that the code will be
added shortly which fixes the potential BIOS bugs by reserving the
stolen range in phys_avail[].  This must be done as early as possible
to avoid conflicts with the potential usage of the memory in kernel.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	bwidawsk, imp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16719
Differential revision:	https://reviews.freebsd.org/D17775
2018-10-31 23:17:00 +00:00
Kyle Evans
90ba2725c1 i386/MINIMAL: VERBOSE_SYSINIT=0 for consistency
MFC after:	never
2018-10-31 22:55:43 +00:00
Kyle Evans
be352d20d5 Compile in VERBOSE_SYSINIT support by default, remain silent by default
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.

MFC after:	never
2018-10-31 22:38:19 +00:00
Gleb Smirnoff
ee7e6f676e Define QMD_SAVELINK() only for QUEUE_MACRO_DEBUG_TRASH case. Otherwise
with QUEUE_MACRO_DEBUG_TRACE compilation fails due to unused variable.
2018-10-31 19:37:11 +00:00
Navdeep Parhar
6933902df4 cxgbe(4): Add rate limiting support for UDP.
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-10-31 19:19:13 +00:00
Navdeep Parhar
1272051e82 cxgbe(4): Report a reasonable non-zero if_hw_tsomaxsegsize to the
kernel.

This reverts an accidental change that snuck in with r339628.

Sponsored by:	Chelsio Communications
2018-10-31 18:30:17 +00:00
Andrew Turner
8c0e047668 Always set the MP_QUIRK_CPULIST quirk under ACPI. This needs a run time
check to only set it for emulators as the CPU list may be changed when
the emulator starts. Until this is working just always set it.

Sponsored by:	DARPA, AFRL
2018-10-31 17:41:53 +00:00
Brooks Davis
e3e5481326 Reformat syscalls.master for better readability.
This takes advantage of two recents changes to makesyscalls.sh:
r328598: Permit a range of syscall numbers for UNIMPL
r339624: Remove the need for backslashes in syscalls.master

Syscall declerations are now split across multiple lines with the
syscall name and variables each on seperate lines (with an exception for
syscalls taking no arguments.)

Reviewed by:	imp
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17706
2018-10-31 16:17:45 +00:00
Andrew Turner
d9961ef478 Use pmap_invalidate_all rather than invalidating 512 level 2 entries in
the early pmap_mapbios/unmapbios code. It is even worse when there are
multiple L2 entries to handle as we would need to iterate over all pages.

Sponsored by:	DARPA, AFRL
2018-10-31 12:00:35 +00:00
Andrew Turner
be7018ffd4 Remove function prototypes for functions removed in r339943.
Sponsored by:	DARPA, AFRL
2018-10-31 10:30:19 +00:00
Andrew Turner
447cfc23bd Fix some style(9) issues in the arm64 pmap_mapbios/unmapbios. Split lines
when they are too long.

Sponsored by:	DARPA, AFRL
2018-10-31 09:39:38 +00:00
Andrew Turner
63bf2d735c Remove the unused arm64_cpu driver.
This was previously used for CPU initilisation, however this hasn't been
the case in a long time.

Sponsored by:	DARPA, AFRL
2018-10-31 09:25:17 +00:00
Marcelo Araujo
ec9e3fb095 Merge cases with upper block.
This is a cosmetic change only to simplify code.

Reported by:	anish
Sponsored by:	iXsystems Inc.
2018-10-31 01:27:44 +00:00
Mark Johnston
5d277a85ad Revert r336984.
It appears to be responsible for random segfaults observed when lots
of paging activity is taking place, but the root cause is not yet
understood.

Requested by:	alc
MFC after:	now
2018-10-30 22:40:40 +00:00
Bjoern A. Zeeb
9afc56849a Fix mips build after r339931.
I erroneously thought that it was two 64bit platforms which use link_elf_obj.c.

PR:		228854
Reported by:	ci.f.o.
MFC after:	3 days
X-MFC with:	r339931
Pointyhat to:	bz
2018-10-30 21:35:56 +00:00
Bjoern A. Zeeb
0f823b6497 As a follow-up to r339930 and various reports implement logging in case
we fail during module load because the pcpu or vnet module sections are
full.  We did return a proper error but not leaving any indication to
the user as to what the actual problem was.

Even worse, on 12/13 currently we are seeing an unrelated error (ENOSYS
instead of ENOSPC, which gets skipped over in kern_linker.c) to be
printed which made problem diagnostics even harder.

PR:		228854
MFC after:	3 days
2018-10-30 20:51:03 +00:00
Bjoern A. Zeeb
c955c6cd08 With more excessive use of modules, more kernel parts working with
VIMAGE, and feature richness and global state increasing the 8k of
vnet module space are no longer sufficient for people and loading
multiple modules, e.g., pf(4) and ipl(4) or ipsec(4) will fail on
the second module.

Increase the module space to 8 * PAGE_SIZE which should be enough
to hold multiple firewalls, ipsec, multicast (as in the old days was
a problem), epair, carp, and any kind of other vnet enabled modules.

Sadly this is a global byte array part of the vnet_set, so we cannot
dynamically change its size;  otherwise a TUNABLE would have been
a better solution.

PR:			228854
Reported by:		Ernie Luzar, Marek Zarychta
Discussed with:		rgrimes on current
MFC after:		3 days
2018-10-30 20:45:15 +00:00
Bjoern A. Zeeb
201100c58b Initial implementation of draft-ietf-6man-ipv6only-flag.
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.

If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.

The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.

Further changes to tcpdump (contrib code) are availble and will
be upstreamed.

Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).

We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set.  Also we might want to start
IPv6 before IPv4 in the future.

All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.

Dear 6man, you have running code.

Discussed with:	Bob Hinden, Brian E Carpenter
2018-10-30 20:08:48 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
John Baldwin
58b6812de1 Only invoke 'ls' if the local modules directory exists.
This avoids a spurious make warning if /usr/local/sys/modules doesn't
exist.

Submitted by:	rgrimes
Reported by:	markj
2018-10-30 18:20:34 +00:00
Mark Johnston
920239efde Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
  first allocation attempt, not after.  Fix this by switching
  uma_prealloc() to use a vm_domainset iterator, which addresses the
  secondary issue of using a signed domain identifier in round-robin
  iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
  excluding empty domains; otherwise we may frequently specify an empty
  domain when calling in to the page allocator, wasting CPU time.
  Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
  total page counts for now: some vm_phys segments are created before
  the SRAT is parsed and are thus always identified as being in domain 0
  even when they are not.  Then, when bootstrap pages are freed, they
  are added to a domain that we had previously thought was empty.  Until
  this is corrected, we simply exclude them from the per-domain page
  count.

Reported and tested by:	Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17704
2018-10-30 17:57:40 +00:00
Hans Petter Selasky
e35079db73 Implement the dump_stack() function in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:42:56 +00:00
Hans Petter Selasky
bf05cd05ac Implement __KERNEL_DIV_ROUND_UP() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:32:52 +00:00
Bjoern A. Zeeb
43f75d57a2 Introduce an EXPERIMENTAL option for both src.conf(5) and the kernel.
In the last decade(s) we have seen both short term or long term projects
committed to the tree which were considered or even marked "experimental".
While out-of-tree development has become easier than it used to be in
CVS times, there still is a need to have the code shipping with HEAD but
not enabled by default.

While people may think about VIMAGE as one of the recent larger, long term
projects, early protocol implementations (before they are standardised)
are others.  (Free)BSD historically was one of the operating systems
which would have running code at early stages and help develop and
influence standardisation and the industry.

Give developers an opportunity to be more pro-active for early adoption
or running large scale code changes stumbling over each others but not
the user's feet.  I have not added the option to NOTES in order to avoid
breaking supported option builds, which require constant compile testing.

Discussed with:	people in the corridor
2018-10-30 15:46:30 +00:00
Eric van Gyzen
fcbb889fdb Always stop the scheduler when entering kdb
Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.

Reviewed by:	kib, cem
Discussed with:	jhb
Tested by:	pho
MFC after:	2 months
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17687
2018-10-30 14:54:15 +00:00
Michael Tuexen
d59a162c11 Bump the number of fans supported from 8 to 12.
The number of fans on a PowerMac7,3 with liquid cooling is 9.

Reviewed by:		andreast@
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D17754
2018-10-30 11:51:09 +00:00
Marcelo Araujo
5bae7542d4 Emulate machine check related MSR_EXTFEATURES to allow guest OSes to
boot on AMD FX Series.

PR:		224476
Submitted by:	Keita Uchida <m@jgz.jp>
Reviewed by:	rgrimes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D17713
2018-10-30 10:02:23 +00:00
Marcelo Araujo
fbd8c33022 Allow changing lagg(4) MTU.
Previously, changing the MTU would require destroying the lagg and
creating a new one. Now it is allowed to change the MTU of
the lagg interface and the MTU of the ports will be set to match.

If any port cannot set the new MTU, all ports are reverted to the original
MTU of the lagg. Additionally, when adding ports, the MTU of a port will be
automatically set to the MTU of the lagg. As always, the MTU of the lagg is
initially determined by the MTU of the first port added. If adding an
interface as a port for some reason fails, that interface is reverted to its
original MTU.

Submitted by:	Ryan Moeller <ryan@freqlabs.com>
Reviewed by:	mav
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D17576
2018-10-30 09:53:57 +00:00
Justin Hibbits
a37c714a0f powerpc/mpc85xx: Reset the PCIe bus on attach
It seems if a Radeon card is already initialized by u-boot, it won't be
reinitialized by the kernel, and the DRM module will fail to attach.  This
steals the reset code from mips/octopci.c to blindly reset the bus on attach.
This was tested on a AmigaOne X5000/20, such that it can be booted from the
local video console, and get a video console in FreeBSD.
2018-10-30 00:47:40 +00:00
John Baldwin
cd785c1b34 Permit local kernel modules to be built as part of a kernel build.
Add support for "local" modules.  By default, these modules are
located in LOCALBASE/sys/modules (where LOCALBASE defaults to
/usr/local).  Individual modules can be built along with a kernel by
defining LOCAL_MODULES to the list of modules.  Each is assumed to be
a subdirectory containing a valid Makefile.  If LOCAL_MODULES is not
specified, all of the modules present in LOCALBASE/sys/modules are
built and installed along with the kernel.

This means that a port that installs a kernel module can choose to
install its source along with a suitable Makefile to
/usr/local/sys/modules/<foo>.  Future kernel builds will then include
that kernel module using the kernel configuration's opt_*.h headers
and install it into /boot/kernel along with other kernel-specific
modules.

This is not trying to solve the issue of folks running GENERIC release
kernels, but is instead aimed at folks who build their own kernels.
For those folks this ensures that kernel modules from ports will
always be using the right KBI, etc.  This includes folks running any
KBI-breaking kernel configs (such as PAE).

There are still some kinks to be worked out with cross-building (we
probably shouldn't include local modules in cross-built kernels by
default), but this is a sufficient starting point.

Reviewed by:	imp
MFC after:	3 months
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D16966
2018-10-30 00:23:37 +00:00
Mark Johnston
25c9cca757 Have gconcat advertise delete support if one of its disks does.
This follows the example set by other multi-disk GEOM classes.

PR:		232676
Tested by:	noah.bergbauer@tum.de
MFC after:	1 month
2018-10-30 00:22:14 +00:00
John Baldwin
68d0cda661 Make battery emptying rate available as sysctl variable.
Curiously, the in-kernel routines always use the design voltage to
convert from mA to mW, but acpiconf in userland uses the current
voltage.  As a result, this can report a different mW rate than
acpiconf.

Submitted by:	Manuel Stühn <freebsdnewbie@freenet.de>
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D17077
2018-10-30 00:19:44 +00:00
Konstantin Belousov
9775a6ebd2 amd64: Use ifuncs to select suitable implementation of set_pcb_flags().
There is no reason to check for PCB_FULL_IRET if FSGSBASE instructions
are not supported.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-10-29 23:52:31 +00:00
Konstantin Belousov
93177620ee Style.
Wrap long lines, use +4 spaces for continuation indent.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-10-29 23:45:17 +00:00
Konstantin Belousov
6acf1b203f Clarify explanation of VFCF_SBDRY.
Requested by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-10-29 23:43:17 +00:00
Navdeep Parhar
f01fc2d0e8 cxgbe/iw_cxgbe: Install the socket upcall before calling soconnect to
ensure that it always runs when soisconnected does.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-10-29 22:35:46 +00:00
John Baldwin
567a3784c2 Add support for "plain" (non-HMAC) SHA digests.
MFC after:	2 months
Sponsored by:	Chelsio Communications
2018-10-29 22:24:31 +00:00
Mark Johnston
da7d7778b0 Expose some netdump configuration parameters through sysctl.
Reviewed by:	cem
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17755
2018-10-29 21:16:26 +00:00
Hans Petter Selasky
8ad9551d36 Implement dma_pool_zalloc() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-29 19:02:36 +00:00
Stephen Hurd
5201e0f110 Drain grouptaskqueue of the gtask before detaching it.
taskqgroup_detach() would remove the task even if it was running or
enqueued, which could lead to panics (see D17404). With this change,
taskqgroup_detach() drains the task and sets a new flag which prevents the
task from being scheduled again.

I've added grouptask_block() and grouptask_unblock() to allow control
over the flag from other locations as well.

Reviewed by:	Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17674
2018-10-29 14:36:03 +00:00
Kristof Provost
99eb00558a pf: Make ':0' ignore link-local v6 addresses too
When users mark an interface to not use aliases they likely also don't
want to use the link-local v6 address there.

PR:		201695
Submitted by:	Russell Yount <Russell.Yount AT gmail.com>
Differential Revision:	https://reviews.freebsd.org/D17633
2018-10-28 05:32:50 +00:00
Yuri Pankov
8d56c80545 Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3").  Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).

Reviewed by:	jhb, rgrimes, 0mp
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D17531
2018-10-27 21:24:28 +00:00
Vladimir Kondratyev
5ef2488947 evdev: disable evdev if it is invoked from KDB or panic context
This allow to prevent deadlock on entering KDB if one of evdev locks is
already taken by userspace process.

Also this change discards all but LED console events produced by KDB as
unrelated to userspace.

Tested by:	dumbbell (as part of D15070)
Objected by:	bde (as 'KDB lock an already locked mutex' problem solution)
MFC after:	1 month
2018-10-27 21:04:34 +00:00
Vladimir Kondratyev
f86e7267f5 evdev: Use console lock as evdev lock for all supported keyboard drivers.
Now evdev part of keyboard drivers does not take any locks if corresponding
input/eventN device node is not opened by userland consumers.

Do not assert console lock inside evdev to handle the cases when keyboard
driver is called from some special single-threaded context like shutdown
thread.
2018-10-27 20:22:41 +00:00
Mark Johnston
4aed5937db Use M_WAITOK in init_hwpmc().
No functional change intended.

MFC after:	2 weeks
2018-10-27 18:48:49 +00:00
Alan Cox
9f1abe3df4 Eliminate typically pointless calls to vm_fault_prefault() on soft, copy-
on-write faults.  On a page fault, when we call vm_fault_prefault(), it
probes the pmap and the shadow chain of vm objects to see if there are
opportunities to create read and/or execute-only mappings to neighoring
pages.  For example, in the case of hard faults, such effort typically pays
off, that is, mappings are created that eliminate future soft page faults.
However, in the the case of soft, copy-on-write faults, the effort very
rarely pays off.  (See the review for some specific data.)

Reviewed by:	kib, markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D17367
2018-10-27 17:49:46 +00:00
Eugene Grosbein
6d305ab0b2 Extend stripeoffset and stripesize of GEOMs from u_int to off_t
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.

This change does not affect ABI but changes KBI. No MFC planned.

Differential Revision:	https://reviews.freebsd.org/D13426
2018-10-27 16:14:42 +00:00
Conrad Meyer
37136b849f random(4): Match enabled sources mask to build options
r287023 and r334450 added build option mechanisms to permanently disable
spammy and/or low quality entropy sources.

Follow-up those changes by updating the 'enabled' sources mask to match.
When sources are compile-time disabled, represent them as disabled in the
source mask, and prevent users from modifying that, like pure sources.
(Modifying the mask bit would have no effect, but users might think it did
if it was not prevented.)

Mostly a cosmetic change.

Reviewed by:	markm
Approved by:	secteam (gordon)
X-MFC-With:	334450
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17252
2018-10-27 15:09:35 +00:00
Eugene Grosbein
5310c19174 ipfw: implement ngtee/netgraph actions for layer-2 frames.
Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".

Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:

ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0

PR:		213452
MFC after:	1 month
Tested-by:	Fyodor Ustinov <ufm@ufm.su>
2018-10-27 07:32:26 +00:00
Eugene Grosbein
1a5995cc88 Prevent ip_input() from panicing due to unprotected access to INADDR_HASH.
PR:			220078
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D12457
Tested-by:		Cassiano Peixoto and others
2018-10-27 04:59:35 +00:00
Eugene Grosbein
4f1e3122ac Prevent multicast code from panicing due to unprotected access to INADDR_HASH.
PR:			220078
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D12457
Tested-by:		Cassiano Peixoto and others
2018-10-27 04:53:25 +00:00
Eugene Grosbein
232485a17e Prevent stf(4) from panicing due to unprotected access to INADDR_HASH.
PR:			220078
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D12457
Tested-by:		Cassiano Peixoto and others
2018-10-27 04:45:28 +00:00
Xin LI
0db665bb98 Restore backward compatibility for "attach" verb.
In r332361 and r333439, two new parameters were added to geli attach
verb using gctl_get_paraml, which requires the value to be present.
This would prevent old geli(8) binary from attaching geli(4) device
as they have no knowledge about the new parameters.

Restore backward compatibility by treating the absense of these two
values as seeing the default value supplied by userland.

PR:		232595
Reviewed by:	oshogbo
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17680
2018-10-27 03:37:14 +00:00
Michael Tuexen
de00ad05e6 Add initial descriptions for SCTP related MIB variable.
This work was mostly done by Marie-Helene Kvello-Aune.

MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D3583
2018-10-26 21:04:17 +00:00
Conrad Meyer
9b8d0fe462 Fortuna: Add failpoints to simulate initial seeding conditions
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of).  See the Differential URL for more detail.

The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).

No functional change when failpoints are not set.

[1]: https://lists.freebsd.org/pipermail/freebsd-current/2018-September/071067.html

Reported by:	lev
Reviewed by:	delphij, markm
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17047
2018-10-26 21:03:57 +00:00
Conrad Meyer
7be4093a84 fortuna: Drop global lock to zero stack variables
Also drop explicit zeroing of hash context -- hash finish() operation is
expected to do this.

PR:		230877
Suggested by:	delphij@
Reviewed by:	delphij, markm
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16986
2018-10-26 21:00:26 +00:00
Conrad Meyer
9a88479843 Fortuna: fix a correctness issue in reseed (fortuna_pre_read)
'i' counts the number of pools included in the array 's'.  Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.

I found this via code inspection, referencing §9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.

Reviewed by:	markm
Approved by:	secteam (gordon)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16985
2018-10-26 20:55:01 +00:00
Conrad Meyer
070249043e rijndael (AES): Avoid leaking sensitive data on kernel stack
Noticed this investigating Fortuna.  Remove useless duplicate stack copies
of sensitive contents when possible, or if not possible, be sure to zero
them out when we're finished.

Approved by:	secteam (gordon)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16935
2018-10-26 20:53:01 +00:00
Conrad Meyer
2384981b61 poll: Unify userspace pollfd pointer name
Some of the poll code used 'fds' and some used 'ufds' to refer to the
uap->fds userspace pointer that was passed around to subroutines.  Some of
the poll code used 'fds' to refer to the kernel memory pollfd arrays, which
seemed unnecessarily confusing.

Unify on 'ufds' to refer to the userspace pollfd array.

Additionally, 'bits' is not an accurate description of the kernel pollfd
array in kern_poll, so rename that to 'kfds'.  Finally, clean up some logic
with mallocarray() and nitems().

No functional change.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D17670
2018-10-26 20:07:46 +00:00
Brooks Davis
ed34a7fcf2 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17475
2018-10-26 17:59:25 +00:00
Warner Losh
ea657f2c76 Add statistics for TRIM comands
Add a counter for the LBAs, Ranges and hardware commands so that we
can provide additional color to the statistics we provide to vendors.

Sponsored by: Netflix, Inc
2018-10-26 16:23:51 +00:00
Warner Losh
24b6d87155 Redo r339563: Remove joy(4) driver.
This driver was marked as gone in 12. We're at 13 now. Remove it.
Data from nycbug's dmesg cache shows only one potential user,
suggesting it never was used much. However, even though this device
has been obsolete for 15 years at least, sys/joystick.h is included in
a number of graphics packages still, so that remains. A full exprun
is needed before that can be removed.

RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D17629
2018-10-26 16:03:30 +00:00
Warner Losh
09efa3dfb2 Put a workaround in for command timeout malfunctioning
At least one NVMe drive has a bug that makeing the Command Time Out
PCIe feature unreliable. The workaround is to disable this
feature. The driver wouldn't deal correctly with a timeout anyway.
Only do this for drives that are known bad.

Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D17708
2018-10-26 14:27:37 +00:00
Ruslan Bukin
b7b391934d o Add pmap lock around pmap_fault_fixup() to ensure other thread will not
modify l3 pte after we loaded old value and before we stored new value.
o Preset A(accessed), D(dirty) bits for kernel mappings.

Reported by:	kib
Reviewed by:	markj
Discussed with:	jhb
Sponsored by:	DARPA, AFRL
2018-10-26 12:27:07 +00:00