Commit Graph

124484 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
e2c532f156 carpstats are the last virtualised variable in the file and end up at the
end of the vnet_set.  The generated code uses an absolute relocation at
one byte beyond the end of the carpstats array.  This means the relocation
for the vnet does not happen for carpstats initialisation and as a result
the kernel panics on module load.

This problem has only been observed with carp and only on i386.
We considered various possible solutions including using linker scripts
to add padding to all kernel modules for pcpu and vnet sections.

While the symbols (by chance) stay in the order of appearance in the file
adding an unused non-file-local variable at the end of the file will extend
the size of set_vnet and hence make the absolute relocation for carpstats
work (think of this as a single-module set_vnet padding).

This is a (tmporary) hack.  It is the least intrusive one as we need a
timely solution for the upcoming release.  We will revisit the problem in
HEAD.  For a lot more information and the possible alternate solutions
please see the PR and the references therein.

PR:			230857
MFC after:		3 days
2018-11-01 17:26:18 +00:00
Andrew Turner
b4b90c1f4c Add the ARMv8.3 HCR_EL2 register fields.
MFC after:	1 month
Sponsored by:	DARPA, AFRL
2018-11-01 17:05:10 +00:00
Mark Johnston
d9ff5789be Remove redundant checks for a NULL lbgroup table.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17108
2018-11-01 15:52:49 +00:00
Mark Johnston
79ee680b65 Improve style in in_pcbinslbgrouphash() and related subroutines.
No functional change intended.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17107
2018-11-01 15:51:49 +00:00
Ben Widawsky
3d40cdf014 linuxkpi: Add GFP flags needed for ttm drivers
Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Requested by:	bwidawsk
MFC after:	3 days
Approved by:	emaste (mentor)
2018-11-01 15:30:01 +00:00
Rick Macklem
881a9516a2 Fix NFS client vnode locking to avoid a crash during forced dismount.
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.

Tested by:	rlibby
PR:		232673
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17757
2018-11-01 15:27:22 +00:00
Michael Tuexen
6999f6975c Remove debug code which slipped in accidently.
MFC after:		4 weeks
X-MFC with:		r339989
Sponsored by:		Netflix, Inc.
2018-11-01 11:41:40 +00:00
Michael Tuexen
099ab39f44 Improve a comment to refer to the actual sections in the TCP
specification for the comparisons made.
Thanks to lstewart@ for the suggestion.

MFC after:		4 weeks
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17595
2018-11-01 11:35:28 +00:00
Andrew Turner
e403f9865d Use the correct offsets for the trap frame in fork_trampoline.
Sponsored by:	DARPA, AFRL
2018-11-01 10:25:22 +00:00
Konstantin Belousov
6bc6a54280 Add pci_early function to detect Intel stolen memory.
On some Intel devices BIOS does not properly reserve memory (called
"stolen memory") for the GPU.  If the stolen memory is claimed by the
OS, functions that depend on stolen memory (like frame buffer
compression) can't be used.

A function called pci_early_quirks that is called before the virtual
memory system is started was added. In Linux, this PCI early quirks
function iterates through all PCI slots to check for any device that
require quirks.  While this more generic solution is preferable I only
ported the Intel graphics specific parts because I think my
implementation would be too similar to Linux GPL'd solution after
looking at the Linux code too much.

The code regarding Intel graphics stolen memory was ported from
Linux. In the case of Intel graphics stolen memory this
pci_early_quirks will read the stolen memory base and size from north
bridge registers.  The values are stored in global variables that is
later read by linuxkpi_gplv2. Linuxkpi stores these values in a
Linux-specific structure that is read by the drm driver.

Relevant linuxkpi code is here:
https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.16/linuxkpi/gplv2/src/linux_compat.c#L37

For now, only amd64 arch is suppor ted since that is the only arch
supported by the new drm drivers. I was told that Intel GPUs are
always located on 0:2:0 so these values are hard coded for now.

Note that the structure and early execution of the detection code is
not required in its current form, but we expect that the code will be
added shortly which fixes the potential BIOS bugs by reserving the
stolen range in phys_avail[].  This must be done as early as possible
to avoid conflicts with the potential usage of the memory in kernel.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Reviewed by:	bwidawsk, imp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16719
Differential revision:	https://reviews.freebsd.org/D17775
2018-10-31 23:17:00 +00:00
Kyle Evans
90ba2725c1 i386/MINIMAL: VERBOSE_SYSINIT=0 for consistency
MFC after:	never
2018-10-31 22:55:43 +00:00
Kyle Evans
be352d20d5 Compile in VERBOSE_SYSINIT support by default, remain silent by default
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.

MFC after:	never
2018-10-31 22:38:19 +00:00
Gleb Smirnoff
ee7e6f676e Define QMD_SAVELINK() only for QUEUE_MACRO_DEBUG_TRASH case. Otherwise
with QUEUE_MACRO_DEBUG_TRACE compilation fails due to unused variable.
2018-10-31 19:37:11 +00:00
Navdeep Parhar
6933902df4 cxgbe(4): Add rate limiting support for UDP.
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-10-31 19:19:13 +00:00
Navdeep Parhar
1272051e82 cxgbe(4): Report a reasonable non-zero if_hw_tsomaxsegsize to the
kernel.

This reverts an accidental change that snuck in with r339628.

Sponsored by:	Chelsio Communications
2018-10-31 18:30:17 +00:00
Andrew Turner
8c0e047668 Always set the MP_QUIRK_CPULIST quirk under ACPI. This needs a run time
check to only set it for emulators as the CPU list may be changed when
the emulator starts. Until this is working just always set it.

Sponsored by:	DARPA, AFRL
2018-10-31 17:41:53 +00:00
Brooks Davis
e3e5481326 Reformat syscalls.master for better readability.
This takes advantage of two recents changes to makesyscalls.sh:
r328598: Permit a range of syscall numbers for UNIMPL
r339624: Remove the need for backslashes in syscalls.master

Syscall declerations are now split across multiple lines with the
syscall name and variables each on seperate lines (with an exception for
syscalls taking no arguments.)

Reviewed by:	imp
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17706
2018-10-31 16:17:45 +00:00
Andrew Turner
d9961ef478 Use pmap_invalidate_all rather than invalidating 512 level 2 entries in
the early pmap_mapbios/unmapbios code. It is even worse when there are
multiple L2 entries to handle as we would need to iterate over all pages.

Sponsored by:	DARPA, AFRL
2018-10-31 12:00:35 +00:00
Andrew Turner
be7018ffd4 Remove function prototypes for functions removed in r339943.
Sponsored by:	DARPA, AFRL
2018-10-31 10:30:19 +00:00
Andrew Turner
447cfc23bd Fix some style(9) issues in the arm64 pmap_mapbios/unmapbios. Split lines
when they are too long.

Sponsored by:	DARPA, AFRL
2018-10-31 09:39:38 +00:00
Andrew Turner
63bf2d735c Remove the unused arm64_cpu driver.
This was previously used for CPU initilisation, however this hasn't been
the case in a long time.

Sponsored by:	DARPA, AFRL
2018-10-31 09:25:17 +00:00
Marcelo Araujo
ec9e3fb095 Merge cases with upper block.
This is a cosmetic change only to simplify code.

Reported by:	anish
Sponsored by:	iXsystems Inc.
2018-10-31 01:27:44 +00:00
Mark Johnston
5d277a85ad Revert r336984.
It appears to be responsible for random segfaults observed when lots
of paging activity is taking place, but the root cause is not yet
understood.

Requested by:	alc
MFC after:	now
2018-10-30 22:40:40 +00:00
Bjoern A. Zeeb
9afc56849a Fix mips build after r339931.
I erroneously thought that it was two 64bit platforms which use link_elf_obj.c.

PR:		228854
Reported by:	ci.f.o.
MFC after:	3 days
X-MFC with:	r339931
Pointyhat to:	bz
2018-10-30 21:35:56 +00:00
Bjoern A. Zeeb
0f823b6497 As a follow-up to r339930 and various reports implement logging in case
we fail during module load because the pcpu or vnet module sections are
full.  We did return a proper error but not leaving any indication to
the user as to what the actual problem was.

Even worse, on 12/13 currently we are seeing an unrelated error (ENOSYS
instead of ENOSPC, which gets skipped over in kern_linker.c) to be
printed which made problem diagnostics even harder.

PR:		228854
MFC after:	3 days
2018-10-30 20:51:03 +00:00
Bjoern A. Zeeb
c955c6cd08 With more excessive use of modules, more kernel parts working with
VIMAGE, and feature richness and global state increasing the 8k of
vnet module space are no longer sufficient for people and loading
multiple modules, e.g., pf(4) and ipl(4) or ipsec(4) will fail on
the second module.

Increase the module space to 8 * PAGE_SIZE which should be enough
to hold multiple firewalls, ipsec, multicast (as in the old days was
a problem), epair, carp, and any kind of other vnet enabled modules.

Sadly this is a global byte array part of the vnet_set, so we cannot
dynamically change its size;  otherwise a TUNABLE would have been
a better solution.

PR:			228854
Reported by:		Ernie Luzar, Marek Zarychta
Discussed with:		rgrimes on current
MFC after:		3 days
2018-10-30 20:45:15 +00:00
Bjoern A. Zeeb
201100c58b Initial implementation of draft-ietf-6man-ipv6only-flag.
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.

If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.

The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.

Further changes to tcpdump (contrib code) are availble and will
be upstreamed.

Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).

We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set.  Also we might want to start
IPv6 before IPv4 in the future.

All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.

Dear 6man, you have running code.

Discussed with:	Bob Hinden, Brian E Carpenter
2018-10-30 20:08:48 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
John Baldwin
58b6812de1 Only invoke 'ls' if the local modules directory exists.
This avoids a spurious make warning if /usr/local/sys/modules doesn't
exist.

Submitted by:	rgrimes
Reported by:	markj
2018-10-30 18:20:34 +00:00
Mark Johnston
920239efde Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
  first allocation attempt, not after.  Fix this by switching
  uma_prealloc() to use a vm_domainset iterator, which addresses the
  secondary issue of using a signed domain identifier in round-robin
  iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
  excluding empty domains; otherwise we may frequently specify an empty
  domain when calling in to the page allocator, wasting CPU time.
  Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
  total page counts for now: some vm_phys segments are created before
  the SRAT is parsed and are thus always identified as being in domain 0
  even when they are not.  Then, when bootstrap pages are freed, they
  are added to a domain that we had previously thought was empty.  Until
  this is corrected, we simply exclude them from the per-domain page
  count.

Reported and tested by:	Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17704
2018-10-30 17:57:40 +00:00
Hans Petter Selasky
e35079db73 Implement the dump_stack() function in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:42:56 +00:00
Hans Petter Selasky
bf05cd05ac Implement __KERNEL_DIV_ROUND_UP() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-30 16:32:52 +00:00
Bjoern A. Zeeb
43f75d57a2 Introduce an EXPERIMENTAL option for both src.conf(5) and the kernel.
In the last decade(s) we have seen both short term or long term projects
committed to the tree which were considered or even marked "experimental".
While out-of-tree development has become easier than it used to be in
CVS times, there still is a need to have the code shipping with HEAD but
not enabled by default.

While people may think about VIMAGE as one of the recent larger, long term
projects, early protocol implementations (before they are standardised)
are others.  (Free)BSD historically was one of the operating systems
which would have running code at early stages and help develop and
influence standardisation and the industry.

Give developers an opportunity to be more pro-active for early adoption
or running large scale code changes stumbling over each others but not
the user's feet.  I have not added the option to NOTES in order to avoid
breaking supported option builds, which require constant compile testing.

Discussed with:	people in the corridor
2018-10-30 15:46:30 +00:00
Eric van Gyzen
fcbb889fdb Always stop the scheduler when entering kdb
Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.

Reviewed by:	kib, cem
Discussed with:	jhb
Tested by:	pho
MFC after:	2 months
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17687
2018-10-30 14:54:15 +00:00
Michael Tuexen
d59a162c11 Bump the number of fans supported from 8 to 12.
The number of fans on a PowerMac7,3 with liquid cooling is 9.

Reviewed by:		andreast@
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D17754
2018-10-30 11:51:09 +00:00
Marcelo Araujo
5bae7542d4 Emulate machine check related MSR_EXTFEATURES to allow guest OSes to
boot on AMD FX Series.

PR:		224476
Submitted by:	Keita Uchida <m@jgz.jp>
Reviewed by:	rgrimes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D17713
2018-10-30 10:02:23 +00:00
Marcelo Araujo
fbd8c33022 Allow changing lagg(4) MTU.
Previously, changing the MTU would require destroying the lagg and
creating a new one. Now it is allowed to change the MTU of
the lagg interface and the MTU of the ports will be set to match.

If any port cannot set the new MTU, all ports are reverted to the original
MTU of the lagg. Additionally, when adding ports, the MTU of a port will be
automatically set to the MTU of the lagg. As always, the MTU of the lagg is
initially determined by the MTU of the first port added. If adding an
interface as a port for some reason fails, that interface is reverted to its
original MTU.

Submitted by:	Ryan Moeller <ryan@freqlabs.com>
Reviewed by:	mav
Relnotes:	Yes
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D17576
2018-10-30 09:53:57 +00:00
Justin Hibbits
a37c714a0f powerpc/mpc85xx: Reset the PCIe bus on attach
It seems if a Radeon card is already initialized by u-boot, it won't be
reinitialized by the kernel, and the DRM module will fail to attach.  This
steals the reset code from mips/octopci.c to blindly reset the bus on attach.
This was tested on a AmigaOne X5000/20, such that it can be booted from the
local video console, and get a video console in FreeBSD.
2018-10-30 00:47:40 +00:00
John Baldwin
cd785c1b34 Permit local kernel modules to be built as part of a kernel build.
Add support for "local" modules.  By default, these modules are
located in LOCALBASE/sys/modules (where LOCALBASE defaults to
/usr/local).  Individual modules can be built along with a kernel by
defining LOCAL_MODULES to the list of modules.  Each is assumed to be
a subdirectory containing a valid Makefile.  If LOCAL_MODULES is not
specified, all of the modules present in LOCALBASE/sys/modules are
built and installed along with the kernel.

This means that a port that installs a kernel module can choose to
install its source along with a suitable Makefile to
/usr/local/sys/modules/<foo>.  Future kernel builds will then include
that kernel module using the kernel configuration's opt_*.h headers
and install it into /boot/kernel along with other kernel-specific
modules.

This is not trying to solve the issue of folks running GENERIC release
kernels, but is instead aimed at folks who build their own kernels.
For those folks this ensures that kernel modules from ports will
always be using the right KBI, etc.  This includes folks running any
KBI-breaking kernel configs (such as PAE).

There are still some kinks to be worked out with cross-building (we
probably shouldn't include local modules in cross-built kernels by
default), but this is a sufficient starting point.

Reviewed by:	imp
MFC after:	3 months
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D16966
2018-10-30 00:23:37 +00:00
Mark Johnston
25c9cca757 Have gconcat advertise delete support if one of its disks does.
This follows the example set by other multi-disk GEOM classes.

PR:		232676
Tested by:	noah.bergbauer@tum.de
MFC after:	1 month
2018-10-30 00:22:14 +00:00
John Baldwin
68d0cda661 Make battery emptying rate available as sysctl variable.
Curiously, the in-kernel routines always use the design voltage to
convert from mA to mW, but acpiconf in userland uses the current
voltage.  As a result, this can report a different mW rate than
acpiconf.

Submitted by:	Manuel Stühn <freebsdnewbie@freenet.de>
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D17077
2018-10-30 00:19:44 +00:00
Konstantin Belousov
9775a6ebd2 amd64: Use ifuncs to select suitable implementation of set_pcb_flags().
There is no reason to check for PCB_FULL_IRET if FSGSBASE instructions
are not supported.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-10-29 23:52:31 +00:00
Konstantin Belousov
93177620ee Style.
Wrap long lines, use +4 spaces for continuation indent.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-10-29 23:45:17 +00:00
Konstantin Belousov
6acf1b203f Clarify explanation of VFCF_SBDRY.
Requested by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2018-10-29 23:43:17 +00:00
Navdeep Parhar
f01fc2d0e8 cxgbe/iw_cxgbe: Install the socket upcall before calling soconnect to
ensure that it always runs when soisconnected does.

Submitted by:	Krishnamraju Eraparaju @ Chelsio
MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-10-29 22:35:46 +00:00
John Baldwin
567a3784c2 Add support for "plain" (non-HMAC) SHA digests.
MFC after:	2 months
Sponsored by:	Chelsio Communications
2018-10-29 22:24:31 +00:00
Mark Johnston
da7d7778b0 Expose some netdump configuration parameters through sysctl.
Reviewed by:	cem
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17755
2018-10-29 21:16:26 +00:00
Hans Petter Selasky
8ad9551d36 Implement dma_pool_zalloc() in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-29 19:02:36 +00:00
Stephen Hurd
5201e0f110 Drain grouptaskqueue of the gtask before detaching it.
taskqgroup_detach() would remove the task even if it was running or
enqueued, which could lead to panics (see D17404). With this change,
taskqgroup_detach() drains the task and sets a new flag which prevents the
task from being scheduled again.

I've added grouptask_block() and grouptask_unblock() to allow control
over the flag from other locations as well.

Reviewed by:	Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17674
2018-10-29 14:36:03 +00:00
Kristof Provost
99eb00558a pf: Make ':0' ignore link-local v6 addresses too
When users mark an interface to not use aliases they likely also don't
want to use the link-local v6 address there.

PR:		201695
Submitted by:	Russell Yount <Russell.Yount AT gmail.com>
Differential Revision:	https://reviews.freebsd.org/D17633
2018-10-28 05:32:50 +00:00
Yuri Pankov
8d56c80545 Provide basic descriptions for VMX exit reason (from "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3").  Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).

Reviewed by:	jhb, rgrimes, 0mp
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D17531
2018-10-27 21:24:28 +00:00
Vladimir Kondratyev
5ef2488947 evdev: disable evdev if it is invoked from KDB or panic context
This allow to prevent deadlock on entering KDB if one of evdev locks is
already taken by userspace process.

Also this change discards all but LED console events produced by KDB as
unrelated to userspace.

Tested by:	dumbbell (as part of D15070)
Objected by:	bde (as 'KDB lock an already locked mutex' problem solution)
MFC after:	1 month
2018-10-27 21:04:34 +00:00
Vladimir Kondratyev
f86e7267f5 evdev: Use console lock as evdev lock for all supported keyboard drivers.
Now evdev part of keyboard drivers does not take any locks if corresponding
input/eventN device node is not opened by userland consumers.

Do not assert console lock inside evdev to handle the cases when keyboard
driver is called from some special single-threaded context like shutdown
thread.
2018-10-27 20:22:41 +00:00
Mark Johnston
4aed5937db Use M_WAITOK in init_hwpmc().
No functional change intended.

MFC after:	2 weeks
2018-10-27 18:48:49 +00:00
Alan Cox
9f1abe3df4 Eliminate typically pointless calls to vm_fault_prefault() on soft, copy-
on-write faults.  On a page fault, when we call vm_fault_prefault(), it
probes the pmap and the shadow chain of vm objects to see if there are
opportunities to create read and/or execute-only mappings to neighoring
pages.  For example, in the case of hard faults, such effort typically pays
off, that is, mappings are created that eliminate future soft page faults.
However, in the the case of soft, copy-on-write faults, the effort very
rarely pays off.  (See the review for some specific data.)

Reviewed by:	kib, markj
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D17367
2018-10-27 17:49:46 +00:00
Eugene Grosbein
6d305ab0b2 Extend stripeoffset and stripesize of GEOMs from u_int to off_t
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.

This change does not affect ABI but changes KBI. No MFC planned.

Differential Revision:	https://reviews.freebsd.org/D13426
2018-10-27 16:14:42 +00:00
Conrad Meyer
37136b849f random(4): Match enabled sources mask to build options
r287023 and r334450 added build option mechanisms to permanently disable
spammy and/or low quality entropy sources.

Follow-up those changes by updating the 'enabled' sources mask to match.
When sources are compile-time disabled, represent them as disabled in the
source mask, and prevent users from modifying that, like pure sources.
(Modifying the mask bit would have no effect, but users might think it did
if it was not prevented.)

Mostly a cosmetic change.

Reviewed by:	markm
Approved by:	secteam (gordon)
X-MFC-With:	334450
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17252
2018-10-27 15:09:35 +00:00
Eugene Grosbein
5310c19174 ipfw: implement ngtee/netgraph actions for layer-2 frames.
Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".

Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:

ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0

PR:		213452
MFC after:	1 month
Tested-by:	Fyodor Ustinov <ufm@ufm.su>
2018-10-27 07:32:26 +00:00
Eugene Grosbein
1a5995cc88 Prevent ip_input() from panicing due to unprotected access to INADDR_HASH.
PR:			220078
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D12457
Tested-by:		Cassiano Peixoto and others
2018-10-27 04:59:35 +00:00
Eugene Grosbein
4f1e3122ac Prevent multicast code from panicing due to unprotected access to INADDR_HASH.
PR:			220078
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D12457
Tested-by:		Cassiano Peixoto and others
2018-10-27 04:53:25 +00:00
Eugene Grosbein
232485a17e Prevent stf(4) from panicing due to unprotected access to INADDR_HASH.
PR:			220078
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D12457
Tested-by:		Cassiano Peixoto and others
2018-10-27 04:45:28 +00:00
Xin LI
0db665bb98 Restore backward compatibility for "attach" verb.
In r332361 and r333439, two new parameters were added to geli attach
verb using gctl_get_paraml, which requires the value to be present.
This would prevent old geli(8) binary from attaching geli(4) device
as they have no knowledge about the new parameters.

Restore backward compatibility by treating the absense of these two
values as seeing the default value supplied by userland.

PR:		232595
Reviewed by:	oshogbo
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17680
2018-10-27 03:37:14 +00:00
Michael Tuexen
de00ad05e6 Add initial descriptions for SCTP related MIB variable.
This work was mostly done by Marie-Helene Kvello-Aune.

MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D3583
2018-10-26 21:04:17 +00:00
Conrad Meyer
9b8d0fe462 Fortuna: Add failpoints to simulate initial seeding conditions
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of).  See the Differential URL for more detail.

The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).

No functional change when failpoints are not set.

[1]: https://lists.freebsd.org/pipermail/freebsd-current/2018-September/071067.html

Reported by:	lev
Reviewed by:	delphij, markm
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17047
2018-10-26 21:03:57 +00:00
Conrad Meyer
7be4093a84 fortuna: Drop global lock to zero stack variables
Also drop explicit zeroing of hash context -- hash finish() operation is
expected to do this.

PR:		230877
Suggested by:	delphij@
Reviewed by:	delphij, markm
Approved by:	secteam (delphij)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16986
2018-10-26 21:00:26 +00:00
Conrad Meyer
9a88479843 Fortuna: fix a correctness issue in reseed (fortuna_pre_read)
'i' counts the number of pools included in the array 's'.  Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.

I found this via code inspection, referencing §9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.

Reviewed by:	markm
Approved by:	secteam (gordon)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16985
2018-10-26 20:55:01 +00:00
Conrad Meyer
070249043e rijndael (AES): Avoid leaking sensitive data on kernel stack
Noticed this investigating Fortuna.  Remove useless duplicate stack copies
of sensitive contents when possible, or if not possible, be sure to zero
them out when we're finished.

Approved by:	secteam (gordon)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16935
2018-10-26 20:53:01 +00:00
Conrad Meyer
2384981b61 poll: Unify userspace pollfd pointer name
Some of the poll code used 'fds' and some used 'ufds' to refer to the
uap->fds userspace pointer that was passed around to subroutines.  Some of
the poll code used 'fds' to refer to the kernel memory pollfd arrays, which
seemed unnecessarily confusing.

Unify on 'ufds' to refer to the userspace pollfd array.

Additionally, 'bits' is not an accurate description of the kernel pollfd
array in kern_poll, so rename that to 'kfds'.  Finally, clean up some logic
with mallocarray() and nitems().

No functional change.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D17670
2018-10-26 20:07:46 +00:00
Brooks Davis
ed34a7fcf2 Move 32-bit compat support for FIODGNAME to the right place.
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.

The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.

Unlike r339174 this change supports both places FIODGNAME is handled.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17475
2018-10-26 17:59:25 +00:00
Warner Losh
ea657f2c76 Add statistics for TRIM comands
Add a counter for the LBAs, Ranges and hardware commands so that we
can provide additional color to the statistics we provide to vendors.

Sponsored by: Netflix, Inc
2018-10-26 16:23:51 +00:00
Warner Losh
24b6d87155 Redo r339563: Remove joy(4) driver.
This driver was marked as gone in 12. We're at 13 now. Remove it.
Data from nycbug's dmesg cache shows only one potential user,
suggesting it never was used much. However, even though this device
has been obsolete for 15 years at least, sys/joystick.h is included in
a number of graphics packages still, so that remains. A full exprun
is needed before that can be removed.

RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D17629
2018-10-26 16:03:30 +00:00
Warner Losh
09efa3dfb2 Put a workaround in for command timeout malfunctioning
At least one NVMe drive has a bug that makeing the Command Time Out
PCIe feature unreliable. The workaround is to disable this
feature. The driver wouldn't deal correctly with a timeout anyway.
Only do this for drives that are known bad.

Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D17708
2018-10-26 14:27:37 +00:00
Ruslan Bukin
b7b391934d o Add pmap lock around pmap_fault_fixup() to ensure other thread will not
modify l3 pte after we loaded old value and before we stored new value.
o Preset A(accessed), D(dirty) bits for kernel mappings.

Reported by:	kib
Reviewed by:	markj
Discussed with:	jhb
Sponsored by:	DARPA, AFRL
2018-10-26 12:27:07 +00:00
Warner Losh
cdf0293e18 Remove #warning since it breaks libsysdecode 2018-10-26 04:53:29 +00:00
Warner Losh
9984cc3d25 Bump to 1300002 for sys/joystick.h removal reversion. 2018-10-26 04:13:56 +00:00
Warner Losh
f263471a3a Add warning to sys/joystick.h announcing its planned demise. 2018-10-26 04:11:58 +00:00
Warner Losh
7c320a22df Revert r339563.
I held the mistaken belief this was completely unused. While the
driver is unused and likely not relevant for a long time,
sys/joystick.h lives on in maybe half a dozen ports, even though
hardware to use it hasn't been widely used in maybe 15 years.
2018-10-26 04:10:32 +00:00
Takanori Watanabe
5efca36fbd Distinguish _CID match and _HID match and make lower priority probe
when _CID match.

Reviewed by: jhb, imp
Differential Revision:https://reviews.freebsd.org/D16468
2018-10-26 00:05:46 +00:00
Navdeep Parhar
f02c9e69cb cxgbe(4): Add a knob to split the rx queues for a netmap enabled
interface into two groups.  Filters can be used to match traffic
and distribute it across a group.

hw.cxgbe.nm_split_rss

Sponsored by:	Chelsio Communications
2018-10-25 22:55:18 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Mark Johnston
ad054101eb Remove a dead store.
CID:		1304878
MFC after:	1 week
2018-10-25 17:36:28 +00:00
Warner Losh
bf15eaf244 Update comment for AMI00[12]0 override.
The AML is even stupider than always returning 0. It will only return
non-zero for an OS that reports itself as "Windows 2015", at least
on the Threadripper board's AML that I've examined.

Those AMLs also suggest we may need this quirk for AMI0030 as well.
There may be other cases where we need to override the _STA in a
generic way, so we should consider writing code to do that.
2018-10-25 17:17:11 +00:00
Mark Johnston
970a174f3b Add FALLTHROUGH comments to appease Coverity.
CID:		1017862-1017864, 1017866-1017868
MFC after:	2 weeks
2018-10-25 15:43:21 +00:00
Glen Barber
638e0274e4 Bump __FreeBSD_version following the OpenSSL shared library version
number bump.

Submitted by:	antoine
MFC with:	r339709
Sponsored by:	The FreeBSD Foundation
2018-10-25 15:41:26 +00:00
Mark Johnston
a0a18fd46b Remove a redundant check.
CID:		1042100
MFC after:	2 weeks
2018-10-25 15:40:59 +00:00
Navdeep Parhar
d54dafc600 cxgbe(4): Allow "pass" filters to distribute matching traffic using a
subset of a VI's RSS indirection table.

This makes it possible to make groups out of rx queues and steer
different kinds of traffic to different groups.  For example, an
interface with 8 rx queues could have all non-TCP traffic delivered to
queues 0-3 and all TCP traffic to queues 4-7.

Note that it is already possible for filters to steer traffic to a
particular queue or to distribute it using the full indirection table
(much like normal rx does).

Sponsored by:	Chelsio Communications
2018-10-25 14:37:26 +00:00
Navdeep Parhar
b77aaff9bc cxgbe(4): Update the VI's default queue when netmap is enabled/disabled.
Sponsored by:	Chelsio Communications
2018-10-25 06:24:42 +00:00
Brooks Davis
9083b6057b Deprecate a number of less used 10 and 10/100 Ethernet devices.
The current deprecated list is: ae, bm, cs, de, dme, ed, ep, ex, fe,
pcn, sf, sn, tl, tx, txp, vx, wb, xe

The list as refined as part of FCP-0101. Per the FCP, devices may be
removed from the deprecation list if enough users are found or they are
converted to iflib.

FCP:	https://github.com/freebsd/fcp/blob/master/fcp-0101.md
2018-10-25 04:10:41 +00:00
Navdeep Parhar
b5da13f72d cxgbe(4): new sysctl to display the start of the RSS region for a VI.
dev.<ifname>.<inst>.rss_base

For example:
dev.cc.0.rss_base: 0
dev.cc.1.rss_base: 128
dev.vcc.0.rss_base: 256
dev.vcc.1.rss_base: 384

Sponsored by:	Chelsio Communications
2018-10-25 01:20:32 +00:00
Konstantin Belousov
4fceda6206 Correct condition to detect mount(2) support by a filesystem.
Reported and tested by:	cy
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
2018-10-24 19:40:09 +00:00
Mark Johnston
17fbf3cf34 Add a !NUMA definition for vm_domainset_iter_policy_ref_init().
Pointy hat:	markj
X-MFC with:	r339661
Sponsored by:	The FreeBSD Foundation
2018-10-24 17:09:20 +00:00
Mark Johnston
7571e24901 Add an #include required after r339686.
X-MFC with:	r339686
Sponsored by:	The FreeBSD Foundation
2018-10-24 16:49:16 +00:00
Mark Johnston
194a979ee9 Use a vm_domainset iterator in keg_fetch_slab().
Previously, it used a hand-rolled round-robin iterator.  This meant that
the minskip logic in r338507 didn't apply to UMA allocations, and also
meant that we would call vm_wait() for individual domains rather than
permitting an allocation from any domain with sufficient free pages.

Discussed with:	jeff
Tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17420
2018-10-24 16:41:47 +00:00
Bjoern A. Zeeb
be01db051f Remove redundant redeclaration of netmap_vp_reg().
This should unbreak sparc64 and powerpc LINT builds.
2018-10-24 14:14:49 +00:00
Bjoern A. Zeeb
1ff6e7a8a8 rip6_input() inp validation after epoch(9)
After r335924 rip6_input() needs inp validation to avoid
working on FREED inps.

Apply the relevant bits from r335497,r335501 (rip_input() change)
to the IPv6 counterpart.

PR:			232194
Reviewed by:		rgrimes, ae (,hps)
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D17594
2018-10-24 10:42:35 +00:00
Kristof Provost
13d640d376 pf: Fix copy/paste error in IPv6 address rewriting
We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.

CID:		1009561
MFC after:	3 weeks
2018-10-24 00:19:44 +00:00
Kristof Provost
73c9014569 pf: ifp can never be NULL in pfi_ifaddr_event()
There's no point in the NULL check for ifp, because we'll already have
dereferenced it by then. Moreover, the event will always have a valid ifp.

Replace the late check with an early assertion.

CID:		1357338
2018-10-23 23:15:44 +00:00
Konstantin Belousov
8ff7fad1d7 Only call sigdeferstop() for NFS.
Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by:	mjg (previous version)
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17658
2018-10-23 21:43:41 +00:00
Kirk McKusick
ec888383cf Continuing efforts to provide hardening of FFS, this change adds a
check hash to the superblock. If a check hash fails when an attempt
is made to mount a filesystem, the mount fails with EINVAL (Invalid
argument). This avoids a class of filesystem panics related to
corrupted superblocks. The hash is done using crc32c.

Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-10-23 21:10:06 +00:00
Warner Losh
8c6d203923 For the moment, put back the MOUSE_PROTO_{BUS,INPORT} #defines until
some ports can be updated.
2018-10-23 20:45:46 +00:00
Navdeep Parhar
980ab1baa6 cxgbe/iw_cxgbe: save the ep in the driver-private provider_data field.
Submitted By: Lily Wang @ Netapp

MFC after:	1 week
2018-10-23 18:32:55 +00:00
John Baldwin
1146377b4b Support the SHA224 HMAC algorithm in ccr(4).
MFC after:	2 months
Sponsored by:	Chelsio Communications
2018-10-23 18:31:39 +00:00
John Baldwin
174a501466 Add sha224 to the authctx union.
MFC after:	2 months
Sponsored by:	Chelsio Communications
2018-10-23 18:07:37 +00:00
Mark Johnston
87ab1a10b1 Initialize static domainsets regardless of whether an SRAT is present.
Reported by:	yuripv
X-MFC with:	r339452
Sponsored by:	The FreeBSD Foundation
2018-10-23 18:07:16 +00:00
Konstantin Belousov
90a38351c8 Initializer error variable in nvdimm_spa_uio().
Several code paths might result in returning uninitialized value.

Reported by:	coverity through cem
CID:	1396315
Sponsored by:	The FreeBSD Foundation
2018-10-23 17:53:35 +00:00
Eric Joyner
46fa0c2552 Revert r339634.
That commit is causing kernel panics in em(4), so this will be reverted
until those are fixed.

Reported by:	ae@, pho@, et al
Sponsored by:	Intel Corporation
2018-10-23 17:06:36 +00:00
Mark Johnston
4c29d2de67 Refactor domainset iterators for use by malloc(9) and UMA.
Before this change we had two flavours of vm_domainset iterators: "page"
and "malloc".  The latter was only used for kmem_*() and hard-coded its
behaviour based on kernel_object's policy.  Moreover, its use contained
a race similar to that fixed by r338755 since the kernel_object's
iterator was being run without the object lock.

In some cases it is useful to be able to explicitly specify a policy
(domainset) or policy+iterator (domainset_ref) when performing memory
allocations.  To that end, refactor the vm_dominset_* KPI to permit
this, and get rid of the "malloc" domainset_iter KPI in the process.

Reviewed by:	jeff (previous version)
Tested by:	pho (part of a larger patch)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17417
2018-10-23 16:35:58 +00:00
Andrey V. Elsukov
8796e291f8 Add the check that current VNET is ready and access to srchash is allowed.
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.

MFC after:	20 days
2018-10-23 13:11:45 +00:00
Andrey V. Elsukov
221022e190 Add the check that current VNET is ready and access to srchash is
allowed.

ipsec_srcaddr() callback can be called during VNET teardown, since
ingress address checking subsystem isn't VNET specific. And thus
callback can make access to already freed memory. To prevent this,
use V_ipsec_idhtbl pointer as indicator of VNET readiness. And make
epoch_wait() after resetting it to NULL in vnet_ipsec_uninit() to
be sure that ipsec_srcaddr() is finished its work.

Reported by:	kp
MFC after:	20 days
2018-10-23 13:03:03 +00:00
Gleb Smirnoff
daa70021ac Fix ipw_start(), where logic was reverted in r287197.
PR:		232554
Submitted by:	gl00my@mail.ru
2018-10-23 12:53:09 +00:00
Andrey V. Elsukov
e6b383b2f5 Remove softc from idhash when interface is destroyed.
MFC after:	20 days
2018-10-23 12:50:28 +00:00
Vincenzo Maffione
2a7db7a63d netmap: align codebase to the current upstream (sha 8374e1a7e6941)
Changelist:
    - Move large parts of VALE code to a new file and header netmap_bdg.[ch].
      This is useful to reuse the code within upcoming projects.
    - Improvements and bug fixes to pipes and monitors.
    - Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to
      handle differences between FreeBSD and Linux.
    - Introduce some new helper functions to handle more host rings and fake
      rings (netmap_all_rings(), netmap_real_rings(), ...)
    - Added new sysctl to enable/disable hw checksum in emulated netmap mode.
    - nm_inject: add support for NS_MOREFRAG

Approved by:	gnn (mentor)
Differential Revision:	https://reviews.freebsd.org/D17364
2018-10-23 08:55:16 +00:00
Eric Joyner
940f62d616 iflib: drain enqueued tasks before detaching from taskqgroup
The taskqgroup_detach function does not check if task is already enqueued when
detaching it. This may lead to kernel panic if enqueued task starts after
context state lock is destroyed. Ensure that the already enqueued admin tasks
are executed before detaching them.

The issue was discovered during validation of D16429. Unloading of if_ixlv
followed by immediate removal of VFs with iovctl -D may lead to panic on
NODEBUG kernel.

As well, check if iflib is in detach before enqueueing new admin or iov
tasks, to prevent new tasks from executing while the taskqgroup tasks
are being drained.

Submitted by:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by:	shurd@, erj@
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D17404
2018-10-23 04:37:29 +00:00
Justin Hibbits
3c22d78997 dpaa: Mark BMan and QMan as earlier driver modules
The BMan softc must exist when dtsec devices are created, else a NULL
pointer is dereferenced.  QMan likely as well.  Until now, we have relied on
order within the fdt parsing to attach correctly, but this obviously is not
foolproof.  Mark these as BUS_PASS_SUPPORTDEV so they're probed and attached
explicitly before dtsec devices.
2018-10-23 01:56:52 +00:00
Navdeep Parhar
17e81b7863 cxgbe(4): improve the accuracy of various TSO limits reported to the kernel.
Sponsored by:	Chelsio Communications
2018-10-22 23:57:59 +00:00
Navdeep Parhar
ddf09ad64b cxgbe(4): Use automatic cidx updates with ofld and ctrl queues.
The bits that explicitly request cidx updates do not work reliably with
all possible WRs that can be sent over the queue.  The F_FW_WR_EQUIQ
requests that still remain may also have to be replaced with explicit
credit flush WRs in the future.

MFC after:	2 days
Sponsored by:	Chelsio Communications
2018-10-22 23:06:23 +00:00
Brooks Davis
c3adaa3305 Consolidate identical ELF auxargs type defintions.
All platforms except powerpc use the same values and powerpc shares a
majority of them.

Go ahead and declare AT_NOTELF, AT_UID, and AT_EUID in favor of the
unused AT_DCACHEBSIZE, AT_ICACHEBSIZE, and AT_UCACHEBSIZE for powerpc.

Reviewed by:	jhb, imp
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17397
2018-10-22 22:24:32 +00:00
Brooks Davis
9d7051d920 Remove the need for backslashes in syscalls.master.
Join non-special lines together until we hit a line containing a '}'
character. This allows the function declaration body to be split
across multiple lines without backslash continuation characters.

Continue to join lines ending with backslashes to allow gradual
migration and to support out-of-tree syscall vectors

Reviewed by:	emaste, kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17488
2018-10-22 22:13:00 +00:00
Brooks Davis
c7864b97d3 Regen after r339622.
Note: changes to freebsd32 syscalls.master impacted no generated files.
2018-10-22 21:51:59 +00:00
Brooks Davis
22c0c9a481 Remove __restrict qualifiers from syscalls.master.
The restruct qualifier is intended to aid code generation in the
compiler, but the only access to storage through these pointers is via
structs using copyin/copyout and the like which can not be written in C
or C++ and thus the compiler gains nothing from the qualifiers.

As such, the qualifiers add no value in current usage.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17574
2018-10-22 21:50:43 +00:00
John Baldwin
74e10fb613 A couple of style fixes in recent TCP changes.
- Add a blank line before a block comment to match other block comments
  in the same function.
- Sort the prototype for sbsndptr_adv and fix whitespace between return
  type and function name.

Reviewed by:	gallatin, bz
Differential Revision:	https://reviews.freebsd.org/D17474
2018-10-22 21:17:36 +00:00
Tijl Coosemans
642909fdfb Define linuxkpi readq for 64-bit architectures. It is used by drm-kmod.
Currently the compiler picks up the definition in machine/cpufunc.h.

Add compiler memory barriers to read* and write*.  The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.

Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.

Qualify the addr parameter in write* as volatile.

Like Linux, define macros with the same name as the inline functions.

Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.

Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian

Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.

Reviewed by:	hselasky
MFC after:	3 days
2018-10-22 20:55:35 +00:00
Mark Johnston
b61f314290 Make it possible to disable NUMA support with a tunable.
This provides a chicken switch for anyone negatively impacted by
enabling NUMA in the amd64 GENERIC kernel configuration.  With
NUMA disabled at boot-time, information about the NUMA topology
is not exposed to the rest of the kernel, and all of physical
memory is viewed as coming from a single domain.

This method still has some performance overhead relative to disabling
NUMA support at compile time.

PR:		231460
Reviewed by:	alc, gallatin, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17439
2018-10-22 20:13:51 +00:00
Conrad Meyer
0f743729ab Update to Zstandard 1.3.7
Relnotes:	yes
Sponsored by:	Dell EMC Isilon
2018-10-22 18:29:12 +00:00
Conrad Meyer
93940996fd Conditionalize kern.tty_info_kstacks feature on STACKS option
Fix tinderbox (mips XLPN32) after r339471.

Reported by:	tinderbox
X-MFC-With:	r339471
Sponsored by:	Dell EMC Isilon
2018-10-22 17:42:57 +00:00
Mark Johnston
2801dd08d7 Fix the build after r339601.
I committed some patches out of order and didn't build-test one of them.

Reported by:	Jenkins, O. Hartmann <ohartmann@walstatt.org>
X-MFC with:	r339601
2018-10-22 17:19:48 +00:00
Mark Johnston
2a843ae7d9 Avoid a redundancy in a comment updated by r339601.
Reported by:	alc
X-MFC with:	r339601
2018-10-22 17:17:30 +00:00
Mark Johnston
b00581965d Swap in processes unless there's a global memory shortage.
On NUMA systems, we would not swap in processes unless all domains
had some free pages.  This is too conservative in general.  Instead,
permit swapins so long as at least one domain has free pages, and add
a kernel stack NUMA policy which ensures that we will try to allocate
kernel stack pages from any domain.

Reported and tested by:	pho, Jan Bramkamp <crest@bultmann.eu>
Reviewed by:	alc, kib
Discussed with:	jeff
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17304
2018-10-22 17:04:04 +00:00
Hans Petter Selasky
127a9d7381 Make sure returned value is checked and assert a valid refcount.
While at it fix a print: Unsigned types cannot be negative.

Reviewed by:		kib, mjg
Differential revision:	https://reviews.freebsd.org/D17616
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-10-22 16:21:50 +00:00
Mark Johnston
21744c825f Don't import 0 into vmem quantum caches.
vmem uses UMA cache zones to implement the quantum cache.  Since
uma_zalloc() returns 0 (NULL) to signal an allocation failure, UMA
should not be used to cache resource 0.  Fix this by ensuring that 0 is
never cached in UMA in the first place, and by modifying vmem_alloc()
to fall back to a search of the free lists if the cache is depleted,
rather than blocking in qc_import().

Reported by and discussed with:	Brett Gutstein <bgutstein@rice.edu>
Reviewed by:	alc
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17483
2018-10-22 16:16:42 +00:00
Mark Johnston
d3a4b0dabc Fix style bugs in in6_pcblookup_lbgroup().
This should have been a part of r338470.  No functional changes
intended.

Reported by:	gallatin
Reviewed by:	gallatin, Johannes Lundberg <johalun0@gmail.com>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17109
2018-10-22 16:09:01 +00:00
Gleb Smirnoff
81c0d72c60 If we lost race or were migrated during bucket allocation for the per-CPU
cache, then we put new bucket on generic bucket cache. However, code didn't
honor UMA_ZONE_NOBUCKETCACHE flag, so potentially we could start a cache
on a zone that clearly forbids that. Fix this.

Reviewed by:	markj
2018-10-22 15:48:07 +00:00
Andriy Gapon
ca8f3d1ca2 nfsrvd_readdirplus: for some errors, do not fail the entire request
Instead, a failing entry is skipped.
This change consist of two logical changes.

A failure to vget or lookup an entry is considered to be a result of a
concurrent removal, which is the only reasonable explanation given that
the filesystem is busied.  So, the entry would be silently skipped.

In the case of a failure to get attributes of an entry for an NFSv3
request, the entry would be silently skipped.  There can be legitimate
reasons for the failure, but NFSv3 does not provide any means to report
the error, so we have two options: either fail the whole request or
ignore the failed entry.  Traditionally, the old NFS server used the
latter option, so the code is reverted to it.  Making the whole
directory unreadable because of a single entry seems to be unpractical.

Additionally, some bits of code are slightly re-arranged to account for
the new control flow and to honor style(9).

Reviewed by:	rmacklem
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D15424
2018-10-22 15:33:05 +00:00
Andrew Turner
43e08d07c5 Stop advertising ARMv8.3 Pointer Authentication
This needs firmware and kernel support before userspace can use it. Until
then don't advertise it's available.

MFC after:	3 days
2018-10-22 15:18:49 +00:00
Andrew Turner
5bb9cd6123 Fix the ID_AA64ISAR0_EL1 dot product field shift.
It's 44 in the documentation, use this correct value.

MFC after:	3 days
2018-10-22 15:06:14 +00:00
Andrew Turner
71374d5d99 Correctly set the DAIF bits in new threads
We should only unmask interrupts when creating a new thread and leave the
other exceptions in teh same state as before creating the thread.

Reported by:	jhibbits
Reviewed by:	jhibbits
MFC after:	1 month
Sponsored by:	https://reviews.freebsd.org/D17497
2018-10-22 14:58:59 +00:00
Andriy Gapon
da4e7cad13 ichwd: add support for TCO watchdog timer in Lewisburg PCH (C620)
The change is based on public documents listed below as well as Linux
changes and the code developed by Kostik.

The documents:
- Intel® C620 Series Chipset Platform Controller Hub Datasheet
- Intel® 100 Series and Intel® C230 Series Chipset Family Platform
  Controller Hub (PCH) Datasheet - Volume 2 of 2

Interesting Linux commits:
- 9424693035
- 2a7a0e9bf7

The peculiarity of the new chipsets is that the watchdog resources are
configured in PCI registers of SMBus controller and Power Management
function as opposed to the LPC bridge.  I took a simplistic approach of
querying the resources from the respective PCI devices.  ichwd is still
a device on isa bus.  The PCI devices are found by their slot and
function defined in the datasheets as siblings of the upstream LPC
bridge.

There are some shortcuts and missing features.

First of all, I have not implemented the functionality required to clear
the no-reboot bit.  That would require writing to a special PCI
configuration register of a hidden / invisible PCI device after which
the device would start responding to accesses to other registers.  The
no-reboot bit was not set on my test hardware, so I decided to leave its
handling for the later time.

Also, I did not try to handle the case where the watchdog resources are
not configured by the hardware as well as the case where ACPI defined
operational region conflicts with the watchdog resources.  My test
system did not have either of those problem, so, again, I decided to
leave those cases until later.
See this Linux commit for some details of the ACPI problem:
a7ae81952c

Finally, I have added only the PCI ID found on my test system.  I think
that more IDs can be added as the change gets tested.

Tested on Dell PowerEdge R740.

PR:		222079
Reviewed by:	mav, kib
MFC after:	3 weeks
Relnotes:	maybe
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D17585
2018-10-22 14:44:44 +00:00
Leandro Lupori
d93e635a81 ppc64: limited 32-bit DMA address range
Further investigation of issues with 32-bit DMA on PowerNV revealed that
its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and
cannot be changed by the OS.
Thus, now jhb suggestion of limiting the range in PCI DMA tag seems
the best way to deal with it.

Reviewed by:	jhibbits, nwhitehorn, sbruno
Approved by:	jhibbits(mentor)
Differential Revision:	https://reviews.freebsd.org/D17601
2018-10-22 13:40:50 +00:00
Hans Petter Selasky
88d57e9eac Resolve deadlock between epoch(9) and various network interface
SX-locks, during if_purgeaddrs(), by not allowing to hold the epoch
read lock over typical network IOCTL code paths. This is a regression
issue after r334305.

Reviewed by:		ae (network)
Differential revision:	https://reviews.freebsd.org/D17647
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-10-22 13:25:26 +00:00
Hans Petter Selasky
27e4bd81f7 Added support for formula-based arbitrary baud rates, in contrast to
the current fixed values, which enables use of rates above 1 Mbps.
Improved the detection of HXD chips, and the status flag handling as
well.

Submitted by:		Gabor Simon <gabor.simon75@gmail.com>
PR:			225932
Differential revision:	https://reviews.freebsd.org/D16639
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-10-22 11:58:30 +00:00
Wei Hu
f0319886ca Do not trop UDP traffic when TXCSUM_IPV6 flag is on
PR:		231797
Submitted by:	whu
Reviewed by:	dexuan
Obtained from:	Kevin Morse
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://bugs.freebsd.org/bugzilla/attachment.cgi?id=198333&action=diff
2018-10-22 11:23:51 +00:00
Slava Shwartsman
0c79f82cf0 mlx5: Notify user that the ConnectX-6 shutdown its port due to power limitation
If power exceed the slot limit, or slot limit is unknown the ConnectX-6
firmware will shutdown its port.
Inform the user via debug message.

MFC after:      3 days
Approved by:    hselasky (mentor), kib (mentor)
Sponsored by:   Mellanox Technologies
2018-10-22 10:38:38 +00:00
Hans Petter Selasky
cb8bfb4830 The event bytes should be unsigned char.
Found by:		Peter Holm <peter@holm.cc>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-22 08:59:20 +00:00
Hans Petter Selasky
f46bdfe14f Drop sequencer mutex around uiomove() and make sure we don't move more bytes
than is available, else a panic might happen.

Found by:		Peter Holm <peter@holm.cc>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-22 08:58:27 +00:00
Hans Petter Selasky
bb415b2a5f Fix off-by-one which can lead to panics.
Found by:		Peter Holm <peter@holm.cc>
MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-10-22 08:55:58 +00:00
Mateusz Guzik
099c6f6d45 amd64: finish the tail in memset with an overlapping store
Instead of finding the exact size to fit in we can just shift the target
by -8 + tail. Doing a blind write to a previously rep stosq'ed area comes
with a penalty so do it conditionally.

Sample win on EPYC when zeroing a 257 sized buffer (tail = 1) aligned to
16 bytes:
before: 44782846 ops/s
after:  46118614 ops/s

Idea stolen from NetBSD.

Sponsored by:	The FreeBSD Foundation
2018-10-22 06:44:20 +00:00
Ben Widawsky
8fd1088042 acpi: Add an interface to obtain DSM information
The Device Specific Method (_DSM) is on optional object that defines
device specific controls. This will be useful for our power management
controller in upcoming patches. More information can be found in ACPI
spec 6.2 section 9.1.1

https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf

This patch had a minor modification changing ENOMEM to AE_NO_MEMORY
after it got review and approval but before committing.

Test Plan: Tested in my s0ix branch

Reviewed by:	kib
Approved by:	emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D17121
2018-10-22 03:29:54 +00:00
Warner Losh
221ac8f4cd Remove the long obsolete SYM_SETUP_LP_PROBE_MAP option. It's not been
needed for almost 20 years, and is totally useless now that ncr(4) has
been removed.

Relnotes: yes
2018-10-22 02:36:31 +00:00
Warner Losh
6a18678249 Remove the ncr(4) drive.
This driver has been obsolete since the FreeBSD 4.x. It should have
been removed then since the sym(4) driver had subsumed it. The driver
was commented out of GENERIC in 2000.

RelNotes: Yes
2018-10-22 02:36:18 +00:00
Warner Losh
51a2f83991 Retire scsi_low
scsi_low was a common set of routines to do the SCSI bus sequencing
for the ncv, nsp and stg drivers. Those have been removed, so it's no
longer needed since nothing else in the tree uses it and nothing
likely ever will (it's for super-low-end 8-bit parallel SCSI cards).
2018-10-22 02:36:07 +00:00