130496 Commits

Author SHA1 Message Date
jtl
ee029a5d0b If the INP lock is uncontested, avoid taking a reference and jumping
through the lock-switching hoops.

A few of the INP lookup operations that lock INPs after the lookup do
so using this mechanism (to maintain lock ordering):

1. Lock lookup structure.
2. Find INP.
3. Acquire reference on INP.
4. Drop lock on lookup structure.
5. Acquire INP lock.
6. Drop reference on INP.

This change provides a slightly shorter path for cases where the INP
lock is uncontested:

1. Lock lookup structure.
2. Find INP.
3. Try to acquire the INP lock.
4. If successful, drop lock on lookup structure.

Of course, if the INP lock is contested, the functions will need to
revert to the previous way of switching locks safely.

This saves a few atomic operations when the INP lock is uncontested.

Discussed with:	gallatin, rrs, rwatson
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D12911
2018-03-21 15:54:46 +00:00
andrew
77f4d6e932 Use a table to find the endpoint configuration
On the Allwinner SoCs we need to set a custom endpoint configuration. To
allow for this use a table to store the configuration so the attachment
can override it.

Reviewed by:	hselasky
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14783
2018-03-21 15:17:54 +00:00
imp
89c879aab5 Mark psycho interrupts as MPSAFE. It's safe to do so now that we don't
need Giant to call shutdown_nice().
2018-03-21 14:47:17 +00:00
imp
17b7d8a1eb Unlock giant when calling shutdown_nice() 2018-03-21 14:47:12 +00:00
imp
95bc67def2 This is MPSAFE on this platform, so don't take Giant out while running
the callback.
2018-03-21 14:47:08 +00:00
imp
55d52e5cbf These interrupts call shutdown_nice() which should be called Giant
unlocked. Rather than dropping it in the interrupt handler, mark these
handlers as MPSAFE.
2018-03-21 14:47:03 +00:00
imp
16e006edb9 bufshutdown is no longer called with Giant held, so there's no need to
drop or pickup Giant anymore. Remove that code and adjust comments.
2018-03-21 14:46:59 +00:00
imp
16e8d96dda Remove Giant from init creation and vfs_mountroot.
Sponsored by: Netflix
Discussed with: kib@, mckusick@
Differential Review: https://reviews.freebsd.org/D14712
2018-03-21 14:46:54 +00:00
imp
20eb8298f5 Revert r331273: "Release the "TUR" reference when clearing the TUR work flag. We mostly"
It exposes other issues, so revert to the pervious state of known issues.
2018-03-21 12:55:59 +00:00
kib
3225db018a Move sysinit and sysuninit linker sets in the data (writeable) section.
Both sets are sorted in place, and with the introduction of read-only
permissions on the amd64 kernel text, the sorting override depended on
CR0.WP turned off.  Make it correct by moving the sets into writeable
part of the KVA, also fixing boot on machines where hand-off from BIOS
to OS occurs with CR0.WP set.

Based on submission by:	Peter Lei <peter.lei@ieee.org>
MFC after:	1 week
2018-03-21 10:26:39 +00:00
cem
02207c7d89 Add missed sys/limits.h include
Apparently header pollution on x86 hid its absense.  Sorry, other arch
users.

Fix the missed header introduced in r331279.

Reported by:	tinderbox
2018-03-21 03:43:40 +00:00
cem
06362ad468 Regenerate sysent files after r331279. 2018-03-21 01:17:01 +00:00
cem
82710b55b6 Implement getrandom(2) and getentropy(3)
The general idea here is to provide userspace programs with well-defined
sources of entropy, in a fashion that doesn't require opening a new file
descriptor (ulimits) or accessing paths (/dev/urandom may be restricted
by chroot or capsicum).

getrandom(2) is the more general API, and comes from the Linux world.
Since our urandom and random devices are identical, the GRND_RANDOM flag
is ignored.

getentropy(3) is added as a compatibility shim for the OpenBSD API.

truss(1) support is included.

Tests for both system calls are provided.  Coverage is believed to be at
least as comprehensive as LTP getrandom(2) test coverage.  Additionally,
instructions for running the LTP tests directly against FreeBSD are provided
in the "Test Plan" section of the Differential revision linked below.  (They
pass, of course.)

PR:		194204
Reported by:	David CARLIER <david.carlier AT hardenedbsd.org>
Discussed with:	cperciva, delphij, jhb, markj
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D14500
2018-03-21 01:15:45 +00:00
jamie
783e904fc9 Represent boolean jail options as an array of structures containing the
flag and both the regular and "no" names, instead of two different string
arrays whose indices need to match the flag's bit position.  This makes
them similar to the say "jailsys" options are represented.

Loop through either kind of option array with a structure pointer rather
then an integer index.
2018-03-20 23:08:42 +00:00
melifaro
75159f749d Use count(9) api for the bpf(4) statistics.
Currently each bfp descriptor uses u64 variables to maintain its counters.
On interfaces with high packet rate this leads to unnecessary contention
and inaccurate reporting.

PR:		kern/205320
Reported by:	elofu17 at hotmail.com
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14726
2018-03-20 22:57:06 +00:00
imp
c2ed5522d0 Release the "TUR" reference when clearing the TUR work flag. We mostly
do this right, except when there's no BP and we do a TUR by request.
In that case, we clear the flag, but don't release the reference,
leaking the reference on rare occasion.

PR: 226510
Sponsored by: Netflix
2018-03-20 22:07:45 +00:00
glebius
c720980782 At this point iwmesg isn't initialized yet, so print pointer to lock
rather than panic before panicing.
2018-03-20 22:05:21 +00:00
imp
0d11728f30 Push down Giant one layer. In the days of yore, back when Penitums
were the new kids on the block and F00F hacks were all the rage, one
needed to take out Giant to do anything moderately complicated with
the VM, mappings and such. So the pccard / cardbus code held Giant for
the entire insertion or removal process.

Today, the VM is MP safe. The lock is only needed for dealing with
newbus things. Move locking and unlocking Giant to be only around
adding and probing devices in pccard and cardbus.
2018-03-20 22:01:18 +00:00
markj
5f265d26d1 Revert part of r331264: disable interrupts before disabling WP.
We might otherwise be preempted, leaving WP disabled while another
thread runs on the CPU.

Reported by:	kib
X-MFC with:	r331264
2018-03-20 21:36:35 +00:00
imp
740a13f2cc Drop support for lint for cdefs.h. 2018-03-20 21:18:40 +00:00
imp
7d19b2c4b8 Remove obsolete lint support. 2018-03-20 21:17:48 +00:00
markj
5ea8c6620a Make use of the KPI added in r331252.
MFC after:	2 weeks
2018-03-20 21:16:26 +00:00
emaste
6d7d087d6c Restore close quote lost in r331254 2018-03-20 21:04:47 +00:00
jhb
959557b416 Use <stdarg.h> instead of <machine/stdarg.h> in userland.
<machine/stdarg.h> is a kernel-only header.  The standard header for
userland is <stdarg.h>.  Using the standard header in userland avoids
weird build errors when building with external compilers that include
their own stdarg.h header.

Reviewed by:	arichardson, brooks, imp
Sponsored by:	DARPA / AFRL
Differential Revision:	https://reviews.freebsd.org/D14776
2018-03-20 21:00:45 +00:00
kib
0a1d8bb0a4 Move the CR0.WP manipulation KPI to x86.
This should allow to avoid some #ifdefs in the common x86/ code.

Requested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-03-20 20:20:49 +00:00
emaste
bd06dc7104 Make linuxulator fn declaration match definition
I accidentally swapped 'linux_fixup_elf' to 'linux_elf_fixup' in amd64's
declaration (only),  while bringing this change over from git and
encountering a conflict.
2018-03-20 19:28:52 +00:00
emaste
6fe54a5343 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
kib
ec36014ed1 Disable write protection around patching of XSAVE instruction in the
context switch code.

Some BIOSes give control to the OS with CR0.WP already set, making the
kernel text read-only before cpu_startup().

Reported by:	Peter Lei <peter.lei@ieee.org>
Reviewed by:	jtl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14768
2018-03-20 17:47:29 +00:00
kib
60e489a73d Provide KPI for handling of rw/ro kernel text.
This is a pure syntax patch to create an interface to enable and later
restore write access to the kernel text and other read-only mapped
regions.  It is in line with e.g. vm_fault_disable_pagefaults() by
allowing the nesting.

Discussed with:	Peter Lei <peter.lei@ieee.org>
Reviewed by:	jtl
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14768
2018-03-20 17:43:50 +00:00
jhb
eef279cefd Set the proper vnet in IPsec callback functions.
When using hardware crypto engines, the callback functions used to handle
an IPsec packet after it has been encrypted or decrypted can be invoked
asynchronously from a worker thread that is not associated with a vnet.
Extend 'struct xform_data' to include a vnet pointer and save the current
vnet in this new member when queueing crypto requests in IPsec.  In the
IPsec callback routines, use the new member to set the current vnet while
processing the modified packet.

This fixes a panic when using hardware offload such as ccr(4) with IPsec
after VIMAGE was enabled in GENERIC.

Reported by:	Sony Arpita Das and Harsh Jain @ Chelsio
Reviewed by:	bz
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D14763
2018-03-20 17:05:23 +00:00
kib
b5cd5f8b75 Check for wrap-around in vm_phys_alloc_seg_contig().
It is possible to provide insane values for size in contigmalloc(9)
request, which usually not reaches the phys allocator due to failing
KVA allocation.  But with the forthcoming 4/4 i386, where 32bit
architecture has almost 4G KVA, contigmalloc(1G) is not unreasonable
outright and KVA might be available sometimes.

Then, the calculation of pa_end could wrap around, depending on the
physical address, and the checks in vm_phys_alloc_seg_contig() would
pass while the iteration in the loop after the 'done' label goes out
of the vm_page_array bounds.

Fix it by detecting the wrap.

Reported and tested by:	pho
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D14767
2018-03-20 16:17:55 +00:00
markj
b6855b6d9d Drop KTR_CONTENTION.
It is incomplete, has not been adopted in the other locking primitives,
and we have other means of measuring lock contention (lock_profiling,
lockstat, KTR_LOCK). Drop it to slightly de-clutter the mutex code and
free up a precious KTR class index.

Reviewed by:	jhb, mjg
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14771
2018-03-20 15:51:05 +00:00
andrew
a9a23ff679 Check if the gettime runtime service is valid.
The U-Boot efi runtime service expects us to set the address map before
calling any runtime services. It will then remap a few functions to their
runtime version. One of these is the gettime function. If we call into
this without having set a runtime map we get a page fault.

Add a check to see if this is valid in efi_init() so we don't try to use
the possibly invalid pointer.

Reviewed by:	imp, kevans (both previous version)
X-MFC-With:	r330868
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14759
2018-03-20 13:35:20 +00:00
imp
0cb80c6c90 Kill assert I shouldn't have committed 2018-03-20 13:14:10 +00:00
imp
8d536b9f2e Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.

Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.

Sponsored by: Netflix
2018-03-20 03:37:14 +00:00
imp
fa46a47c04 Make kern.cam.nda.num_trim tunable to limit the number of BIO_DELETE
requests that we'll collapse into one DSM_TRIM. By default it is a
256, which is the max that will fit into a 4k page.

Sponsored by: Netflix
2018-03-20 03:37:09 +00:00
imp
7fb7fa1954 Remove some redundant MPSAFE flags.
This was pointed out in a code review I'm having trouble finding right
now, but go ahead and eliminate these.

Sponsored by: Netfix
2018-03-20 03:37:04 +00:00
emaste
3df5d233f9 Rationalize license text on Linuxolator files
i386 linux.h missed in r330239.

Approved by:	sos
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-03-20 02:50:11 +00:00
jhibbits
68f6ae368c Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a
pointer.  When &bdomain[i] is added to arg2 it widens from uintptr_t to
intmax_t, then gcc whines when it gets cast to a pointer.  Casting through
uintptr_t silences this warning.
2018-03-20 02:01:30 +00:00
jhibbits
431986ec3a Fix powerpc Book-E build post-331018/331048.
pagedaemon_wakeup() was moved from vm_pageout.h to vm_pagequeue.h.
2018-03-20 01:07:22 +00:00
gonzo
ee59b6a5e7 [ofw] fix errneous checks for OF_finddevice(9) return value
OF_finddevices returns ((phandle_t)-1) in case of failure. Some code
in existing drivers checked return value to be equal to 0 or
less/equal to 0 which is also wrong because phandle_t is unsigned
type. Most of these checks were for negative cases that were never
triggered so trhere was no impact on functionality.

Reviewed by:	nwhitehorn
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D14645
2018-03-20 00:03:49 +00:00
mav
b5f2c5035d Update mpr(4) driver from v15 to v18 from Broadcom site.
Version 16 is just a number bump, since we already had those changes.

Version 17 introduces new AdapterType value, that allows new user-space
tools from Broadcom to differentiate adapter generations 3 and 3.5.

Version 18 updates headers and adds SAS_DEVICE_DISCOVERY_ERROR reporting.

MFC after:	2 weeks
2018-03-19 23:21:45 +00:00
mjoras
7854cb198f Fix initialization of eventhandler mutex.
mtx_init does not do a copy of the name string it is passed. The
eventhandler code incorrectly passed the parameter string directly to
mtx_init instead of using the copy it makes. This was an existing
problem with the code that I dutifully copied over in my changes in r325621.

Reported by:	Anton Rang <rang AT acm.org>
Reviewed by:	rstone, markj
Approved by:	rstone (mentor)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14764
2018-03-19 22:43:27 +00:00
emaste
58e1b5421f Rename linuxulator functions with linux_ prefix
It's preferable to have a consistent prefix.  This also reduces
differences between the three linux*_sysvec.c files.

Sponsored by:	Turing Robotic Industries Inc.
2018-03-19 21:26:32 +00:00
kp
af55608e2a pf: Fix memory leak in DIOCRADDTABLES
If a user attempts to add two tables with the same name the duplicate table
will not be added, but we forgot to free the duplicate table, leaking memory.
Ensure we free the duplicate table in the error path.

Reported by:	Coverity
CID:		1382111
MFC after:	3 weeks
2018-03-19 21:13:25 +00:00
erj
6cfbddd770 ixgbe(4): Update shared code, add support for X552 1G, fix bug
This patch will:

- Update ixgbe shared code
- Add support for Intel(R) Ethernet Connection X552 1000BASE-T
- Add error handling for link state check preventing VF from stopping traffic
  after changing PF's MTU value

Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: Intel Networking
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D13885
2018-03-19 20:55:05 +00:00
ken
fddccc44e2 cam_periph_acquire() now returns an errno.
The ch(4) driver was missed in change 328918, which changed
cam_periph_acquire() to return an errno instead of cam_status.

As a result, ch(4) failed to attach.

Sponsored by:	Spectra Logic
2018-03-19 20:19:00 +00:00
jhb
50dd7456bb Fix a typo.
Reviewed by:	kib
2018-03-19 17:14:56 +00:00
lstewart
22e6359728 Add support for the experimental Internet-Draft "TCP Alternative Backoff with
ECN (ABE)" proposal to the New Reno congestion control algorithm module.
ABE reduces the amount of congestion window reduction in response to
ECN-signalled congestion relative to the loss-inferred congestion response.

More details about ABE can be found in the Internet-Draft:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn

The implementation introduces four new sysctls:

- net.inet.tcp.cc.abe defaults to 0 (disabled) and can be set to non-zero to
  enable ABE for ECN-enabled TCP connections.

- net.inet.tcp.cc.newreno.beta and net.inet.tcp.cc.newreno.beta_ecn set the
  multiplicative window decrease factor, specified as a percentage, applied to
  the congestion window in response to a loss-based or ECN-based congestion
  signal respectively. They default to the values specified in the draft i.e.
  beta=50 and beta_ecn=80.

- net.inet.tcp.cc.abe_frlossreduce defaults to 0 (disabled) and can be set to
  non-zero to enable the use of standard beta (50% by default) when repairing
  loss during an ECN-signalled congestion recovery episode. It enables a more
  conservative congestion response and is provided for the purposes of
  experimentation as a result of some discussion at IETF 100 in Singapore.

The values of beta and beta_ecn can also be set per-connection by way of the
TCP_CCALGOOPT TCP-level socket option and the new CC_NEWRENO_BETA or
CC_NEWRENO_BETA_ECN CC algo sub-options.

Submitted by:	Tom Jones <tj@enoti.me>
Tested by:	Tom Jones <tj@enoti.me>, Grenville Armitage <garmitage@swin.edu.au>
Relnotes:	Yes
Differential Revision:	https://reviews.freebsd.org/D11616
2018-03-19 16:37:47 +00:00
manu
d46d9cadf4 sys/dts: Remove arm64 from subdir as it no longer exists.
r325987 removed the arm64 directory, remove it from SUBDIR too.
2018-03-19 15:35:26 +00:00