Commit Graph

136108 Commits

Author SHA1 Message Date
jhibbits
b33106f398 powerpc/powernv: Make erasing before writes optional
If the OPAL flash driver supports writing without erase, it adds a
'no-erase' property to the flash device node.  Honor that property and don't
bother erasing if it exists.
2019-04-19 02:28:04 +00:00
jhb
b4bd29249c Push down INP_WLOCK slightly in tcp_ctloutput.
The inp lock is not needed for testing the V6 flag as that flag is set
once when the inp is created and never changes.  For non-TCP socket
options the lock is immediately dropped after checking that flag.
This just pushes the lock down to only be acquired for TCP socket
options.

This isn't a hot-path, more a cosmetic cleanup I noticed while reading
the code.

Reviewed by:	bz
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19740
2019-04-18 23:21:26 +00:00
imp
7b5943709e When parsing command line stuff, treat tabs and spaces the same.
When creating complex config files, people like to use tabs to offset
sections. Treat them the same as spaces for delimiters.
2019-04-18 22:52:12 +00:00
cem
4c79b5f69c random(4): Restore availability tradeoff prior to r346250
As discussed in that commit message, it is a dangerous default.  But the
safe default causes enough pain on a variety of platforms that for now,
restore the prior default.

Some of this is self-induced pain we should/could do better about; for
example, programmatic CI systems and VM managers should introduce entropy
from the host for individual VM instances.  This is considered a future work
item.

On modern x86 and Power9 systems, this may be wholly unnecessary after
D19928 lands (even in the non-ideal case where early /boot/entropy is
unavailable), because they have fast hardware random sources available early
in boot.  But D19928 is not yet landed and we have a host of architectures
which do not provide fast random sources.

This change adds several tunables and diagnostic sysctls, documented
thoroughly in UPDATING and sys/dev/random/random_infra.c.

PR:		230875 (reopens)
Reported by:	adrian, jhb, imp, and probably others
Reviewed by:	delphij, imp (earlier version), markm (earlier version)
Discussed with:	adrian
Approved by:	secteam(delphij)
Relnotes:	yeah
Security:	related
Differential Revision:	https://reviews.freebsd.org/D19944
2019-04-18 20:48:54 +00:00
hselasky
d9340a46aa Implement flag for telling cuse(3) clients if the peer is running in 32-bit
compat mode or not. This is useful when implementing compatibility ioctl(2)
handlers in userspace.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-18 19:04:07 +00:00
kib
870e4adb34 Use correct type name.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2019-04-18 15:31:03 +00:00
kib
9da7c1a266 Correct handling of RMRR during early enumeration stages.
On some machines, DMAR contexts must be created before all devices
under the scope of the corresponding DMAR unit are enumerated.
Current code has two problems with that:
- scope lookup returns NULL device_t, which causes to skip creating a
  context with RMRR, which is fatal for the affected device.
- calculation of the final pci dbsf address fails if any bridge in the
  scope is not yet enumerated, because code relies on pcib_get_bus().

Make creation of contexts work either with device_t, or with DMAR PCI
scope paths.  Scope provides enough information to infer context
address, and it is directly matched against DMAR tables scopes.

When calculating bus addresses for the scope or device, use direct
pci_cfgregread(PCIR_SECBUS_1) to get the secondary bus number, instead
of pcib_get_bus().

The issue was observed on HP Gen servers, where iLO PCI devices are
located behind south bridge switch.  Turning on translation without
satisfying RMRR requests caused iLO to mostly hang, up to the level of
being unusable to control the server.

While there, remove hw.dmar.dmar_match_verbose tunable, and make the
normal logging under bootverbose useful and sufficient to diagnose
DRHD and RMRR parsing and matching.

Sponsored by:	Mellanox Technologies
MFC after:	1 week
2019-04-18 14:18:06 +00:00
kib
1995e4dec9 Remove witness warning. dmar_bus_dmamap_create() does not sleep.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2019-04-18 14:03:59 +00:00
kib
f9ebd5245d Reduce verbosity, do not announce details of irte programming by default.
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2019-04-18 14:02:33 +00:00
kp
d88fa777c9 pf: No need to M_NOWAIT in DIOCRSETTFLAGS
Now that we don't hold a lock during DIOCRSETTFLAGS memory allocation we can
use M_WAITOK.

MFC after:	1 week
Event:		Aberdeen hackathon 2019
Pointed out by:	glebius@
2019-04-18 11:37:44 +00:00
manu
bf879e13c9 arm: allwinner: Fix audio for Allwinner H3/H5
Due to three conditions the codec driver for Allwinner A10/A20 and H3/H5 did not work properly here:

    Wrong bit position for the analog audio reset
    Hardware Reset of codec was not de-asserted correctly
    Linux DTS file did not contain the address of the analog register the way as the driver was expecting it.

This patch proposes fixes for those three parts.

Submitted by:	freebsdnewbie@freenet.de (Manuel Stühn)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19910
2019-04-17 21:45:19 +00:00
manu
c3ff664282 ofw_graph: Add functions for graph bindings
Those functions are helpers to work on graph bindings.
graphs are mostly use with video related devices.
See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/graph.txt?id=4436a3711e3249840e0679e92d3c951bcaf25515

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19877
2019-04-17 20:09:01 +00:00
kevans
377f5a8be5 Compile sha1.c when ether support is included
sha1 is used by ether_gen_addr after r346324. Perhaps in an ideal world we
could detect that the kernel's been compiled without sha1_* bits included
and silently fallback to arc4random instead because these platforms/kernel
configs are far and few between. It's fairly lightweight, though, so just
include it for now.
2019-04-17 18:08:28 +00:00
kevans
84233445c4 iflib: Use new ether_gen_addr, restricting addresses to that subset
Differential Revision:	https://reviews.freebsd.org/D19587
2019-04-17 17:19:54 +00:00
kevans
9902f46a44 net: adjust randomized address bits
Give devices that need a MAC a 16-bit allocation out of the FreeBSD
Foundation OUI range. Change the name ether_fakeaddr to ether_gen_addr now
that we're dealing real MAC addresses with a real OUI rather than random
locally-administered addresses.

Reviewed by:	bz, rgrimes
Differential Revision:	https://reviews.freebsd.org/D19587
2019-04-17 17:18:43 +00:00
kp
9fe8ed111f pf: Fix panic on invalid DIOCRSETTFLAGS
If during DIOCRSETTFLAGS pfrio_buffer is NULL copyin() will fault, which we're
not allowed to do with a lock held.
We must count the number of entries in the table and release the lock during
copyin(). Only then can we re-acquire the lock. Note that this is safe, because
pfr_set_tflags() will check if the table and entries exist.

This was discovered by a local syzcaller instance.

MFC after:	1 week
Event:		Aberdeen hackathon 2019
2019-04-17 16:42:54 +00:00
ian
ff419c455c Only set up the interrupts that will actually be used in arm generic_timer.
The code previously set up interrupt handlers for all the interrupt
resources available, including for timers that are not in use.  That could
lead to interrupt storms.  For example, if boot firmware enabled the virtual
timer but the kernel is using the physical timer, it could get flooded with
interrupts on the virtual timer which it cannot shut off.  By only setting
up an interrupt handler for the hardware that will actually be used, any
interrupts from other timer units will remain masked in the interrupt
controller.

Differential Revision:	https://reviews.freebsd.org/D19871
2019-04-17 15:27:11 +00:00
kevans
b658113064 fdt: further consolidate DTB building and revise manpage
FDT_DTS_FILE was built separately with a rule in sys/conf/files and
recreated the rules we used in dtb.mk. Now that we have other infrastructure
to build a DTB along with the kernel, fold FDT_DTS_FILE into that since it
doesn't have any special requirements.

fdt(4) never got revised to mention the DTS/DTSO make options, so do that
now.

Reviewed by:	imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19736
2019-04-17 03:29:16 +00:00
manu
788090ecde arm: allwinner: Makes more device optional
MFC after:	2 weeks
2019-04-16 22:42:50 +00:00
manu
e5bc73bddd arm: Order files.arm to have cloudabi and annapurna sections
MFC after:	2 weeks
2019-04-16 20:06:39 +00:00
manu
94008dd35f arm: Add kern_clocksource.c directly in files.arm
This files is needed and included in all our config so move it to a common
location.

MFC after:	2 weeks
2019-04-16 20:04:22 +00:00
kib
7fc477c2c0 Fix initial x87 state after r345562.
After the referenced commit, we did not set x87 and sse valid bits in
the xstate_bv bitmask for initial fpu state (stored in memory), when
using XSAVE.

The state is loaded into FPU register file to initialize the process
FPU state, and since both bits were clear, the default x87 and SSE
states were loaded.  By chance, FreeBSD ABI SSE2 state is same as FPU
initial state, so the bug is not visible for 64bit processes.  But on
i386, the precision control should be set to double (53bit mantissa),
instead of the default double extended (64bit mantissa). For 32bit
processes on amd64, kernel reloads control word with the right mask,
which only left native i386 and amd64 native but using x87 as
affected.

Fix it by setting minimal required xstate_bv mask.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-16 19:46:02 +00:00
manu
d6c1d3bf63 allwinner: clk: Garbage collect old clock implementation
The old clocks are disconneted from the build since r337344.
Remove all those pseudo drivers. The only one remaining is for gmac
(the ethernet controller) so move it to sys/arm/allwinner.
While here remove a83t support from gmacclk as it is unneeded since r326114.

MFC after:	1 month
2019-04-16 19:38:16 +00:00
cem
fbf32f5f72 stack_protector: Add tunable to bypass random cookies
This is a stopgap measure to unbreak installer/VM/embedded boot issues
introduced (or at least exposed by) in r346250.

Add the new tunable, "security.stack_protect.permit_nonrandom_cookies," in
order to continue boot with insecure non-random stack cookies if the random
device is unavailable.

For now, enable it by default.  This is NOT safe.  It will be disabled by
default in a future revision.

There is follow-on work planned to use fast random sources (e.g., RDRAND on
x86 and DARN on Power) to seed when the early entropy file cannot be
provided, for whatever reason.  Please see D19928.

Some better hacks may be used to make the non-random __stack_chk_guard
slightly less predictable (from delphij@ and mjg@); those suggestions are
left for a future revision.  I think it may also be plausible to move stack
guard initialization far later in the boot process; potentially it could be
moved all the way to just before userspace is started.

Reported by:	many
Reviewed by:	delphij, emaste, imp (all w/ caveat: this is a stopgap fix)
Security:	yes
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19927
2019-04-16 18:47:20 +00:00
cem
dec0f8b592 random(4): Add is_random_seeded(9) KPI
The imagined use is for early boot consumers of random to be able to make
decisions based on whether random is available yet or not.  One such
consumer seems to be __stack_chk_init(), which runs immediately after random
is initialized.  A follow-up patch will attempt to address that.

Reported by:	many
Reviewed by:	delphij (except man page)
Approved by:	secteam(delphij)
Differential Revision:	https://reviews.freebsd.org/D19926
2019-04-16 17:12:17 +00:00
gallatin
952f3ca34e Replace cosqos with numa_domain in mbuf pkthdr
The cosqos field was added nearly 6 years ago in r254804, and it is
still unused by any in-tree consumers.  I have a patchset that I'm
working on which aligns many network resources by NUMA domain,
including inps, inpcb lb group, tcp pacing, lagg output link
selection, backing pages for sendfile, and more.  It reduces
cross-domain traffic by roughly 50% for a real web workload.

This patchset relies on being able to store the numa domain in the
mbuf, and grabbing the unused cosqos field for this purpose is the
first step in starting to usptream it.

Reviewed by:	kib, markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19862
2019-04-16 16:49:34 +00:00
emaste
0ae08aa89a correct readlinkat(2) return type
r176215 corrected readlink(2)'s return type and the type of the last
argument.  readlink(2) was introduced in r177788 after being developed
as part of Google Summer of Code 2007; it appears to have inherited the
wrong return type.

Man pages and header files were already ssize_t; update syscalls.master
to match.

PR:		197915
Submitted by:	Henning Petersen <henning.petersen@t-online.de>
MFC after:	2 weeks
2019-04-16 13:26:31 +00:00
manu
b6f38bc718 aw_syscon: Add a new compatible
Since 5.0 DTS the syscon controller have a new compatible as it
exports new subnodes, we currently only use it as a syscon provider
so just add the new compatible.

Tested On:  H3
MFC after:	1 month
2019-04-16 12:40:49 +00:00
manu
5c94073f5b aw_rtc: Register the clocks
Since latest DTS update the rtc is supposed to register two clocks :

- osc32k (the 32k oscillator on the board that the RTC uses directly and
that other peripheral can use)
- iosc (the internal oscillator of the RTC when available which frequency
depend on the SoC revision)

Since we need the RTC before the proper clock control unit (because it uses
those clocks) attach it a BUS_PASS_BUS + MIDDLE and attach the clock control
unit at BUS_PASS_BUS + LAST for the SoC that requires it.

Tested On:	     A20, H3, A64

MFC after:	1 month
2019-04-16 12:39:31 +00:00
fsu
147d386c7c ext2fs: Initial version of DTrace support.
Commit forgotten file.

Reviewed by:    pfg, gnn
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19848
2019-04-16 11:37:15 +00:00
fsu
c3ee715fe1 ext2fs: Initial version of DTrace support.
Reviewed by:    pfg, gnn
MFC after:      1 week

Differential Revision:    https://reviews.freebsd.org/D19848
2019-04-16 11:20:10 +00:00
peterj
3a30b2e1eb Specify correct Ethernet phy for RPI-B
Correct a typo in the RPI-B ethernet config - the RPi-B includes a
SMC LAN9512 USB bridge and Ethernet 10/100 NIC/phy.  The phy part of
this is supported by smscphy.

Tested On: RPi1 Model B

Approved by:	grog, jhb (mentors)
MFC after:	3 days
2019-04-16 09:44:46 +00:00
peterj
c485d101c5 Fix cpufreq(4) on RPI-B
Since r324184 the root node compatible for the original Raspberry Pi
is "brcm,bcm2835", add it to the compatible list of bcm2835_cpufreq.

Tested On: RPi1 Model B

Note that the default Das U-Boot FDT does not include a cpus clause
so actually adding a bcm2835_cpufreq device requires adding a FDT
overlay defining the cpu.

Approved by:	grog, jhb (mentors)
MFC after:	3 days
2019-04-16 09:42:42 +00:00
mw
faf2f0725c Improve tpm20 style
No functional changes to the code are applied.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
2019-04-16 02:46:21 +00:00
mw
79a9626e09 tpm: Prevent session hijack
Check caller thread id before allowing to read the buffer
to make sure that it can only be accessed by the thread that
did the associated write to the TPM.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: delphij
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19713
2019-04-16 02:28:35 +00:00
cem
654aeb58dd random(4): Block read_random(9) on initial seeding
read_random() is/was used, mostly without error checking, in a lot of
very sensitive places in the kernel -- including seeding the widely used
arc4random(9).

Most uses, especially arc4random(9), should block until the device is seeded
rather than proceeding with a bogus or empty seed.  I did not spy any
obvious kernel consumers where blocking would be inappropriate (in the
sense that lack of entropy would be ok -- I did not investigate locking
angle thoroughly).  In many instances, arc4random_buf(9) or that family
of APIs would be more appropriate anyway; that work was done in r345865.

A minor cleanup was made to the implementation of the READ_RANDOM function:
instead of using a variable-length array on the stack to temporarily store
all full random blocks sufficient to satisfy the requested 'len', only store
a single block on the stack.  This has some benefit in terms of reducing
stack usage, reducing memcpy overhead and reducing devrandom output leakage
via the stack.  Additionally, the stack block is now safely zeroed if it was
used.

One caveat of this change is that the kern.arandom sysctl no longer returns
zero bytes immediately if the random device is not seeded.  This means that
FreeBSD-specific userspace applications which attempted to handle an
unseeded random device may be broken by this change.  If such behavior is
needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK
option.

On any typical FreeBSD system, entropy is persisted on read/write media and
used to seed the random device very early in boot, and blocking is never a
problem.

This change primarily impacts the behavior of /dev/random on embedded
systems with read-only media that do not configure "nodevice random".  We
toggle the default from 'charge on blindly with no entropy' to 'block
indefinitely.'  This default is safer, but may cause frustration.  Embedded
system designers using FreeBSD have several options.  The most obvious is to
plan to have a small writable NVRAM or NAND to persist entropy, like larger
systems.  Early entropy can be fed from any loader, or by writing directly
to /dev/random during boot.  Some embedded SoCs now provide a fast hardware
entropy source; this would also work for quickly seeding Fortuna.  A 3rd
option would be creating an embedded-specific, more simplistic random
module, like that designed by DJB in [1] (this design still requires a small
rewritable media for forward secrecy).  Finally, the least preferred option
might be "nodevice random", although I plan to remove this in a subsequent
revision.

To help developers emulate the behavior of these embedded systems on
ordinary workstations, the tunable kern.random.block_seeded_status was
added.  When set to 1, it blocks the random device.

I attempted to document this change in random.4 and random.9 and ran into a
bunch of out-of-date or irrelevant or inaccurate content and ended up
rototilling those documents more than I intended to.  Sorry.  I think
they're in a better state now.

PR:		230875
Reviewed by:	delphij, markm (earlier version)
Approved by:	secteam(delphij), devrandom(markm)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D19744
2019-04-15 18:40:36 +00:00
hselasky
cc2d24c16f Remove superfluous USB keyword.
Discussed with:		danfe@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-15 17:32:38 +00:00
gallatin
5e9e5fdf6f mlx5en: Enable new pfil(9) KPI ethernet filtering hooks
This allows efficient filtering at packet ingress on mlx5en.

Note that the packets are filtered (and potentially dropped) *before*
the driver has committed to (re)allocating an mbuf for the
packet. Dropped packets are treated essentially the same as an
error. Nothing is allocated, and the existing buffer is recycled. This
allows us to drop malicious packets at close to line rate with very
little CPU use.

Reviewed by:	hselasky, slavash, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19063
2019-04-15 17:14:50 +00:00
hselasky
ef58b1f602 Fix spelling.
Submitted by:		Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-04-15 14:32:19 +00:00
emaste
b17bf6cec4 Add quirk for ignoring SPCR AccessWidth values on the PL011 UART
The SPCR table on the Lenovo HR330A Ampere eMAG server indicates 8-bit
access, but 32-bit access is required for the PL011 to work.

PL011 on SBSA platforms always supports 32-bit access (and that was
hardcoded here before my EC2 fix), let's use 32-bit access for PL011
and 32BIT interface types.

Tested by emaste on Ampere eMAG and Cavium/Marvell ThunderX2.

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	andrew, imp (earlier)
Differential Revision:	https://reviews.freebsd.org/D19507
2019-04-15 13:41:53 +00:00
rmacklem
eddbdfff07 Fix the NFSv4 client to safely find processes.
r340744 broke the NFSv4 client, because it replaced pfind_locked() with a
call to pfind(), since pfind() acquires the sx lock for the pid hash and
the NFSv4 already holds a mutex when it does the call.
The patch fixes the problem by recreating a pfind_any_locked() and adding the
functions pidhash_slockall() and pidhash_sunlockall to acquire/release
all of the pid hash locks.
These functions are then used by the NFSv4 client instead of acquiring
the allproc_lock and calling pfind().

Reviewed by:	kib, mjg
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19887
2019-04-15 01:27:15 +00:00
tuexen
18c75290c7 When sending a routing message, don't allow the user to set the
RTF_RNH_LOCKED flag in rtm_flags, since this flag is used only
internally.

Reported by:		syzbot+65c676f5248a13753ea0@syzkaller.appspotmail.com
Reviewed by:		ae@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D19898
2019-04-14 10:18:14 +00:00
rmacklem
c5cfdafb1f Add support for INET6 addresses to the kernel code that dumps open/lock state.
PR#223036 reported that INET6 callback addresses were not printed by
nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure,
so that nfsdumpstate(8) can print them out, post-r346190.
The patch also includes the addition of #ifdef INET, INET6 as requested
by bz@.

PR:		223036
Reviewed by:	bz, rgrimes
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19839
2019-04-13 22:00:09 +00:00
tuexen
485db168ce When sending IPv4 packets on a SOCK_RAW socket using the IP_HDRINCL option,
ensure that the ip_hl field is valid. Furthermore, ensure that the complete
IPv4 header is contained in the first mbuf. Finally, move the length checks
before relying on them when accessing fields of the IPv4 header.
Reported by:		jtl@
Reviewed by:		jtl@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D19181
2019-04-13 10:47:47 +00:00
imp
3dcee6772d Move mpr/mps drivers from per-arch NOTES files into the MI notes
file. They are in more arches they they aren't. Add appropriate
nodevice directives in powerpc and arm.
2019-04-13 06:30:45 +00:00
imp
96ac98747a Fix sbttons for values > 2s
Add test against negative times. Add code to cope with larger values
properly.

Discussed with: bde@ (quite some time ago, for an earlier version)
2019-04-13 04:46:35 +00:00
jhibbits
7061ad58c2 Add NUMA support to powerpc
Summary:
Initial NUMA support:
    - associate CPU with domain
    - associate memory ranges with domain
    - identify domain for devices
    - limit device interrupt binding to appropriate domain

- Additionally fixes a bug in the setting of Maxmem which led to
  only memory attached to the first socket being enabled for DMA

A pmap variant can opt in to numa support by by calling `numa_mem_regions`
at the end of pmap_bootstrap - registering the corresponding ranges with the
VM.

This yields a ~20% improvement in build times of llvm on dual socket POWER9
over non-NUMA.

Original patch by mmacy.

Differential Revision: https://reviews.freebsd.org/D17933
2019-04-13 04:03:18 +00:00
jhibbits
81013cfae7 powerpc/dtrace: Fix dtrace powerpc asm, and simplify stack walking
Fix some execution bugs in the dtrace powerpc asm.  addme pulls in the carry
flag which we don't want, and the result wasn't recorded anyways, so the
following beq to check for exit condition wasn't checking the right
condition.

Simplify the stack walking in dtrace_isa.c, so there's only a single walker
that handles both pc and sp.  This should make it easier to follow, and any
bugfix that may be needed for walking only needs to be made in one place
instead of two now.

MFC after:	2 weeks
2019-04-13 03:32:21 +00:00
jhibbits
ef80efbacd powerpc: Add file forgotten in r346144
Forgot to add the changes for DELAY(), which lowers priority during the
delay period.  Also, mark the timebase read as volatile so newer GCC does
not optimize it away, as it reportedly does currently.

MFC after:	2 weeks
MFC with:	r346144
2019-04-13 02:29:30 +00:00
mav
44190ed378 Fix SCSI sense data pass through.
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-04-12 18:54:09 +00:00
kib
333f08f7aa Ignore doomed vnodes in tmpfs_update_mtime().
Otherwise we might dereference NULL vp->v_data after
VP_TO_TMPFS_NODE().

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-12 17:11:50 +00:00
trasz
15e5c29cae Remove unneeded conditionals for sv_ functions - all the ABIs
(apart from null_sysvec) define them, so the 'else' branch is
never taken.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19889
2019-04-12 14:18:16 +00:00
tychon
e660248c13 for a cache-only zone the destructor tries to destroy a non-existent keg
Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19835
2019-04-12 12:46:25 +00:00
jhibbits
831bf1a1aa powerpc: Adjust priority NOPs, and make them functions
PowerISA 2.07 and PowerISA 3.0 both specify special NOPs for priority
adjustments, with "medium" priority being normal.  We had been setting
medium-low as our normal priority.  Rather than guess each time as to what
we want and the right NOP, wrap them in inline functions, and replace the
occurrances of the NOPs with the functions.  Also, make DELAY() drop to very
low priority while waiting, so we don't burn CPU.

Coupled with r346143, this shaves off a modest 5-8% on buildworld times with
-j72.  There may be more room for improvement with judicious use of these
NOPs.

MFC after:	2 weeks
2019-04-12 00:53:30 +00:00
jhibbits
c5657f49cc powerpc64: Increase the nap level on power9 idling
The POWER9 documentation specifies that levels 0-3 are the 'lightest' sleep
level, meaning lowest latency and with no state loss.  However, state 3 is
not implemented, and is instead reserved for future chips.  This now
properly configures the PSSCR, specifying state 2 as the lowest level to
enter, but request level 0 for quickest sleep level.  If the OCC determines
that the CPU can enter states 1 or 2 it will trigger the transition to those
states on demand.

MFC after:	1 week
2019-04-12 00:44:33 +00:00
tuexen
7186df98c8 Fix an SCTP related locking issue. Don't report that the TCB_SEND_LOCK
is owned, when it is not.

This issue was found by running syzkaller.
MFC after:		1 week
2019-04-11 20:39:12 +00:00
trasz
9e141477c1 Use shared vnode locks for the ELF interpreter.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19874
2019-04-11 11:21:45 +00:00
markj
9abf4945e6 Reinitialize multicast source filter structures after invalidation.
When leaving a multicast group, a hole may be created in the inpcb's
source filter and group membership arrays.  To remove the hole, the
succeeding array elements are copied over by one entry.  The multicast
code expects that a newly allocated array element is initialized, but
the code which shifts a tail of the array was leaving stale data
in the final entry.  Fix this by explicitly reinitializing the last
entry following such a copy.

Reported by:	syzbot+f8c3c564ee21d650475e@syzkaller.appspotmail.com
Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19872
2019-04-11 08:00:59 +00:00
oshogbo
06483b0326 The nvlist_report_missing is also used by the cnvlist.
It can't be a static one.

Reported by:	jenkins
MFC after:	2 weeks
2019-04-11 04:24:41 +00:00
cy
145eb83ae0 Catch up to r343631: Avoid "pfil: duplicate hook" due to
ipf_check_wrapper and ipf_check_wrapper6 being registered
under the same pa_rulename.

MFC after:	3 days
2019-04-11 04:22:06 +00:00
oshogbo
c45b7353f3 libnv: fix compilation warnings
When building libnv without a debug those arguments are no longer used
because assertions will be changed to NOP.

Submitted by: Mindaugas Rasiukevicius <rmind@netbsd.org>
MFC after:    2 weeks
2019-04-11 04:21:58 +00:00
oshogbo
13c654428c libnv: fix compilation warnings
When building libnv without a debug those arguments are no longer used
because assertions will be changed to NOP.

Submitted by:	Mindaugas Rasiukevicius <rmind@netbsd.org>
MFC after:	2 weeks
2019-04-11 03:47:53 +00:00
kibab
af9a3c8f19 Add some CMD53-related definitions
In preparation to adding block mode functions, add necessary definitions.

Reviewed by:	bz
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D19832
2019-04-10 20:44:54 +00:00
manu
818002a905 arm: dtb: Compile the Linux DTS for pandaboards
Reported by:	ci.freebsd.org
2019-04-10 20:11:28 +00:00
kibab
49c5ad42bb Implement CMD53 block mode support for SDHCI and AllWinner-based boards
If a custom block size requested, use it, otherwise revert to the previous logic
of using just a data size if it's less than MMC_BLOCK_SIZE, and MMC_BLOCK_SIZE otherwise.

Reviewed by:	bz
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D19783
2019-04-10 19:53:36 +00:00
kibab
674b0be51d Add new fields to mmc_data in preparation to SDIO CMD53 block mode support
SDIO command CMD53 (IO_RW_EXTENDED) allows data transfers using blocks of 1-2048 bytes,
with a maximum of 511 blocks per request.
Extend mmc_data structure to properly describe such requests,
and initialize the new fields in kernel and userland consumers.

No actual driver changes happen yet, these will follow in the separate changes.

Reviewed by:	bz
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D19779
2019-04-10 19:49:35 +00:00
manu
ac7512a382 arm: kernel: Remove old kernel configs
Follow up to r346095
All those kernels are either not working or the release have switched
to GENERIC
2019-04-10 19:27:14 +00:00
manu
ea57e476f1 arm: dts: Remove some old DTS
RPI is using the firmware provided DTS since 12.0
Pandaboard works with the Linux DTS
RK* Exynos* and Meson*/Odroid* don't even work with current
source code, if someone wants to make them work again they
better use the Linux DTS.
2019-04-10 19:18:05 +00:00
rrs
5883516e75 Fix a small bug in the tcp_log_id where the bucket
was unlocked and yet the bucket-unlock flag was not
changed to false. This can cause a panic if INVARIANTS
is on and we go through the right path (though rare).
This fixes the correct bug :)

Reported by:	syzbot+179a1ad49f3c4c215fa2@syzkaller.appspotmail.com
Reviewed by:	tuexen@
2019-04-10 18:58:11 +00:00
manu
332b830883 Import DTS files from Linux 5.0
MFC after:	2 months
2019-04-10 18:15:36 +00:00
lwhsu
2bc49e10c6 Fix build in sys/modules/nfscommon
Sponsored by:	The FreeBSD Foundation
2019-04-10 16:48:45 +00:00
asomers
d0c51c9a66 fix cache_lookup's documentation
cache_lookup's documentation got dislocated by r324378. Relocate and expand
it.

Reviewed by:	jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-04-10 13:02:33 +00:00
trasz
8fedfd26a7 Improve vnode lock assertions.
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-04-10 10:21:14 +00:00
avos
80646d8a6d urtw(4), otus(4), iwi(4): allow to set non-default MAC address via ifconfig(8)
Tested with Netgear WG111 v3 (RTL8187B, urtw(4)), STA mode.

MFC after:	1 week
2019-04-10 08:17:56 +00:00
glebius
3078edc62c Obvious comment correction. 2019-04-09 22:15:39 +00:00
jhb
2b1f6ab13c Refine r330113 to honor the ProducerConsumer flag most of the time.
While it is true that the ACPI spec says that the flag is only valid
on Extended Address Space Descriptors, examples of other descriptors
in the spec use the ProducerConsumer flag explicitly, and real
hardware uses it as well.  In fact, even in the ASL of the Thunder X2
for which r330113 was a workaround, some devices use this flag on
non-Extended Address Space Descriptors correctly.  Instead, only
ignore the flag for resources associated with the UART devices on the
Thunder X2 using the "ARMH0011" HID to identify these devices.

This should fix regressions from ignoring this flag in other contexts
such as Hyper-V.

PR:		235876
Reported by:	Wei Hu <weh@microsoft.com>
Tested by:	emaste (Thunder X2)
MFC after:	2 weeks
2019-04-09 21:18:02 +00:00
kib
5c087ad1bb Add vn_fsync_buf().
Provide a convenience function to avoid the hack with filling fake
struct vop_fsync_args and then calling vop_stdfsync().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-09 20:20:04 +00:00
kib
dea9380e34 Fix dirty buf exhaustion easily triggered with msdosfs.
If truncate(2) is performed on msdosfs file, which extends the file by
system-depended large amount, fs creates corresponding amount of dirty
delayed-write buffers, which can consume all buffers.  Such buffers
cannot be flushed by the bufdaemon because the ftruncate() thread owns
the vnode lock.  So the system runs out of free buffers, and even
truncate() thread starves, which means deadlock because it owns the
vnode lock.

Fix this by doing vnode fsync in extendfile() when low memory or low
buffers condition detected, which flushes all dirty buffers belonging
to the file being extended.

Note that the more usual fallback to bawrite() does not work
acceptable in this situation, because it would only allow one buffer
to be recycled.  Other filesystems, most important UFS, do not allow
userspace to create arbitrary amount of dirty delayed-write buffers
without feedback, so bawrite() is good enough for them.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-09 19:55:02 +00:00
jhb
8350e99074 Don't pre-reserve resources for CPU devices when they are set.
CPUs can use shared (RF_SHAREABLE) resources for the I/O port used for
entering and exiting C states.  If this I/O port is included in an ACPI
system resource device, then this happens to still work, but if the port
wasn't part of a system resource device, only the first CPU could allocate
the I/O port and use C states since resource_list_reserve() was always
allocating the resource from nexus0 without RF_SHAREABLE.  By avoiding
the reservation, the flags from the bus_alloc_resource() in the CPU driver
(which include RF_SHAREABLE) are honored.

PR:		236513
Reported by:	stockhausen@collogia.de
Sleuthing by:	avg
Reviewed by:	avg
MFC after:	2 weeks
2019-04-09 19:22:08 +00:00
kib
fcf1189407 pci_cfgreg.c: Use io port config access for early boot time.
Some early PCIe chipsets are explicitly listed in the white-list to
enable use of the MMIO config space accesses, perhaps because ACPI
tables were not reliable source of the base MCFG address at that time.
For that chipsets, MCFG base was read from the known chipset MCFGbase
config register.

During very early stage of boot, when access to the PCI config space
is performed (see e.g. pci_early_quirks.c), we cannot map 255MB of
registers because the method used with pre-boot pmap overflows initial
kernel page tables.

Move fallback to read MCFGbase to the attachment method of the
x86/legacy device, which removes code duplication, and results in the
use of io accesses until MCFG is parsed or legacy attach called.

For amd64, pre-initialize cfgmech with CFGMECH_1, right now we
dynamically assign CFGMECH_1 to it anyway, and remove checks for
CFGMECH_NONE.

There is a mention in the Intel documentation for corresponding
chipsets that OS must use either io port or MMIO access method, but we
already break this rule by reading MCFGbase register, so one more
access seems to be innocent.

Reported by:	longwitz@incore.de
PR:	236838
Reviewed by:	avg (other version), jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19833
2019-04-09 18:07:17 +00:00
trasz
ca6bb3d6ec Factor out section loading into a separate function.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19846
2019-04-09 15:24:38 +00:00
ganbold
8aba5de150 In some cases like NanoPI R1, its second USB ethernet
RTL8152 (chip version URE_CHIP_VER_4C10) doesn't
have hardwired MAC address, in other words, it is all zeros.
This commit fixes it by setting random MAC address
when MAC address is all zeros.

Reviewed by:	kevlo
Differential Revision:	https://reviews.freebsd.org/D19856
2019-04-09 13:54:08 +00:00
imp
5d511432d6 Style only change: Prefer $() to ``
$() is more modern and also nests. Convert the mix of styles to using
only the former (although the latter was more common). It's the more
dominant style in other shell scripts these days as well.

Differential Revision:  https://reviews.freebsd.org/D19840
2019-04-08 18:25:14 +00:00
kib
aeecbe07e1 Handle races when remounting UFS volume from ro to rw.
In particular, ensure that writers are not unleashed before SU
structures are initialized.  Also, correctly handle MNT_ASYNC before
this.

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-08 15:20:05 +00:00
trasz
de8cb8b511 Refactor ELF interpreter loading into a separate function.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19741
2019-04-08 14:31:07 +00:00
oshogbo
e5fad78020 In the unlinkat syscall, the operation is performed on the directory
descriptor, not the file descriptor. The file descriptor is used only for
verification so do not expect any additional capabilities on it.

Reported by:	antoine
Tested by:	antoine
Discussed with:	kib, emaste, bapt
Sponsored by:	Fudo Security
2019-04-08 14:23:52 +00:00
ganbold
7285d802f5 Fix URE_WDT6_SET_MODE value in the register definition.
Both linux and u-boot sources for RTL8152 driver has this value.
RTL8152 USB ethernet is used in NanoPI R1 board as second ethernet.
This fixes for me RTL8152 USB ethernet not detected problem after
reboot on NanoPI R1 board.

Both NetBSD and OpenBSD have a wrong value so far.
2019-04-08 13:40:46 +00:00
imp
f7689276a8 Make RELDATE be on a single line.
All variable assignments that start in column 1 have to be on a single
line for amd to build due to as weird dependency there (most likely it
can be fixed to use the new VARS_ONLY feature, but it isn't
today). usr.sbin/amd/include/Makefile calls
usr.sbin/amd/include/newvers.sh which does:
	eval `LC_ALL=C egrep '^[A-Z]+=' $1 | grep -v COPYRIGHT`
which is where that requirement comes from. It handles COPYRIGHT since
that's an exception. Rather than add additional exceptions, cope with
the long line in newvers.sh instead. Note: it no longer needs to
filter COPYRIGHT because the assignment doesn't start in column 1
anymore.

I had done a universe when I had an earlier version of r346018 that
had it as one line. When I changed it to multi-line as suggested in
the review, I only built kernels on a couple of architectures to make
sure it didn't break anything.

Add comment to newvers.sh noting this.

Obviously, this unbreaks the amd build.
2019-04-07 21:01:02 +00:00
mhorne
6de23af4cf RISC-V: initialize pcpu slightly earlier
In certain scenarios, it is possible for PCPU data to be
accessed before it has been initialized (e.g. during printf
if the kernel was built with the TSLOG option).

Initialize the PCPU pointer for hart 0 at the beginning of
initriscv() rather than near the end.

Reviewed by:		markj
Approved by:		markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D19726
2019-04-07 20:12:24 +00:00
imp
7d87d74539 Use default shell assignment rather more complicated if then
construct.

Discussed with: emaste@, allanjude@ (changes (or not) based on their feedback)
Differential Revision: https://reviews.freebsd.org/D19797
2019-04-07 18:39:55 +00:00
ian
fa027ee465 Add g_label_flashmap.c to the module, should have been part of r345480.
Reported by:	Jia-Shiun Li <jiashiun@gmail.com>
2019-04-07 16:33:22 +00:00
oshogbo
99638c7483 Bump FreeBSD version after r345982.
Reported by:	Shawn Webb <shawn.webb@hardenedbsd.org>
Discussed with: imp, cy, rgrimes
2019-04-07 16:07:41 +00:00
markj
6e75ac3373 Set the p_oppid field of orphans when exiting.
Such processes will be reparented to the reaper when the current
parent is done with them (i.e., ptrace detached), so p_oppid must be
updated accordingly.

Add a regression test to exercise this code path.  Previously it
would not be possible to reap an orphan with a stale oppid.

Reviewed by:	kib, mjg
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19825
2019-04-07 14:26:14 +00:00
kib
587458f1cd Give new home to the comment from ppt_pci_reset(), explaining a nuance
of power reset.

Noted by:	soralx@cydem.org
Sponsored by:	Mellanox Technologies
MFC after:	12 days
2019-04-07 08:58:09 +00:00
cem
16397b2576 kern/subr_pctrie: Fix mismatched signedness in assertion comparison
'tos' is an index into an array and never holds a negative value.  Correct
its signedness to match PCTRIE_LIMIT, which it is compared to in assertions.

No functional change (kills a warning).
2019-04-06 21:56:24 +00:00
rmacklem
4bb25ea3ad Add INET6 support for the upcalls to the nfsuserd daemon.
The kernel code uses UDP to do upcalls to the nfsuserd(8) daemon to get
updates to the username<->uid and groupname<->gid mappings.
A change to AF_LOCAL last year had to be reverted, since it could result
in vnode locking issues on the AF_LOCAL socket.
This patch adds INET6 support and the required #ifdef INET and INET6
to the code.

Requested by:	bz
PR:		205193
Reviewed by:	bz, rgrimes
MFC after:	2 weeks
Differential Revision:	http://reviews.freebsd.org/D19218
2019-04-06 21:53:46 +00:00
cem
ecbc847507 kern/subr_pctrie: Convert old-style boolean_t to plain "bool"
No functional change.
2019-04-06 20:38:44 +00:00
asomers
5a6d620e2b fusefs: fix a panic on mount
Don't page fault if the file descriptor provided with "-o fd" is invalid.
This is a merge of r345419 from the projects/fuse2 branch.

Reviewed by:	ngie
Tested by:	Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
MFC after:	2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19836
2019-04-06 18:04:04 +00:00
oshogbo
def45a363e Regen after r345982. 2019-04-06 09:37:10 +00:00
oshogbo
20d273b44b Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567
2019-04-06 09:34:26 +00:00
jkim
f5b6bd46f2 MFV: r345969
Import ACPICA 20190405.
2019-04-06 06:02:42 +00:00
jhibbits
90186d78c6 powerpc/powernv: Fix major bugs in opal_flash
* The BIO bio_data may not be page aligned.  Only the base address of each
  page worth of data is extracted to pass to OPAL.  Without page alignment
  it can scribble over random memory when finishing the page read.  Fix this
  by short-reading the first page to properly align for full page reads.
* Fix the definition of OPAL_FLASH_ERASE.
* Properly handle the async message result, as now returned from r345974.
2019-04-06 02:39:56 +00:00
jhibbits
3ef32d3557 powerpc/powernv: Fix issues in opal_async
* Properly return the full opal_msg from an async completion.
* Don't keep bugging OPAL, wait 100us or so.  With some minor changes to
  DELAY() to drop to very low priority, the thread won't hog the CPU while
  polling for the async completion.
2019-04-06 02:31:01 +00:00
kib
810f55efd4 Add DEV_RESET /dev/devctl2 ioctl.
It performs BUS_RESET_CHILD() on the parental bus and the specified
device.

Reviewed by:	imp (previous version), jhb (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19646
2019-04-05 19:31:26 +00:00
kib
6eb7556345 Remove single-use DEV_RESET() macro.
It conflicts with the sys/bus.h DEV_XXX namespace.

Reviewed by:	imp (previous version), jhb (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19646
2019-04-05 19:27:51 +00:00
kib
1f83f13cf3 Implement resets for PCI buses and PCIe bridges.
For PCI device (i.e. child of a PCI bus), reset tries FLR if
implemented and worked, and falls to power reset otherwise.

For PCIe bus (child of a PCIe bridge or root port), reset
disables PCIe link and then re-trains it, performing what is known as
link-level reset.

Reviewed by:	imp (previous version), jhb (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19646
2019-04-05 19:25:26 +00:00
kib
3298fbd5b2 Provide newbus infrastructure for initiating device reset.
The methods BUS_RESET_PREPARE(), BUS_RESET(), and BUS_RESET_POST()
should be implemented by bus which can provide reset to a device.  The
methods are described in inline doxygen comments.

Code only provides BUS_RESET_PREPARE() and BUS_RESET_POST() helpers
instead of default implementation, because actual bus needs to handle
device state around reset, while helpers provide the other half of
typical prepare/post code.

Reviewed by:	imp (previous version), jhb (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19646
2019-04-05 18:09:22 +00:00
kib
2a6eaf86a2 vn_vmap_seekhole(): align running offset to the block boundary.
Otherwise we might miss the last iteration where EOF appears below
unaligned noff.

Reported and reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19811
2019-04-05 16:14:16 +00:00
kib
3a48424552 Fix mis-merge.
Amusingly, it is nop.

Noted by:	trasz
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC-rev:	r345702
2019-04-05 16:12:35 +00:00
manu
99cbdc45aa twsi: Use config_intrhook_oneshot instead of config_intrhook_establish
Suggested by:	ian
MFC after:	1 month
X-MFC-With:	345948
2019-04-05 15:53:27 +00:00
manu
3f5e78d74d twsi: Add interrupt mode
Add the ability to use interrupts for i2c message.
We still use polling for early boot i2c transfer (for PMIC
for example) but as soon as interrupts are available use them.
On Allwinner SoC >A20 is seems that polling mode is broken for some
reason, this is now fixed by using interrupt mode.
For Allwinner also fix the frequency calculation, the one in the code
was for when the APB frequency is at 48Mhz while it is at 24Mhz on most
(all?) Allwinner SoCs. We now support both cases.

While here add more debug info when it's compiled in.

Tested On: A20, H3, A64
MFC after:	1 month
2019-04-05 14:44:23 +00:00
imp
b8f858c079 Remove another instance of All Rights Reserved.
Remove the phrase from boilerplate copyright we stick on vers.c when
we can't find the template file. In practice, this won't change a
thing, except for the case of compiling the kernel standalone w/o the
rest of a tree on a system that doesn't have
/usr/share/examples/etc/bsd-copyright installed.
2019-04-05 14:27:48 +00:00
imp
e37799a8d3 Add mpr, mps, mpt to NOTES file
Add these to all the architectures that these are in the GENERIC
kernel.
2019-04-05 02:54:02 +00:00
rmacklem
185da79fc0 Revert r320698, since the related userland changes were reverted by r338192.
r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL
socket, since it resulted in a vnode locking panic().
Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and
do not use these kernel changes.
I left them in for a while, so that nfsuserd daemons built from head sources
between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them
by default.
This only affects head, since the changes were never MFC'd.
I will add an UPDATING entry, since an nfsuserd daemon built from head
sources between r320757 and r338192 will not run unless the "-use-udpsock"
option is specified. (This command line option is only in the affected
revisions of the nfsuserd daemon.)

I suspect few will be affected by this, since most who run systems built
from head sources (not stable or releases) will have rebuilt their nfsuserd
daemon from sources post r338192 (Aug. 22, 2018)

This is being reverted in preparation for an update to include AF_INET6
support to the code.
2019-04-04 23:30:27 +00:00
emaste
8087e3c8fb if_muge: use NULL not 0 for DRIVER_MODULE pointer args
Sponsored by:	The FreeBSD Foundation
2019-04-04 19:59:31 +00:00
rgrimes
cda8035706 Use IN_foo() macros from sys/netinet/in.h inplace of handcrafted code
There are a few places that use hand crafted versions of the macros
from sys/netinet/in.h making it difficult to actually alter the
values in use by these macros.  Correct that by replacing handcrafted
code with proper macro usage.

Reviewed by:		karels, kristof
Approved by:		bde (mentor)
MFC after:		3 weeks
Sponsored by:		John Gilmore
Differential Revision:	https://reviews.freebsd.org/D19317
2019-04-04 19:01:13 +00:00
rmacklem
e349a3602c Fix malloc stats for the RPCSEC_GSS server code when DEBUG is enabled.
The code enabled when "DEBUG" is defined uses mem_alloc(), which is a
malloc(.., M_RPC, M_WAITOK | M_ZERO), but then calls gss_release_buffer()
which does a free(.., M_GSSAPI) to free the memory.
This patch fixes the problem by replacing mem_alloc() with a
malloc(.., M_GSSAPI, M_WAITOK | M_ZERO).
This bug affects almost no one, since the sources are not normally built
with "DEBUG" defined.

Submitted by:	peter@ifm.liu.se
MFC after:	2 weeks
2019-04-04 01:23:06 +00:00
cem
33133b3b41 Replace read_random(9) with more appropriate arc4rand(9) KPIs
Reviewed by:	ae, delphij
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19760
2019-04-04 01:02:50 +00:00
pjd
c118ff293a Implement automatic online expansion of GELI providers - if the underlying
provider grows, GELI will expand automatically and will move the metadata
to the new location of the last sector.

This functionality is turned on by default. It can be turned off with the
-R flag, but it is not recommended - if the underlying provider grows and
automatic expansion is turned off, it won't be possible to attach this
provider again, as the metadata is no longer located in the last sector.

If the automatic expansion is turned off and the underlying provider grows,
GELI will only log a message with the previous size of the provider, so
recovery can be easier.

Obtained from:	Fudo Security
2019-04-03 23:57:37 +00:00
emaste
2ef79738b1 cpsw: use phy-handle in FDT to find PHY address
In r337703 DTS files were updated to Linux 4.18, including Linux commit
4d8b032d3c03f4e9788a18bbb51b10e6c9e8a56b which removed the `phy_id`
property from am335x-bone-common (as the property was deprecated).

Use `phy-handle` via fdt_get_phyaddr, keeping the existing code as a
fallback for old DTBs.

PR:		236624
Submitted by:	manu, Gerald Aryeetey <aryeeteygerald_rogers.com>
Reported by:	Gerald Aryeetey
Reviewed by:	manu
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19814
2019-04-03 21:01:53 +00:00
rrs
50c7932ba8 Undo my previous erroneous commit changing the tcp_output kassert.
Hmm now the question is where did the tcp_log_id change go :o
2019-04-03 19:35:07 +00:00
mav
6c0c4d730c Fix typos in r345849.
MFC after:	1 week
2019-04-03 18:35:13 +00:00
mav
295e1572d3 List few more ATA commands.
MFC after:	1 week
2019-04-03 18:27:54 +00:00
kib
36b7295410 msdosfs: zero tail of the last block on truncation for VREG vnodes as well.
Despite the call to vtruncbuf() from detrunc(), which results in
zeroing part of the partial page after EOF, there still is a
possibility to retain the stale data which is revived on file
enlargement.  If the filesystem block size is greater than the page
size, partial block might keep other after-EOF pages wired and they
get reused then.  Fix it by zeroing whole part of the partial buffer
after EOF, not relying on vnode_pager_setsize().

PR:	236977
Reported by:	asomers
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-03 17:02:18 +00:00
mw
77b8255fbc Add a cv_wait to the TPM2.0 harvesting function
Harvesting has to compete for the TPM chip with userspace.
Before this change the callout could hijack an unread buffer
causing a userspace call to the TPM to fail.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: delphij
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19712
2019-04-03 08:22:58 +00:00
jhibbits
ca52774d85 powerpc: Allow emulating optional FPU instructions on CPUs with an FPU
The e5500 has an FPU, but lacks the optional fsqrt instruction.  This
instruction gets emulated in the kernel, but the emulation uses stale data,
from the last switch out, and does not return the result of the operation
immediately.  Fix both of these conditions by saving and restoring the FPRs
around the emulation point.

MFC after:	1 week
MFC with:	r345829
2019-04-03 04:01:08 +00:00
mw
7c5d4b81ab Create kernel module to parse Veriexec manifest based on envs
The current approach of injecting manifest into mac_veriexec is to
verify the integrity of it in userspace (veriexec (8)) and pass its
entries into kernel using a char device (/dev/veriexec).
This requires verifying root partition integrity in loader,
for example by using memory disk and checking its hash.
Otherwise if rootfs is compromised an attacker could inject their own data.

This patch introduces an option to parse manifest in kernel based on envs.
The loader sets manifest path and digest.
EVENTHANDLER is used to launch the module right after the rootfs is mounted.
It has to be done this way, since one might want to verify integrity of the init file.
This means that manifest is required to be present on the root partition.
Note that the envs have to be set right before boot to make sure that no one can spoof them.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: sjg
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19281
2019-04-03 03:57:37 +00:00
jhibbits
acf1cd99b3 powerpc: Apply r178139 from sparc64 to powerpc's fpu_sqrt
This fix was committed less than 2 months after the code was forked into the
powerpc kernel.  Though powerpc doesn't use quad-precision floating point,
or need it for emulation, the changes do look like correctness fixes
overall.

This was found while trying to get fsqrt emulation working on e5500, which
does have a real FPU, but lacks the fsqrt instruction.  This is not the
complete fix, the rest is to be committed separately.

MFC after:	1 week
2019-04-03 03:54:30 +00:00
rmacklem
c5f8d6c34f Add a comment to the r345818 patch to explain why cl_refs is initialized to 2.
PR:		235582
MFC after:	2 weeks
2019-04-03 03:50:16 +00:00
rmacklem
bdb31b3b51 Fix a race in the RPCSEC_GSS server code that caused crashes.
When a new client structure was allocated, it was added to the list
so that it was visible to other threads before the expiry time was
initialized, with only a single reference count.
The caller would increment the reference count, but it was possible
for another thread to decrement the reference count to zero and free
the structure before the caller incremented the reference count.
This could occur because the expiry time was still set to zero when
the new client structure was inserted in the list and the list was
unlocked.

This patch fixes the race by initializing the reference count to two
and initializing all fields, including the expiry time, before inserting
it in the list.

Tested by:	peter@ifm.liu.se
PR:		235582
MFC after:	2 weeks
2019-04-02 23:51:08 +00:00
mav
dc10c69c7f Build NVMe CAM transport unrelated to NVMe SIM.
Before this I suppose it was impossible load CAM-based NVMe as module.
Plus this appeared to be needed to build r345815 without NVMe driver.

MFC after:	2 weeks
2019-04-02 20:27:56 +00:00
mav
66e1cda3f6 Make cam_error_print() decode NVMe commands.
MFC after:	2 weeks
2019-04-02 19:37:52 +00:00
tychon
6fcd6bd4ce ioat(4) should use bus_dma(9) for the operation source and destination
addresses

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19725
2019-04-02 19:08:06 +00:00
tychon
87228e148f ioatcontrol(8) could exercise 8k-aligned copy with page-break, crc and
crc-copy modes.

Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19780
2019-04-02 19:06:25 +00:00
tychon
8b2669726f DMAR driver assumes all physical addresses are backed by a fully
initialized struct vm_page.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19753
2019-04-02 18:50:49 +00:00
np
b0756a4416 cxgbe(4): Add a flag to indicate that bits in interrupt cause but not in
interrupt enable are not fatal.

The firmware sets up all the interrupt enables based on run time
configuration, which means the information in the enables is more
accurate than what's compiled into the driver.  This change also allows
the fatal bits to be updated without any changes in the driver in some
cases.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-04-02 18:50:33 +00:00
mav
8d4daa603c Unify SCSI_STATUS_BUSY retry handling with other cases.
- Do not retry if periph was invalidated.
 - Do not decrement retry_count if already zero.
 - Report action_string when applicable.

MFC after:	2 weeks
2019-04-02 14:46:10 +00:00
kib
4853d54461 tmpfs: plug holes on rw->ro mount update.
In particular:
- suspend the mount around vflush() to avoid new writes come after the
  vnode is processed;
- flush pending metadata updates (mostly node times);
- remap all rw mappings of files from the mount into ro.

It is not clear to me how to handle writeable mappings on rw->ro for
tmpfs best.  Other filesystems, which use vnode vm object, call
vgone() on vnodes with writers, which sets the vm object type to
OBJT_DEAD, and keep the resident pages and installed ptes as is.  In
particular, the existing mappings continue to work as far as
application only accesses resident pages, but changes are not flushed
to file.

For tmpfs the vm object of VREG vnodes also serves as the data pages
container, giving single copy of the mapped pages, so it cannot be set
to OBJT_DEAD.  Alternatives for making rw mappings ro could be either
invalidating them at all, or marking as CoW.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:59:04 +00:00
kib
01adc0720f tmpfs: ignore tmpfs_set_status() if mount point is read-only.
In particular, this fixes atimes still changing for ro tmpfs.
tmpfs_set_status() gains tmpfs_mount * argument.

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:49:32 +00:00
kib
3b825b8983 Block creation of the new nodes for read-only tmpfs mounts.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D19737
2019-04-02 13:41:26 +00:00
br
9e0faec84c o Grab the number of devices supported by PLIC from FDT.
o Fix bug in PLIC_ENABLE macro when irq >= 32.

Tested on the real hardware, which is HiFive Unleashed board.

Thanks to SiFive, Inc. for the board provided.

Reviewed by:	markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19775
2019-04-02 12:02:35 +00:00
jhibbits
3fd37b6c9a ipmi: Fixes for ipmi_opal(powernv)
* Crank the OPAL state machine during the receive loop, to make sure the
  pollers are executed
* Add a proper detach function, so the module can be unloaded and reloaded
  at runtime.

It still doesn't reliably work 100% of the time on POWER9, and it appears
timing and/or cache related.  It may work on POWER8 now.

MFC after:	2 weeks
2019-04-02 04:12:06 +00:00
jhibbits
4fcf68bc12 powernv: Port OPAL asynchronous framework to use the new message framework
Since OPAL_GET_MSG does not discriminate between message types, asynchronous
completion events may be received in the OPAL_GET_MSG call, which dequeues
them from the list, thus preventing OPAL_CHECK_ASYNC_COMPLETION from
succeeding.  Handle this case by integrating with the messaging framework.
2019-04-02 04:02:57 +00:00
jhibbits
edd7da6900 powerpc/powernv: Add OPAL heartbeat thread
Summary:
OPAL needs to be kicked periodically in order for the firmware to make
progress on its tasks.  To do so, create a heartbeat thread to perform this task
every N milliseconds, defined by the device tree.  This task is also a central
location to handle all messages received from OPAL.

Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D19743
2019-04-02 04:00:01 +00:00
tychon
8f32b48958 Devices behind downstream bridges should still get DMAR protection.
Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19717
2019-04-01 19:08:05 +00:00
kibab
a38810c906 Refactor error handling
There is some code duplication in error handling paths in a few functions.
Create a function for printing such errors in human-readable way and get rid
of duplicates.

Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15912
2019-04-01 18:54:15 +00:00
kibab
10ac8498c5 Use information about max data size that the controller is able to operate
Using DFLTPHYS/MAXPHYS is not always OK, instead make it possible for the
controller driver to provide maximum data size to MMCCAM, and use it there.

The old stack already does this.

Reviewed by:	manu
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D15892
2019-04-01 18:49:39 +00:00
mckusick
06b0191176 When using the force option to shut down a memory-disk device,
I/O operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the device driver
needs to wait for the draining to happen. The waiting is done by
adding a g_md_providergone() function to wait for the I/O operations
to finish up.

It is likely that every GEOM provider that implements orphaning
attached GEOM consumers needs to use the "providergone" mechanism
for this same reason, but some of them do not do so. Apparently
Kenneth Merry (ken@) added the drain for just such races, but he
missed adding it to some of the device drivers that needed it.

Submitted by: Chuck Silvers
Reviewed by:  imp
Tested by:    Chuck Silvers
MFC after:    1 week
Sponsored by: Netflix
2019-03-31 21:34:58 +00:00
bz
a12312b811 Improve debugging options in bcm2835_sdhci.c
Similar to bcm2835_sdhost.c add a TUNABLE and SYSCTL to selectively
turn on debugging printfs if debugging is turned on at compile time.

MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
Reviewed by:		gonzo, andrew
Differential Revision:	https://reviews.freebsd.org/D19745
2019-03-31 19:27:44 +00:00
avos
acc892a759 run(4): properly set F_DATAPAD radiotap flag if frame has padding between
frame header and data.

This will fix 'Mysterious OLPC stuff' for received frames and wrong
CCMP / TKIP / data decoding for transmitted frames in net/wireshark
dissector.

While here, drop unneeded comment - net80211 handles padding requirements
for Tx & Rx without driver adjustment.

Tested with D-Link DWA-140 rev B3, STA mode.

MFC after:	1 week
2019-03-31 14:18:02 +00:00
avos
740fa0b1c9 run(4): do not clear PROTECTED bit if frame was not decrypted by NIC.
Tested with D-Link DWA-140 rev B3, STA / MONITOR modes.

MFC after:	1 week
2019-03-31 13:41:20 +00:00
avos
e1756702c0 uath(4), urtw(4): restart driver if device does not respond after Tx request
MFC after:	1 week
2019-03-31 09:52:36 +00:00
jah
05de3ee169 freebsd32: fix padding of computed control message length for recvmsg()
Each control message region must be aligned on a 4-byte boundary on 32-bit
architectures. The 32-bit compat shim for recvmsg() gets the actual layout
right, but doesn't pad the payload length when computing msg_controllen for
the output message header. If a control message contains an unaligned
payload, such as the 1-byte TTL field in the example attached to PR 236737,
this can produce control message payload boundaries that extend beyond
the boundary reported by msg_controllen.

PR:	236737
Reported by:	Yuval Pavel Zholkover <paulzhol@gmail.com>
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19768
2019-03-30 23:43:58 +00:00
markj
f0215c1068 Do not perform DAD on stf(4) interfaces.
stf(4) interfaces are not multicast-capable so they can't perform DAD.
They also did not set IFF_DRV_RUNNING when an address was assigned, so
the logic in nd6_timer() would periodically flag such an address as
tentative, resulting in interface flapping.

Fix the problem by setting IFF_DRV_RUNNING when an address is assigned,
and do some related cleanup:
- In in6if_do_dad(), remove a redundant check for !UP || !RUNNING.
  There is only one caller in the tree, and it only looks at whether
  the return value is non-zero.
- Have in6if_do_dad() return false if the interface is not
  multicast-capable.
- Set ND6_IFF_NO_DAD when an address is assigned to an stf(4) interface
  and the interface goes UP as a result. Note that this is not
  sufficient to fix the problem because the new address is marked as
  tentative and DAD is started before in6_ifattach() is called.
  However, setting no_dad is formally correct.
- Change nd6_timer() to not flag addresses as tentative if no_dad is
  set.

This is based on a patch from Viktor Dukhovni.

Reported by:	Viktor Dukhovni <ietf-dane@dukhovni.org>
Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19751
2019-03-30 18:00:44 +00:00
kib
e644a7809f Fix branding after r345661.
In particular, elf32 FreeBSD binaries were not executed on LP64 hosts.
The interp_name_len value should account for the nul terminator.  This
is needed for strncmp()s in brand checking code to work.

Reported by:	andreast
Sponsored by:	The FreeBSD Foundation
MFC after:	12 days (together with r345661)
2019-03-30 16:58:51 +00:00
avos
47521614bb urtw(4): export TSF timestamp for received frames via radiotap
Tested with Netgear WG111 v3 (RTL8187B), STA mode.

MFC after:	1 week
2019-03-30 09:24:06 +00:00
pjd
98d73aafdb If the autoexpand pool property is turned on and vdev is healthy try to
expand the pool automatically when we detect underlying GEOM provider
size change.

Obtained from:	Fudo Security
Tested in:	AWS
2019-03-30 07:29:20 +00:00
pjd
a4706c168b Introduce new event SIZECHANGE within GEOM system to inform about GEOM
providers mediasize changes.

While here, use GEOM nomenclature to describe providers instead of calling
them device nodes.

Obtained from:	Fudo Security
Tested in:	AWS
2019-03-30 07:24:34 +00:00
pjd
cbdc1ae9e2 Implement support for online disk capacity changes.
Obtained from:	Fudo Security
Tested in:	AWS
2019-03-30 07:20:28 +00:00
np
3ee5c69403 tcp_autorcvbuf_inc was removed in r344433.
Discussed with:	tuexen@
Sponsored by:	Chelsio Communications
2019-03-29 21:39:47 +00:00
jkim
c30ced85d1 Merge ACPICA 20190329. 2019-03-29 20:21:28 +00:00
jhb
24a54e79c9 Don't check the inp socket pointer in in_pcboutput_eagain.
Reviewed by:	hps (by saying it was ok to be removed)
MFC after:	1 month
Sponsored by:	Netflix
2019-03-29 19:47:42 +00:00
manu
8a98eb767b arm: allwinner: clk: Fix nm_recalc
When comparing best frequencies use the absolute value.
If we do not do that we end up choosing an always lower value than
the best one if the exact freq cannot be met.

MFC after:	2 weeks
2019-03-29 19:40:04 +00:00
kib
9df1f56292 Eliminate adj_free field from vm_map_entry.
Drop the adj_free field from vm_map_entry_t. Refine the max_free field
so that p->max_free is the size of the largest gap with one endpoint
in the subtree rooted at p. Change vm_map_findspace so that, first,
the address-based splay is restricted to tree nodes with large-enough
max_free value, to avoid searching for the right starting point in a
subtree where all the gaps are too small. Second, when the address
search leads to a tree search for the first large-enough gap, that gap
is the subject of a splay-search that brings the gap to the top of the
tree, so that an immediate insertion will take constant time.

Break up the splay code into separate components, one for searching
and breaking up the tree and another for reassembling it. Use these
components, and not splay itself, for linking and unlinking. Drop the
after-where parameter to link, as it is computed as a side-effect of
the splay search.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	markj
Tested by:	pho
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D17794
2019-03-29 16:53:46 +00:00
np
5c7bb5ff38 cxgbe/t4_tom: Catch up with r344433, which removed tcb_autorcvbuf_inc.
The declaration in tcp_var.h is still around so t4_tom continued to
compile but wouldn't load.  A separate commit will fix tcp_var.h

Reported By: Dustin Marquess (dmarquess at gmail)

Sponsored by:	Chelsio Communications
2019-03-29 16:43:24 +00:00
asomers
ad20b6ee10 fix the GENERIC-NODEBUG build after r345675
Submitted by:	cy
Reported by:	cy, Michael Butler <imb@protected-networks.net>
MFC after:	2 weeks
X-MFC-With:	345675
2019-03-29 14:07:30 +00:00
kevans
1af8373c08 NOTES: Use non-default value for BOOT_TAG
Reported by:	jhb
MFC after:	1 week (except non-empty value in stable/11)
2019-03-29 04:00:46 +00:00
jhibbits
fb7a2f4237 powerpc64: Fix kernel ldscript to only emit one PT_LOAD segment
Summary:
kexec-lite cannot currently handle multiple PT_LOAD segments.  In some
cases the compiler generates multiple PT_LOAD segments for an unknown
reason, causing boot to fail from kexec-lite.

Submitted by:	Brandon Bergren (older version)
Differential Revision: https://reviews.freebsd.org/D19574
2019-03-29 03:01:21 +00:00
jhibbits
e205beed0a powerpc64: Use medium code model in asm files for TOC references
Summary:
With a sufficiently large TOC, it's possible to index out of range, as
the immediate load instructions only permit 16-bit indices, allowing up
to 64kB range (signed) from the base pointer.  Allow +/- 2GB range, with
the medium code model TOC accesses in asm.

Patch originally by Brandon Bergren.  The issue appears to impact ELFv2
more than ELFv1.

Reviewed by:	luporl
Differential Revision: https://reviews.freebsd.org/D19708
2019-03-29 02:38:30 +00:00
asomers
618fee7479 fusefs: convert debug printfs into dtrace probes
fuse(4) was heavily instrumented with debug printf statements that could
only be enabled with compile-time flags. They fell into three basic groups:

1. Totally redundant with dtrace FBT probes. These I deleted.
2. Print textual information, usually error messages. These I converted to
   SDT probes of the form fuse:fuse:FILE:trace. They work just like the old
   printf statements except they can be enabled at runtime with dtrace. They
   can be filtered by FILE and/or by priority.
3. More complicated probes that print detailed information. These I
   converted into ad-hoc SDT probes.

Also, de-inline fuse_internal_cache_attrs.  It's big enough to be a regular
function, and this way it gets a dtrace FBT probe.

This commit is a merge of r345304, r344914, r344703, and r344664 from
projects/fuse2.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19667
2019-03-29 02:13:06 +00:00
jhibbits
f602246999 powerpc: Remove now-obsolete P9H MMU name 2019-03-29 02:11:48 +00:00
trasz
de48cc4214 Factor out retrieving the interpreter path from the main ELF
loader routine.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19715
2019-03-28 21:43:01 +00:00
np
19966acdb5 cxgbe(4): Count and clear interrupts generated at the software's request.
An interrupt can be requested by setting the F_SWINT bit in PL_PF_CTL.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-28 21:22:28 +00:00
jhb
521163f5f3 Use a dedicated malloc type for lagg(4)'s structures.
Reviewed by:	gallatin
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19719
2019-03-28 21:00:54 +00:00
erj
f501e29bc9 iflib: return ENETDOWN when the network device is down
From Jake:
iflib_if_transmit returns ENOBUFS when the device is down, or when the
link isn't active.

This was changed in r308792 from return (0), so that the function
correctly reports an error that it was unable to transmit.

However, using ENOBUFS can cause some network applications to produce
the following or similar errors:

"ping: sendto: No buffer space available"

This is a bit confusing as the real cause of the issue is that the
network device is down.

Replace the ENOBUFS return with ENETDOWN to indicate more clearly that
the reason for the failure to send is due to the network device is
offline.

This will cause the error message to be reported as

"ping: sendto: Network is down"

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, sbruno@, bz@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19652
2019-03-28 20:46:45 +00:00
erj
3290e27ba6 iflib: hold the CTX lock in iflib_pseudo_register
From Jake:
The iflib_device_register function takes the CTX lock before calling
IFDI_ATTACH_PRE, and releases it upon finishing the registration.

Mirror this process in iflib_pseudo_register, so that we always hold the
CTX lock during the attach process when registering a pseudo interface
or a regular interface.

This was caught by code inspection while attempting to analyze where the
CTX lock was held.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19604
2019-03-28 20:43:47 +00:00
mav
4ef45ba40d Do not map small IOCTL buffers to KVA, but copy.
CAM IOCTL interfaces traditionally mapped user-space data buffers to KVA.
It was nice originally, but now it takes too much to handle respective
TLB shootdowns, while small kernel memory allocations up to 64KB backed
by UMA and accompanied by copyin()/copyout() can be much cheaper.

For large buffers mapping still may have sense, and unmapped I/O would
be even better, but the last unfortunately is more tricky, since unmapped
I/O API is too specific to struct bio now.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-03-28 20:41:02 +00:00
jhb
3c0065a48f Remove nested epochs from lagg(4).
lagg_bcast_start appeared to have a bug in that was using the last
lagg port structure after exiting the epoch that was keeping that
structure alive.  However, upon further inspection, the epoch was
already entered by the caller (lagg_transmit), so the epoch enter/exit
in lagg_bcast_start was actually unnecessary.

This commit generally removes uses of the net epoch via LAGG_RLOCK to
protect the list of ports when the list of ports was already protected
by an existing LAGG_RLOCK in a caller, or the LAGG_XLOCK.

It also adds a missing epoch enter/exit in lagg_snd_tag_alloc while
accessing the lagg port structures.  An ifp is still accessed via an
unsafe reference after the epoch is exited, but that is true in the
current code and will be fixed in a future change.

Reviewed by:	gallatin
MFC after:	1 month
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19718
2019-03-28 20:25:36 +00:00
emaste
790f15c6a4 Revert change accidentally committed along with r345625
Reported by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>
2019-03-28 10:56:27 +00:00
hselasky
f5682a5ff5 Add new USB PCI ID.
Submitted by:		Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-28 09:00:56 +00:00
lwhsu
5541b064cf Fix make in sys/modules
Sponsored by:	The FreeBSD Foundation
2019-03-28 08:59:11 +00:00
lwhsu
b1e9d4fe2c Add dependent header files
Reported by:	https://ci.freebsd.org/job/FreeBSD-head-mips-build/6702/console
2019-03-28 08:30:45 +00:00
kevans
5db726ceb0 if_bridge(4): ensure all traffic passing over the bridge is accounted for
Consider a bridge0 with em0 and em1 members. Traffic rx'd by em0 and
transmitted by bridge0 through em1 gets accounted for in IPACKETS/IBYTES
and bridge0 bpf -- assuming it's not unicast traffic destined for em1.
Unicast traffic destined for em1 traffic is not accounted for by any
mechanism, and isn't pushed through bridge0's bpf machinery as any other
packets that pass over the bridge do.

Fix this and simplify GRAB_OUR_PACKETS by bailing out early if it was rx'd
by the interface that it was addressed for. Everything else there is
relevant for any traffic that came in from one member that's being directed
at another member of the bridge.

Reviewed by:	kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19614
2019-03-28 03:31:51 +00:00
emaste
2648698edf revert r341429 "disable BIND_NOW in libc, libthr, and rtld"
r345620 by kib@ fixed the rtld issue that caused a crash at startup
during resolution of libc's ifuncs with BIND_NOW.

PR:		233333
Sponsored by:	The FreeBSD Foundation
2019-03-28 02:12:32 +00:00
rpokala
c63d1e2792 Teach jedec_dimm(4) to be more forgiving of non-fatal errors.
It looks like some DIMMs claim to have a TSOD, but actually don't. Some
claim they weren't able to change the SPD page, but they did. Neither of
those should be fatal errors.

PR:		235944
Submitted by:	Greg V <greg@unrelenting.technology>
Reported by:	Greg V <greg@unrelenting.technology>
Reviewed by:	cem
MFC after:	1 weeks
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D19681
2019-03-27 21:50:01 +00:00
tychon
b637576aff Use the BUS_DMA_NOWRITE flag to expose and create the read-only VT-d
IOMMU mappings.

Reviewed by:	kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19729
2019-03-27 20:15:51 +00:00
markj
d6bbcaaa29 Stop using -fdebug-prefix-map to map the object directory.
We were doing so as a workaround for the problem addressed by r345593, so
it's no longer necessary.

Reviewed by:	jhb
Discussed with:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19705
2019-03-27 19:34:19 +00:00
br
15f0d92c9c Grab timer frequency from FDT.
RISC-V timer has no dedicated DTS node and we have to get timer
frequency from cpus node.

Tested on Government Furnished Equipment (GFE) cores synthesized
on Xilinx VCU118.

Reviewed by:	markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19727
2019-03-27 16:26:03 +00:00
scottl
5379a255fe Add missing break statements. Coverity CID 1400446.
Reported by:	mav
2019-03-27 12:25:46 +00:00
cem
ba6127cb43 x86: Use XSAVEOPT for fpusave(), when available
Remove redundant npxsave_core definition while here.

Suggested by:	Anton Rang
Reviewed by:	kib, Anton Rang <rang AT acm.org>
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19665
2019-03-26 22:45:41 +00:00
markj
6cc01e9149 Add CTLFLAG_VNET to the net.inet.icmp.tstamprepl definition.
Reported by:	Hans Fiedler <hans@hfconsulting.com>
MFC after:	3 days
2019-03-26 22:14:50 +00:00
emaste
ceab2111d4 pf: use UID_ROOT and GID_WHEEL named constants in make_dev
No functional change but improves consistency and greppability of
make_dev calls.

Discussed with: kp
2019-03-26 21:20:42 +00:00
trasz
75a305f873 Improve error reporting when the swap pager runs out of memory.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D19699
2019-03-26 19:11:15 +00:00
gonzo
daeb6dacb7 Change default value of kern.bootfile to reflect reality
In most cases kernel.bootfile is populated from the information
provided by loader(8). There are certain scenarios when loader
is not available, for instance when kernel is loaded by u-boot
or some other BootROM directly. In this case the default value
"/kernel" points to invalid location and breaks some functinality,
like using installkernel on self-hosted system or dtrace's CTF
lookup. This can be fixed by setting the value manually but the
default that reflects correct location is better than default that
points to invalid one.

Current default was set around FreeBSD 1, when "/kernel" was the
actual path. Transition to /boot/kernel/kernel happened circa FreeBSD 3.

PR:		221550
Reviewed by:	ian, imp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18902
2019-03-26 18:03:18 +00:00
trasz
e27ffc6060 Make smartpqi(4) behave better when running out of memory, by returning
CAM_RESRC_UNAVAIL instead of CAM_REQUEUE_REQ.  This makes CAM delay a bit
before retrying, so that the retries actually get a chance to succeed.

Reviewed by:	sbruno
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D19696
2019-03-26 15:47:13 +00:00
trasz
390bc6cf0d Factor out resource limit enforcement code in the ELF loader.
It makes the code slightly easier to follow, and might make
it easier to fix the resouce accounting to also account for
the interpreter.

The PROC_UNLOCK() is moved earlier - I don't see anything
it should protect; the lim_max() is a wrapper around lim_rlimit(),
and that, differently from lim_rlimit_proc(), doesn't require
the proc lock to be held.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19689
2019-03-26 15:35:49 +00:00
rrs
b6ca75d739 Fix a small bug in the tcp_log_id where the bucket
was unlocked and yet the bucket-unlock flag was not
changed to false. This can cause a panic if INVARIANTS
is on and we go through the right path (though rare).

Reported by:	syzbot+179a1ad49f3c4c215fa2@syzkaller.appspotmail.com
Reviewed by:	tuexen@
MFC after:	1 week
2019-03-26 10:41:27 +00:00
tuexen
92665ddcf3 Fix a double free of an SCTP association in an error path.
This is joint work with rrs@. The issue was found by running
syzkaller.

MFC after:		1 week
2019-03-26 08:27:00 +00:00
jhibbits
35c0b5f4dd powerpc64: Micro-optimize moea64 native pmap tlbie
* Cache moea64_need_lock in a local variable; gcc generates slightly better
  code this way, it doesn't need to reload the value from memory each read.
* VPN cropping is only needed on PowerPC ISA 2.02 and older cores, a subset
  of those that need serialization, so move this under the need_lock check,
  so those that don't need the lock don't even need to check this.
2019-03-26 02:53:35 +00:00
kevans
d36c815c78 Allow kernel config to specify DTS/DTSO to build, and out-of-tree support
This allows for directives such as

makeoptions DTS+=/out/of/tree/myboard.dts
# in tree! Same rules applied as if this were in a dtb/ module
makeoptions DTS+=otherboard.dts

to be specified in config(5) and have these built/installed alongside th
kernel. The assumption that overlays live in an overlays/ directory is only
made for in-tree DTSO, but we still make the assumption that out-of-tree
arm64 DTS will be in vendored directories (for now).

This lowers the cost to hacking on an overlay or dts by being able to
quickly throw it in a custom config, especially if it doesn't fit one of the
current dtb/modules quite appropriately or it's not intended for commit
there.

The build/install targets were split out of dtb.mk to centralize the build
logic and leave out the all/realinstall/CLEANFILES additions... it was
believed that we didn't want to pollute the kernel build with these.

The build rules were converted to suffix rules at the suggestion of Ian to
clean things up a little bit in a world where we can have mixed
in-tree/out-of-tree DTS/DTSO specified.

Reviewed by:	ian
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19351
2019-03-26 02:45:23 +00:00
sobomax
7be74560f5 Refine r345425: get rid of superfluous helper macro that I have added.
MFC after:	2 weeks
2019-03-26 01:28:10 +00:00
markj
fb4ce630e0 Reject F_SETLK_REMOTE commands when sysid == 0.
A sysid of 0 denotes the local system, and some handlers for remote
locking commands do not attempt to deal with local locks.  Note that
F_SETLK_REMOTE is only available to privileged users as it is intended
to be used as a testing interface.

Reviewed by:	kib
Reported by:	syzbot+9c457a6ae014a3281eb8@syzkaller.appspotmail.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19702
2019-03-25 21:38:58 +00:00
andrew
0c946c6523 Sort printing of the ID registers on arm64 to be identical to the
documentation. This will simplify checking new fields when they are added.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-03-25 18:02:04 +00:00
tuexen
873fcf8446 Initialize scheduler specific data for the FCFS scheduler.
This is joint work with rrs@. The issue was reported by using
syzkaller.

MFC after:		1 week
2019-03-25 16:40:54 +00:00
tuexen
a150bffcbf Improve locking when tearing down an SCTP association.
This is joint work with rrs@ and the issue was found by
syzkaller.

MFC after:		1 week
2019-03-25 15:23:20 +00:00
hselasky
b37bde59c8 Change all kernel C-type macros into static inline functions.
The current kernel C-type macros might obscurely hide the fact that
the input argument might be used multiple times.

This breaks code like:
isalpha(*ptr++)

Use static inline functions instead of macros to fix this.

Reviewed by:		kib @
Differential Revision:	https://reviews.freebsd.org/D19694
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-03-25 13:50:38 +00:00
tuexen
e89e1927c7 Fix the handling of fragmented unordered messages when using DATA chunks
and FORWARD-TSN.

This bug was reported in https://github.com/sctplab/usrsctp/issues/286
for the userland stack.

This is joint work with rrs@.

MFC after:		1 week
2019-03-25 09:47:22 +00:00
avos
69cab2e458 run(4): merge some common TSF-related code into run_disable_tsf()
No functional change intended.

MFC after:	5 days
2019-03-25 09:10:07 +00:00
allanjude
17b9e44c40 The Atheros AR7241 has 20 GPIO pins
AR724X_GPIO_PINS used for this family is defined as 18
The datasheet for the AR7241 describes 20 pins, allow all to be used.

Submitted by:	Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by:	mizhka
Differential Revision:	https://reviews.freebsd.org/D17580
2019-03-25 07:48:52 +00:00
allanjude
771a7591dc Make TMPFS_PAGES_MINRESERVED a kernel option
TMPFS_PAGES_MINRESERVED controls how much memory is reserved for the system
and not used by tmpfs.

On very small memory systems, the default value may be too high and this
prevents these small memory systems from using reroot, which is required
for them to install firmware updates.

Submitted by:	Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by:	mizhka
Differential Revision:	https://reviews.freebsd.org/D13583
2019-03-25 07:46:20 +00:00
scottl
c57accd7b2 Add event table decoding for SAS Broadcast Primitive events. 2019-03-24 20:37:37 +00:00
scottl
15a481e038 Fix a transposition error from the previous commit 2019-03-24 19:29:30 +00:00
ian
2b092124bf Support device-independent labels for geom_flashmap slices.
While geom_flashmap has always supported label names for its slices, it does
so by appending "s.labelname" to the provider device name, meaning you still
have to know the name and unit of the hardware device to use the labels.

These changes add support for device-independent geom_flashmap labels, using
the standard geom_label infrastructure. geom_flashmap now creates a softc
struct attached to its geom, and as it creates slices it stores the label
into an array in the softc. The new geom_label_flashmap uses those labels
when tasting a geom_flashmap provider.

Differential Revision:	https://reviews.freebsd.org/D19535
2019-03-24 19:11:45 +00:00
scottl
136e1d1535 r329522 created problemss with commands that enter the TIMEDOUT state but
are successfully returned by the card (usually due to an abort being issued
as part of timeout recovery). Remove what amounts to an insufficient
KASSERT, and don't overwrite the state value. State should probably be
re-designed, and that will be done with a future commit.

Reported by:	phk, bei.io
Reviewed by:	imp, mav
Differential Revision:	D19677
2019-03-24 19:09:50 +00:00
ian
8d5378d733 Revert accidental change that should not have been included in r345475.
I had changed this value as part of a local experiment, and neglected to
change it back before committing the other changes.
2019-03-24 18:02:27 +00:00
ian
3ca1299470 Truncate a too-long interrupt handler name when there is only one handler.
There are only 19 bytes available for the name of an interrupt plus the
name(s) of handlers/drivers using it. There is a mechanism from the days of
shared interrupts that replaces some of the handler names with '+' when they
don't all fit into 19 bytes.

In modern times there is typically only one device on an interrupt, but long
device names are the norm, especially with embedded systems. Also, in systems
with multiple interrupt controllers, the names of the interrupts themselves
can be long. For example, 'gic0,s54: imx6_anatop0' doesn't fit, and
replacing the device driver name with a '+' provides no useful info at all.

When there is only one handler but its name was too long to fit, this
change truncates enough leading chars of the handler name (replacing them
with a '-' char to indicate that some chars are missing) to use all 19
bytes, preserving the unit number typically on the end of the name. Using
the prior example, this results in: 'gic0,s54:-6_anatop0' which provides
plenty of info to figure out which device is involved.

PR:		211946
Reviewed by:	gonzo@ (prior version without the '-' char)
Differential Revision:	https://reviews.freebsd.org/D19675
2019-03-24 17:53:26 +00:00
dchagin
7dbe184dfc Whitespace cleanup (annoying).
MFC after:	1 month
2019-03-24 15:08:30 +00:00
dchagin
502aa03a0e Regen from r345471.
MFC after:	1 month
2019-03-24 14:51:17 +00:00
dchagin
82329819bc Update syscall.master to 5.0.
For 32-bit Linuxulator, ipc() syscall was historically
the entry point for the IPC API. Starting in Linux 4.18, direct
syscalls are provided for the IPC. Enable it.

MFC after:	1 month
2019-03-24 14:50:02 +00:00
dchagin
ed0d39917e Regen for r345469 (shmat()).
MFC after:	1 month
2019-03-24 14:46:07 +00:00
dchagin
70fa6829e2 Linux between 4.18 and 5.0 split IPC system calls.
In preparation for doing this in the Linuxulator modify our linux_shmat()
to match actual Linux shmat() system call.

MFC after:	1 month
2019-03-24 14:44:35 +00:00
dchagin
12f579b5ef Revert r313993.
AMD64_SET_**BASE expects a pointer to a pointer, we just passing in the pointer value itself.

Set PCB_FULL_IRET for doreti to restore %fs, %gs and its correspondig base.

PR:		225105
Reported by:	trasz@
MFC after:	1 month
2019-03-24 14:02:57 +00:00
tuexen
1ff39c37aa Fix build issue for the userland stack.
Joint work with rrs@.

MFC after:		1 week
2019-03-24 12:13:05 +00:00
tuexen
aa72882b7f Fox more signed unsigned issues. This time on the send path.
This is joint work with rrs@ and was found by running syzkaller.

MFC after:		1 week
2019-03-24 10:40:20 +00:00
tuexen
ff6cd9e93e Fix a signed/unsigned bug when receiving SCTP messages.
This is joint work with rrs@.

Reported by:		syzbot+6b8a4bc8cc828e9d9790@syzkaller.appspotmail.com
MFC after:		1 week
2019-03-24 09:46:16 +00:00
allanjude
480a8566d1 Fix AMD type flash write operations, and display chip information at boot
Applies to MX flash chips on AR9132 and RT3050

Submitted by:	Hiroki Mori <yamori813@yahoo.co.jp>
Reviewed by:	imp, sbruno
Differential Revision:	https://reviews.freebsd.org/D14279
2019-03-24 06:28:25 +00:00
tuexen
46b4806255 Limit the size of messages sent on 1-to-many style SCTP sockets with the
SCTP_SENDALL flag. Allow also only one operation per SCTP endpoint.

This fixes an issue found by running syzkaller and is joint work with rrs@.

MFC after:		1 week
2019-03-23 22:56:03 +00:00
tuexen
5e3a245f1b Limit the number of bytes which can be queued for SCTP sockets.
This is joint work with rrs@.
Reported by:		syzbot+307f167f9bc214f095bc@syzkaller.appspotmail.com
MFC after:		1 week
2019-03-23 22:46:29 +00:00
tuexen
f674536274 Add sysctl variable net.inet.tcp.rexmit_initial for setting RTO.Initial
used by TCP.

Reviewed by:		rrs@, 0mp@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19355
2019-03-23 21:36:59 +00:00
rpokala
f817f49efa Add descriptions for sysctls in kern_mib.c and sysctl.3 which lack them.
r343532 noted the difference between "hw.realmem" and "hw.physmem", which I
was previously unaware of. I discovered that neither sysctl had a
description visible via `sysctl -d', so I found where they were defined and
added suitable descriptions. While in the file, I went ahead and added
descriptions for all the others which lacked them. I also updated sysctl.3
accordingly

Reviewed by:	kib, bcr
MFC after:	1 weeks
Sponsored by:	Panasas
Differential Revision:	https://reviews.freebsd.org/D19007
2019-03-23 19:53:15 +00:00
imp
beefa83d5c Remove duplicate options. 2019-03-23 18:32:28 +00:00
imp
b8109b4194 Add device xz. This was somehow missed in the last round.
Submitted by: Brandon Bergren
2019-03-23 18:32:24 +00:00
kib
b37c7d4a72 ASLR: check for max_addr after applying randomization, not before.
Otherwise resulting address from vm_map_find() migh not satisfy the
upper limit.  For instance, it could affect MAP_32BIT flag from 64bit
processes.

Found by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc, Doug Moore <dougm@rice.edu>
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19688
2019-03-23 16:36:18 +00:00
trasz
0e6bf0c478 Remove trunc_page_ps() and round_page_ps() macros. This completes
the undoing of r100384.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19680
2019-03-23 13:41:14 +00:00
tuexen
202ab2ae5b Fix a KASSERT() in tcp_output().
When checking the length of the headers at this point, the IP level
options have not been added to the mbuf chain.
So don't take them into account.

Reported by:		syzbot+16025fff7ee5f7c5957b@syzkaller.appspotmail.com
Reported by:		syzbot+adb5836b8a9ff621b2aa@syzkaller.appspotmail.com
Reported by:		syzbot+d25a5352bcdf40acdbb8@syzkaller.appspotmail.com
Reviewed by:		rrs@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
2019-03-23 09:56:41 +00:00
mw
6305041503 Allow using TPM as entropy source.
TPM has a built-in RNG, with its own entropy source.
The driver was extended to harvest 16 random bytes from TPM every 10 seconds.
A new build option "TPM_HARVEST" was introduced - for now, however, it
is not enabled by default in the GENERIC config.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: markm, delphij
Approved by: secteam
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19620
2019-03-23 05:13:51 +00:00
jhibbits
85812893b0 powernv: Add Hypervisor Maintenance Interrupt handler
Attempting to build www/firefox on POWER9 resulted in a HMI exception being
thrown, a fatal trap currently.  This is typically caused by timer facility
errors, but examination of the Hypervisor Maintenance Exception Register
(HMER) yielded only that an exception had recovered, with no information of
the actual exception cause.

When an HMI occurs, OPAL_HANDLE_HMI or OPAL_HANDLE_HMI2 must be called to
handle the exception at the firmware level.  If the exception is handled, we
can continue.

This adds only the preliminary handler, enough to prevent package building
from panicking.  An enhancement in the future is to use the flags returned
by OPAL_HANDLE_HMI2 to print more useful error messages, and log maintenance
events.

Reviewed by:	luporl
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19634
2019-03-23 03:23:20 +00:00
mw
d8bc0028e7 Enable etherswitchcfg and e6000sw driver in arm64 build
After latest binding update, this patch enables usage of
the switch on Armada 3720 EspressoBin, so compile it
by default with arm64 GENERIC.

A patch was extracted from https://reviews.freebsd.org/D19036

Submitted by: Bert JW Regeer <xistence@0x58.com>
Reviewed by: manu
2019-03-23 02:53:47 +00:00
mw
0e7675604a Update mvneta/e6000sw for new DSA Device Tree Bindings
In the latest Linux kernel revisions the DSA (Distributed
Switch Architecture) device tree binding was changed.
Instead of the top level dsa@ node, the switch and its
ports is represented as a child node of the mdio bus.
With that other modifications were added, such as
relation with the ethernet port of the SoC. Adjust
e6000sw etherswitch and mvneta drivers to that.

Tested on Armada 3720 EspressoBin and Armada 388 Clearfog Pro boards.

Submitted by: Bert JW Regeer <xistence@0x58.com>
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D19036
2019-03-23 02:48:47 +00:00
jhibbits
dd454ce0a7 powerpc: Re-merge isa3 HPT with moea64 native HPT
r345402 fixed the bug that led to the split of the ISA 3.0 HPT handling from
the existing manager.  The cause of the bug was gcc moving the register
holding VPN to a different register (not r0), which triggered bizarre
behaviors.  With the fix, things work, so they can be re-merged.  No
performance lost with the merge.
2019-03-22 22:14:14 +00:00
sobomax
a400eb5102 Make it possible to update TMPFS mount point from read-only to read-write
and vice versa.

Reviewed by:	delphij
Approved by:	delphij
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19682
2019-03-22 21:31:21 +00:00
avg
c9511b5db6 Revert r345410, VOP_FSYNC change in ZFS vdev_file
I overlooked the fact that that VOP_FSYNC() call is not a FreeBSD VFS
call, but a macro that provides an illumos-compatible wrapper for the
FreeBSD operation.

PR:		236475
Reported by:	lwhsu
Pointyhat to:	avg
2019-03-22 17:44:47 +00:00
avg
75ee4f08d3 intpm: change translation of HBA error status to smbus(4) errors
PIIX4_SMBHSTSTAT_ERR can be set for several reasons that, unfortunately,
cannot be distinguished, but the most typical case is a missing or hung
slave (SMB_ENOACK).

PIIX4_SMBHSTSTAT_FAIL means failed or killed / aborted transaction, so
it's previous mapping to SMB_ENOACK was not ideal.

After this change an smb(4) access to a missing slave results in ENXIO
rather than EIO.  To me, that seems to be more appropriate.

MFC after:	3 weeks
2019-03-22 10:38:22 +00:00
avg
f359ad5c3a ZFS vdev_file: use correct value for waitfor parameter of VOP_FSYNC
PR:		236475
Reported by:	asomers
MFC after:	2 weeks
2019-03-22 09:11:45 +00:00
cperciva
be4e05cbf3 Add nvme support to the arm64 GENERIC kernel.
Submitted by:	Greg V
Differential Revision:	https://reviews.freebsd.org/D19657
2019-03-22 06:36:40 +00:00
cperciva
11a7cd1158 Build if_ena.ko on arm64.
This module provides support for the Amazon Elastic Network Adapter; it
was previously only built on x86 architectures, but Amazon EC2 now also
has ARM64 instances with this hardware.

Submitted by:	Greg V
2019-03-22 06:33:26 +00:00
cperciva
3fdef584c0 Initialize uart_bus_space_mem.
This value was being used uninitialized, resulting in predictable issues
on systems with memory-mapped UART registers.

A case could be made that memmap_bus should be declared in a header
rather than being declared in each .c file which needs to refer to it,
but that's a broader style question.

This commit unbreaks hw.uart.console="mm:..." on ARM64.

Submitted by:	Greg V
2019-03-22 06:28:37 +00:00
cperciva
b0d688847a Obey SPCR AccessWidth parameter.
The "access width" value was hard-coded as 2, indicating 32-bit accesses;
instead, use the value specified in the SPCR table.

This unbreaks the console on EC2 "A1" family instances.

Submitted by:	Greg V
2019-03-22 06:21:03 +00:00
jhibbits
e0266e5c8c powerpc64: Handle the modern (2.05+) implementaiton of tlbie
By happenstance gcc4 puts 'vpn' into r0 in all uses of TLBIE(), but modern
gcc does not.  Also, the single-argument form of tlbie zeros all unused
arguments, making the modern tlbie instruction use r0 as the RS field
(LPID).

The vpn argument has the bottom 12 bits cleared (the input having been
left-shifted by 12 bits), which just so happens, on the POWER9 and previous
incarnations, to be the number of LPID bits supported.  With those bits
being zero, the instruction:

	tlbie r0, r0

will invalidate the VPN in r0, in LPAR 0 (ignoring the upper bits of r0 for
the RS field).  One build with gcc8 yields:

	tlbie r9, r0

with r0 having arbitrary contents, not equal to r9.  This leads to strange
crashes, behaviors, and panics, due to the requested TLB entry not actually
being invalidated.

As the moea64_native must work on both old and new, we explicitly zero out
r0 so that it can work with only the single argument, built with base gcc
and modern gcc.  isa3_hashtb takes a different approach, encoding the
two-argument form, soas not to explicitly clobber r0, and instead let the
compiler decide.

Reported by:	Brandon Bergren
Tested by:	Brandon Bergren
MFC after:	1 week
2019-03-22 01:43:31 +00:00
trasz
502b34b987 Fix smartpqi(4) malloc tag and description to match the driver name.
No functional changes.

Reviewed by:	sbruno
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D19625
2019-03-21 21:14:25 +00:00
markj
1cd6073f58 Use an explicit comparison with VM_GUEST_NO.
Reported by:	jhb
MFC with:	r345359
Sponsored by:	The FreeBSD Foundation
2019-03-21 20:07:50 +00:00
markj
1ab80ddad8 Disallow preemptive creation of wired superpage mappings.
There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped.  If the application subsequently
faults on that region, the handler may attempt to create a superpage
mapping backed by the resident, wired pages.  However, the pmap code
responsible for creating such a mapping (pmap_enter_pde() on i386
and amd64) does not ensure that a leaf page table page is available
if the superpage is later demoted; the demotion operation must therefore
perform a non-blocking page allocation and must unmap the entire
superpage if the allocation fails.  The pmap layer ensures that this
can never happen for wired mappings, and so the case described above
breaks that invariant.

For now, simply ensure that the MI fault handler never attempts to
create a wired superpage except via promotion.

Reviewed by:	kib
Reported by:	syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19670
2019-03-21 19:52:50 +00:00
glebius
41a70f9371 Always create ipfw(4) hooks as long as module is loaded.
Now enabling ipfw(4) with sysctls controls only linkage of hooks to default
heads. When module is loaded fetch sysctls as tunables, to make it possible
to boot with ipfw(4) in kernel, but not linked to any pfil(9) hooks.
2019-03-21 16:15:29 +00:00
kib
e9037b6394 nullfs: fix unmounts when filesystem is active.
If vflush() did not completely flushed the mount vnodes queue, either
retry for forced unmounts, or give up for non-forced.  This situation
can occur when new vnodes are instantiated while vflush() worked.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-03-21 13:30:48 +00:00
mw
6bc222605a Add bus_release_resource() method to nexus on arm64
The nexus module was missing method for releasing bus resources. As a
result, it couldn't be released and the bus_release_resource() call would
return ENXIO.

Next call to bus_alloc_resource() for the same resource was returning
error, because it wasn't released previously and it was still busy.

The implementation of the nexus_release_resource() is the same as for
arm architecture.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Reported-by:   Greg V <greg@unrelenting.technology>
Tested-by:     cperciva, Greg V <greg@unrelenting.technology>
Obtained from: Semihalf
MFC after:     2 weeks
Sponsored by:  Amazon, Inc.
Differential revision: https://reviews.freebsd.org/D19641
2019-03-21 10:51:36 +00:00
bz
1ca6c95b99 Whitespace cleanup in sdhci.c
No functional changes.  Replace whitespace by tabs, indent with 4 spaces,
coalesce multi-line shorter than 80 characters,

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-03-21 10:50:36 +00:00
mw
6118447bb2 Prevent double activation of admin interrupt in ENA
The resource is already being activated in the bus_alloc_resource(),
because the flag RF_ACTIVE is being passed.

Double activation on arm64 is causing kernel panic.

Version of the driver was upgraded to 0.8.4.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Reported-by:   Greg V <greg@unrelenting.technology>
Tested-by:     cperciva, Greg V <greg@unrelenting.technology>
Obtained from: Semihalf
MFC after:     2 weeks
Sponsored by:  Amazon, Inc.
Differential revision: https://reviews.freebsd.org/D19655
2019-03-21 10:46:10 +00:00
bz
b5e8e61ac3 Align struct sdhci_slot MMCCAM members.
Whitespace only, no functional change.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-03-21 10:23:02 +00:00
cperciva
d06aee4336 Recognize the Amazon PCI serial device found in a1.* EC2 instances
as an NS8250 UART.

This is the same as the UART found in EC2 "bare metal" instances,
except that the card vendor shows up as 0x0000 rather than 0x1d0f.
This seems like a bug in the EC2 firmware; but we might as well support
it anyway.

Reported by:	Greg V
2019-03-21 08:54:34 +00:00
kp
629163522d pf: Ensure that IP addresses match in ICMP error packets
States in pf(4) let ICMP and ICMP6 packets pass if they have a
packet in their payload that matches an exiting connection.  It was
not checked whether the outer ICMP packet has the same destination
IP as the source IP of the inner protocol packet.  Enforce that
these addresses match, to prevent ICMP packets that do not make
sense.

Reported by:	Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv
Obtained from:	OpenBSD
Security:	CVE-2019-5598
2019-03-21 08:09:52 +00:00
markj
52ae896ad7 Don't attempt to measure TSC skew when running as a VM guest.
It simply doesn't work in general since VCPUs may migrate between
physical cores.  The approach used to measure skew also doesn't
make much sense in a VM.

PR:		218452
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2019-03-21 02:52:22 +00:00
mckusick
1e2cc9b200 This is an additional and hopefully final fix for bug report 230962.
This bug was introduced with the change to use softdep_bp_to_mp()
in January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp()
function failed to include VSOCK as one of the valid cases.

Although local-domain sockets do not allocate blocks in the filesystem,
they will allocate blocks if they use extended attributes (such as
ACLs). Thus, softdep_bp_to_mp() needs to return a non-NULL mount
pointer when presented with a socket vnode so that the soft updates
write complete will properly process the soft updates structures
associated with the extended attribute blocks. It was the failure
to process these soft updates structures, thus leaving them hanging
off the buffer, which lead to the "panic: softdep_deallocate_dependencies:
dangling deps" when trying to clean up the buffer after it was written.

PR:           230962
Reported by:  2t8mr7kx9f@protonmail.com
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    1 week
Sponsored by: Netflix
2019-03-20 23:11:05 +00:00
bdrewery
fb47f2a648 Build common kernel dependencies before modules.
This ensures files like genassym.o and awk/mfiles are generated before
descending into the modules build.  It may also allow some module builds
to not recreate files that are already present in the KERNBUILDDIR.

This fixes a rare build race where genassym.o is missing and assym.inc
is empty.

More work is planned around this to reduce some redundant dependency
generation in modules.

PR:		233339
MFC after:	2 weeks
Reported by:	markj
2019-03-20 22:49:41 +00:00
asomers
1215d8a08b Rename fuse(4) to fusefs(4)
This makes it more consistent with other filesystems, which all end in "fs",
and more consistent with its mount helper, which is already named
"mount_fusefs".

Reviewed by:	cem, rgrimes
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19649
2019-03-20 21:48:43 +00:00
markj
71b2141ae6 Use -fdebug-prefix-map to map auto-generated kernel build paths.
The kernel build uses symlinks to make MD #includes like <machine/pcpu.h>
work.  Debug info ends up referencing these symlinks in a relative path,
so debuggers generally don't know how to find the corresponding headers.
Address this by using -fdebug-prefix-map to map relative paths through
the symlinks to their absolute paths in the source tree.  This is
consistent with how regular source file paths are defined in the
kernel's debug info.

Also map the current directory to an absolute path to the object
directory.  This gives debuggers a chance to find auto-generated files
like vnode_if.c if the object directory is available.

Reviewed by:	emaste, jhb (previous version)
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19633
2019-03-20 20:42:44 +00:00
np
7154ff277a cxgbe(4): Treat the viid as an opaque identifier.
Recent firmwares prefer to use a different format for viid internally
and this change allows them to do so.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2019-03-20 17:27:11 +00:00
mav
da08500f86 Add some Cannon Lake chipset IDs.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2019-03-20 17:27:00 +00:00
mav
2cf541e0f7 Tune chipset naming.
Submitted by:	Dmitry Luhtionov <dmitryluhtionov@gmail.com>
MFC after:	2 weeks
2019-03-20 17:21:17 +00:00
kib
609c32a75e vm_fault_copy_entry: accept invalid source pages.
Either msync(MS_INVALIDATE) or the object unlock during vnode
truncation can expose invalid pages backing wired entries.  Accept
them, but do not install them into destrination pmap.  We must create
copied pages in the copy case, because e.g. vm_object_unwire() expects
that the entry is fully backed.

Reported by:	syzkaller, via emaste
Reported by:	syzbot+514d40ce757a3f8b15bc@syzkaller.appspotmail.com
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19615
2019-03-20 13:07:57 +00:00
ae
0e59726896 Do not enter epoch section recursively.
A pfil hook is already invoked in NET_EPOCH section.
2019-03-20 10:11:21 +00:00
ae
19a685d26f Use NET_EPOCH instead of allocating separate one.
MFC after:	1 month
2019-03-20 10:06:44 +00:00
erj
4f4c322dc5 iflib: mark isc_driver_version as constant
From Jake:
The iflib core never modifies the isc_driver_version string. Allow
drivers to safely assign pointers to constant buffers by marking this
parameter const.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19577
2019-03-19 23:44:26 +00:00
imp
90d8cba860 Fix two typos: an -> and; the the -> the
And justify the paragraph after the change (and set fill column to 80
instead of 70).

Noticed by: rpokala@, vangyzen@
2019-03-19 21:46:21 +00:00
np
236daeb39c iw_cxgbe: Remove unused smac_idx from the ep structure.
Submitted by:	Krishnamraju Eraparaju @ Chelsio
2019-03-19 19:11:44 +00:00
erj
fbab47379a ixv(4): Add missing IFLIB_IS_VF flag in iflib shared ctx
From Krzysztof:
The driver built as KLD cannot be unloaded, if this flag is not set.

Submitted by:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19402
2019-03-19 18:07:44 +00:00
erj
66cfcbc300 iflib: expose the Rx mbuf buffer size to drivers
From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.

This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.

If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.

Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19489
2019-03-19 17:59:56 +00:00
erj
ef226b1dc6 iflib: prevent possible infinite loop in iflib_encap
From Jake:
iflib_encap calls bus_dmamap_load_mbuf_sg. Upon it returning EFBIG, an
m_collapse and an m_defrag are attempted to shrink the mbuf cluster to
fit within the DMA segment limitations.

However, if we call m_defrag, and then bus_dmamap_load_mbuf_sg returns
EFBIG on the now defragmented mbuf, we will continuously re-call
bus_dmamap_load_mbuf_sg over and over.

This happens because m_head isn't NULL, and remap is >1, so we don't try
to m_collapse or m_defrag again. The only way we exit the loop is if
m_head is NULL. However, m_head can't be modified by the call to
bus_dmamap_load_mbuf_sg, because we don't pass it as a double pointer.

I believe this will be an incredibly rare occurrence, because it is
unlikely that bus_dmamap_load_mbuf_sg will actually fail on the second
defragment with an EFBIG error. However, it still seems like
a possibility that we should account for.

Fix the exit check to ensure that if remap is >1, we will also exit,
even if m_head is not NULL.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19468
2019-03-19 17:49:03 +00:00
mmel
2758d1a1b9 PSCI: Don't take missing implementation of psci get_version() as fatal.
Minimalistic PSCI implementation in U-Boot doesn't implement get_version()
method for some SoC. In this case, use PSCI version declared by 'psci' node
in DT as fallback.

MFC after:	2 weeks
2019-03-19 15:42:11 +00:00
imp
148c0d4211 Add comment about why we bother to use endian macros here, and why we
must use bitfields.
2019-03-19 15:03:20 +00:00
mmel
9485b14635 Improve cpufreq_dt.
- older DT can use 'cpu0-supply' property for power supply binding.
 - don't expect that actual CPU frequency is contained in CPU
   operational point table, but read current CPU voltage directly from
   reguator. Typically, u-boot can set starting CPU frequency to any
   value.

MFC after:	2 weeks
2019-03-19 14:34:53 +00:00
mmel
5fb90cdf7d Use named field's initializer when constructing <foo>_platform structure.
In current code, the delay argument in FDT_PLATFORM_DEF(2) improperly
initialize refs field from kobj_class structure instead of delay_count
field.
This causes not working DELAY() function (due to never initialized
delay_count) in earlier boot stages, until the first timer was attached.

MFC after:	2 weeks
2019-03-19 14:32:54 +00:00
mmel
c9cb8e3f9b extres: Unify error codes for <foo>_get_by_ofw_property() methods.
Return:
 - ENOENT if requested property doesn't exist
 - ENODEV if producer device is not (yet) attached
 - ENXIO otherwise

MFC after:	2 weeks
2019-03-19 14:30:54 +00:00
ae
d763427450 Reapply r345274 with build fixes for 32-bit architectures.
Update NAT64LSN implementation:

  o most of data structures and relations were modified to be able support
    large number of translation states. Now each supported protocol can
    use full ports range. Ports groups now are belongs to IPv4 alias
    addresses, not hosts. Each ports group can keep several states chunks.
    This is controlled with new `states_chunks` config option. States
    chunks allow to have several translation states for single alias address
    and port, but for different destination addresses.
  o by default all hash tables now use jenkins hash.
  o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
  o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
    special prefix "::" value should be used for this purpose when instance
    is created.
  o due to modified internal data structures relations, the socket opcode
    that does states listing was changed.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-19 10:57:03 +00:00
ae
670da7d638 Convert allocation of bpf_if in bpfattach2 from M_NOWAIT to M_WAITOK
and remove possible panic condition.

It is already allowed to sleep in bpfattach[2], since BPF_LOCK was
converted to SX lock in r332388. Also move KASSERT() to the top of
function and make full initialization before bpf_if will be linked
to BPF's list of interfaces.

MFC after:	2 weeks
2019-03-19 10:29:32 +00:00
mw
a61dc2f9d7 Prevent loading SGX with incorrect EPC data
It may happen on some machines, that even if SGX is disabled
in firmware, the driver would still attach despite EPC base and
size equal zero. Such behaviour causes a kernel panic when the
module is unloaded. Add a simple check to make sure we
only attach when these values are correctly set.

Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: br
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D19595
2019-03-19 02:33:58 +00:00
adrian
7200c5b15d [ath_hal_ar9300] Add some comments around the AR9300 ANI code.
I'm refamiliarising myself with the behaviour of the ANI code and I thought
I'd drop some comments to remind myself.
2019-03-19 00:07:12 +00:00
emaste
c6b6ed2e54 sys/stat.h: Improve timespec compatibility with other BSDs
OpenBSD and NetBSD provide macros to directly reference the underlying
struct timespec's tv_nsec member.  While FreeBSD has such macros for
tv_sec, the others are missing.  Add the following macros:

st->st_atimensec
st->st_mtimensec
st->st_ctimensec
st->st_birthtimensec

Adding these fields will provide programs which reference them better
portability to FreeBSD.  An example of such a program is makefs(8),
which has unused support for subseconds that it has inherited from
NetBSD.

Submitted by:	Mitchell Horne <mhorne063@gmail.com>
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D19626
2019-03-18 19:23:19 +00:00
ae
e171491f01 Revert r345274. It appears that not all 32-bit architectures have
necessary CK primitives.
2019-03-18 14:00:19 +00:00
ae
f13ac20eb6 Update NAT64LSN implementation:
o most of data structures and relations were modified to be able support
  large number of translation states. Now each supported protocol can
  use full ports range. Ports groups now are belongs to IPv4 alias
  addresses, not hosts. Each ports group can keep several states chunks.
  This is controlled with new `states_chunks` config option. States
  chunks allow to have several translation states for single alias address
  and port, but for different destination addresses.
o by default all hash tables now use jenkins hash.
o ConcurrencyKit and epoch(9) is used to make NAT64LSN lockless on fast path.
o one NAT64LSN instance now can be used to handle several IPv6 prefixes,
  special prefix "::" value should be used for this purpose when instance
  is created.
o due to modified internal data structures relations, the socket opcode
  that does states listing was changed.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-18 12:59:08 +00:00
gallatin
b9125f9ffc Fix a typo introduced in r344133
The line was misedited to change tt to st instead of
changing ut to st.

The use of st as the denominator in mul64_by_fraction() will lead
to an integer divide fault in the intr proc (the process holding
ithreads) where st will be 0.  This divide by 0 happens after
the total runtime for all ithreads exceeds 76 hours.

Submitted by: bde
2019-03-18 12:41:42 +00:00
vmaffione
c056f1d6c1 netmap: add support for multiple host rings
Some applications forward from/to host rings most or all the
traffic received or sent on a physical interface. In this
cases it is desirable to have more than a pair of RX/TX host
rings, and use multiple threads to speed up forwarding.
This change adds support for multiple host rings. On registering
a netmap port, the user can specify the number of desired receive
and transmit host rings in the nr_host_tx_rings and nr_host_rx_rings
fields of the nmreq_register structure.

MFC after:	2 weeks
2019-03-18 12:22:23 +00:00
ae
93a7173b74 Add NAT64 CLAT implementation as defined in RFC6877.
CLAT is customer-side translator that algorithmically translates 1:1
private IPv4 addresses to global IPv6 addresses, and vice versa.
It is implemented as part of ipfw_nat64 kernel module. When module
is loaded or compiled into the kernel, it registers "nat64clat" external
action. External action named instance can be created using `create`
command and then used in ipfw rules. The create command accepts two
IPv6 prefixes `plat_prefix` and `clat_prefix`. If plat_prefix is ommitted,
IPv6 NAT64 Well-Known prefix 64:ff9b::/96 will be used.

  # ipfw nat64clat CLAT create clat_prefix SRC_PFX plat_prefix DST_PFX
  # ipfw add nat64clat CLAT ip4 from IPv4_PFX to any out
  # ipfw add nat64clat CLAT ip6 from DST_PFX to SRC_PFX in

Obtained from:	Yandex LLC
Submitted by:	Boris N. Lytochkin
MFC after:	1 month
Relnotes:	yes
Sponsored by:	Yandex LLC
2019-03-18 11:44:53 +00:00
ae
2770fa04e1 Add SPDX-License-Identifier and update year in copyright.
MFC after:	1 month
2019-03-18 10:50:32 +00:00
ae
6b7a62da46 Modify struct nat64_config.
Add second IPv6 prefix to generic config structure and rename another
fields to conform to RFC6877. Now it contains two prefixes and length:
PLAT is provider-side translator that translates N:1 global IPv6 addresses
to global IPv4 addresses. CLAT is customer-side translator (XLAT) that
algorithmically translates 1:1 IPv4 addresses to global IPv6 addresses.
Use PLAT prefix in stateless (nat64stl) and stateful (nat64lsn)
translators.

Modify nat64_extract_ip4() and nat64_embed_ip4() functions to accept
prefix length and use plat_plen to specify prefix length.

Retire net.inet.ip.fw.nat64_allow_private sysctl variable.
Add NAT64_ALLOW_PRIVATE flag and use "allow_private" config option to
configure this ability separately for each NAT64 instance.

Obtained from:	Yandex LLC
MFC after:	1 month
Sponsored by:	Yandex LLC
2019-03-18 10:39:14 +00:00
markj
b24a98cb7e Revert r345244 for now.
The code which advances the block number is simplistic and is not
correct when the starting offset is non-zero.  Revert the change until
this is fixed.
2019-03-18 05:03:55 +00:00
avos
c67db9c243 net80211: correct check for SMPS node flags updates
Update node flags when driver supports SMPS, not when it is disabled or
in dynamic mode ((iv_htcaps & HTCAP_SMPS) != 0).

Checked with RTL8188EE (1T1R), STA mode - 'smps' word should disappear
from 'ifconfig wlan0' output.

MFC after:	2 weeks
2019-03-18 02:40:22 +00:00
kib
ea004a70f9 i386: improve detection of the fast page fault assist.
In particular, check that we are assisting the page fault from kernel mode.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-03-17 18:31:48 +00:00
markj
72039f2089 Fix the gcc build (-Wstrict-prototypes) after r345244.
Reported by:	jenkins
MFC with:	r345244
2019-03-17 18:06:13 +00:00
markj
3139ccc80b Optimize lseek(SEEK_DATA) on UFS.
The old implementation, at the VFS layer, would map the entire range of
logical blocks between the starting offset and the first data block
following that offset.  With large sparse files this is very
inefficient.  The VFS currently doesn't provide an interface to improve
upon the current implementation in a generic way.

Add ufs_bmap_seekdata(), which uses the obvious algorithm of scanning
indirect blocks to look for data blocks.  Use it instead of
vn_bmap_seekhole() to implement SEEK_DATA.

Reviewed by:	kib, mckusick
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19598
2019-03-17 17:34:06 +00:00
jhibbits
985f00a61c fdt: Explicitly mark fdt_slicer as dependent on geom_flashmap
Without this dependency relationship, the linker doesn't find the
flash_register_slicer() function, so kldload fails to load fdt_slicer.ko.

Discussed with:	ian@
2019-03-17 04:33:17 +00:00