Commit Graph

130633 Commits

Author SHA1 Message Date
Alexander Motin
ace409ce9c Restore loop break in vm_pageout_lowmem().
r355004 removed return statement from this loop with intention to also
call uma_reclaim_wakeup().  But in case of vm.lowmem_period=0 it causes
infinite loop.

Reviewed by:	markj
Sponsored by:	iXsystems, Inc.
2020-01-14 03:27:57 +00:00
Ryan Libby
9b8db4d0a0 uma: split slabzone into two sizes
By allowing more items per slab, we can improve memory efficiency for
small allocs.  If we were just to increase the bitmap size of the
slabzone, we would then waste slabzone memory.  So, split slabzone into
two zones, one especially for 8-byte allocs (512 per slab).  The
practical effect should be reduced memory usage for counter(9).

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23149
2020-01-14 02:14:15 +00:00
Ryan Libby
51871224c0 malloc: remove assumptions about MINALLOCSIZE
Remove assumptions about the minimum MINALLOCSIZE, in order to allow
testing of smaller MINALLOCSIZE.  A following patch will lower the
MINALLOCSIZE, but not so much that the present patch is required for
correctness at these sites.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
2020-01-14 02:14:02 +00:00
Ryan Libby
e63a1c2f52 uma: fixup some ktr messages
Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23148
2020-01-14 02:13:46 +00:00
Jeff Roberson
815b74866a Fix a long standing bug in journaled soft-updates. The dirrem structure
needs to handle file removal, directory removal, file move, directory move,
etc.  The code in handle_workitem_remove() needs to propagate any completed
journal entries to the write that will render the change stable.  In the
case of a moved directory this means the new parent.  However, for an
overwrite that frees a directory (DIRCHG) we must move the jsegdep to the
removed inode to be released when it is stable in the cg bitmap or the
unlinked inode list.  This case was previously unhandled and caused a
panic.

Reported by:	mckusick, pho
Reviewed by:	mckusick
Tested by:	pho
2020-01-14 02:00:24 +00:00
Navdeep Parhar
46d29cab25 cxgbe/iw_cxgbe: Do not allow memory registrations with page size greater
than 128MB, which is the maximum supported by the hardware in RDMA mode.

Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-01-14 01:43:04 +00:00
Justin Hibbits
a80e3de39b powerpc/mpc85xx: Partially revert r356640
The count block was correct before.  r356640 caused a read past the end of
the tuple.
2020-01-13 23:09:00 +00:00
Alexander Motin
da19f62dfa Map ECKSUM and EFRAGS from ZFS onto real errnos.
Make it less confusing when, for example, stat sets errno to 122 because a
checksum failed in ZFS:

Before: getfacl: /foo/bar: stat() failed: Unknown error: 122
After: getfacl: /foo/bar: stat() failed: Integrity check failed

Submitted by:	Ryan Moeller <ryan@ixsystems.com>
Reviewed by:	mckusick, mav
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D22973
2020-01-13 22:06:16 +00:00
Eric van Gyzen
cf64777f50 Add missing comma in nfsv4_errstr
Reported by:	Coverity
CID:		1412243
Sponsored by:	Dell EMC Isilon
2020-01-13 21:49:27 +00:00
Vincenzo Maffione
2ec213aba4 netmap: disable passthrough with no hypervisor support
The netmap passthrough subsystem requires proper support in the
hypervisor. In particular, two PCI device ids (from the Red Hat
PCI vendor id 0x1b36) need to be assigned to the two netmap
virtual devices. We then disable these devices until the ids have
not been assigned, in order to avoid conflicts with other
virtual devices emulated by upstream QEMU.

PR:	241774
MFC after:	3 days
2020-01-13 21:47:23 +00:00
Vincenzo Maffione
f55f37d9c5 vmx: fix initialization of TSO related descriptor fields
Fix a mistake introduced by r343291, which ported the vmx(4)
driver to iflib.
In case of TSO, the hlen field of the (first) tx descriptor must
be initialized to the cumulative length of Ethernet, IP and TCP
headers. The length of the TCP header was missing.

PR:		236999
Reported by:	pkelsey
Reviewed by:	avg
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22967
2020-01-13 21:26:17 +00:00
Gleb Smirnoff
4c69f60a8e Fix yet another regression from r354484. Error code from cr_cansee()
aliases with hard error from other operations.

Reported by:	flo
2020-01-13 21:12:10 +00:00
Mateusz Guzik
f1fcaffd8e ufs: relax an overzealous assert added in r356671
Part of i_flag can persist across a drop to hold count of 0, at which
point the vnode is taken off the lazy list. Then whoever locks and unlocks
the vnode can trip on the assert.

This trips over kyua running a test untarring character devices to ufs.

Reported by:	lwhsu
2020-01-13 14:33:51 +00:00
Konstantin Belousov
fedab1b499 Code must not unlock a mutex while owning the thread lock.
Reviewed by:	hselasky, markj
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23150
2020-01-13 14:30:19 +00:00
Mitchell Horne
e12bf34c05 RISC-V: fix global symbol lookups for mpentry with lld
This is a follow up to r356481. In locore.S, before virtual memory is
set up, we should avoid using indirect address lookups through the GOT.
Therefore we need to convert uses of the la instruction to lla, which
always generates an auipc/addi pair of instructions. This conversion was
done for the BSP case, but not the AP case, resulting in a fault
somewhere before mpva and a failure to bring APs online.

Reported by:	lwhsu
Reviewed by:	lwhsu, jrtc27 (accepted in a comment)
Differential Revision:	https://reviews.freebsd.org/D23138
2020-01-13 03:39:02 +00:00
Mateusz Guzik
0c236d3d52 vfs: per-cpu batched requeuing of free vnodes
Constant requeuing adds significant lock contention in certain
workloads. Lessen the problem by batching it.

Per-cpu areas are locked in order to synchronize against UMA freeing
memory.

vnode's v_mflag is converted to short to prevent the struct from
growing.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   122.38s user 1780.45s system 6242% cpu 30.480 total
patched: 144.84s user 985.90s system 4856% cpu 23.282 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22998
2020-01-13 02:39:41 +00:00
Mateusz Guzik
cc3593fbd9 vfs: rework vnode list management
The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock:   118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22997
2020-01-13 02:37:25 +00:00
Mateusz Guzik
80663cadb8 ufs: use lazy list instead of active list for syncer
Quota code is temporarily regressed to do a full vnode scan.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22996
2020-01-13 02:35:15 +00:00
Mateusz Guzik
57083d2576 vfs: add per-mount vnode lazy list and use it for deferred inactive + msync
This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by:	jeff
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D22995
2020-01-13 02:34:02 +00:00
Mateusz Guzik
ac4ec14188 ufs: add a setter for inode i_flag field
This will be used later to add vnodes to the lazy list.

Reviewed by:	kib (previous version), jeff
Tested by:	pho (in a larger patch)
Differential Revision:	https://reviews.freebsd.org/D22994
2020-01-13 02:31:51 +00:00
Conrad Meyer
365cd52245 Fix a typo in r356667 comment
No functional change.

Reported by:	bdragon
Approved by:	csprng(markm), earlier version
X-MFC-With:	r356667
2020-01-12 23:52:16 +00:00
Conrad Meyer
86def3dcd6 getrandom(2): Add Linux GRND_INSECURE API flag
Treat it as a synonym for GRND_NONBLOCK.  The reasoning is this:

We have two choices for handling Linux's GRND_INSECURE API flag.

1. We could ignore it completely (like GRND_RANDOM).  However, this might
produce the surprising result of GRND_INSECURE requests blocking, when the
Linux API does not block.

2. Alternatively, we could treat GRND_INSECURE requests as requests for
GRND_NONBLOCk.  Here, the surprising result for Linux programs is that
invocations with unseeded random(4) will produce EAGAIN, rather than
garbage.

Honoring the flag in the way Linux does seems fraught.  If we actually use
the output of a random(4) implementation prior to seeding, we leak some
entropy (in an information theory and also practical sense) from what will
be the initial seed to attackers (or allow attackers to arbitrary DoS
initial seeding, if we don't leak).  This seems unacceptable -- it defeats
the purpose of blocking on initial seeding.

Secondary to that concern, before seeding we may have arbitrarily little
entropy collected; producing output from zero or a handful of entropy bits
does not seem particularly useful to userspace.

If userspace can accept garbage, insecure, non-random bytes, they can create
their own insecure garbage with srandom(time(NULL)) or similar.  Any program
which would be satisfied with a 3-bit key CTR stream has no need for CSPRNG
bytes.  So asking the kernel to produce such an output from the secure
getrandom(2) API seems inane.

For now, we've elected to emulate GRND_INSECURE as an alternative spelling
of GRND_NONBLOCK (2).  Consider this API not-quite stable for now.  We
guarantee it will never block.  But we will attempt to monitor actual port
uptake of this bizarre API and may revise our plans for the unseeded
behavior (prior stable/13 branching).

Approved by:	csprng(markm), manpages(bcr)
See also:	https://lwn.net/ml/linux-kernel/cover.1577088521.git.luto@kernel.org/
See also:	https://lwn.net/ml/linux-kernel/20200107204400.GH3619@mit.edu/
Differential Revision:	https://reviews.freebsd.org/D23130
2020-01-12 20:47:38 +00:00
Michael Tuexen
fe1274ee39 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
Bjoern A. Zeeb
c6feea3b89 nd6_rtr: constantly use __func__ for nd6log()
Over time one or two hard coded function names did not match the
actual function anymore.  Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.

MFC after:	2 weeks
2020-01-12 17:41:09 +00:00
Bjoern A. Zeeb
25ebfe3350 nd6_rtr: make nd6_prefix_onlink() static
nd6_prefix_onlink() is not used anywhere outside nd6_rtr.c.  Stop
exporting it and make it file local static.
2020-01-12 16:58:21 +00:00
Michael Tuexen
fc0eb7637c Fix division by zero issue.
Thanks to Stas Denisov for reporting the issue for the userland stack
and providing a fix.

MFC after:		3 days
2020-01-12 15:45:27 +00:00
Edward Tomasz Napierala
ca603bb1ee dd kern_getpriority(), make Linuxulator use it.
Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22842
2020-01-12 14:25:44 +00:00
Edward Tomasz Napierala
7a0ef283e6 Add kern_setpriority(), use it in Linuxulator.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22841
2020-01-12 13:38:51 +00:00
Mateusz Guzik
d199ad3b44 Add "panicked" boolean which can be tested instead of panicstr
The test is performed all the time and reading entire panicstr to do it
wastes space.
2020-01-12 06:09:10 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik
76a49ebaa6 sysctl: add missing CLTFLAG_MPSAFE annotation to CONST_STRING 2020-01-12 05:25:06 +00:00
Mateusz Guzik
a314aba874 vm: add missing CLTFLAG_MPSAFE annotations
This covers all vm/* files.
2020-01-12 05:08:57 +00:00
Mateusz Guzik
638af813d9 dtrace: add missing CLTFLAG_MPSAFE annotations 2020-01-12 04:53:22 +00:00
Mateusz Guzik
20fa645666 zfs: add missing CLTFLAG_MPSAFE annotations 2020-01-12 04:53:01 +00:00
Kyle Evans
89476f9c99 regulator: small enhancements to regulator_shutdown
Highlights:

- Exit early if we're not disabling unused regulators; there's no need to
  take the regulator topology lock and re-evaluate this every iteration, as
  it's not going to change.
- Don't emit a notice that we're shutting down a regulator if it's not
  enabled, to reduce noise.
- Mention the outcome of the shutdown, to aide debugging and easily let
  developer/user collect list of regulators we actually shutdown to
  determine problematic one.

Reviewed by:	manu
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D22213
2020-01-12 04:07:03 +00:00
Mateusz Guzik
91de98e6d4 vfs: only recalculate watermarks when limits are changing
Previously they would get recalculated all the time, in particular in:
getnewvnode -> vcheckspace -> vspace
2020-01-11 23:00:57 +00:00
Mateusz Guzik
e6ae744e0e vfs: deduplicate vnode allocation logic
This creates a dedicated routine (vn_alloc) to allocate vnodes.

As a side effect code duplicationw with getnewvnode_reserve is eleminated.

Add vn_free for symmetry.
2020-01-11 22:59:44 +00:00
Mateusz Guzik
b52d50cf69 vfs: prealloc vnodes in getnewvnode_reserve
Having a reserved vnode count does not guarantee that getnewvnodes wont
block later. Said blocking partially defeats the purpose of reserving in
the first place.

Preallocate instaed. The only consumer was always passing "1" as count
and never nesting reservations.
2020-01-11 22:58:14 +00:00
Mateusz Guzik
6928306764 vfs: incomplete pass at converting more ints to u_long
Most notably numvnodes and freevnodes were u_long, but parameters used to
govern them remained as ints.
2020-01-11 22:56:20 +00:00
Mateusz Guzik
bf62296f35 vfs: add missing CLTFLA_MPSAFE annotations
This covers all kern/vfs_*.c files.
2020-01-11 22:55:12 +00:00
Justin Hibbits
7d7671db00 powerpc/mpc85xx: Fix localbus child reg property decoding
r302340, as an attempt to fix the localbus child handling post-rman change,
actually broke child resource allocation, due to typos in
fdt_lbc_reg_decode().  This went unnoticed because there aren't any drivers
currently in tree that use localbus.
2020-01-11 22:29:44 +00:00
Gleb Smirnoff
629667a148 Pacify gcc.
Reported by:	rlibby
2020-01-11 20:07:30 +00:00
Bjoern A. Zeeb
e1891232fc in6_mcast: make in6_joingroup_locked() static
in6_joingroup_locked() is only used file-local. No need to export it
hance make it static.
2020-01-11 18:55:12 +00:00
Emmanuel Vadot
c9f3a1ac17 arm64: allwinner: dtso: Add spi0 spigen DTSO
This overlays can be used on A64 board to use spigen and spi(8)
on the spi0 pins.

Tested On:  Pine64-LTS, A64-Olinuxino

Submitted by:	Gary Otten <gdotten@gmail.com>
2020-01-11 18:36:10 +00:00
Hans Petter Selasky
ae5b45c86e Make sure the VNET is properly set when reaping mbufs in ipoib.
Else the following panic may happen:

panic()
icmp_error()
ipoib_cm_mb_reap()
linux_work_fn()
taskqueue_run_locked()
taskqueue_thread_loop()
fork_exit()
fork_trampoline()

Submitted by:	Andreas Kempe <kempe@lysator.liu.se>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-11 12:02:16 +00:00
Hans Petter Selasky
5bc41c932f Revert r356598 for now because it breaks some AMD based XHCI controllers.
Reported by:	jkim @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-11 11:38:02 +00:00
Kirk McKusick
27a6257130 When a read error occurs while fetching a directory block to delete
or rename an entry in it, properly reset the link count of the inode
associated with the entry that was to have been changed.

Tested by: Peter Holm
MFC after: 7 days
2020-01-11 03:18:47 +00:00
Pedro F. Giffuni
7e4c9d4893 Update ELFOSABI_* constants with OpenVOS.
Reference:
	https://www.sco.com/developers/gabi/latest/ch4.eheader.html
2020-01-11 01:44:55 +00:00
Jung-uk Kim
f425b8be7e MFV: r356607
Import ACPICA 20200110.
2020-01-10 22:49:14 +00:00
Gleb Smirnoff
ed6cbf4805 Add pfil(9) hook to vtnet(4).
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.

However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
2020-01-10 21:22:03 +00:00
Gleb Smirnoff
9328cbc047 Always multiple vm.pgcache_zone_max to number of CPUs, and rename it
respectively.  The tunable controls how big is the size of per-cpu
vm page cache.  Previously the value was split for all CPUs in system,
so configuring same value on machines with different count of CPUs
yielded in different cache size available to a particular CPU.

Reviewed by:	markj
Obtained from:	Netflix
2020-01-10 19:32:08 +00:00
Emmanuel Vadot
ca4387843e arm: allwinner: axp209: Add regnode_status method
This allow consumers to check if the regulator is enable or not.

MFC after:	1 week
2020-01-10 18:53:14 +00:00
Emmanuel Vadot
b74b94d2a1 twsi: Rework how we handle the i2c messages
We use to handle each message separately in i2c_transfer but that cannot
work with message with NOSTOP as it confuses the controller that we disable
the interrupts and start a new message.
Handle every message in the interrupt handler and fire a new start condition
if the previous message have NOSTOP, the controller understand this as a
repeated start.
This fixes booting on Allwinner A10/A20 platform where before the i2c controller
used to write 0 to the PMIC register that control the regulators as it though that
this was the continuation of the write message.

Tested on:   A20 BananaPi, Cubieboard 1 (kevans)
Reported by:	kevans
MFC after:	1 month
2020-01-10 18:52:14 +00:00
Kyle Evans
1171c633fb Set .ORDER for makesyscalls generated files
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.

Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.

Reviewed by:	brooks
Discussed with:	sjg
Differential Revision:	https://reviews.freebsd.org/D23099
2020-01-10 18:24:17 +00:00
Kyle Evans
554f71e2b3 makesyscalls.lua: generate all files in /tmp, write into place at the end
This makes makesyscalls.lua more parallel-friendly, or at least not as
hostile to the idea. We get into situations where we're running parallel if
we end up with MAKE_JOBS>1 entering any of the sysent targets, since each
output file is recognized a distinct build step that needs to be executed.

Another commit will add some .ORDER to further improve the situation.

Reported by:	jhb
Reviewed by:	brooks
Differential Revision:	https://reviews.freebsd.org/D23098
2020-01-10 18:22:14 +00:00
Kyle Evans
3898f9bdb3 a10_ahci: grab the target-supply regulator and enable it
This regulator is marked regulator-boot-on, but it will get shutdown if it's
not actually used/enabled by a driver. This should fix sata on the
cubieboard{1,2}.

Reported by:	Ray White @ UWaterloo
Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D23112
2020-01-10 14:09:59 +00:00
Hans Petter Selasky
92dfc0fc1d Check the XHCI endpoint state before stopping any endpoint.
This avoids getting the XHCI_TRB_ERROR_CONTEXT_STATE error code from the XHCI
controller when the endpoint is disabled or already stopped.

Suggested by:	Shichun.Ma@dell.com
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-10 09:32:44 +00:00
Hans Petter Selasky
b8ffd2d5d6 Define the XHCI endpoint states.
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-10 09:07:43 +00:00
Justin Hibbits
a11dc32ebc powerpc: Prevent infinite loop in moea_sync_icache()
This applies r344049 to the 32-bit pmap.

Reported by:	Mark Millard <marklmi_yahoo.com>
2020-01-10 04:13:16 +00:00
Mitchell Horne
d2b6a2ff1e Replace inline assembly with rdtime macro
This macro is used elsewhere and is slightly cleaner. NFC.
2020-01-10 03:17:28 +00:00
Justin Hibbits
4dc25d4452 powerpc: Mark cpu_feature-based sysctls as MP_SAFE
hw.floatingpoint and hw.altivec are effectively runtime constants (bits from
the cpu_feature bitfield), so don't need Giant, or any locking for that
matter.
2020-01-10 03:16:40 +00:00
Justin Hibbits
03b6e7a627 powerpc/powernv: Un-Giant-ify opal_nvram driver
It may be possible to make this completely lock free, but for now it's using
a statically allocated bounce buffer in the softc, so it needs to be
guarded.
2020-01-10 01:24:49 +00:00
Ian Lepore
8bfc473c0e Remove scary-looking printf output that happens when you kldload dtrace on
arm.  Replace it with a comment block explaining why the function is empty
on 32-bit arm.
2020-01-09 22:51:37 +00:00
Kyle Evans
acf411aafe dwc_otg: fix fdt attachment for newer bcm2708-usb nodes
The newer versions of RPi FDT flipped the order of the interrupts
specification and added an 'interrupt-names' property for driver aide in
finding the correct interrupt, rather than assuming the positions. Use it if
it's available, or fallback to the old method if there is no interrupt-names
property with a usb value.

This has been tested with both old RPi3B FDT and new RPi3B FDT, USB again
works on the latter.

Reported by:	Tom Yan <tom.ty89 gmail com>
MFC after:	3 days
2020-01-09 19:22:11 +00:00
Mark Johnston
860bb7a04c UMA: Don't destroy zones after the system shutdown process starts.
Some kernel subsystems, notably ZFS, will destroy UMA zones from a
shutdown eventhandler.  This causes the zone to be drained.  For slabs
that are mapped into KVA this can be very expensive and so it needlessly
delays the shutdown process.

Add a new state to the "booted" variable, BOOT_SHUTDOWN.  Once
kern_reboot() starts invoking shutdown handlers, turn uma_zdestroy()
into a no-op, provided that the zone does not have a custom finalization
routine.

PR:		242427
Reviewed by:	jeff, kib, rlibby
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23066
2020-01-09 19:17:42 +00:00
Kyle Evans
d2ccf385fd bcm2835_vcbus: hide 'checking root' messages under bootverbose 2020-01-09 19:13:09 +00:00
John Baldwin
5ac518b51f Add stricter checking on mac key lengths.
Negative lengths are always invalid.  The key length should also
be zero for hash algorithms that do not accept a key.

admbugs:	949
Reported by:	Yuval Kanarenstein <yuvalk@ssd-disclosure.com>
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23094
2020-01-09 18:29:59 +00:00
Alexander V. Chernikov
ead85fe415 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
Warner Losh
0b4da9c8e4 Const-poison the cam_sim_* convenience accessor functions.
These don't modify the cam_sim, so make that parameter const.
2020-01-09 16:34:54 +00:00
Gleb Popov
dfead4180e Fix typo: MANGAEMENT_PROTOCOL_OUT -> MANAGEMENT_PROTOCOL_OUT.
Approved by:	allanjude
2020-01-09 15:21:42 +00:00
Mark Johnston
dc727127f1 Change malloc_domain() to return the allocation size to the caller.
Otherwise the malloc type accounting in malloc_domainset(9) is wrong
after r355203.

Reviewed by:	rlibby
Reported by:	kaktus
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23095
2020-01-09 15:02:48 +00:00
Mark Johnston
c23df8eafa lagg: Further cleanup of the rr_limit option.
Add an option flag so that arbitrary updates to a lagg's configuration
do not clear sc_stride.  Preseve compatibility for old ifconfig
binaries.  Update ifconfig to use the new flag and improve the casting
used when parsing the option parameter.

Modify the RR transmit function to avoid locklessly reading sc_stride
twice.  Ensure that sc_stride is always 1 or greater.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23092
2020-01-09 14:58:41 +00:00
Andrew Turner
f2792e5e55 Add atomic_testandset/clear on arm64.
These will reportedly be used in future uma changes.

MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D23019
2020-01-09 10:26:36 +00:00
Hans Petter Selasky
7ba6c62fa0 Fix a XHCI driver issue with Intel's Gemini Lake SOC.
Do not configure any endpoint twice, but instead keep track of which
endpoints are configured on a per device basis, and use an evaluate
endpoint context command instead. When changing the configuration make
sure all endpoints get deconfigured and the configured endpoint mask
is reset.

This fixes an issue where an endpoint might stop working if there is
an error and the endpoint needs to be reconfigured as a part of the
error recovery mechanism in the FreeBSD USB stack.

Tested by:	Shichun.Ma@dell.com
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-09 09:29:24 +00:00
Kyle Evans
6a38cd3a54 kern/Makefile: systrace_args.c is also generated 2020-01-09 06:10:25 +00:00
Kyle Evans
39eae263cd shmfd: posix_fallocate(2): only take rangelock for section we need
Other mechanisms that resize the shmfd grab a write lock from 0 to OFF_MAX
for safety, so we still get proper synchronization of shmfd->shm_size in
effect. There's no need to block readers/writers of earlier segments when
we're just reserving more space, so narrow the scope -- it would likely be
safe to narrow it completely to just the section of the range that extends
beyond our current size, but this likely isn't worth it since the size isn't
stable until the writelock is granted the first time.

Suggested by:	cem (passing comment)
2020-01-09 04:03:17 +00:00
Kyle Evans
c7bab2a7ca if_vmove: return proper error status
if_vmove can fail if it lost a race and the vnet's already been moved. The
callers (and their callers) can generally cope with this, but right now
success is assumed. Plumb out the ENOENT from if_detach_internal if it
happens so that the error's properly reported to userland.

Reviewed by:	bz, kp
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D22780
2020-01-09 03:52:50 +00:00
Ryan Libby
4a8b575c6b uma: unify layout paths and improve efficiency
Unify the keg layout selection paths (keg_small_init, keg_large_init,
keg_cachespread_init), and slightly improve memory efficiecy by:
 - using the padding of the final item to store the slab header,
 - not going OFFPAGE if we have a choice unless it improves efficiency.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23048
2020-01-09 02:03:17 +00:00
Ryan Libby
54c5ae804f uma: reorganize flags
- Garbage collect UMA_ZONE_PAGEABLE & UMA_ZONE_STATIC.
 - Move flag VTOSLAB from public to private.
 - Introduce public NOTPAGE flag and make HASH private.
 - Introduce public NOTOUCH flag and make OFFPAGE private.
 - Update man page.

The net effect of this should be to make the contract with clients more
clear.  Clients should choose constraints, UMA will figure out how to
implement them.  This also breaks the confusing double meaning of
OFFPAGE.

Reviewed by:	jeff, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D23016
2020-01-09 02:03:03 +00:00
Bjoern A. Zeeb
334fc5822b vnet: virtualise more network stack sysctls.
Virtualise tcp_always_keepalive, TCP and UDP log_in_vain.  All three are
set in the netoptions startup script, which we would love to run for VNETs
as well [1].

While virtualising the log_in_vain sysctls seems pointles at first for as
long as the kernel message buffer is not virtualised, it at least allows
an administrator to debug the base system or an individual jail if needed
without turning the logging on for all jails running on a system.

PR:		243193 [1]
MFC after:	2 weeks
2020-01-08 23:30:26 +00:00
Ian Lepore
e371a9e6cc Remove some trailing whitespace; no functional changes. 2020-01-08 23:06:13 +00:00
Ian Lepore
ca16c2b8de Split the code to find and add iicbus children out to its own function.
Move the decision to take an early exit from that function after adding
children based on FDT data into the #ifdef FDT block, so that it doesn't
offend coverity's notion of how the code should be written.  (What's the
point of compilers optimizing away dead code if static analyzers won't
let you use the feature in conjuction with an #ifdef block?)

Reported by:	coverity via vangyzen@
2020-01-08 23:03:47 +00:00
Ian Lepore
a5910414d4 Change some KASSERT to device_printf + return EINVAL. There's no need to
bring the whole kernel down due to a configuration error detected when a
module is loaded, it suffices to just not attach the device.
2020-01-08 22:48:14 +00:00
Ian Lepore
b29fdcf8b3 Init sc->maxbus to -1, not 0. It represents the highest array index that
has a non-NULL child bus stored in it, so the "none" value can't be zero
since that's a valid array index.  Also, when adding all possible buses
because there is no specific per-bus config, there's no need to reset
sc->maxbus on each loop iteration, it can be set once after the loop.
2020-01-08 22:45:32 +00:00
John Baldwin
ec212149ad Remove no-longer-used function prototype.
Reported by:	amd64-gcc
2020-01-08 22:16:26 +00:00
Ian Lepore
d9b35549d4 Ensure any reserved gpio pins get released if an early exit is taken
from the attach function.
2020-01-08 22:06:31 +00:00
Kyle Evans
f10405323a posixshm: implement posix_fallocate(2)
Linux expects to be able to use posix_fallocate(2) on a memfd. Other places
would use this with shm_open(2) to act as a smarter ftruncate(2).

Test has been added to go along with this.

Reviewed by:	kib (earlier version)
Differential Revision:	https://reviews.freebsd.org/D23042
2020-01-08 19:08:44 +00:00
Kyle Evans
7ce4072267 Bump __FreeBSD_version after r356510
linuxkpi kmod would need to rebuilt at a minimum; fileops layout has
changed.
2020-01-08 19:06:22 +00:00
Kyle Evans
2856d85ecb posix_fallocate: push vnop implementation into the fileop layer
This opens the door for other descriptor types to implement
posix_fallocate(2) as needed.

Reviewed by:	kib, bcr (manpages)
Differential Revision:	https://reviews.freebsd.org/D23042
2020-01-08 19:05:32 +00:00
John Baldwin
f57d4d4641 Remove unneeded cdevsw methods and D_NEEDGIANT.
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23079
2020-01-08 19:05:23 +00:00
John Baldwin
efb7929173 Use falloc_noinstall + finstall for crypto file descriptors.
Reviewed by:	cem, kib
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23078
2020-01-08 19:03:24 +00:00
John Baldwin
d2cdaed130 Add a reference count to cryptodev sessions.
This prevents use-after-free races with crypto requests (which may
sleep) and CIOCFSESSION as well as races from current CIOCFSESSION
requests.

admbugs:	949
Reported by:	Yuval Kanarenstein <yuvalk@ssd-disclosure.com>
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23077
2020-01-08 18:59:23 +00:00
Alexander Motin
b2cdfb72f4 Fix copy-paste bug in HMB free code.
MFC after:	2 weeks
X-MFC-with:	r356474
2020-01-08 18:26:23 +00:00
Mark Johnston
f8091e2c8f linprocfs: Fix some bugs in the maps file implementation.
- Export the offset into the backing object, not the object size.
- Fix a bug where we would print the previous entry's "offset" when a
  map_entry has no object.
- Try to identify shared mappings.  Linux prints "s" when the mapping
  "may be shared".  This attempt is not perfect, for example, we print
  "p" for anonymous memory that may be shared via
  minherit(INHERIT_SHARE).

PR:		240992
Reviewed by:	kib
MFC after:	1 week
MFC note:	no OBJ_ANON in stable/12
Differential Revision:	https://reviews.freebsd.org/D23062
2020-01-08 16:57:08 +00:00
Emmanuel Vadot
650a41c6d0 regulator: fix regnode_method_get_voltage
This method is supposed to write the voltage into uvolt
and return an errno compatible value.

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D23006
2020-01-08 11:30:42 +00:00
Emmanuel Vadot
546652fbe3 rk805: Add regnode_status method
This allow consumers to check if the regulator is enable or not.

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D23005
2020-01-08 11:30:03 +00:00
Emmanuel Vadot
d194b63134 rk808: Add min/max for the switch regulators
The two switch regulator are always 3.0V.
Add a special case in get_voltage that if min=max we directly
return the value without calculating it.

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D23004
2020-01-08 11:29:22 +00:00
Kristof Provost
29bfe2102d vtnet: Pre-allocate debugnet data immediately
Don't wait until the vtnet_debugnet_init() call happens, because at that
point we might already have allocated something from
vtnet_tx_header_zone.

Some systems showed this panic:

        vtnet0: link state changed to UP
        panic: keg vtnet_tx_hdr initialization after use.
        cpuid = 5
        time = 1578427700
        KDB: stack backtrace:
        db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004db427f0
        vpanic() at vpanic+0x17e/frame 0xfffffe004db42850
        panic() at panic+0x43/frame 0xfffffe004db428b0
        uma_zone_reserve() at uma_zone_reserve+0xf6/frame 0xfffffe004db428f0
        vtnet_debugnet_init() at vtnet_debugnet_init+0x77/frame 0xfffffe004db42930
        debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x42/frame 0xfffffe004db42980
        do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe004db429d0
        taskqueue_run_locked() at taskqueue_run_locked+0x178/frame 0xfffffe004db42a30
        taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe004db42a50
        ithread_loop() at ithread_loop+0x1d6/frame 0xfffffe004db42ab0
        fork_exit() at fork_exit+0x80/frame 0xfffffe004db42af0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db42af0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        KDB: enter: panic
        [ thread pid 12 tid 100011 ]
        Stopped at      kdb_enter+0x37: movq    $0,0x1084eb6(%rip)
        db>

Reviewed by:	cem, markj
Differential Revision:	https://reviews.freebsd.org/D23073
2020-01-08 10:06:32 +00:00
Alexander Motin
6de4e458fa Minor adjustments to r356474 and r356480.
Reported by:	jkim, imp
MFC after:	2 weeks
X-MFC-with:	r356474
2020-01-07 23:29:54 +00:00
John Baldwin
f39b4f8899 Work around lld's inability to handle undefined weak symbols on risc-v.
lld on RISC-V is not yet able to handle undefined weak symbols for
non-PIC code in the code model (medany/medium) used by the RISC-V
kernel.

Both GCC and clang emit an auipc / addi pair of instructions to
generate an address relative to the current PC with a 31-bit offset.
Undefined weak symbols need to have an address of 0, but the kernel
runs with PC values much greater than 2^31, so there is no way to
construct a NULL pointer as a PC-relative value.  The bfd linker
rewrites the instruction pair to use lui / addi with values of 0 to
force a NULL pointer address.  (There are similar cases for 'ld'
becoming auipc / ld that bfd rewrites to lui / ld with an address of
0.)

To work around this, compile the kernel with -fPIE when using lld.
This does not make the kernel position-independent, but it does
force the compiler to indirect address lookups through GOT entries
(so auipc / ld against a GOT entry to fetch the address).  This
adds extra memory indirections for global symbols, so should be
disabled once lld is finally fixed.

A few 'la' instructions in locore that depend on PC-relative
addressing to load physical addresses before paging is enabled have to
use auipc / addi and not indirect via GOT entries, so change those to
use 'lla' which always uses auipc / addi for both PIC and non-PIC.

Submitted by:	jrtc27
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23064
2020-01-07 23:18:31 +00:00