Commit Graph

139790 Commits

Author SHA1 Message Date
jhb
dee7ebd77d Use falloc_noinstall + finstall for crypto file descriptors.
Reviewed by:	cem, kib
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23078
2020-01-08 19:03:24 +00:00
jhb
f03bc830e7 Add a reference count to cryptodev sessions.
This prevents use-after-free races with crypto requests (which may
sleep) and CIOCFSESSION as well as races from current CIOCFSESSION
requests.

admbugs:	949
Reported by:	Yuval Kanarenstein <yuvalk@ssd-disclosure.com>
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23077
2020-01-08 18:59:23 +00:00
mav
ef8d51daa1 Fix copy-paste bug in HMB free code.
MFC after:	2 weeks
X-MFC-with:	r356474
2020-01-08 18:26:23 +00:00
markj
ff34767453 linprocfs: Fix some bugs in the maps file implementation.
- Export the offset into the backing object, not the object size.
- Fix a bug where we would print the previous entry's "offset" when a
  map_entry has no object.
- Try to identify shared mappings.  Linux prints "s" when the mapping
  "may be shared".  This attempt is not perfect, for example, we print
  "p" for anonymous memory that may be shared via
  minherit(INHERIT_SHARE).

PR:		240992
Reviewed by:	kib
MFC after:	1 week
MFC note:	no OBJ_ANON in stable/12
Differential Revision:	https://reviews.freebsd.org/D23062
2020-01-08 16:57:08 +00:00
manu
f9baadb811 regulator: fix regnode_method_get_voltage
This method is supposed to write the voltage into uvolt
and return an errno compatible value.

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D23006
2020-01-08 11:30:42 +00:00
manu
b63b91aad5 rk805: Add regnode_status method
This allow consumers to check if the regulator is enable or not.

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D23005
2020-01-08 11:30:03 +00:00
manu
4bbb0710ba rk808: Add min/max for the switch regulators
The two switch regulator are always 3.0V.
Add a special case in get_voltage that if min=max we directly
return the value without calculating it.

Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D23004
2020-01-08 11:29:22 +00:00
kp
0a0869f580 vtnet: Pre-allocate debugnet data immediately
Don't wait until the vtnet_debugnet_init() call happens, because at that
point we might already have allocated something from
vtnet_tx_header_zone.

Some systems showed this panic:

        vtnet0: link state changed to UP
        panic: keg vtnet_tx_hdr initialization after use.
        cpuid = 5
        time = 1578427700
        KDB: stack backtrace:
        db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004db427f0
        vpanic() at vpanic+0x17e/frame 0xfffffe004db42850
        panic() at panic+0x43/frame 0xfffffe004db428b0
        uma_zone_reserve() at uma_zone_reserve+0xf6/frame 0xfffffe004db428f0
        vtnet_debugnet_init() at vtnet_debugnet_init+0x77/frame 0xfffffe004db42930
        debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x42/frame 0xfffffe004db42980
        do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe004db429d0
        taskqueue_run_locked() at taskqueue_run_locked+0x178/frame 0xfffffe004db42a30
        taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe004db42a50
        ithread_loop() at ithread_loop+0x1d6/frame 0xfffffe004db42ab0
        fork_exit() at fork_exit+0x80/frame 0xfffffe004db42af0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db42af0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        KDB: enter: panic
        [ thread pid 12 tid 100011 ]
        Stopped at      kdb_enter+0x37: movq    $0,0x1084eb6(%rip)
        db>

Reviewed by:	cem, markj
Differential Revision:	https://reviews.freebsd.org/D23073
2020-01-08 10:06:32 +00:00
mav
8f7704790f Minor adjustments to r356474 and r356480.
Reported by:	jkim, imp
MFC after:	2 weeks
X-MFC-with:	r356474
2020-01-07 23:29:54 +00:00
jhb
f3446f6a74 Work around lld's inability to handle undefined weak symbols on risc-v.
lld on RISC-V is not yet able to handle undefined weak symbols for
non-PIC code in the code model (medany/medium) used by the RISC-V
kernel.

Both GCC and clang emit an auipc / addi pair of instructions to
generate an address relative to the current PC with a 31-bit offset.
Undefined weak symbols need to have an address of 0, but the kernel
runs with PC values much greater than 2^31, so there is no way to
construct a NULL pointer as a PC-relative value.  The bfd linker
rewrites the instruction pair to use lui / addi with values of 0 to
force a NULL pointer address.  (There are similar cases for 'ld'
becoming auipc / ld that bfd rewrites to lui / ld with an address of
0.)

To work around this, compile the kernel with -fPIE when using lld.
This does not make the kernel position-independent, but it does
force the compiler to indirect address lookups through GOT entries
(so auipc / ld against a GOT entry to fetch the address).  This
adds extra memory indirections for global symbols, so should be
disabled once lld is finally fixed.

A few 'la' instructions in locore that depend on PC-relative
addressing to load physical addresses before paging is enabled have to
use auipc / addi and not indirect via GOT entries, so change those to
use 'lla' which always uses auipc / addi for both PIC and non-PIC.

Submitted by:	jrtc27
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23064
2020-01-07 23:18:31 +00:00
mav
0c75e47646 Increate HMB limit from 1% to 5%.
SSD capacity in laptops is growing faster then RAM size, so my original
guess seems too low on second thought.  Hopefully nobody will build large
array of those crappy SSDs.

MFC after:	2 weeks
X-MFC-with:	356474
2020-01-07 23:10:38 +00:00
glebius
33073a3d8c Fix a typo - passing wrong mbuf pointer to needs_udp_csum(). Will
trigger panic only on a kernel with RATELIMIT.

Submitted by:	rrs
2020-01-07 21:29:42 +00:00
mav
a91b9f9262 Add Host Memory Buffer support to nvme(4).
This allows cheapest DRAM-less NVMe SSDs to use some of host RAM (about
1MB per 1GB on the devices I have) for its metadata cache, significantly
improving random I/O performance.  Device reports minimal and preferable
size of the buffer.  The code limits it to 1% of physical RAM by default.
If the buffer can not be allocated or below minimal size, the device will
just have to work without it.

MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
2020-01-07 21:17:11 +00:00
melifaro
1c6615a32e Fix rtsock route message generation for interface addresses.
Reviewed by:	olivier
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22974
2020-01-07 21:16:30 +00:00
ian
dfd4484205 Add #ifdef option-test wrappers around another call to an arm/unwind.c
function which is only compiled-in with certain options.

Why is it always the most trivial part of a big commit that takes 3 tries
to get right?
2020-01-07 21:13:34 +00:00
mjg
7bf1b9c7de vfs: handle doomed vnodes in vdefer_inactive
vgone dooms the vnode while keeping VI_OWEINACT set and then drops the
interlock.

vputx can pick up the interlock and pass it to vdefer_inactive since the
flag is set.

The race is harmless, just don't defer anything as vgone will take care of it.

Reported by:	pho
2020-01-07 20:24:21 +00:00
emaste
201b368c14 Do not define TCPOUTFLAGS in rack_bbr_common
tcp_outflags isn't used in this source file and compilation failed with
external GCC on sparc64.  I'm not sure why only that case failed (perhaps
inconsistent -Werror config) but it is a legitimate issue to fix.

Reviewed by:	tuexen
Differential Revision:	https://reviews.freebsd.org/D23068
2020-01-07 17:57:08 +00:00
markj
88041c36b1 Decrease logging severity when adding a device or reading config table.
In PR 243056 a user reports some spam from smartpqi(4).  In particular,
the driver warns about an unrecognized PQI_CONF_TABLE_SECTION_SOFT_RESET
section (not yet defined in the driver, but handled in Linux), but this
doesn't cause any problems.  The Linux driver also does not warn about
unrecognized sections.

Also do not log a warning when a device is added, since this is routine.
Lower severity to DISC, to match pqisrc_remove_device().

PR:		243056
Reviewed by:	sbruno
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D23023
2020-01-07 16:07:30 +00:00
markj
c80353a5cc Define a unified pmap structure for i386.
The overloading of struct pmap for PAE and non-PAE pmaps results in
three distinct layouts for the structure, which is embedded in
struct vmspace.  This causes a large number of duplicate structure
definitions in the i386 kernel's CTF type graph.

Since most pmap fields are the same in the two pmaps, simply provide
side-by-side variants of the fields that are distinct, using fixed-size
types.

PR:		242689
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D22896
2020-01-07 15:59:31 +00:00
markj
db5f34cc37 Consistently use pmap_t instead of struct pmap *.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-01-07 15:59:02 +00:00
mjg
1204b9c882 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
mjg
5569680139 vfs: trylock in vfs_msync and refactor the func
- use LK_NOWAIT instead of calling VOP_ISLOCKED before deciding to lock
- evaluate flags before looping over vnodes

Reviewed by:	kib
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23035
2020-01-07 15:44:19 +00:00
mjg
037b78139b vfs: use a dedicated counter for free vnode recycling
Otherwise vlrureclaim activitity is mixed in and it is hard to tell which
vnodes got reclaimed.
2020-01-07 15:42:01 +00:00
kp
22ef484d6c sifive: Fix incorrect tx/rx ctrl defines
Happily these were never used, but they should be correct anyway.

Reported by:	Nicholas O'Brien <nickisobrien_gmail.com>
Sponsored by:	Axiado
2020-01-07 09:02:14 +00:00
mjg
5c232f8fed zfs: plug a vnode reserve leak in zfs_make_xattrdir 2020-01-07 04:34:29 +00:00
mjg
b1939ef450 vfs: prevent numvnodes and freevnodes re-reads when appropriate
Otherwise in code like this:
if (numvnodes > desiredvnodes)
	vnlru_free_locked(numvnodes - desiredvnodes, NULL);

numvnodes can drop below desiredvnodes prior to the call and if the
compiler generated another read the subtraction would get a negative
value.
2020-01-07 04:34:03 +00:00
mjg
6facf4df8b vfs: annotate numvnodes and vnode_free_list_mtx with __exclusive_cache_line 2020-01-07 04:30:49 +00:00
mjg
d5666b82d2 vfs: eliminate v_tag from struct vnode
There was only one consumer and it was using it incorrectly.

It is given an equivalent hack.

Reviewed by:	jeff
Differential Revision:	https://reviews.freebsd.org/D23037
2020-01-07 04:29:34 +00:00
mjg
bb33b4e736 vfs: add a helper for allocating marker vnodes 2020-01-07 04:27:40 +00:00
andrew
338a93d2b7 Add more Arm arm64 CPU identification values
- Add all the Cortex-A CPU ID register values I can find.
 - Add the Neoverse-N1 ID regiser value [1]
 - Sort macros by register value.

PR:		243065
Submitted by:	Ali Saidi <alisaidi AT amazon.com> [1]
Sponsored by:	DARPA, AFRL (other than [1])
2020-01-06 20:57:59 +00:00
kaktus
1d97ce15ed kern_sysctl: make sysctl.debug work as intended
r136999 introduced SYSTCL_DEBUG but apparently "opt_sysctl.h" was never
included making the option ignored.

r322954 introduced sysctl.reuse_test with OID number equal to 0, effectively
shadowing the very special sysctl.debug one. Use OID_AUTO as it doesn't need
any special treatment.

Reviewed by:	kib (mentor)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23056
2020-01-06 19:47:59 +00:00
jhb
0b722f1c6f Simplify arguments to signal handlers on mips.
- Use ksi_addr directly as si_addr in the siginfo instead of the
  'badvaddr' register.
- Remove a duplicate assignment of si_code.
- Use ksi_addr as the 4th argument to the old-style handler instead of
  'badvaddr'.

Reviewed by:	brooks, kevans
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D23013
2020-01-06 18:02:02 +00:00
rrs
c2ceba81e5 This catches rack up in the recent changes to ECN and
also commonizes the functions that both the freebsd and
rack stack uses.

Sponsored by:Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D23052
2020-01-06 15:29:14 +00:00
rrs
583431074c This change adds a small feature to the tcp logging code. Basically
a connection can now have a separate tag added to the id.

Obtained from:	Lawrence Stewart
Sponsored by:	Netflix Inc
Differential Revision:	https://reviews.freebsd.org/D22866
2020-01-06 12:48:06 +00:00
kaktus
b05c0a53bd sysctl: mark more nodes as MPSAFE
vm.kvm_size and vm.kvm_free are read only and marked as MPSAFE on i386
already. Mark them as that on amd64 and arm64 too to avoid locking Giant.

Reviewed by:	kib (mentor)
Approved by:	kib (mentor)
Differential Revision:	https://reviews.freebsd.org/D23039
2020-01-06 10:52:13 +00:00
hselasky
6a61e40c31 Add own counter for cancelled USB transfers.
Do not count these as errors.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-01-06 09:49:20 +00:00
jeff
2d206baae6 Fix uma boot pages calculations on NUMA machines that also don't have
MD_UMA_SMALL_ALLOC.  This is unusual but not impossible.  Fix the alignemnt
of zones while here.  This was already correct because uz_cpu strongly
aligned the zone structure but the specified alignment did not match
reality and involved redundant defines.

Reviewed by:	markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D23046
2020-01-06 02:51:19 +00:00
jeff
4281552371 The fix in r356353 was insufficient. Not every architecture returns 0 for
EARLY_COUNTER.  Only amd64 seems to.

Suggested by:	markj
Reported by:	lwhsu
Reviewed by:	markj
PR:		243117
2020-01-05 22:54:25 +00:00
bz
11061156a3 netgraph/ng_bridge: Reestablish old ABI
In order to be able to merge r353026 bring back support for the old
cookie API for a transition period in 12.x releases (and possibly 13)
before the old API can be removed again entirely.

Suggested by:	julian
Submitted by:	Lutz Donnerhacke (lutz donnerhacke.de)
PR:		240787
Reviewed by:	julian
MFC after:	2 weeks
X-MFC with:	r353026
Differential Revision:	https://reviews.freebsd.org/D21961
2020-01-05 19:14:16 +00:00
tuexen
740267e0fe Don't make the sendall iterator as being up if it could not be started.
MFC after:		1 week
2020-01-05 14:08:01 +00:00
tuexen
c2e0570386 Return -1 consistently if an error occurs.
MFC after:	1 week
2020-01-05 14:06:40 +00:00
tuexen
02e5dff4f0 Ensure that we don't miss a trigger for kicking off the SCTP iterator.
Reported by:		nwhitehorn@
MFC after:		1 week
2020-01-05 13:56:32 +00:00
mjg
06b12c0bed locks: add default delay struct
Use it for all primitives. This makes everything fit in 8 bytes.
2020-01-05 12:48:19 +00:00
mjg
cd5aa4442c locks: convert delay times to u_short
int is just a waste of space for this purpose.
2020-01-05 12:47:29 +00:00
mjg
2405dcd4f0 Mark mtxpool_sleep as read mostly, not frequently.
The latter is not justified.
2020-01-05 12:46:35 +00:00
kevans
24ac2fb6ec shm: correct KPI mistake introduced around memfd_create
When file sealing and shm_open2 were introduced, we should have grown a new
kern_shm_open2 helper that did the brunt of the work with the new interface
while kern_shm_open remains the same. Instead, more complexity was
introduced to kern_shm_open to handle the additional features and consumers
had to keep changing in somewhat awkward ways, and a kern_shm_open2 was
added to wrap kern_shm_open.

Backpedal on this and correct the situation- kern_shm_open returns to the
interface it had prior to file sealing being introduced, and neither
function needs an initial_seals argument anymore as it's handled in
kern_shm_open2 based on the shmflags.
2020-01-05 04:06:40 +00:00
kevans
dcd4cbf26b shmfd/mmap: restrict maxprot with MAP_SHARED + F_SEAL_WRITE
If a write seal is set on a shared mapping, we must exclude VM_PROT_WRITE as
the fd is effectively read-only. This was discovered by running
devel/linux-ltp, which mmap's with acceptable protections specified then
attempts to raise to PROT_READ|PROT_WRITE with mprotect(2), which we
allowed.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22978
2020-01-05 03:15:16 +00:00
mjg
eaef9d7970 vfs: factor out avoidable branches in _vn_lock 2020-01-05 01:00:11 +00:00
mjg
22dbb8bf5e vfs: drop thread argument from vinactive 2020-01-05 00:59:47 +00:00
mjg
12c96f7830 vfs: patch up vnode count assertions to report found value 2020-01-05 00:59:16 +00:00