Commit Graph

133411 Commits

Author SHA1 Message Date
Mitchell Horne
9ead45af7b RISC-V: copy kernelname from the environment
This is allows kern.bootfile to report the correct value.
2020-08-15 16:15:34 +00:00
Mitchell Horne
3d89a9759f arm64: parse HWCAP values using user_cpu_desc
The hard work of parsing fields per-CPU, handling heterogeneous
features, and excluding features from userspace is already done by
update_special_regs. We can build our set of HWCAPs from the result.

This exposed a small bug in update_special_regs, in which the
generated bitmask was not wide enough, and as a result some bits
weren't being exposed in user_cpu_desc. Fix this.

While here, adjust some formatting.

Reviewed by:	andrew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26069
2020-08-15 15:06:39 +00:00
Mitchell Horne
6194973636 arm64: update instruction set attribute register definitions
This adds definitions for the latest additions to the AA64ISAR[01] ID
registers. This brings these registers in sync with ARMv8.6 initial spec
release.

An future change will parse many of these fields for HWCAP features.

Reviewed by:	andrew, manu, markj (all previous versions)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26029
2020-08-15 14:57:53 +00:00
Alexander V. Chernikov
bec053ffe0 Make net.inet6.ip6.deembed_scopeid behaviour default & remove sysctl.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25637
2020-08-15 11:37:44 +00:00
Michael Tuexen
04996cb74b Enter epoch earlier. This is needed because we are exiting it also
in error cases.

MFC after:	1 week
2020-08-15 11:22:07 +00:00
Li-Wen Hsu
fc4c42c9e3 Remove redeclaration found by gcc build
Reviewed by:	Jacob Keller <jacob.e.keller@intel.com>
Suggested editing from:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25954
2020-08-15 03:26:00 +00:00
Li-Wen Hsu
60f3863fa2 Remove redeclaration found by gcc build
Reviewed by:	erj
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25953
2020-08-15 03:20:39 +00:00
Jason A. Harmening
f3ba85ccc8 kenv: avoid sleepable alloc for integer tunables
Avoid performing a potentially-blocking malloc for kenv lookups that will only
perform non-destructive integer conversions on the returned buffer. Instead,
perform the strtoq() in-place with the kenv lock held.

While here, factor the logic around kenv_lock acquire and release into
kenv_acquire() and kenv_release(), and use these functions for some light
cleanup. Collapse getenv_string_buffer() into kern_getenv(), as the former
no longer has any other callers and the only additional task performed by
the latter is a WITNESS check that hasn't been useful since r362231.

PR:		248250
Reported by:	gbe
Reviewed by:	mjg
Tested by:	gbe
Differential Revision:	https://reviews.freebsd.org/D26010
2020-08-14 21:37:38 +00:00
Alexander V. Chernikov
2f23f45b20 Simplify dom_<rtattach|rtdetach>.
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
  them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
  to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
  directly.

Differential Revision:	https://reviews.freebsd.org/D26053
2020-08-14 21:29:56 +00:00
Conrad Meyer
ea7b737a6f vm_pageout: Correct threshold calculation on single-CPU systems
Reported by:	Michael Butler
X-MFC-With:	r364129
2020-08-14 18:48:48 +00:00
Mark Johnston
85232c2ff1 Rename the pipe_map field of struct pipe.
This is to avoid conflicts with a upcoming macro.  pipe_pages is a
more accurate name since the field tracks pages wired into the kernel as
part of a process-to-process copy operation.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-14 14:50:41 +00:00
Emmanuel Vadot
9b9210015d Bump __FreeBSD_version after r364232
We now have everything needed for DRM from Linux v5.4, let the
ports tree know that.
2020-08-14 08:49:40 +00:00
Emmanuel Vadot
0e123c13fe linuxkpi: Add a few wait_bit functions
The linux function does a lot more than that as multiple waitqueue could be fetch
from a static table based on the hash of the argument but since in DRM it's only used
in one place just add a single variable.
We will probably need to change that in the futur but it's ok with DRM even with current
linux.

Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26054
2020-08-14 08:48:17 +00:00
Bryan Drewery
3869d41465 lagg: Avoid adding a port to a lagg device being destroyed.
The lagg_clone_destroy() handles detach and waiting for ifconfig callers
to drain already.

This narrows the race for 2 panics that the tests triggered. Both were a
consequence of adding a port to the lagg device after it had already detached
from all of its ports. The link state task would run after lagg_clone_destroy()
free'd the lagg softc.

    kernel:trap_fatal+0xa4
    kernel:trap_pfault+0x61
    kernel:trap+0x316
    kernel:witness_checkorder+0x6d
    kernel:_sx_xlock+0x72
    if_lagg.ko:lagg_port_state+0x3b
    kernel:if_down+0x144
    kernel:if_detach+0x659
    if_tap.ko:tap_destroy+0x46
    kernel:if_clone_destroyif+0x1b7
    kernel:if_clone_destroy+0x8d
    kernel:ifioctl+0x29c
    kernel:kern_ioctl+0x2bd
    kernel:sys_ioctl+0x16d
    kernel:amd64_syscall+0x337

    kernel:trap_fatal+0xa4
    kernel:trap_pfault+0x61
    kernel:trap+0x316
    kernel:witness_checkorder+0x6d
    kernel:_sx_xlock+0x72
    if_lagg.ko:lagg_port_state+0x3b
    kernel:do_link_state_change+0x9b
    kernel:taskqueue_run_locked+0x10b
    kernel:taskqueue_run+0x49
    kernel:ithread_loop+0x19c
    kernel:fork_exit+0x83

PR:		244168
Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D25284
2020-08-13 22:06:27 +00:00
Conrad Meyer
8a0edc914f Add prng(9) API
Add prng(9) as a replacement for random(9) in the kernel.

There are two major differences from random(9) and random(3):

- General prng(9) APIs (prng32(9), etc) do not guarantee an
  implementation or particular sequence; they should not be used for
  repeatable simulations.

- However, specific named API families are also exposed (for now: PCG),
  and those are expected to be repeatable (when so-guaranteed by the named
  algorithm).

Some minor differences from random(3) and earlier random(9):

- PRNG state for the general prng(9) APIs is per-CPU; this eliminates
  contention on PRNG state in SMP workloads.  Each PCPU generator in an
  SMP system produces a unique sequence.

- Better statistical properties than the Park-Miller ("minstd") PRNG
  (longer period, uniform distribution in all bits, passes
  BigCrush/PractRand analysis).

- Faster than Park-Miller ("minstd") PRNG -- no division is required to
  step PCG-family PRNGs.

For now, random(9) becomes a thin shim around prng32().  Eventually I
would like to mechanically switch consumers over to the explicit API.

Reviewed by:	kib, markj (previous version both)
Discussed with:	markm
Differential Revision:	https://reviews.freebsd.org/D25916
2020-08-13 20:48:14 +00:00
Alexander V. Chernikov
6cbadc4234 Move rtzone handling code to net/route_ctl.c
After moving the route control plane code from net/route.c,
 all rtzone users ended up being in net/route_ctl.c.
Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well
 to have everything in a single place.

While here, remove custom initializers from the zone.
It was added originally to avoid setup/teardown of costy per-cpu couters.
With these counters removed, the only remaining job was avoiding rte mutex
 setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex
 will soon be removed. With that in mind, there is no sense in keeping
 custom zone callbacks.

Differential Revision:	https://reviews.freebsd.org/D26051
2020-08-13 18:35:29 +00:00
Richard Scheffenegger
a459638fc4 TCP Cubic: Have Fast Convergence Heuristic work for ECN, and align concave region
The Cubic concave region was not aligned nicely for the very first exit from
slow start, where a 50% cwnd reduction is done instead of the normal 30%.

This addresses an issue, where a short line-rate burst could result from that
sudden jump of cwnd.

In addition, the Fast Convergence Heuristic has been expanded to work also
with ECN induced congestion response.

Submitted by:	chengc_netapp.com
Reported by:	chengc_netapp.com
Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25976
2020-08-13 16:45:55 +00:00
Richard Scheffenegger
2bb6dfabbe TCP Cubic: After leaving slowstart fix unintended cwnd jump.
Initializing K to zero in D23655 introduced a miscalculation,
where cwnd would suddenly jump to cwnd_max instead of gradually
increasing, after leaving slow-start.

Properly calculating K instead of resetting it to zero resolves
this issue. Also making sure, that cwnd is recalculated at the
earliest opportunity once slow-start is over.

Reported by:	chengc_netapp.com
Reviewed by:	chengc_netapp.com, tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25746
2020-08-13 16:38:51 +00:00
Richard Scheffenegger
f359d6ebbc Improve SACK support code for RFC6675 and PRR
Adding proper accounting of sacked_bytes and (per-ACK)
delivered data to the SACK scoreboard. This will
allow more aspects of RFC6675 to be implemented as well
as Proportional Rate Reduction (RFC6937).

Prior to this change, the pipe calculation controlled with
net.inet.tcp.rfc6675_pipe was also susceptible to incorrect
results when more than 3 (or 4) holes in the sequence space
were present, which can no longer all fit into a single
ACK's SACK option.

Reviewed by:	kbowling, rgrimes (mentor)
Approved by:	rgrimes (mentor, blanket)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D18624
2020-08-13 16:30:09 +00:00
Mitchell Horne
958a094323 Enable interrupts while handling traps
I observed hangs post-r362977 in QEMU with -smp 2, in which one thread
would acquire write access to an rm_lock (sysctllock) and get stuck
waiting in smp_rendezvous_cpus while the other CPU was servicing a trap.
The other thread was waiting for read access to the same lock, thus
causing deadlock.

It's clear that this is just one symptom of a larger problem. The
general expectation of MI kernel code is that interrupts are enabled.
Violating this assumption will at best create some additional latency,
but otherwise might cause locking or other unforeseen issues. All other
architectures do so for some subset of trap values, but this somehow got
missed in the RISC-V port. Enable interrupts now during kernel page
faults and for all user trap types.

The code in exception.S already knows to disable interrupts while
handling the return from exception, so there are no changes required
there.

Reviewed by:	jhb, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26017
2020-08-13 14:21:05 +00:00
Mitchell Horne
99c9fdd09a Small fixes in locore.S
- Properly set up the frame pointer
 - Hang if we return from mi_startup
 - Whitespace

Clearing the frame pointer marks the end of the backtrace. This fixes
"bt 0" in ddb, which previously would unwind one frame too far.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D26016
2020-08-13 14:17:36 +00:00
Mateusz Guzik
b38ad2683a vfs: add missing pwd_drop on error in namei_setup
Reported by:	pho
2020-08-13 10:24:45 +00:00
Alexander Motin
654f53ab6a Fill device serial_num and device_id in NVMe XPT.
It allows to report GEOM::lunid for nda(4) same as for nvd(4).  Since
NVMe now allows multiple LUs (namespaces) with multiple paths unique
LU identification is important.  The serial_num field is filled same
as before with the controller serial number, while device_id is based
on namespace GUID and/or EUI64 fields as recommended by "NVM Express:
SCSI Translation Reference" and matching nvd(4) at the end.

MFC after:	1 week
2020-08-13 02:32:46 +00:00
John Baldwin
40db51b42f Check that the frame pointer is within the current stack.
This same check is used on other architectures.  Previously this would
permit a stack frame to unwind into any arbitrary kernel address
(including unmapped addresses).

Reviewed by:	mhorne
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25996
2020-08-12 20:33:29 +00:00
John Baldwin
367de39efa Use uintptr_t instead of uint64_t for pointers in stack frames.
Reviewed by:	mhorne
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25995
2020-08-12 20:29:49 +00:00
Alexander Motin
97dc595da2 Report cpi->hba_* for nda(4) because why not.
MFC after:	1 week
2020-08-12 20:05:43 +00:00
Alexander Motin
1c7decd4de Report proper stripesize for nda(4).
Same as for nvd(4) report NPWG if present, otherise NOIOB.

MFC after:	1 week
2020-08-12 19:37:57 +00:00
Alexander Motin
e8cc9e1d84 Report attachment for nvd same as reported for nda.
MFC after:	1 week
2020-08-12 19:11:53 +00:00
John Baldwin
90699f2a76 Correct padding length for RISC-V PCPU data.
There was an additional 7 bytes of compiler-inserted padding at the
end of the structure visible via 'ptype /o' in gdb.

Reviewed by:	mhorne
Obtained from:	CheriBSD
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D25867
2020-08-12 18:45:36 +00:00
Mitchell Horne
f7d79f6c6d Correctly set error in rt_mpath_unlink
It is possible for rn_delete() to return NULL. If this happens, then set
*perror to ESRCH, as is done in the rest of the function.

Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D25871
2020-08-12 16:43:20 +00:00
Mark Johnston
9eb0cd08ae linprocfs: Fix some inaccuracies in meminfo.
- Fill out MemFree correctly.  Delete an ancient comment suggesting that
  we don't want to advertise the true quantity of free memory.
- Populate the Buffers field by reading vfs.bufspace.
- The page cache consists of all pages in page queues, not just the
  inactive queue.

PR:		248463
Reported and tested by:	danfe
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-08-12 16:08:44 +00:00
Vincenzo Maffione
6d84e76a25 iflib: netmap: improve rxsync to support IFLIB_HAS_RXCQ
For drivers with IFLIB_HAS_RXCQ set, there is a separate completion
queue. In this case, the netmap rxsync routine needs to update
rxq->ifr_cq_cidx in the same way it is updated by iflib_rxeof().
This improves the situation for vmx(4) and bnxt(4) drivers, which
use iflib and have the IFLIB_HAS_RXCQ bit set.

PR:	248494
MFC after:	3 weeks
2020-08-12 14:45:31 +00:00
Vincenzo Maffione
530960be8d iflib: refactor netmap_fl_refill and fix off-by-one issue
First, fix the initialization of the fl->ifl_rxd_idxs array,
which was affected by an off-by-one bug.
Once there, refactor the function to use better names for
local variables, optimize the variable assignments, and
merge the bus_dmamap_sync() inner loop with the outer one.

PR:	248494
MFC after:	3 weeks
2020-08-12 14:17:38 +00:00
Andrew Turner
da11e1f9ee Add support for Cortex-A76/Neoverse-N1 to hwpmc
This adds support for the Cortex-A76 and Neoverse-N1 PMU counters to pmc.

While here add more PMCR_IDCODE values and check the implementers code is
correct before setting the PMU type.

Reviewed by:	bz, emaste (looks reasonable to me)
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25959
2020-08-12 10:17:17 +00:00
Andriy Gapon
d9fe3aed75 aw_cir: in the pulse encoding the actual length is one greater than value
While here change type of some variables from long to int, it's sufficient.
Also, add length reporting to a couple of debug printfs.

MFC after:	3 weeks
2020-08-12 09:57:28 +00:00
Andriy Gapon
852d135791 aw_cir: lower activation threshold to support NECx protocol
In NECx the leading mark has length of 8T as opposed to 16T in NEC,
where T is  562.5 us.  So, 4.5 ms.
Our threshold was set to 128 * 42.7 us (derived from the sampling
frequency of 3/128 MHz).  So, ~5.5 ms.

The new threshold is set to AW_IR_L1_MIN.  I think that's a good enough
lower bound for detecting the leading pulse.

Also, calculations of active_delay (which is activation delay) are fixed.
Previously they would be wrong if AW_IR_ACTIVE_T was anything but zero,
because the value was already bit-shifted.

Finally, I am not sure why the activation delay was divided by two when
calculating the initial pulse length.  I have not found anything that
would explain or justify it.  So, I removed that division.

MFC after:	3 weeks
2020-08-12 09:56:21 +00:00
Andriy Gapon
8b616b263d aw_cir: minor cleanups
MFC after:	1 week
2020-08-12 09:52:39 +00:00
Andriy Gapon
012fba460a aw_cir: add support for allwinner,sun6i-a31-ir (found, e.g., on h3)
MFC after:	1 week
2020-08-12 09:52:12 +00:00
Andriy Gapon
2ad1660ae4 gpiokeys: add evdev support
Only linux,code is supported as it maps 1:1 to evdev key codes.
No reverse mapping for freebsd,code yet.

Reviewed by:	wulf
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D25940
2020-08-12 09:49:25 +00:00
Andriy Gapon
ef32901b25 cp2112: a number of cleanups and improvements
- hoist all request / response structures from function level to top level
- replace magic numeric literals with constants
- regroup types, data and functions
- remove setting of the id field in responses as they are completely
  overwritten with data from the device
- centralize setting of the id field as it is always set to the value of
  request type
- fix setting and querying of open-drain vs push-pull configuration of
  an output pin -- it's always in one of those configurations
- detect special pin configurations: a pin in a special configuration is
  neither general purpose input or output
- there is still no support for setting special configurations

MFC after:	2 weeks
2020-08-12 09:07:07 +00:00
Mateusz Guzik
36f47512d9 vfs: inline vrefcnt 2020-08-12 04:53:20 +00:00
Mateusz Guzik
4c2d103a02 vfs: garbage collect vrefactn 2020-08-12 04:53:02 +00:00
Mateusz Guzik
6883f07e97 vfs: reimplement vref on top of vget
No change in generated assembly.
2020-08-12 04:52:35 +00:00
Rick Macklem
90cf38f22e Fix a bug introduced by r363001 for the ext_pgs case.
r363001 added support for ext_pgs mbufs to nfsm_uiombuf().
By inspection, I noticed that "mlen" was not set non-zero and, as such, there
would be an iteration of the loop that did nothing.
This patch sets it.
This bug would have no effect on the system, since the ext_pgs mbuf code
is not yet enabled.
2020-08-12 04:35:49 +00:00
Conrad Meyer
0ac9e27ba9 devfs: Abstract locking assertions
The conversion was largely mechanical: sed(1) with:

  -e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
  -e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'

The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.

No functional change.
2020-08-12 00:32:31 +00:00
Conrad Meyer
b7883452d4 Back out unrelated change
Reported by:	kib, markj
X-MFC-With:	r364129
2020-08-12 00:21:30 +00:00
Conrad Meyer
6eecc07f65 smp.h: Reconcile definition and declaration of smp_ncpus
The variable is defined unconditionally; declare it unconditionally as well.

It is already initialized to the correct value (1) for !SMP builds.

No functional change.
2020-08-11 20:42:21 +00:00
Conrad Meyer
0292c54bdb Add support for multithreading the inactive queue pageout within a domain.
In very high throughput workloads, the inactive scan can become overwhelmed
as you have many cores producing pages and a single core freeing.  Since
Mark's introduction of batched pagequeue operations, we can now run multiple
inactive threads working on independent batches.

To avoid confusing the pid and other control algorithms, I (Jeff) do this in
a mpi-like fan out and collect model that is driven from the primary page
daemon.  It decides whether the shortfall can be overcome with a single
thread and if not dispatches multiple threads and waits for their results.

The heuristic is based on timing the pageout activity and averaging a
pages-per-second variable which is exponentially decayed. This is visible in
sysctl and may be interesting for other purposes.

I (Jeff) have verified that this does indeed double our paging throughput
when used with two threads. With four we tend to run into other contention
problems.  For now I would like to commit this infrastructure with only a
single thread enabled.

The number of worker threads per domain can be controlled with the
'vm.pageout_threads_per_domain' tunable.

Submitted by:	jeff (earlier version)
Discussed with:	markj
Tested by:	pho
Sponsored by:	probably Netflix (based on contemporary commits)
Differential Revision:	https://reviews.freebsd.org/D21629
2020-08-11 20:37:45 +00:00
Alex Richardson
91b31c100b Allow linking the kernel with a linker that doesn't support -z ifunc-noplt
This can happen when linking with upstream LLD < 9.0.

Reviewed By:	markj
Differential Revision: https://reviews.freebsd.org/D25985
2020-08-11 16:47:00 +00:00
Alex Richardson
1a18ab420b Allow overriding the tool used for stripping binaries
Since the make variable STRIP is already used for other purposes, this
uses STRIPBIN (which is also used for the same purpose by install(1).
This allows using LLVM objcopy to strip binaries instead of the in-tree
elftoolchain objcopy. We make use of this in CheriBSD since passing
binaries generated by our toolchain to elftoolchain strip sometimes results
in assertion failures.

This allows working around https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248516
by specifying STRIPBIN=/path/to/llvm-strip

Obtained from:	CheriBSD
Reviewed By:	emaste, brooks
Differential Revision: https://reviews.freebsd.org/D25988
2020-08-11 16:46:27 +00:00