The hard work of parsing fields per-CPU, handling heterogeneous
features, and excluding features from userspace is already done by
update_special_regs. We can build our set of HWCAPs from the result.
This exposed a small bug in update_special_regs, in which the
generated bitmask was not wide enough, and as a result some bits
weren't being exposed in user_cpu_desc. Fix this.
While here, adjust some formatting.
Reviewed by: andrew
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26069
This adds definitions for the latest additions to the AA64ISAR[01] ID
registers. This brings these registers in sync with ARMv8.6 initial spec
release.
An future change will parse many of these fields for HWCAP features.
Reviewed by: andrew, manu, markj (all previous versions)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26029
Reviewed by: Jacob Keller <jacob.e.keller@intel.com>
Suggested editing from: Krzysztof Galazka <krzysztof.galazka@intel.com>
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25954
Avoid performing a potentially-blocking malloc for kenv lookups that will only
perform non-destructive integer conversions on the returned buffer. Instead,
perform the strtoq() in-place with the kenv lock held.
While here, factor the logic around kenv_lock acquire and release into
kenv_acquire() and kenv_release(), and use these functions for some light
cleanup. Collapse getenv_string_buffer() into kern_getenv(), as the former
no longer has any other callers and the only additional task performed by
the latter is a WITNESS check that hasn't been useful since r362231.
PR: 248250
Reported by: gbe
Reviewed by: mjg
Tested by: gbe
Differential Revision: https://reviews.freebsd.org/D26010
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
directly.
Differential Revision: https://reviews.freebsd.org/D26053
This is to avoid conflicts with a upcoming macro. pipe_pages is a
more accurate name since the field tracks pages wired into the kernel as
part of a process-to-process copy operation.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The linux function does a lot more than that as multiple waitqueue could be fetch
from a static table based on the hash of the argument but since in DRM it's only used
in one place just add a single variable.
We will probably need to change that in the futur but it's ok with DRM even with current
linux.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26054
The lagg_clone_destroy() handles detach and waiting for ifconfig callers
to drain already.
This narrows the race for 2 panics that the tests triggered. Both were a
consequence of adding a port to the lagg device after it had already detached
from all of its ports. The link state task would run after lagg_clone_destroy()
free'd the lagg softc.
kernel:trap_fatal+0xa4
kernel:trap_pfault+0x61
kernel:trap+0x316
kernel:witness_checkorder+0x6d
kernel:_sx_xlock+0x72
if_lagg.ko:lagg_port_state+0x3b
kernel:if_down+0x144
kernel:if_detach+0x659
if_tap.ko:tap_destroy+0x46
kernel:if_clone_destroyif+0x1b7
kernel:if_clone_destroy+0x8d
kernel:ifioctl+0x29c
kernel:kern_ioctl+0x2bd
kernel:sys_ioctl+0x16d
kernel:amd64_syscall+0x337
kernel:trap_fatal+0xa4
kernel:trap_pfault+0x61
kernel:trap+0x316
kernel:witness_checkorder+0x6d
kernel:_sx_xlock+0x72
if_lagg.ko:lagg_port_state+0x3b
kernel:do_link_state_change+0x9b
kernel:taskqueue_run_locked+0x10b
kernel:taskqueue_run+0x49
kernel:ithread_loop+0x19c
kernel:fork_exit+0x83
PR: 244168
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D25284
Add prng(9) as a replacement for random(9) in the kernel.
There are two major differences from random(9) and random(3):
- General prng(9) APIs (prng32(9), etc) do not guarantee an
implementation or particular sequence; they should not be used for
repeatable simulations.
- However, specific named API families are also exposed (for now: PCG),
and those are expected to be repeatable (when so-guaranteed by the named
algorithm).
Some minor differences from random(3) and earlier random(9):
- PRNG state for the general prng(9) APIs is per-CPU; this eliminates
contention on PRNG state in SMP workloads. Each PCPU generator in an
SMP system produces a unique sequence.
- Better statistical properties than the Park-Miller ("minstd") PRNG
(longer period, uniform distribution in all bits, passes
BigCrush/PractRand analysis).
- Faster than Park-Miller ("minstd") PRNG -- no division is required to
step PCG-family PRNGs.
For now, random(9) becomes a thin shim around prng32(). Eventually I
would like to mechanically switch consumers over to the explicit API.
Reviewed by: kib, markj (previous version both)
Discussed with: markm
Differential Revision: https://reviews.freebsd.org/D25916
After moving the route control plane code from net/route.c,
all rtzone users ended up being in net/route_ctl.c.
Move uma(9) rtzone setup/teardown code to net/route_ctl.c as well
to have everything in a single place.
While here, remove custom initializers from the zone.
It was added originally to avoid setup/teardown of costy per-cpu couters.
With these counters removed, the only remaining job was avoiding rte mutex
setup/teardown. Mutex setup is relatively cheap. Additionally, this mutex
will soon be removed. With that in mind, there is no sense in keeping
custom zone callbacks.
Differential Revision: https://reviews.freebsd.org/D26051
The Cubic concave region was not aligned nicely for the very first exit from
slow start, where a 50% cwnd reduction is done instead of the normal 30%.
This addresses an issue, where a short line-rate burst could result from that
sudden jump of cwnd.
In addition, the Fast Convergence Heuristic has been expanded to work also
with ECN induced congestion response.
Submitted by: chengc_netapp.com
Reported by: chengc_netapp.com
Reviewed by: tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25976
Initializing K to zero in D23655 introduced a miscalculation,
where cwnd would suddenly jump to cwnd_max instead of gradually
increasing, after leaving slow-start.
Properly calculating K instead of resetting it to zero resolves
this issue. Also making sure, that cwnd is recalculated at the
earliest opportunity once slow-start is over.
Reported by: chengc_netapp.com
Reviewed by: chengc_netapp.com, tuexen (mentor), rgrimes (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25746
Adding proper accounting of sacked_bytes and (per-ACK)
delivered data to the SACK scoreboard. This will
allow more aspects of RFC6675 to be implemented as well
as Proportional Rate Reduction (RFC6937).
Prior to this change, the pipe calculation controlled with
net.inet.tcp.rfc6675_pipe was also susceptible to incorrect
results when more than 3 (or 4) holes in the sequence space
were present, which can no longer all fit into a single
ACK's SACK option.
Reviewed by: kbowling, rgrimes (mentor)
Approved by: rgrimes (mentor, blanket)
MFC after: 3 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D18624
I observed hangs post-r362977 in QEMU with -smp 2, in which one thread
would acquire write access to an rm_lock (sysctllock) and get stuck
waiting in smp_rendezvous_cpus while the other CPU was servicing a trap.
The other thread was waiting for read access to the same lock, thus
causing deadlock.
It's clear that this is just one symptom of a larger problem. The
general expectation of MI kernel code is that interrupts are enabled.
Violating this assumption will at best create some additional latency,
but otherwise might cause locking or other unforeseen issues. All other
architectures do so for some subset of trap values, but this somehow got
missed in the RISC-V port. Enable interrupts now during kernel page
faults and for all user trap types.
The code in exception.S already knows to disable interrupts while
handling the return from exception, so there are no changes required
there.
Reviewed by: jhb, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26017
- Properly set up the frame pointer
- Hang if we return from mi_startup
- Whitespace
Clearing the frame pointer marks the end of the backtrace. This fixes
"bt 0" in ddb, which previously would unwind one frame too far.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D26016
It allows to report GEOM::lunid for nda(4) same as for nvd(4). Since
NVMe now allows multiple LUs (namespaces) with multiple paths unique
LU identification is important. The serial_num field is filled same
as before with the controller serial number, while device_id is based
on namespace GUID and/or EUI64 fields as recommended by "NVM Express:
SCSI Translation Reference" and matching nvd(4) at the end.
MFC after: 1 week
This same check is used on other architectures. Previously this would
permit a stack frame to unwind into any arbitrary kernel address
(including unmapped addresses).
Reviewed by: mhorne
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25996
There was an additional 7 bytes of compiler-inserted padding at the
end of the structure visible via 'ptype /o' in gdb.
Reviewed by: mhorne
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D25867
It is possible for rn_delete() to return NULL. If this happens, then set
*perror to ESRCH, as is done in the rest of the function.
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D25871
- Fill out MemFree correctly. Delete an ancient comment suggesting that
we don't want to advertise the true quantity of free memory.
- Populate the Buffers field by reading vfs.bufspace.
- The page cache consists of all pages in page queues, not just the
inactive queue.
PR: 248463
Reported and tested by: danfe
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
For drivers with IFLIB_HAS_RXCQ set, there is a separate completion
queue. In this case, the netmap rxsync routine needs to update
rxq->ifr_cq_cidx in the same way it is updated by iflib_rxeof().
This improves the situation for vmx(4) and bnxt(4) drivers, which
use iflib and have the IFLIB_HAS_RXCQ bit set.
PR: 248494
MFC after: 3 weeks
First, fix the initialization of the fl->ifl_rxd_idxs array,
which was affected by an off-by-one bug.
Once there, refactor the function to use better names for
local variables, optimize the variable assignments, and
merge the bus_dmamap_sync() inner loop with the outer one.
PR: 248494
MFC after: 3 weeks
This adds support for the Cortex-A76 and Neoverse-N1 PMU counters to pmc.
While here add more PMCR_IDCODE values and check the implementers code is
correct before setting the PMU type.
Reviewed by: bz, emaste (looks reasonable to me)
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25959
While here change type of some variables from long to int, it's sufficient.
Also, add length reporting to a couple of debug printfs.
MFC after: 3 weeks
In NECx the leading mark has length of 8T as opposed to 16T in NEC,
where T is 562.5 us. So, 4.5 ms.
Our threshold was set to 128 * 42.7 us (derived from the sampling
frequency of 3/128 MHz). So, ~5.5 ms.
The new threshold is set to AW_IR_L1_MIN. I think that's a good enough
lower bound for detecting the leading pulse.
Also, calculations of active_delay (which is activation delay) are fixed.
Previously they would be wrong if AW_IR_ACTIVE_T was anything but zero,
because the value was already bit-shifted.
Finally, I am not sure why the activation delay was divided by two when
calculating the initial pulse length. I have not found anything that
would explain or justify it. So, I removed that division.
MFC after: 3 weeks
Only linux,code is supported as it maps 1:1 to evdev key codes.
No reverse mapping for freebsd,code yet.
Reviewed by: wulf
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D25940
- hoist all request / response structures from function level to top level
- replace magic numeric literals with constants
- regroup types, data and functions
- remove setting of the id field in responses as they are completely
overwritten with data from the device
- centralize setting of the id field as it is always set to the value of
request type
- fix setting and querying of open-drain vs push-pull configuration of
an output pin -- it's always in one of those configurations
- detect special pin configurations: a pin in a special configuration is
neither general purpose input or output
- there is still no support for setting special configurations
MFC after: 2 weeks
r363001 added support for ext_pgs mbufs to nfsm_uiombuf().
By inspection, I noticed that "mlen" was not set non-zero and, as such, there
would be an iteration of the loop that did nothing.
This patch sets it.
This bug would have no effect on the system, since the ext_pgs mbuf code
is not yet enabled.
The conversion was largely mechanical: sed(1) with:
-e 's|mtx_assert(&devmtx, MA_OWNED)|dev_lock_assert_locked()|g'
-e 's|mtx_assert(&devmtx, MA_NOTOWNED)|dev_lock_assert_unlocked()|g'
The definitions of these abstractions in fs/devfs/devfs_int.h are the
only non-mechanical change.
No functional change.
The variable is defined unconditionally; declare it unconditionally as well.
It is already initialized to the correct value (1) for !SMP builds.
No functional change.
In very high throughput workloads, the inactive scan can become overwhelmed
as you have many cores producing pages and a single core freeing. Since
Mark's introduction of batched pagequeue operations, we can now run multiple
inactive threads working on independent batches.
To avoid confusing the pid and other control algorithms, I (Jeff) do this in
a mpi-like fan out and collect model that is driven from the primary page
daemon. It decides whether the shortfall can be overcome with a single
thread and if not dispatches multiple threads and waits for their results.
The heuristic is based on timing the pageout activity and averaging a
pages-per-second variable which is exponentially decayed. This is visible in
sysctl and may be interesting for other purposes.
I (Jeff) have verified that this does indeed double our paging throughput
when used with two threads. With four we tend to run into other contention
problems. For now I would like to commit this infrastructure with only a
single thread enabled.
The number of worker threads per domain can be controlled with the
'vm.pageout_threads_per_domain' tunable.
Submitted by: jeff (earlier version)
Discussed with: markj
Tested by: pho
Sponsored by: probably Netflix (based on contemporary commits)
Differential Revision: https://reviews.freebsd.org/D21629
Since the make variable STRIP is already used for other purposes, this
uses STRIPBIN (which is also used for the same purpose by install(1).
This allows using LLVM objcopy to strip binaries instead of the in-tree
elftoolchain objcopy. We make use of this in CheriBSD since passing
binaries generated by our toolchain to elftoolchain strip sometimes results
in assertion failures.
This allows working around https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248516
by specifying STRIPBIN=/path/to/llvm-strip
Obtained from: CheriBSD
Reviewed By: emaste, brooks
Differential Revision: https://reviews.freebsd.org/D25988