In particular on amd64 this eliminates an atomic op in the common case,
trading it for IPIs in the uncommon case of catching CPUs executing the
code while the filesystem is getting suspended or unmounted.
This is a wrapper around smp_rendezvous_cpus which enables use of IPI
handlers which can fail and require retrying.
wait_func argument is added to to provide a routine which can be used to
poll CPU of interest for when the IPI can be retried.
Handlers which succeed must call smp_rendezvous_cpus_done to denote that
fact.
Discussed with: jeff
Differential Revision: https://reviews.freebsd.org/D23582
When the receive ring cannot be filled with mbufs, due to lack of memory,
no more interrupts may be generated to fill the receive ring later on.
Make sure to have a watchdog, to try refilling the receive ring from time
to time, hopefully when more mbufs are available.
Differential Revision: https://reviews.freebsd.org/D23315
MFC after: 1 week
Reviewed by: gallatin@
Sponsored by: Mellanox Technologies
It was saved from the initial purge of drivers in fcp-101 due to being
the onboard Ethernet device on a number of sparc64 machines. Now that
sparc64 is gone, it serves little purpose (PCI cards exist, but are rare
and are unlikely to have been deployed outside Sun systems).
MFC after: 3 days
From the beginning, ng_nat safely assumed cleansed traffic
because of limited ways it could be attached to NETGRAPH:
ng_ipfw or ng_ppp only.
Now as it may be attached with ng_ether too, the assumption proven wrong.
Add needed check to the ng_nat. Thanks for markj for debugging this.
PR: 243096
Submitted by: Lutz Donnerhacke <lutz@donnerhacke.de>
Reported by: Robert James Hernandez <rob@sarcasticadmin.com>
Reviewed by: markj and others
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23091
sys/taskqueue.h may not have includes that define MPASS. It
was useful during testing of r357771, but can be omitted now.
An invalid value of priority will yield only in potential
priority inversion, not a crash.
This fixes compilation of ports/x11/nvidia-driver.
Maintain a count of free slabs in the per-domain keg structure and use
that to clear the free slab list in constant time for most cases. This
helps minimize lock contention induced by reclamation, in preparation
for proactive trimming of excesses of free memory.
Reviewed by: jeff, rlibby
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23532
When processing a taskqueue and a task has associated epoch, then
enter for duration of the task. If consecutive tasks belong to the
same epoch, batch them. Now we are talking about the network epoch
only.
Shrink the ta_priority size to 8-bits. No current consumers use
a priority that won't fit into 8 bits. Also complexity of
taskqueue_enqueue() is a square of maximum value of priority, so
we unlikely ever want to go over UCHAR_MAX here.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D23518
vdrop can set the hold count to 0 and wait for the ->mnt_listmtx held by
mnt_vnode_next_lazy_relock caller. The routine incorrectly asserted the
count has to be > 0.
Reported by: pho
Tested by: pho
The reason for this change is to make it clear the scope of the in-kernel usage
of IFM_TYPE_DESCRIPTIONS and IFM_SUBTYPE_ETHERNET_DESCRIPTIONS macros. Also it
is somewhat better C.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D23620
Platform (N1SDP).
Neoverse N1 is a high-performance ARM microarchitecture designed
by the ARM Holdings for the server market.
The PCI part on N1SDP was shipped untested and suffers from some
integration issues.
For instance accessing to not existing BDFs causes System Error
(SError) exception. To mitigate this, the firmware scans the bus,
catches SErrors and creates a table with valid BDFs. That allows
us to filter-out accesses to invalid BDFs in this driver.
Also the root complex config space (BDF == 0) has an unusual
location in memory map, so remapping accesses to it is required.
Finally, the config space is restricted to 32-bit accesses only.
This was tested on the ARM boxes kindly provided by the ARM Ltd
to the DARPA CHERI Project.
In collaboration with: andrew
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D23349
The race is:
CPU1 CPU2
devfs_reclaim_vchr
make v_usecount 0
VI_LOCK
sees v_usecount == 0, no updates
vp->v_rdev = NULL;
...
VI_UNLOCK
VI_LOCK
v_decr_devcount
sees v_rdev == NULL, no updates
In this scenario si_devcount decrement is not performed.
Note this can only happen if the vnode lock is not held.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23529
handler.
Interrupt handlers are removed via intr_event_execute_handlers() when
IH_DEAD is set. The thread removing the interrupt is woken up, and
calls intr_event_update(). When this happens, the ie_hflags are
cleared and re-built from all the remaining handlers sharing the
event. When the last IH_NET handler is removed, the IH_NET flag will
be cleared from ih_hflags (or ie_hflags may still be being rebuilt in
a different context), and the ithread_execute_handlers() may return
with ie_hflags missing IH_NET. This can lead to a scenario where
IH_NET was present before calling ithread_execute_handlers, and is not
present at its return, meaning the need for epoch must be cached
locally.
This can happen when loading and unloading network drivers. Also make
sure the ie_hflags is not cleared before being updated.
This is a regression issue after r357004.
Backtrace:
panic()
# trying to access epoch tracker on stack of dead thread
_epoch_enter_preempt()
ifunit_ref()
ifioctl()
fo_ioctl()
kern_ioctl()
sys_ioctl()
syscallenter()
amd64_syscall()
Differential Revision: https://reviews.freebsd.org/D23483
Reviewed by: glebius@, gallatin@, mav@, jeff@ and kib@
Sponsored by: Mellanox Technologies
Currently, the vm.panic_on_oom sysctl is a boolean which controls the
behavior of the VM system when it encounters an out-of-memory situation.
If set to 0, the VM system kills the largest process. If set to any other
value, the VM system will initiate a panic.
This change makes the sysctl a count of events. If set to 0, the VM system
kills the largest process. If set to any other value, the VM system will
kill the largest process until it has seen the specified number of
out-of-memory events. Once it reaches the specified number of events, it
will initiate a panic.
This change is helpful in capturing cores when the system is in a perpetual
cycle of out-of-memory events (as opposed to just hitting one or two
sporadic out-of-memory events).
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D23601
vrele is supposed to be called with an unlocked vnode, but this was never
asserted for if v_usecount was > 0. For such counts the lock is never touched
by the routine. As a result the kernel has several consumers which expect
vunref semantics and get away with calling vrele since they happen to never do
it when this is the last reference (and for some of them this may happen to be
a guarantee).
Work around the problem by changing vrele semantics to tolerate being called
with a lock. This eliminates a possible bug where the lock is already held and
vputx takes it anyway.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23528
- only compute the target address once
- remove spurious type casting, zpcpu_get already return the correct type
While here add missing newlines to other routines.
The intent is to provide bsd-specific flags relevant to interpreter
and C runtime. I did not want to reuse AT_FLAGS which is common ELF
auxv entry.
Use bsdflags to report kernel support for sigfastblock(2). This
allows rtld and libthr to safely infer the syscall presence without
SIGSYS. The tunable kern.elf{32,64}.sigfastblock blocks reporting.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
A new syscall sigfastblock(2) is added which registers a uint32_t
variable as containing the count of blocks for signal delivery. Its
content is read by kernel on each syscall entry and on AST processing,
non-zero count of blocks is interpreted same as the signal mask
blocking all signals.
The biggest downside of the feature that I see is that memory
corruption that affects the registered fast sigblock location, would
cause quite strange application misbehavior. For instance, the process
would be immune to ^C (but killable by SIGKILL).
With consumers (rtld and libthr added), benchmarks do not show a
slow-down of the syscalls in micro-measurements, and macro benchmarks
like buildworld do not demonstrate a difference. Part of the reason is
that buildworld time is dominated by compiler, and clang already links
to libthr. On the other hand, small utilities typically used by shell
scripts have the total number of syscalls cut by half.
The syscall is not exported from the stable libc version namespace on
purpose. It is intended to be used only by our C runtime
implementation internals.
Tested by: pho
Disscussed with: cem, emaste, jilles
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D12773
This patch introduces processing of the frames
up to 9kB by the mvneta driver. Some versions of
this NIC limit TX checksum offloading, depending
on the frame size, so add appropriate handling
of this feature.
Submitted by: Kornel Duleba
Obtained from: Semihalf
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D23225
To make the PMC tool pmcstat working properly on Hygon platform, add
support for Hygon Dhyana family 18h by using the PMC initialization
code path of AMD family 17h.
Submitted by: Pu Wen <puwen@hygon.cn>
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23562
Currently the installer will tag geliboot partitions with both BOOT and
GELIBOOT; the former allows the kernel to taste it at boot, while the latter
is what loaders keys off of.
However, it seems reasonable to assume that if a provider's been tagged with
GELIBOOT that the kernel should also take that as a hint to taste/attach at
boot. This would allow us to stop tagging GELIBOOT partitions with BOOT in
bsdinstall, but I'm not sure that there's a compelling reason to do so any
time soon.
Reviewed by: oshogbo
Differential Revision: https://reviews.freebsd.org/D23387
For the moment, supress the operation not supported messages at this level. In
the fullness of time, we will have better error tracking so we can diagnose
issues in the future.
Reviewed by: scottl@
Since AVL already has embedded element counter, use dn_dbufs_count
only for dbufs not counted there (bonus buffers) and just add them.
This removes two atomics per dbuf life cycle.
According to profiler it reduces time spent by dbuf_destroy() inside
bottlenecked dbuf_evict_thread() from 13.36% to 9.20% of the core.
This counter is used only on illumos, so for FreeBSD it was just a
waste of time.
MFC after: 2 weeks
riscv cores.
GFE cores come with standard DTS file that lacks standard 'dmas ='
property, which means xae(4) could not find a DMA controller to use.
The 'dmas' property could not be added to the DTS file because the
ethernet controller and DMA engine parts in Linux are implemented
in a single driver.
Instead of 'dmas' property the standard Xilinx 'axistream-connected'
property is provided, so fallback to use it instead.
Suggested by: James Clarke <jrtc27@jrtc27.com>
Reviewed by: James Clarke <jrtc27@jrtc27.com>
Sponsored by: DARPA, AFRL
in the sysctl block for the driver. mpsutil/mprutil needs this so it can
know how big of a buffer to allocate when requesting the IOCFacts from the
controller. This eliminates the kernel console messages about wrong
allocation sizes.
Reported by: imp
BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in
drivers when the driver doesn't support a particular command. Do a
sweep and fix that.
Reported by: imp
This was relatively harmless but surprising to see in counters. The
race occurred when rd_seq was read after the goal was updated and we
incorrectly calculated the delta between them.
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D23464
Recent network epoch changes have left some drivers unexpectedly broken
and there is not yet a consensus on the correct fix. This is patch is
a minor performance impact until we can agree on the correct path
forward.
Reviewed by: core, network, imp, glebius, hselasky
Differential Revision: https://reviews.freebsd.org/D23515
Previous code used 4 atomics to do aggsum_flush_bucket() and 2 more to
re-borrow after the flush. But since asc_borrowed and asc_delta are
accessed only while holding asc_lock, it makes no any sense to modify
as_lower_bound and as_upper_bound in multiple steps. Instead of that
the new code uses only 2 atomics in all the cases, one per as_*_bound
variable. I think even that is overkill, simple atomic store and
load could be used here, since all modifications are done under the
as_lock, but there are no such primitives in ZFS code now.
While there, make borrow code consider previous borrow value, so that
on mixed request patterns reduce chance of needing to borrow again if
much larger request follows tiny one that needed borrow.
Also reduce as_numbuckets from uint64_t to u_int. It makes no sense
to use so large division operation on every aggsum_add().
Reviewed by: Brian Behlendorf, Paul Dagnelie
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
counters. In my stress test there is only one poll for every 15,000
frees. This means we are effectively amortizing the cache coherency
overhead even with very high write rates (3M/s/core).
Reviewed by: markj, rlibby
Differential Revision: https://reviews.freebsd.org/D23463
Always use the kdb_thr_ctx() for db_trace_thread() as on other
architectures. Initialize pcb_ra to be the sepc from the saved
trapframe rather than the saved ra to avoid skipping a frame.
Reviewed by: mhorne, br
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23513
While here, add an extra line of information for exceptions and
interrupts and compress the per-frame line down to one line to match
other architectures.
Reviewed by: mhorne, br
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23508
Currently the Xen console is always attached with priority CN_REMOTE
(highest), which means that when booting with a single console the Xen
console will take preference over the VGA for example, and that's not
intended unless the user has also selected to use a serial console.
Fix this by lowering the priority of the Xen console to NORMAL unless
the user has selected to use a serial console. This keeps the usual
FreeBSD behavior of outputting to the internal consoles (ie: VGA) when
booted as a Xen dom0.
MFC after: 3 days
Sponsored by: Citrix Systems R&D
This change adds a new libkvm function, kvm_kerndisp(), that can be used to
retrieve the kernel displacement, that is the difference between the kernel's
base virtual address at run time and the kernel base virtual address specified
in the kernel image file.
This will be used by kgdb, to properly relocate kernel symbols, when needed.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D23285
Add CTLFLAG_NEEDGIANT flag (modelled after D_NEEDGIANT) that will be used to
mark sysctls that still require locking Giant.
Rewrite sysctl_handle_string() to use internal locking instead of locking
Giant.
Mark SYSCTL_STRING, SYSCTL_OPAQUE and their variants as MPSAFE.
Add infrastructure support for enforcing proper use of CTLFLAG_NEEDGIANT
and CTLFLAG_MPSAFE flags with SYSCTL_PROC and SYSCTL_NODE, not enabled yet.
Reviewed by: kib (mentor)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23378
UMA_ZFLAG_CACHEONLY was essentially the same thing as UMA_ZONE_VM, but
with a more confusing name. Remove the flag, make UMA_ZONE_VM an
inherit flag, and replace all references.
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23516
We somewhat blindly copy the srr1 from the new context to the trap frame,
but disable FPU and VSX unconditionally, relying on the trap to re-enable
them. This works because the FPU manages the VSX extended FP registers,
which is governed by the PCB_FPFREGS flag. However, with altivec, we
would blindly disable PSL_VEC, without touching PCB_VEC. Handle this case
by disabling altivec in both srr1 and pcb_flags, if the mcontext doesn't
have _MC_AV_VALID set.
Reported by: pkubaj
This means that extra virtual interfaces (VIs) created with
hw.cxgbe.num_vis are no longer required to use netmap. Use this
tunable to enable native netmap support on the main interface:
hw.cxgbe.native_netmap="3"
There is no change in default behavior.
Suggested by: jch@
MFC after: 2 weeks
Sponsored by: Chelsio Communications
In legacy VirtIO drivers, the header must be PCI endianness (little) and the
device-specific region is encoded in the native endian of the guest.
This patch makes the access (read/write) to VirtIO header using the little
endian order. Other read and write access are native endianness. This also
sets the device's IO region as big endian if on big endian machine.
PR: 205178
Submitted by: Andre Silva <afscoelho@gmail.com>
Reported by: Kenneth Salerno <kennethsalerno@yahoo.com>
Reviewed by: bryanv, bdragon, luporl, alfredo
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D23401
While cause codes higher than 16 are reserved, the exception code
field of the register is defined to be all bits but the upper-most
bit.
Reviewed by: mhorne
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23510
While here, remove a local variable to avoid the CSR read in non-debug
kernels.
Reviewed by: mhorne
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23511
In practice this discarded all characters entered at the DDB prompt.
Reviewed by: br
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23509
This fixes continuing from debug.kdb.enter=1 after enabling the use of
compressed instructions since the compiler can emit the two byte
c.ebreak instead of the 4 byte ebreak.
Reviewed by: br
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D23507
I believe this is left over from when dtrace was being ported and
developed out-of-tree. Now it just ensures that dtrace.ko and a non-SMP
kernel have incompatible KBIs.
PR: 243711
Sponsored by: The FreeBSD Foundation
This reverts r177661. The change is no longer very useful since
out-of-tree KLDs will be built to target SMP kernels anyway. Moveover
it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
value of MAXCPU, and several kernel interfaces, notably
smp_rendezvous_cpus(), take a cpuset_t as a parameter.
PR: 243711
Reviewed by: jhb, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23512
Submitted by: Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by: Yang Wang <2333@outlook.jp>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19917
sendfile(2) optionally takes a set of headers that get prepended to the
file data. If the request length is less than that of the headers,
sendfile may not allocate an sfio structure, in which case its pointer
is null and we should be careful not to dereference. This was
introduced in r356902.
Reported by: syzkaller
Sponsored by: The FreeBSD Foundation
This change adds 2 new SYSCTLs, to retrieve the original and relocated KERNBASE
values. This provides an easy, architecture independent way to calculate the
running kernel displacement (current/load address minus original base address).
The initial goal for this change is to add a new libkvm function that returns
the kernel displacement, both for live kernels and crashdumps. This would in
turn be used by kgdb to find out how to relocate kernel symbols (if needed).
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D23284
The cong_drop setting will apply to queues created after the setting is
changed and not to existing queues.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Add a switch to allow disabling multipage slabs, in order to facilitate
measuring memory usage and performance effects. The tunable
vm.debug.uma_multipage_slabs defaults to 1 and can be set to 0 to
disable. The name may change soon.
Reviewed by: markj (previous version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23487
Memory efficiency can be poor with awkward item sizes (e.g. 1/2 or 1
page size + epsilon). In order to achieve a minimum memory efficiency,
select a slab size with a potentially larger number of pages if it
yields a lower portion of waste.
This may mean using page_alloc instead of uma_small_alloc, which could
be more costly.
Discussed with: jeff, mckusick
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23239
Remove mbuf_jumbo_alloc and let large mbuf zones use the new uma default
contig allocator (a copy of mbuf_jumbo_alloc). Tag other zones which
require contiguous objects, even if they don't use the new default
contig allocator, so that uma knows about their constraints.
Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23238
After r357392, it is apparent that we do have some early-boot PCPU
zones. Make it so we can safely free pages from them if they are
actually used during early boot.
Reviewed by: jeff, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D23496
In r356767, memcpy/memmove/bcopy optimizations were added to libc to
improve performance.
This exposed an existing kernel issue in VSX handling. The PSL_VSX flag was
not being excluded from the psl_userstatic set, which meant that any thread
that used these and then called swapcontext(3) would get an EINVAL error.
Fixing this exposed a second issue - in r344123, the FPU was being forced
off in set_mcontext(). However, this was neglecting to ensure VSX was turned
off at the same time.
While here, add some code comments to explain what's going on.
Reviewed by: jhibbits, luporl (earlier rev), pkubaj (earlier rev)
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D23497
potential bugs that access freed pages as well as providing a path
towards lockless page lookup.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23444
system. Small bucket sizes already pack well even if they are an odd
number of words. This prevents any potential new instances of the
problem fixed in r357463 as well as making the system easier to
understand.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D23494
which disables tracking mtime updates due to writes through the shared
mapped areas backed by tmpfs files. This removes periodic scans which
downgrades rw mapped pages to ro to note the writes.
Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23432
objects backing tmpfs vnodes data.
The clean scan is limited to only remove write permissions from the
mapped pages of the objects. This fixes the issue that tmpfs vnode
mtime is not updated from writes to the mmaped area after the initial
page-in.
Noted by: mjg
Reviewed by: markj
Discussed with: jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23432
This most importantly reduces duplication, but it also removes any potential
race with usage of dev->si_drv1 since it's now set prior to the device being
constructed enough to be accessible.
messages for some of the unimplemented syscalls, in particular
the AIO-related ones.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23231
Move db_link into the same cache line as db_blkid and db_level.
It allows significantly reduce avl_add() time in dbuf_create() on
systems with large RAM and huge number of dbufs per dnode.
Avoid few accesses to dbuf_caches[].size, which is highly congested
under high IOPS and never stays in cache for a long time. Use local
value we are receiving from zfs_refcount_add_many() any way.
Remove cache_size_bytes_max bump from dbuf_evict_one(). I don't see
a point to do it on dbuf eviction after we done it on insertion in
dbuf_rele_and_unlock().
Reviewed by: mahrens, Brian Behlendorf
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
When panicing because of an unhandled data abort from the kernel it is
useful to know the register state and faulting address to aid debugging.
Print these registers before calling panic.
Sponsored by: DARPA, AFRL
- handle the CLOCK_{PROCESS,THREAD}_CPUTIME_ID specified directly;
- fix thread id calculation as in the Linuxulator we should
convert the user supplied thread id to struct thread * by linux_tdfind();
- fix CPUCLOCK_SCHED case by using kern_{process,thread}_cputime()
directly as native get_cputime() used by kern_clock_gettime() uses
native tdfind()/pfind() to find proccess/thread.
PR: 240990
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23341
MFC after: 2 weeks
and get_thread_cputime() and add prototypes for it to <sys/syscallsubr.h>.
As both functions become a public interface add process lock assert
to ensure that the process is not exiting under it.
Fix whitespace nit while here.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23340
MFC after 2 weeks
so don't initialize nwhich in declaration and remove stale comment from r161304.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D23339
MFC after: 2 weeks
and this is more space efficient.
Stop queueing recently used buckets to the head of the list. If the bucket
goes to a different processor the cache coherency will be more expensive.
We already try to encourage cache-hot behavior in the per-cpu layer.
Reviewed by: rlibby
Differential Revision: https://reviews.freebsd.org/D23493
This simplifies the driver's rx fast path as well as the bookkeeping
code that tracks various rx buffer sizes and layouts.
MFC after: 1 week
Sponsored by: Chelsio Communications
This allows us to boot FreeBSD RISCV on QEMU using the -kernel command line
options. When using that option, QEMU maps the kernel ELF file to the
addresses specified in the LMAs in the program headers.
Since version 4.2 QEMU ships with OpenSBI fw_jump by default so this allows
booting FreeBSD using the following command line:
qemu-system-riscv64 -bios default -kernel /.../boot/kernel/kernel -nographic -M virt
Without this change the -kernel option cannot be used since the LMAs start
at address zero and QEMU already maps a ROM to these low physical addresses.
For targets that require a different kernel LMA the make variable
KERNEL_LMA can be overwritten in the config file. For example, adding
`makeoptions KERNEL_LMA=0xc0200000` will create an ELF file that will be
loaded at 0xc0200000.
Before:
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0xffffffc000000000 0x0000000000000000 0x75e598 0x8be318 RWE 0x1000
DYNAMIC 0x71fb20 0xffffffc00071eb20 0x000000000071eb20 0x000100 0x000100 RW 0x8
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x0
NOTE 0x693400 0xffffffc000692400 0x0000000000692400 0x000024 0x000024 R 0x4
After:
There are 4 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0xffffffc000000000 0x0000000080200000 0x734198 0x893e18 RWE 0x1000
DYNAMIC 0x6f7810 0xffffffc0006f6810 0x00000000808f6810 0x000100 0x000100 RW 0x8
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x0
NOTE 0x66ca70 0xffffffc00066ba70 0x000000008086ba70 0x000024 0x000024 R 0x4
Reviewed By: br, mhorne (earlier version)
Differential Revision: https://reviews.freebsd.org/D23436
ext_arg2 is the only item in the third cacheline in an mbuf and could be
cold by the time rxb_free runs. Put the information needed by rxb_free
in the same line as the refcount, which is very likely to be hot given
that rxb_free runs when the refcount is decremented and reaches 0.
MFC after: 1 week
Sponsored by: Chelsio Communications
requested.
This is a tradeoff between PCIe efficiency during large packet rx and
packing efficiency during small packet rx.
MFC after: 1 week
Sponsored by: Chelsio Communications
If the thread's lock is already that of the runqueue, don't recurse on
the queue lock.
Reviewed by: jeff, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23492
clang has the unfortunate property of paying little attention to prediction
hints when faced with a loop spanning the majority of the rotuine.
In particular fget_unlocked has an unlikely corner case where it starts almost
from scratch. Faced with this clang generates a maze of taken jumps, whereas
gcc produces jump-free code (in the expected case).
Work around the problem by providing a variant which only tries once and
resorts to calling the original code if anything goes wrong.
While here note that the 'seq' parameter is almost never passed, thus the
seldom users are redirected to call it directly.
This eliminates a branch from its consumers trading it for an extra call
if ktrace is enabled for curthread. Given that this is almost never true,
the tradeoff is worth it.
With r357314, sizeof(struct uma_bucket) grew to 16 bytes on 32-bit
platforms, so BUCKET_SIZE(4) is 0. This resulted in the creation of a
bucket zone for buckets with zero capacity. A more general fix is
planned, but for now this bandaid allows 32-bit platforms to boot again.
PR: 243837
Discussed with: jeff
Reported by: pho, Jenkins via lwhsu
Tested by: pho
Sponsored by: The FreeBSD Foundation
Most notably, we want to make sure we don't clobber any capabilities-related
errors. This is a regression from r357412 (O_SEARCH) that was picked up by
the capsicum tests.
PR: 243839
Reviewed by: kib (committed form recommended by)
Tested by: lwhsu
Differential Revision: https://reviews.freebsd.org/D23479
Once all CPUs are online, determine if they all support LSE atomics and
set lse_supported to indicate this. For now the atomic(9)
implementations are still always inlined, though it would be preferable
to create out-of-line functions to avoid text bloat. This was not done
here since big.little systems exist in which some CPUs implement LSE
while others do not, and ifunc resolution must occur well before this
scenario can be detected. It does seem unlikely that FreeBSD will
ever run on such platforms, however, so converting atomic(9) to use
ifuncs is probably a good next step.
Add a LSE_ATOMICS arm64 kernel configuration option to unconditionally
select LSE-based atomic(9) implementations when the target system is
known.
Reviewed by: andrew, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation, Amazon (hardware)
Differential Revision: https://reviews.freebsd.org/D23325
These make use of the cas*, ld* and swp instructions added in ARMv8.1.
Testing shows them to be significantly more performant than LL/SC-based
implementations.
No functional change here since the wrappers still unconditionally
select the _llsc variants.
Reviewed by: andrew, kib
MFC after: 1 month
Submitted by: Ali Saidi <alisaidi@amazon.com> (original version)
Differential Revision: https://reviews.freebsd.org/D23324
Add a _llsc suffix for the existing LL/SC-based implementations and add
trivial wrappers. This is in preparation for supporting LSE-based
atomic(9) implementations.
No functional change intended.
Reviewed by: andrew, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation, Amazon (hardware)
Differential Revision: https://reviews.freebsd.org/D23323
Parameterize the macros by type width as well as acq/rel semantics.
This makes modifying the implementations much less tedious and
error-prone and makes it easier to support alternate LSE-based
implementations. No functional change intended.
Reviewed by: andrew, kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation, Amazon (hardware)
Differential Revision: https://reviews.freebsd.org/D23322
Instead of doing a 2 iteration loop (determined at runeimt), take advantage
of the fact that the size is already known.
While here provdie cap_check_inline so that fget_unlocked does not have to
do a function call.
Verified with the capsicum suite /usr/tests.
It was generated to be just a jumping off point to tmpfs_itimes.
While here provide a dedicated variant for getattr since we normally don't
expect to need to the update from that caller.
In r357324 most of the use of gi_irq was moved to gi_lpi. Complete this
with the last few places we need the IRQ value and create gi_id for the
per-device value we need.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
The code was using a hand-rolled fcmpset loop, while in other places the same
count is manipulated with the refcount API.
This transferred from a stylistic issue into a bug after the API got extended
to support flags. As a result the hand-rolled loop could bump the count high
enough to set the bit flag. Another bump + refcount_release would then free
the file prematurely.
The bug is only present in -CURRENT.
When there are multiple GICv3 ITS devices we don't know which vmem is for
which device. Use device_get_nameunit to get a per-device name.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
This uses UMA to allocate space. It causes issues when there are multiple
ITS devices in the system where interrupts are not allocated from a low
address on some interrupt controllers. Disabling the quantum cache fixes
this on the Neoverse N1 SDP.
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
fonts. As a workaround, remove the static. vt is default on powerpc, but there's
a few old macs that still fail with vt. sc is used as a work arouond for those
machines, and the kernel fails to build w/o it.
O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping
permissions checks on the directory itself after the initial open(). This is
close to the semantics we've historically applied for O_EXEC on a directory,
which is UB according to POSIX. Conveniently, O_SEARCH on a file is also
explicitly undefined behavior according to POSIX, so O_EXEC would be a fine
choice. The spec goes on to state that O_SEARCH and O_EXEC need not be
distinct values, but they're not defined to be the same value.
This was pointed out as an incompatibility with other systems that had made
its way into libarchive, which had assumed that O_EXEC was an alias for
O_SEARCH.
This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC
respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a
directory is checked in vn_open_vnode already, so for completeness we add a
NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not
re-check that when descending in namei.
[0] https://pubs.opengroup.org/onlinepubs/9699919799/
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23247
If we come from VOP_CACHEDLOOKUP, we must skip the VEXEC check as it will
have been done in the caller (vfs_cache_lookup). This is a part of D23247,
which may skip the earlier VEXEC check as well if the root fd was opened
with O_SEARCH.
This one required slightly more work as zfs_lookup may also be called
indirectly as VOP_LOOKUP or a couple of other places where we must do the
check.
clang inlines fget -> _fget into kern_fstat and eliminates several checkes,
but prior to this change it would assume fget_unlocked was likely to fail
and consequently avoidable jumps got generated.
Use ${SRCTOP} instead of /usr/share.
Prefer to depend on option sc_dflt_fnt instead of sc.
gc the 4 otherwise identical instances in the tree.
Platforms that don't need this won't included it.
Fix the old-style build by using ${SRCTOP} instead of a weird
construct that only works for new-style build.
Simplify the building of keymap files by using macros
Move atkbdmap.h in files.x86
This has been broken since r296899 which removed the implicit
dependency on /usr/share.
Now that armv5 is gone, we no longer need multiple LINT files. Kill
the odd-ball support here. From now on, we just have LINT built from
notes like all the other platforms. Keep the removal of LINT-V5/7
to remove stale files for a while still..
The Parallel Port SCSI adapter was interesting for 100MB ZIP drives, but is no
longer used or maintained. Remove it from the tree.
The Parallel Port microsequencer (microseq.9) is now mostly unused in the tree,
but remains. PPI still refrences it, but doesn't use its full functionality.
Relnotes: Yes
Reviewed by: rgrimes@, Ihor Antonov
Discussed on: arch@
Differential Revision: https://reviews.freebsd.org/D23389