detour - it writes ktrace entries to the filesystem - so the overhead
of ast() won't make any difference.
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26404
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.
Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1
route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20
netstat -6On
Nexthop groups data
Internet6:
GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2
Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default
Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449
With helper page daemon threads, enabled by default in r364786, we
divide the inactive target by the number of threads, rounding down, and
sum the total number of pages freed by the threads. This sum is
compared with the original target, but by rounding down we might lose
pages, causing the page daemon control loop to conclude that inactive
queue scanning isn't keeping up with demand for free pages. Typically
this results in excessive swapping.
Fix the problem by accounting for the error in the main pagedaemon
thread's target. Note that by default the problem will manifest only in
systems with >16 CPUs in a NUMA domain.
Reviewed by: cem
Discussed with: dougm
Reported and tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26610
uma_zalloc_domain() allocates from the requested domain instead of
following a first-touch policy (the default for most zones). Currently
it is only used by malloc_domainset(), and consumers free returned items
with free(9) since r363834.
Previously uma_zalloc_domain() worked by always going to the keg for an
item. As a result, the use of UMA zone caches was unbalanced: we free
items to the caches, but always allocate from the keg, skipping the
caches.
Make some effort to allocate from the UMA caches when performing a
cross-domain allocation. This avoids blowing up the caches when
something is performing many transient allocations with
malloc_domainset().
Reported and tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26427
When SMR was introduced, zone_put_bucket() was changed to always place
full buckets at the end of the queue. However, it is generally
preferable to use recently used buckets since their items are more
likely to be resident in cache. So, for buckets that have no constraint
on item reuse, use a last-in-first-out ordering as we did before.
Reviewed by: rlibby
Tested by: dhw, glebius
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26426
dmi function are used to get smbios values.
The DRM subsystem and drivers use it to enabled (or not) quirks.
Reviewed by: hselasky
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26046
Add backlight function to linuxkpi.
Graphics drivers expose the backlight of the panel directly so allow them to use the backlight subsystem so
user can use backlight(8) to configure them.
Reviewed by: hselasky
Relnotes: yes
Differential Revision: The FreeBSD Foundation
Driver for pwm-backlight compatible device.
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26252
This is a simple subsystem that allow drivers to register as a backlight.
Each backlight creates a device node under /dev/backlight/backlightX and
an alias based on the name provided.
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26250
Currently we allocate and map zero-filled anonymous pages when dumping
core. This can result in lots of needless disk I/O and page
allocations. This change tries to make the core dumper more clever and
represent unbacked ranges of virtual memory by holes in the core dump
file.
Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to
clean up and return an error when it would otherwise map a zero-filled
page. Then, in the core dumper code, prefault all user pages and handle
errors by simply extending the size of the core file. This also fixes a
bug related to the fact that vn_io_fault1() does not attempt partial I/O
in the face of errors from vm_fault_quick_hold_pages(): if a truncated
file is mapped into a user process, an attempt to dump beyond the end of
the file results in an error, but this means that valid pages
immediately preceding the end of the file might not have been dumped
either.
The change reduces the core dump size of trivial programs by a factor of
ten simply by excluding unaccessed libc.so pages.
PR: 249067
Reviewed by: kib
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26590
OBJT_DEFAULT, _SWAP, _VNODE and _PHYS is exactly the set of
non-fictitious object types, so just check for OBJ_FICTITIOUS. The
check no longer excludes dead objects, but such objects have to be
handled regardless.
No functional change intended.
Reviewed by: alc, dougm, kib
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26589
Access faults in user mode are treated like TLB misses, which leads to an
endless loop of faults. It's less serious than the same fault in kernel mode,
because we can just terminate the process, but that's not ideal.
Treat user mode access faults as a bus error.
Suggested by: jrtc27
Reviewed by: br, jhb
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D26621
These tunables can only be set to a valid cluster size (2K, 4K, 9K, or
16K) as documented in the man page. Anything else could lead to a
panic on interface up.
Reported by: mav@
MFC after: 1 week
Sponsored by: Chelsio Communications
- Annotate FreeBSD sysctls with CTLFLAG_MPSAFE
- Reduce stack usage of Lua
- Don't save user FPU context in kernel threads
- Add support for procfs_list
- Code cleanup in zio_crypt
- Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
- Drop references when skipping dmu_send due to EXDEV
- Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
- Fix legacy compat for platform IOCs
The assembly implementation incorrectly used logical AND instead of
bitwise AND. Fix, and re-enable in libmd.
Submitted by: Yang Zhong <yzhong@freebsdfoundation.org>
Reviewed by: cem (earlier)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26614
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared. While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.
This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.
SmartOS bug: https://smartos.org/bugview/OS-8168
Submitted by: Patrick Mooney
Reviewed by: rgrimes
Differential Revision: https://reviews.freebsd.org/D24727
Repeating the default WARNS here makes it slightly more difficult to
experiment with default WARNS changes, e.g. if we did something absolutely
bananas and introduced a WARNS=7 and wanted to try lifting the default to
that.
Drop most of them; there is one in the blake2 kernel module, but I suspect
it should be dropped -- the default WARNS in the rest of the build doesn't
currently apply to kernel modules, and I haven't put too much thought into
whether it makes sense to make it so.
successful RPC.
Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would
loop until the copy "len" was completed. The problem with doing this is
that it might take a considerable time to complete for a large "len".
By returning after a single successful Copy RPC that copied some of the
data, the application that did the copy_file_range(2) syscall will be
more responsive to signal delivery for large "len" copies.
hole size boundary.
By clipping the len argument of vn_generic_copy_file_range() to end at
an exact multiple of hole size, holes are more likely to be maintained
during the copy.
A hole can still straddle the boundary at the end of the
copy range, resulting in a block being allocated in the
output file as it is being grown in size, but this will reduce the
likelyhood of this happening.
While here, also modify setting of blksize to better handle the
case where _PC_MIN_HOLE_SIZE is returned as 1.
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D26570
A user pointer is not a suitable value for bio_data and the next block
of code always overwrites bio_data anyway. Just use cb->aio_buf
directly in the call to vm_fault_quick_hold_pages().
Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26595
Some DSDT entries have multiple interrupts for one device.
Add support for it.
This fixes ahci on NXP LS2160 and genet on RPi4
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25145
In r351368, we introduced this XML- and GDB-encoded data. The protocol
'offset' should reflex the logical XML data offset, but unfortunately we
counted the GDB escapes as well.
In fact, we cannot safely do GDB character escaping at this layer at
all, because we don't know what will be flushed in a packet. It is
bogus to send only the first character of a two-character escape
sequence.
This patch "corrects" the problem by squashing these characters in the
transmitted XML document. It would be nice to transmit the characters
faithfully, but that is a more complicated change. Thread names are a
nice convenience feature for the GDB client, but one can always inspect
td_name or p_comm directly to find the true name.
Reported by: Ka Ho Ng <khng300 AT gmail.com>
Tested by: Ka Ho Ng
Reviewed by: emaste, markj, rlibby
Differential Revision: https://reviews.freebsd.org/D26599
Load/store/fetch access exceptions always indicate a violation of a PMP
rule. We can't treat those as page faults, because updating the page
table and trying again will only result in exactly the same access
exception recurring. This leaves us in an endless exception loop.
We cannot recover from these exceptions, so panic instead.
Reviewed by: jhb
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D26544