Teach the pci host generic ACPI attachment about PCI_ID_OFW_IOMMU. This
will be used by the arm64 smmu IOMMU driver to read the xref and ID
this interface provides in a bus-agnostic way.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D39182
Use the xref from OF_xref_from_node for the smmu xref. We already have
a valid xref ID, there is no need to convert this to a memory address.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D39181
Add a n attachment to the pci_host_generic driver for the Arm DEN0115
PCI Configuration Space Access Firmware Interface [1]. This can be used
when PCI controllers need to implement quirks in the PCI root bus.
To handle this the firmware implements a SMCCC interface the driver can
use to read and write the configuration register.
This has been tested on a Raspberry Pi 4 booting with EDK2.
[1] https://developer.arm.com/documentation/den0115/latest
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D39228
To allow for attachments that don't use memory mapped registers add
a flag they can set when the base driver shouldn't map them.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D39227
When mapping the arm64 KASAN shadow map we use Ln_TABLE_MASK to align
physical addresses, however these should already be aligned either
by rounding to a greater alignment, or the VM subsystem is giving us
a correctly aligned page.
Remove these extra alignment masks.
Reviewed by: kevans
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D39752
hid_input is equal to 0. It is leftover from NetBSD code.
Reviewed by: hselasky, wulf
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28149
Also adds fixups and cleanups:
- apply the child's mode/speed
- implement suspend/resume support
- use RF_SHAREABLE interrupts
- use bus_delayed_attach_children since the transfer can use interrupts
- add support for newly added spibus features (cs_delay and flags)
Operation tested on Broadwell (Wildcat Point) MacBookPro12,1.
Attachment also tested on Kaby Lake (Sunrise Point) Pixelbook.
Reviewed by: wulf
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D29249
These feature are required for an upcoming Apple MacBook topcase
(HID over SPI) driver:
A delay after toggling CS is required to avoid anomalies like an extra
junk byte in front of the message. Keeping CS asserted is required to
be able to read a status report after writing a command. (The device
won't return the status if CS was deasserted.)
Sleep is not allowed in the interrupt context where the Apple input
driver runs its transactions. Use a flag to tell the SPI driver to
avoid mtx_sleep.
Reviewed by: manu (ok to SPI part of larger patch)
MFC afret: 1 month
Differential revision: https://reviews.freebsd.org/D29534
Import ISC-licensed ath10k driver assumed to be
based on Linux kvalo/ath.git master at
6bae9de622d3ef4805aba40e763eb4b0975c4f6d.
Import support to redirect fwlogs to kernel messages
from https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/389075
Complement the driver to make compile on FreeBSD
using LinuxKPI with changes covered by #ifdef (__FreeBSD__).
Further select updates were applied since the initial import
in order to keep compiling along with other LinuxKPI based
drivers.
Any other native driver using BUS_PROBE_DEFAULT will attach
ignoring this one by default given bsd_probe_return is set
to a lower priority.
Add the module build framework.
We only support PCI parts.
The firmware is provided by port net/wifi-firmware-ath10k-kmod.
Given the lack of full license texts on most files this is
imported under the draft policy for handling SPDX files (D29226). [1]
Approved by: core (emaste, 2022-04-08) [1]
MFC after: 2 months
Import common ISC-licensed athk parts assumed to be
based on Linux kvalo/ath.git master at
6bae9de622d3ef4805aba40e763eb4b0975c4f6d.
The only modification should be for FreeBSD module
handling in main.c.
Add the module build framework unconnected to the
build for now.
These files will be shared by ath1?k drivers.
MFC after: 2 months
Add files needed by ath1?k drivers to linuxkpi/linuxkpi_wlan.
This contain (skeleton) implementations of what is needed to
compile but specifically mhi/qmi/qrtr will need more work for
ath11k.
MFC after: 2 months
Import ISC-licensed driver parts of mediatek/mt76
assumed to be based on Linux wireless-testing at
a02411a5b98612c12be99349836d99f07db12a77 (tag: wt-2022-11-23).
Complement the driver and LinuxKPI with our own (dummy)
implementations of missing parts (util.h and soc/mediatek/)
as well as changes to make compile on FreeBSD with changes
covered by #ifdef (__FreeBSD__) conditions.
Further select updates were applied since the initial import
in order to keep compiling along with other LinuxKPI based
drivers.
For the moment we only target the mt7915 and mt7921 PCI parts.
More may follow in the future.
Firmware is provided by port net/wifi-firmware-mt76-kmod.
Given the lack of full license texts on non-local files this is
imported under the draft policy for handling SPDX files (D29226). [1]
Approved by: core (emaste, 2022-04-08) [1]
MFC after: 2 months
Summary:
After https://github.com/llvm/llvm-project/commit/b4257d3bf58c ("[tsan]
Replace mem intrinsics with calls to interceptors") intrinsic calls to
memcpy, memmove or memset will directly call sanitizer interceptors,
e.g. __tsan_memcpy, __tsan_memmove or __tsan_memset.
Building GENERIC-KCSAN with clang >= 16 would thus result in link errors
similar to:
ld: error: undefined symbol: __tsan_memcpy
>>> referenced by cam_compat.c:150 (/usr/src/sys/cam/cam_compat.c:150)
>>> cam_compat.o:(cam_compat_handle_0x17)
>>> referenced by cam_compat.c:151 (/usr/src/sys/cam/cam_compat.c:151)
>>> cam_compat.o:(cam_compat_handle_0x17)
>>> referenced by cam_compat.c:152 (/usr/src/sys/cam/cam_compat.c:152)
>>> cam_compat.o:(cam_compat_handle_0x17)
>>> referenced 1692 more times
Similar to subr_msan.c, add aliases from the existing kcsan_* versions
of these functions to __tsan_* names.
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39772
It appears that PAC registers are configured to trap upon access, but
since the kernel starts in EL1 on this platform it has no ability to
inspect or modify this configuration. Simply disable PAC on this
platform for now, since the kernel otherwise hangs during boot.
PR: 270472
Reviewed by: andrew, emaste
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39748
- In _in_pcbinshash_wild(), we should avoid returning v6 sockets unless
no other matches are available. This preserves pre-existing
semantics.
- Fix an inverted test: when inserting a non-jailed PCB, we want to
search for the first non-jailed PCB in the hash chain.
- Test the right PCB when searching for a non-jailed PCB.
While here, add a required locking assertion.
Fixes: 7b92493ab1 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
In pcie_capability_read_*() always initialize the return value to
avoid warnings of uninitialized values in callers.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39721
While experimenting with changing boolean_t to another type, I noticed
that several powerpc pmap related functions returned the wrong type:
boolean_t instead of int.
Fix several declarations and definitions to match the actual pmap
function types: pmap_dev_direct_mapped_t and pmap_ts_referenced_t.
MFC after: 3 days
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.
No functional change intended.
Reviewed by: kp
Fixes: 6d8fdfa9d5 Rework IP encapsulation handling code
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39746
As the flag M_WAITOK is passed to ip_encap_attach(), then the function
will never return NULL, and the following code within NULL check branch
will be unreachable.
No functional change intended.
Reviewed by: kp
Fixes: 6d8fdfa9d5 Rework IP encapsulation handling code
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39746
Linux kernel version 5.15 named Trick or Treat is a 22nd LTS release.
Reviewed by: trasz, emaste
Differential Revision: https://reviews.freebsd.org/D39649
MFC after: 1 month
AT_EXECFN has appeared in the 2.6.26 Linux kernel first time.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D39647
MFC after: 1 month
AT_RANDOM has appeared in the 2.6.30 Linux kernel first time.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D39646
MFC after: 1 month
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.
Differential Revision: https://reviews.freebsd.org/D39644
MFC after: 1 month
The powerpc asm from openzfs assumes that big-endian is always ELFv1 and
ELFv2 is always little-endian, while FreeBSD uses ELFv2 everywhere. Add
the necessary bits to the checksum asm to work on big-endian ELFv2.
This was also submitted upstream as PR#14779.
Tested by: dbaio
It is identical to noinline and used for documentation reasons.
Required by: drm-kmod 5.15-lts
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D39553
bitmap_to_arr32() copies contents of bitmap to a uint32_t array of bits
Required by: drm-kmod 5.15-lts
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D39552
There's no need to quote the # here. Inside of regexp, it's not treated
like a comment from an awk perspective. And inside if '' it's not
treated as special by the shell. gawk also warns.
Sponsored by: Netflix
The other stacks it turns out actually expect the output to be called and can become stuck if it is
not. This is because they run there timer code from there and the input routine does not always
assure a timer is running. The real longterm fix here might be to go into the other stacks (rack and bbr)
and make sure that a timer is running after input if you don't do output.. as well as call the timer functions.
This would cut down on calls from hpts. But I think its too dramatic of a change for the immediate time.
Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39738
Before the commit 6cc44223cb the
field event_mask was fully copied to the EventMasks field.
After this commit the event_mask (uint8_t) is 4 times casted to
EventMask (uint32_t). Because of that 24 bits of each event_mask array
is lost.
This commits brings back simple copying of field, and after words
converting 32 bits field to the requested endian.
I don't think we need more sophisticated method,
as the array is of size 4 (for 32 bits version).
Reviewed by: imp
MFC after: 1 week
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D39562
Commit 3e0856b63f updated
__sg_alloc_table_from_pages to use the same API as linux, but modified
the loop condition when going over the pages in a sg list. Part of the
change included moving the sg_next call out of the for loop and into the
body, which causes an off by one error when traversing the list. Since
sg_next is called before the loop body it will skip the first element
and read one past the last element.
This caused panics when running PRIME with nvidia-drm as the off-by-one
issue causes a NULL dereference.
Reviewed by: bz, hselasky
Differential Revision: https://reviews.freebsd.org/D39628
Fixes: 3e0856b63f ("linuxkpi: Fix `sg_alloc_table_from_pages()` to have the same API as Linux")
The 4.2 sigreturn was a bit of a enima so the 4.2 was remove. Regenerate
to cope the very minor changes in comments and one string.
Sponsored by: Netflix
Back in 4.3BSD, the system call table wasn't generated, and there was an
entry:
"4.2 sigreturn", /* 139 = old 4.2 sigreturn */
This got converted to
139 OBSOL 0 4.2 sigreturn
in 4.3 RENO. Since it was obsolete, nothing bad happened. In fact,
there was code in makeyscalls.sh to cope:
{ comment = $4
for (i = 5; i <= NF; i++)
comment = comment " " $i
if (NF < 5)
$5 = $4
}
so the generated comment in syscalls.c was almost correct:
"obs_4.2", /* 139 = obsolete 4.2 sigreturn */
a bug that we have to this very day, despite makesyscalls.sh being
rewritten in lua.
However, this historical wart is the only place in our current
syscalls.master file where we have an extra field for the 'not
generated' class of system calls. Remove the historical wart so that the
re-write of makesyscalls.lua can be simpler (so, I hope, qemu's bsd-user
can large swathes of code automatically generated too). This should help
make things more understandable (changes to simplify makesyscalls.lue
aren't quite debugged, so have to wait for another day).
There's 3 different obsolete sigreturns (but only 1 that was ever in
FreeBSD 2.x and newer).
Sponsored by: Netflix
These are the changes since the last update (copy-pasted from the
release notes for Chelsio Unified Wire v3.18.0.0):
====================
Version : 1.27.3.0
Date : 04/07/2023
Fixes
-----
BASE:
- Fixed a hang if module eeprom reads gives invalid data.
- KR backlplane no-fec link problem fixed.
OFLD:
- iscsi ddp errors fixed.
- iwarp connection abort in rare cases causing NIC traffic hang fixed.
ENHANCEMENTS
------------
BASE:
- Cisco GLC-TE 1G modules support added.
====================
Version : 1.27.1.0
Date : 12/02/2022
Fixes
-----
BASE:
- memwrite dsgl cannot be used for T5.
OFLD:
- Enabled FCoE in SO adapters.
- TOE-TLS crash fixed.
- iscsi hang fixed.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not. And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).
Sponsored by: Netflix
luacheck pointed out two minor issues: line isn't declared as a global,
so declare it local. Also remove an unused parameter.
Suggested by: kevans
Sponsored by: Netflix
x["y"] can be written as x.y, which looks better and is a more typical
lua idiom.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D39709
This change touches both kernel and netstat(1), but either of the changes
will fix printing pcb addresses with -A.
The thing is that historically netstat(1) treated TCP differently, and
printed tcpcb address instead of inpcb address. This is not documented
anywhere! With e68b379244 these two addresses became the same. It is
highly likely they will be the same for a long time, but it might be they
will start to differ again in a far future. My proposal is to stop
treating TCP differently with netstat(1) and right now is a good opportunity
to do that, since there will be no behavior change at all. The kernel
change to tcp_inptoxtp() will go into stable/14 to make it compatible with
netstat(1) binary from stable/13. We can drop it later, probably together
with in_ppcb pointer from inpcb. The in_ppcb in xinpcb will stay for size
compatibility.
Reviewed by: tuexen, rrs
Differential Revision: https://reviews.freebsd.org/D39736
Add DPAA2 console support for MC and AIOP (latter untested) for FDT
systems. ACPI systems are prepared but need some proper bus function
in order to get the address from MC (and likely a file splitup then).
This will come at a later stage once other ACPI/FDT bus parts are
cleared up.
The work was originally done in July 2022 and finally switched to
bus_space[1] lately to be ready for main.
Suggested by: andrew [1]
Reviewed by: dsl
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D38592
dtrace_instr_size() is needed by the forthcoming RISC-V port of kinst,
as well as by libdtrace in D38825 for both amd64 and RISC-V.
Reviewed by: markj, mhorne
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39489
Callers are specifying uint8_t anyway and this slightly reduces
dependencies on compatibility typedefs. No functional change intended.
Reviewed by: markj, mhorne
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39490
Now that the inp_cred pointer is accessed only while the inpcb lock is
held, we can avoid deferring a crfree() call when freeing an inpcb.
This fixes a problem introduced when inpcb hash tables started being
synchronized with SMR: the credential reference previously could not be
released until all lockless readers have drained, and there is no
mechanism to explicitly purge cached, freed UMA items. Thus, ucred
references could linger indefinitely, and since ucreds hold a jail
reference, the jail would linger indefinitely as well. This manifests
as jails getting stuck in the DYING state.
Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38573
The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets. To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.
Changing SMR to periodically recycle garbage is not trivial. Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred. This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.
Commit 220d892129 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there. This patch goes further to handle
lookups of unconnected sockets. Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics. This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.
In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.
Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups. When a match is found, we try to lock the inpcb and
re-validate its connection info. In the common case, this works well
and we can simply return the inpcb. If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.
Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR. In
particular, lbgroups are rarely allocated or freed.
I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.
Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38572
These functions will get some additional callers in future revisions.
No functional change intended.
Discussed with: glebius
Tested by: glebius
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38571
Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.
Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb. Now the database has one table each
for connected and unconnected sockets.
When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use. This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away. There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.
I also made the "rehash" parameter of in(6)_pcbconnect() unused. This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.
UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle. To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number. When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.
raw_ip (ab)uses the hash table without using the corresponding
accessors. Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs. This will be addressed in some
way in the future.
inp interators which specify a hash bucket will only visit connected
PCBs. This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.
Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38569
Move a KASSERT out of a function and make it a CTASSERT with
appropriate comments.
Skeleton implement two tkip functions, still left TODO, initializing
variables with dummy values to quiten compiler warnings. It is
unclear to me if we should still ever properly implement TKIP
compat code at this point. If so the current code gives a good
idea what needs to be done in addition to allocating references
to real state along with keyconf.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Quieten some more (valid) gcc warnings and disable dead code.
There are more warnings, some probably a compiler problem, the
other related to firmware structs which I do not want to adjust
just locally. Leave a comment to revisit after a next driver
update.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Rather than using ACCESS_ONCE() in READ_ONCE() add a missing cast
to const in order to satisfy -Wcast-equal by gcc.
Sadly we cannot do the same to WRITE_ONCE() which still is very
noisy.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D39706
This change is required to support interface renaming via Netlink.
No functional changes intended.
Reviewed by: zlei
Differential Revision: https://reviews.freebsd.org/D39692
MFC after: 2 weeks
We are asserting that two values from different enums are the same.
gcc warns about these. Cast the values to (int) to avoid the warning.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Harmonize sk_buff_head and sk_buff further and fix -Warray-bounds
warnings reports by gcc. At the same time simplify some code by
re-using other functions or factoring some code out.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.
Reported by: Clang
Reviewed by: emaste, wulf
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39665
Two vfs.cache.stats names are fixed:
- s/.dotdothis/.dotdothits/
- s/.posszaps/.poszaps/
Signed-off-by: Igor Ostapenko <pm@igoro.pro>
[mjg: massaged the header a little bit]
When doing request level BB logging the hybrid_bw_log() does not have proper screening to minimize logging
when point level logging is in use. Lets fix it properly so you have to have the proper knobs set to get the
more noisy logging.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39699
Turns out the location of the check to see if we can do output is in the wrong place. We need
to jump off to the compressed acks before handling that case since th is NULL in the
compressed ack case which is handled differently anyway.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39690
holds some nice stats about why/how the connection ended. Though with the current code it does not
come out without accounting due to the placement of the ifdefs. Also we need to make sure the stacks
fini has ran before calling in from tcp_subr so we get all logs the stack may make at its ending.
Reviewed by: rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39693
t4_dump_stag to dump hw state for a known STAG.
t4_dump_all_stag to dump hw state for all valid STAGs. This routine
walks the entire STAG region looking for valid entries and this can take
a while for some configurations.
MFC after: 1 week
Sponsored by: Chelsio Communications
struct dpaa2_cmd is no longer malloc'ed, but can be allocated on stack
and initialized with DPAA2_CMD_INIT() on demand. Drivers stopped caching
their DPAA2 command objects (and associated tokens) in the software
contexts in order to avoid using them concurrently.
Reviewed by: bz
Approved by: bz (mentor)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D39509
We need to make the syncache aware of the flag and not do ECN if its set. Note that this
is not 100% full proof but the best we can do (i.e. its still possible that you can get in a
situation where the peer try's to do ecn).
Reviewed by: tuexen, glebius, rscheff
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39672
Both pf_rules_lock and pf_ioctl_lock only ever affect one vnet, so
there's no point in having these locks affect other vnets.
(In fact, the only lock in pf that can affect multiple vnets is
pf_end_lock.)
That's especially important for the rules lock, because taking the write
lock suspends all network traffic until it's released. This will reduce
the impact a vnet running pf can have on other vnets, and improve
concurrency on machines running multiple pf-enabled vnets.
Reviewed by: zlei
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39658
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".
The fix is to protect "use_xsave" from being initialized more than once.
Reported and reviewed by: stevek
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39660
Since syncer vnode vector does not provide a fallback to the default
one, its VOP_GETWRITEMOUNT() implementation implicitly returned
EOPNOTSUPP, which means that syncer ignored suspension.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Ensure MAC modules are inserted in order that they are registered.
Reviewed by: markj
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39589
These bits are obsolete since 58aa35d429.
This change reverts part of 9ba2b298df as
well as effectively bd3d9826d7, i. e. the
SBus-related modifications. This also gets rid of a nasty hack required
as bus_{read,write}_N(9) doesn't really fit bus_space_subregion(9).
The original idea behind calling into the bridge driver was to have the
logic deciding whether tuning is actually required for a particular bus
timing in a given slot as well as doing the sanity checking only on the
controller layer which also generally is better suited for these due to
say SDHCI_SDR50_NEEDS_TUNING. On another thought, not every such driver
should need to check whether tuning is required at all, though, and not
everything is SDHCI in the first place.
Adjust sdhci{,_fsl_fdt}(4) accordingly, but keep sdhci_generic_tune() a
bit cautious still.
Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One
such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to
rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the
"no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside
the hpts assurance of 250useconds.
Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go
from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed.
This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing
is going on may be harmful.
Lets fix all this and explicitly state the contract that hpts is making with transports that care about the
flag.
Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39653
Using .ALLSRC may get additional arguments that we may not want
and could cause the objcopy to fail.
Reviewed by: emaste
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D39639
PF_TABLE_STATS_ASSERT() should be checking pf_table_stats_lock not
pf_rules_lock.
Fortunately the define is not yet used anywhere so this was harmless.
Fix it anyway, in case it does get used.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Failure to get mbufs may be transient.
Don't permanently fail to open the channels due to lack of mbufs.
This also makes modifying channel parameters faster.
MFC after: 1 week
Sponsored by: NVIDIA Networking
Implement one CQ modify function supporting all firmware versions,
instead of having more variants of CQ modify.
MFC after: 1 week
Sponsored by: NVIDIA Networking
When using hardware pacing, this value can be increased, because more SQ's
means more EQ events aswell. Make it tunable, hw.mlx5.comp_eq_size .
MFC after: 1 week
Sponsored by: NVIDIA Networking
If you currently turn BB logging on and in combination have TCP Accounting on we can get a
crash where we have no NULL check and we run out of memory. Also lets make sure we
don't do a divide by 0 in calculating any BB ratios.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39622
The VV_CROSSLOCK handling logic may need to upgrade the covered
vnode lock depending upon the requirements of the filesystem into
which vfs_lookup() is walking. This may involve transiently
dropping the lock, which can allow the target mount to be unmounted.
Tested by: pho
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D39272
The underlying VOP_MKDIR() implementation may temporarily drop the
parent directory vnode's lock. If the vnode is reclaimed during that
window, the unionfs vnode will effectively become unlocked because
the its v_vnlock field will be reset. To uphold the locking
requirements of VOP_MKDIR() and to avoid triggering various VFS
assertions, explicitly re-lock the unionfs vnode before returning
in this case.
Note that there are almost certainly other cases in which we'll
similarly need to handle vnode relocking by the underlying FS; this
is the only one that's caused problems in stress testing so far.
A more general solution, such as that employed for nullfs in
null_bypass(), will likely need to be implemented.
Tested by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D39272
The implementation is racy; if the unionfs vnode is not in fact
locked, vnode private data may be concurrently altered or freed.
Instead, simply rely upon the standard implementation to query the
v_vnlock field, which is type-stable and will reflect the correct
lower/upper vnode configuration for the unionfs node.
Tested by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D39272
We hold the vnode interlock, so vnode private data cannot suddenly
become NULL.
Tested by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D39272
The LK_UPGRADE operation may have temporarily dropped the upper or
lower vnode's lock. If the unionfs vnode was reclaimed during that
window, its lock field will be reset to no longer point at the
upper/lower vnode lock, so the lock operation will use the standard
lock stored in v_lock. Remove LK_UPGRADE from the flags in this case
to avoid a lockmgr assertion, as this lock has not been previously
owned by the calling thread.
Reported by: pho
Tested by: pho
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D39272
A one-bit wide bit-field can take only the values 0 and -1. Clang 16
introduced a warning that "implicit truncation from 'int' to a one-bit
wide bit-field changes value from 1 to -1". Fix by using c99 bool.
Reported by: Clang, via dim
Reviewed by: dim
Sponsored by: The FreeBSD Foundation
The recent 25685b7537 came in conflict with a540cdca31. Remove the
code that cleans up the old style input queue. Note that two lines
below we assert that the new style input queue is empty. The TCP
stacks that use the queue are supposed to flush it in their
tfb_tcp_fb_fini method.
Expect that drivers call into the network stack with the net epoch
entered. This has already been the fact since early 2020. The net
interrupts, that are marked with INTR_TYPE_NET, were entering epoch
since 511d1afb6b. For the taskqueues there is NET_TASK_INIT() and
all drivers that were known back in 2020 we marked with it in
6c3e93cb5a. However in e87c494015 we took conservative approach
and preferred to opt-in rather than opt-out for the epoch.
This change not only reverts e87c494015 but adds a safety belt to
avoid panicing with INVARIANTS if there is a missed driver. With
INVARIANTS we will run in_epoch() check, print a warning and enter
the net epoch. A driver that prints can be quickly fixed with the
IFF_NEEDSEPOCH flag, but better be augmented to properly enter the
epoch itself.
Note on TCP LRO: it is a backdoor to enter the TCP stack bypassing
some layers of net stack, ignoring either old IFF_KNOWSEPOCH or the
new IFF_NEEDSEPOCH. But the tcp_lro_flush_all() asserts the presence
of network epoch. Indeed, all NIC drivers that support LRO already
provide the epoch, either with help of INTR_TYPE_NET or just running
NET_EPOCH_ENTER() in their code.
Reviewed by: zlei, gallatin, erj
Differential Revision: https://reviews.freebsd.org/D39510
Some 32bit apps may need to be able to use
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL
MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL
Therefore compat32 support is required.
Obtained from: Juniper Networks, Inc.
Ensure veriexec opens the file before doing any read operations.
When the MAC_VERIEXEC_CHECK_PATH_SYSCALL syscall is requested, veriexec
needs to open the file before calling mac_veriexec_check_vp. This is to
ensure any set up is done by the file system. Most file systems do not
explicitly need an open, but some (e.g. virtfs) require initialization
of access tokens (file identifiers, etc.) before doing any read or write
operations.
The evaluate_fingerprint() function needs to ensure it has an open file
for reading in order to evaluate the fingerprint. The ideal solution is
to have a hook after the VOP_OPEN call in vn_open. For now, we open the
file for reading, envaluate the fingerprint, and close the file. While
this leaves a potential hole that could possibly be taken advantage of
by a dedicated aversary, this code path is not typically visited often
in our use cases, as we primarily encounter verified mounts and not
individual files. This should be considered a temporary workaround until
discussions about the post-open hook have concluded and the hook becomes
available.
Add MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL and
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL to mac_veriexec_syscall so we can
fetch and check label contents in an unconstrained manner.
Add a check for PRIV_VERIEXEC_CONTROL to do ioctl on /dev/veriexec
Make it clear that trusted process cannot be debugged. Attempts to debug
a trusted process already fail, but the failure path is very obscure.
Add an explicit check for VERIEXEC_TRUSTED in
mac_veriexec_proc_check_debug.
We need mac_veriexec_priv_check to not block PRIV_KMEM_WRITE if
mac_priv_gant() says it is ok.
Reviewed by: sjg
Obtained from: Juniper Networks, Inc.
The contents of frame->tf_tp are uninitialized if accessed by DTrace (in
probe context), resulting in a panic when trying to access the memory
pointed to by tp. This saves the thread pointer to the trap frame when
handling both userland and kernel exceptions.
Reviewed by: markj, mhorne
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39582
Keep block cloning disabled by default for now, but allow to enable and
use it after setting vfs.zfs.bclone_enabled to 1, so people can easily
try it.
Approved by: oshogbo
Reviewed by: mm, oshogbo
Differential Revision: https://reviews.freebsd.org/D39613
939a050ad9 virtualized lagg(4), but the corresponding sysctl of some
virtualized global variables are not marked with CTLFLAG_VNET. A try to
operate on those variables via sysctl will effectively go to the 'master'
copies and the virtualized ones are not read or set accordingly. As a
side effect, on updating the 'master' copy, the virtualized global
variables of newly created vnets will have correct values.
PR: 270705
Reviewed by: kp
Fixes: 939a050ad9 Virtualize lagg(4) cloner
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39467
The zfs_log_clone_range() function is never called from the
zfs_clone_range_replay() function, so I assumed it is safe to assert
that zil_replaying() is never TRUE here. It turns out zil_replaying()
also returns TRUE when the sync property is set to disabled.
Fix the problem by just returning if zil_replaying() returns TRUE.
Reported by: Florian Smeets
Signed-off-by: Pawel Jakub Dawidek pawel@dawidek.net
Approved by: oshogbo, mm
Fix data corruption when cloning embedded blocks
Don't overwrite blk_phys_birth, as for embedded blocks it is part of
the payload.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Issue #13392Closes#14739
Approved by: oshogbo, mm
Only build MAC/veriexec modules when MK_VERIEXEC is yes or we
are building all modules.
Add VERIEXEC knob to kernel __DEFAULT_NO_OPTIONS
Reviewed by: sjg
Obtained from: Juniper Networks, Inc.
Allow other MAC modules to override some veriexec checks.
We need two new privileges:
PRIV_VERIEXEC_DIRECT process wants to override 'indirect' flag
on interpreter
PRIV_VERIEXEC_NOVERIFY typically associated with PRIV_VERIEXEC_DIRECT
allow override of O_VERIFY
We also need to check for PRIV_VERIEXEC_NOVERIFY override
for FINGERPRINT_NODEV and FINGERPRINT_NOENTRY.
This will only happen if parent had PRIV_VERIEXEC_DIRECT override.
This allows for MAC modules to selectively allow some applications to
run without verification.
Needless to say, this is extremely dangerous and should only be used
sparingly and carefully.
Obtained from: Juniper Networks, Inc.
Reviewers: sjg
Subscribers: imp, dab
Differential Revision: https://reviews.freebsd.org/D39537
Summary:
Check for PRIV_KDB_SET_BACKEND before allowing a thread to change
the KDB backend.
Obtained from: Juniper Networks, Inc.
Reviewers: sjg, emaste
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D39538
For a process supervisor using the reaper API to track process subtrees,
it is very useful to know the state of the processes on the list.
Sponsored by: https://www.patreon.com/valpackett
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39585
Summary:
Typecasting both parts of the comparison to u_int quiets compiler
warnings about signed/unsigned comparison and takes care of positive
and negative numbers for the file descriptor in a single comparison.
Obtained from: Juniper Netwowrks, Inc.
Reviewers: mjg
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D39593
ccb_h.status has two parts: the actual status and some addition bits to
indicate additional information. It must be masked before comparing
against completion codes. Add new inline function cam_ccb_success to
simplify this to test whether or not the request succeeded. Most of the
code already does this, but a few places don't (the rest likely should
be converted to use cam_ccb_status and/or cam_ccb_success, but that's
for another day). This caused at least one bug in recognizing devices
behind a SATA port multiplexer, though some of these checks were
fine with the special knowledge of the code paths involved.
PR: 270459
Sponsored by: Netflix
MFC After: 1 week (and maybe a EN requst)
Reviewed by: ken, mav
Differential Revision: https://reviews.freebsd.org/D39572
There is one data corruption problem reported and fixed upstream, not
cherry-picked here yet.
On top of it the following fires under load:
VERIFY(zil_replaying(zfsvfs->z_log, tx));
The patch which introduced the entire machinery is a revert candidate,
but as the machinery came with a dedicated feature flag, doing so would
render affected pools read-only at best. To be figured out.
As a temporary bandaid at least stop the active usage.
Note this patch does not make the feature disappear from zpool upgrade.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Add the remaining bus_space*read*_8 functions conditionally for
only arm64 in order to not break KASAN builds with new code using
one of them.
Suggested by: markj
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39581
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact. Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.
For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().
Reported by: andrew
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39556
Its possible to induce a crash in either rack or bbr. This would be done
if the rack stack were say the default and bbr was being used by a connection.
If the bbr stack is then unloaded and it was active, we will trigger a MPASS assert
in tcp_hpts since the new stack (default rack) would start a timer, and the old stack
(bbr) would have the inp already in hpts.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39576
Current Netlink message writer code relies on executing callbacks
with arbitrary data (pointer or integer) to flush the completed
messages.
This arbitrary data is stored as a union of { void *, uint64_t }.
At some stage, the message flushing code copied this data, using
direct uint64_t assignment instead of copying the union. It lead
to failure on CHERI, as sizeof(pointer) == 16 there.
Fix the code by making union non-anonymous and copying it entirely.
Reviewed by: br, jhb, jrtc27
Differential Revision: https://reviews.freebsd.org/D39557
MFC after: 2 weeks
This changes intends to reduce the bar to the kernel unit-testing by
introducing a new kernel-testing framework ("ktest") based on Netlink,
loadable test modules and python test suite integration.
This framework provides the following features:
* Integration to the FreeBSD test suite
* Automatic test discovery
* Automatic test module loading
* Minimal boiler-plate code in both kernel and userland
* Passing any metadata to the test
* Convenient environment pre-setup using python testing framework
* Streaming messages from the kernel to the userland
* Running tests in the dedicated taskqueues
* Skipping or parametrizing tests
Differential Revision: https://reviews.freebsd.org/D39385
MFC after: 2 weeks
RX MCS set defines which MCSs are supported for RX, bits 0-31 are for equal
modulation of the streams, bits 33-76 are for unequal case. Current code checks
txstreams variable instead of rxstreams to set bits from 53 to 76 for 4 spatial
streams case.
The modulations are defined in tables 19-38 and 19-41 of the IEEE Std
802.11-2020.
Spotted by bz in https://reviews.freebsd.org/D39476
Reviewed by: bz
Approved by: bz
Sponsored by: Serenity Cybersecurity, LLC
Differential Revision: https://reviews.freebsd.org/D39568
Current code checks whether or not txstreams are equal to rxstreams and if it
isn't - sets needed bits in "Transmit MCS Set". But if they are equal it sets
whole set to zero, which contradicts the standard, if tx and rx streams are
equal 'Tx MCS Set Defined' (table 9-186, IEEE Std 802.11-2020) must be set to
one.
Reviewed by: bz
Approved by: bz
Sponsored by: Serenity Cybersecurity, LLC
Differential Revision: https://reviews.freebsd.org/D39476
The xen_domain_type and HYPERVISOR_shared_info variables are shared by
all Xen architectures, so they should be in common rather than
reimplemented by each architecture.
hvm_start_flags is used by xen_initial_domain() and so needs to be in
common.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D28982
The event channel source code or equivalent is needed on all
architectures. Since much of this is viable to share, get this moved out
of x86-land. Each interrupt interface then needs a distinct back-end
implementation.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-01-13 17:41:04
Differential Revision: https://reviews.freebsd.org/D30236
Simply moving the interrupt allocation and release functions into files
which belong to the architecture. Since x86 interrupt handling is quite
distinct from other architectures, this is a crucial necessary step.
Identifying the border between x86 and architecture-independent is
actually quite tricky. Similarly, getting the prototypes for the
border right is also quite tricky.
Inspired by the work of Julien Grall <julien@xen.org>,
2015-10-20 09:14:56, but heavily adjusted.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30936
The x86 PIC interface is very much x86-specific and not used by other
architectures. Since most of xen_intr.c can be shared with other
architectures, the PIC interface needs to be broken off.
Introduce wrappers for calls into the architecture-dependent interrupt
layer. All architectures need roughly the same functionality, but the
interface is slightly different between architectures. Due to the
wrappers being so thin, all of them are implemented as inline in
arch-intr.h.
The original implementation was done by Julien Grall in 2015, but this
has required major updating.
Removal of PVHv1 meant substantial portions disappeared. The original
implementation took care of moving interrupt allocation to
xen_arch_intr.c, but this has required massive rework and was broken
off.
In the original implementation the wrappers were normal functions. Some
had empty stubs in xen_intr.c and were removed.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30909
This value doesn't need to be set in xen_intr_alloc_isrc(). What is
needed is simply to ensure the allocated xenisrc won't appear as free,
even if xi_type is written non-atomically. Since the type is no longer
used to indicate free or not, the calling function should take care of
all non-architecture initialization.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31188
Scanning the list of interrupts to find an unused entry is rather
inefficient. Instead overlay a free list structure and use a list
instead.
This also has the useful effect of removing the last use of evtchn_type
values outside of xen_intr.c.
Reviewed by: royger
[royger]
- Make avail_list static.
The evtchn_type enum is only touched by the Xen interrupt code. Other
event channel uses no longer need the value, so that has been moved to
restrict its use.
Copyright note. The current evtchn_type was introduced at 76acc41fb7
by Justin T. Gibbs. This in turn appears to have been heavily inspired
by 30d1eefe39 done by Kip Macy.
Reviewed by: royger
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header. A
similar situation exists for the NR_EVENT_CHANNELS constant.
Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.
This was originally implemented by Julien Grall, but has been heavily
modified. The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around. This allows the
architecture to add function definitions which use struct xenisrc.
The original version only moved xi_intsrc into xen_arch_isrc_t. Moving
xi_vector was done by the submitter.
The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t. Those disappeared with the removal of PVHv1 support.
Copyright note. The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs. Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
- Adjust some line lengths
- Fix comment about NR_EVENT_CHANNELS after movement.
- Use #include instead of symlinks.
xen_intr_handle_upcall() has two interfaces. It needs to be called by
the x86 assembly code invoked by the APIC. Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.
Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface. Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().
When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled. This removes the need for
critical_enter()/critical_exit() when called this way.
The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.
Additionally driver_filter_t functions have no need to handle interrupt
counters. The intrcnt_add() calling function was reworked to match the
current situation. intrcnt_add() is now only called via one path.
The increment/decrement of curthread->td_intr_nesting_level had
previously been left out. Appears this was mostly harmless, but this
was noticed during implementation and has been added.
CONFIG_X86 is a leftover from use with Linux. While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.
Copyright note. xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs. xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.
sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface. Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later. As such the filename and header list
belong to Julien Grall, but what those were created for is later.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
Keeping released xenisrcs in a known state simplifies allocation, but
forces the allocation function to maintain that state. This turns into
a problem when trying to allow for interchangeable allocation functions.
Fix this issue by ensuring xenisrcs are always *fully* initialized
during binding.
Reviewed by: royger
There are actually several distinct locking domains in xen_intr.c, but
all were sharing the same lock. Both xen_intr_port_to_isrc[] and the
x86 interrupt structures needed protection. Split these two apart as a
precursor to splitting the architecture portions off the file.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
Locking for allocation was being done in xen_intr_bind_isrc(), but the
unlock was inside xen_intr_alloc_isrc(). While the lock acquisition at
the end of xen_intr_alloc_isrc() was to modify xen_intr_port_to_isrc[],
NOT allocation. Fix this garbled (though working) locking scheme.
Now locking for allocation is strictly in xen_intr_alloc_isrc(), while
locking to modify xen_intr_port_to_isrc[] is in xen_intr_bind_isrc().
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
The call structure around xen_intr_alloc_isrc() was rather awful.
Notably finding a structure for reuse is part of allocation, but this
was done outside xen_intr_alloc_isrc(). Move this into
xen_intr_alloc_isrc() so the function handles all allocation steps.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30726
As "CPUs", IRQs (vector) and virtual IRQs are always positive integers,
adjust the Xen code to use unsigned integers. Several format strings
need adjustment to match. Additionally single-bit bitfields are
boolean.
No functional change expected.
Reviewed by: royger
This overlaps the purpose of __XEN_INTERFACE_VERSION__. Remove Xen 3.0.2
compatibility. __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled. As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.
Reviewed by: royger
HYPERVISOR_poll(), HYPERVISOR_block(), and HYPERVISOR_crash() appear no
longer used. Further get_system_time() appears to have disappeared at
some point in the past, so HYPERVISOR_poll() was broken anyway.
No functional change intended.
Reviewed by: royger
The present implementation is only for x86. Other architectures need
adjustments for querying presence of EFI.
Xen's EFI support is also quite troublesome on non-x86. This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
Part of the series for allowing FreeBSD/ARM to run on Xen. On ARM the
function is a trivial pass-through, other architectures need distinct
implementations.
While implementing XEN_VCPUID() as a call to XEN_CPUID_TO_VCPUID()
works, that involves multiple accesses to the PCPU region. As such make
this a distinct macro. Only callers in machine independent code have
been switched.
Add a wrapper for the x86 PIC interface to use matching the old
prototype.
Partially inspired by the work of Julien Grall <julien@xen.org>,
2015-08-01 09:45:06, but XEN_VCPUID() was redone by Elliott Mitchell on
2022-06-13 12:51:57.
Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 08:57:40
Original implementation: Julien Grall <julien@xen.org>, 2014-04-19 14:32:01
Differential Revision: https://reviews.freebsd.org/D29404
Previously the upper layer handle was being set before the last
potential error condition. The reasoning appears to have been it was
assumed invalid in case of an error being returned. Now ensure it is
invalid until just before a successful return.
Fixes: 76acc41fb7 ("Implement vector callback for PVHVM and unify event channel implementations")
Fixes: 6d54cab1fe ("xen: allow to register event channels without handlers")
Reviewed by: royger
n further non-LRO testing I found a case where rack is supposed to be waking up but
it is not now. In this special case it sets the flag rc_ack_can_sendout_data. When that is
set we should not prohibit output.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39565
The bridge treated no vlan tag as being equivalent to vlan ID 1, which
causes confusion if the bridge sees both untagged and vlan 1 tagged
traffic.
Use DOT1Q_VID_NULL when there's no tag, and fix up the lookup code by
using 'DOT1Q_VID_RSVD_IMPL' to mean 'any vlan', rather than vlan 0. Note
that we have to account for userspace expecting to use 0 as meaning 'any
vlan'.
PR: 270559
Suggested by: Zhenlei Huang <zlei@FreeBSD.org>
Reviewed by: philip, zlei
Differential Revision: https://reviews.freebsd.org/D39478
It is shorter and more readable.
No functional change intended.
Reviewed by: kp
Fixes: 2d3614fb13 bridge: Log MAC address port flapping
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39542
Introduce the OpenBSD syntax of "scrub" option for "match" and "pass"
rules and the "set reassemble" flag. The patch is backward-compatible,
pf.conf can be still written in FreeBSD-style.
Obtained from: OpenBSD
MFC after: never
Sponsored by: InnoGames GmbH
Differential Revision: https://reviews.freebsd.org/D38025
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D35074
In going through all the TCP stacks I have found we have a few little bugs and niggles that need to
be cleaned up in tcp_subr.c including the following:
a) Set tcp_restoral_thresh to 450 (45%) not 550. This is a better proven value in my testing.
b) Lets track when we try to do pacing but fail via a counter for connections that do pace.
c) If a switch away from the default stack occurs and it fails we need to make sure the time
scale is in the right mode (just in case the other stack changed it but then failed).
d) Use the TP_RXTCUR() macro when starting the TT_REXMT timer.
e) When we end a default flow lets log that in BBlogs as well as cleanup any t_acktime (disable).
f) When we respond with a RST lets make sure to update the log_end_status properly.
g) When starting a new pcb lets assure that all LRO features are off.
h) When discarding a connection lets make sure that any t_in_pkt's that might be there are freed properly.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39501
Quoting from https://maskray.me/blog/2023-04-12-elf-hash-function:
The System V Application Binary Interface (generic ABI) specifies the
ELF object file format. When producing an output executable or shared
object needing a dynamic symbol table (.dynsym), a linker generates a
.hash section with type SHT_HASH to hold a symbol hash table. A DT_HASH
tag is produced to hold the address of .hash.
The function is supposed to return a value no larger than 0x0fffffff.
Unfortunately, there is a bug. When unsigned long consists of more than
32 bits, the return value may be larger than UINT32_MAX. For instance,
elf_hash((const unsigned char *)"\xff\x0f\x0f\x0f\x0f\x0f\x12") returns
0x100000002, which is clearly unintended, as the function should behave
the same way regardless of whether long represents a 32-bit integer or
a 64-bit integer.
Reviewed by: kib, Fangrui Song
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39517
During compressed ack and mbuf queuing we determine if we need to wake up. A
new function was added that is optional to the tfb so that the stack itself can also
be asked if a wakeup should happen. This helps compensate for late hpts calls.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39502
This fixes a bug where stacks could not be deregistered when
end points in the non-default vnet are using it.
Reviewed by: glebius, zlei
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D39514
Use it when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Use it when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
sysctl variables rx_budget and max_aggregation_size are read-only loader
tunable. Mark them with CTLFLAG_RD flag.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Use it when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Use them when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Use them when possible, instead of separated flags.
No functional change intended.
Reviewed by: hselasky, erj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D39466
Avoid magic numbers when handling the IPv6 flow ID for
DSCP and ECN fields and use the named variable instead.
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D39503
which returns an indicator if the current thread owns the specified
lock.
Reviewed by: jah, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39477
This ensures that *mpp != NULL iff vn_finished_write() should be
called, regardless of the returned error, except for V_NOWAIT.
The only exception that must be maintained is the case where
vn_start_write(V_NOWAIT) is called with the intent of later dropping
other locks and then doing vn_start_write(V_XSLEEP), which needs the mp
value calculated from the non-waitable call above it.
Also note that V_XSLEEP is not supported by vn_start_secondary_write().
Reviewed by: markj, mjg (previous version), rmacklem (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39441
Falling back to the multicast key may cause unicast traffic to leak.
Instead fail when no key is found.
For more information see the 'Framing Frames: Bypassing Wi-Fi Encryption
by Manipulating Transmit Queues' paper.
[ I updated the commit message to reference the paper and the code
comment to record historic behaviour as discussed in private email. ]
Security: CVE-2022-47522
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when
LRO is not available. This is because LRO will hold off waking up the tcp connection to
processing the inbound packets until the pacing timer is up. Without LRO the pacing only
sort-of works. Sometimes we pace correctly, other times not so much.
This set of changes will make it so pacing works properly in the absence of LRO.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D39494
This function is clearly a stub, but it seems better to leave the stub
bits in place than to remove the function entirely.
Differential Revision: https://reviews.freebsd.org/D39355
If all of the mirror's children have the same rotation rate, report
that. But if they have mixed rotation rates, or if any child has an
unknown rotation rate, report "Unknown".
MFC after: 2 weeks
Sponsored by: Axcient
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D39458
This will be used by a forthcoming port of the kinst provider.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39481
if_bridge receives packets via a special interface, if_bridge_input,
rather than by if_input. Thus, netmap's usual hooking of ifnet routines
does not work as expected. Instead, modify bridge_input() to pass
packets directly to netmap when it is enabled. This applies to both
locally delivered packets and forwarded packets.
When a netmap application transmits a packet by writing it to the host
TX ring, the mbuf chain is passed to if_input, which ordinarily points
to ether_input(). However, when transmitting via if_bridge,
bridge_input() needs to see the packet again in order to decide whether
to deliver or forward. Thus, introduce a new protocol flag,
M_BRIDGE_INJECT, which 1) causes the packet to be passed to
bridge_input() again after Ethernet processing, and 2) avoids passing
the packet back to netmap. The source MAC address of the packet is used
to determine the original "receiving" interface.
Reviewed by: vmaffione
MFC after: 2 months
Sponsored by: Zenarmor
Sponsored by: OPNsense
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38066
We already remove mbuf tags from packets transitting an if_epair, but we
didn't remove vlan metadata.
In certain configurations this could lead to unexpected vlan tags
turning up on the rx side.
PR: 270736
Reviewed by: markj
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39482
The assert_vop_locked messages are ignored, and file/line information
is not too useful. Fixing this without changing both witness and VFS
asserts KPIs is not possible.
Reviewed by: markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39464
Use route destination sockaddr when the gateway is eiter AF_LINK or
has the different family (IPv4 over IPv6). This change ensures
the nexthop IFA has the same family as the destination.
Reported by: Dmitriy Smirnov <fox@sage.su>
Tested by: Dmitriy Smirnov <fox@sage.su>
MFC after: 3 days
Bad behaving user-space USB applicatoins may crash the kernel by issuing
USB FS related ioctl(2)'s out of their expected order. By default
the USB FS ioctl(2) interface is only available to the
administrator, root, and driver applications like webcamd(8) needs
to be hijacked in order for this to happen.
The issue is the fast-path code does not always see updates made
by the slow-path code, and may then work on freed memory.
This is easily fixed by using an EPOCH(9) type of synchronization
mechanism. A SX(9) lock will be used as a substitute for EPOCH(9),
due to the need for sleepability. In addition most calls going into
the fast-path originate from a single user-space process and the
need for multi-thread performance is not present.
Differential Revision: https://reviews.freebsd.org/D39373
Reviewed by: markj@
Reported by: C Turt <ecturt@gmail.com>
admbugs: 994
MFC after: 1 week
Sponsored by: NVIDIA Networking
Move code in switch cases into own functions to make later changes easier to track.
No functional change, except for removing a superfluous break statement when
range checking USB_FS_MAX_FRAMES, in the USB_FS_OPEN case.
It should not have been there at all.
Suggested by: emaste@
MFC after: 1 week
Sponsored by: NVIDIA Networking
If either of vnodes is shared locked, lock must not be recursed.
Requested by: rmacklem
Reviewed by: markj, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39444
It is not implemented and causes panics on boot.
This is a temporary measure until someone(tm) sorts it out.
Reported by: many
Sponsored by: Rubicon Communications, LLC ("Netgate")
Although the NFS client does not currently perform Null RPCs,
this fix is needed if/when it might do so.
Found during testing of experimental code that uses Null RPCs
to maintain/monitor TCP connections for "nconnect" mounts.
MFC after: 3 months
Commit f4179ad46f added support for operation bitmaps for
NFSv4.1/4.2. This commit uses those to implement the SP4_MACH_CRED
case for the NFSv4.1/4.2 ExchangeID operation since the Linux
NFSv4.1/4.2 client is now using this for Kerberized mounts.
The Linux Kerberized NFSv4.1/4.2 mounts currently work without
support for this because Linux will fall back to SP4_NONE,
but there is no guarantee this fallback will work forever.
This commit only affects Kerberized NFSv4.1/4.2 mounts from
Linux at this time.
MFC after: 3 months
The socket argument is superfluous, as a tcpcb always has one and
only one socket.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39434
API contract requires VOPs to handle EXDEV internally, worst case by
falling back to the generic copy routine. This broke with the recent
changes.
While here whack custom loop to lock 2 vnodes with vn_lock_pair, which
provides the same functionality internally. write start/finish around
it plays no role so got eliminated.
One difference is that vn_lock_pair always takes an exclusive lock on
both vnodes. I did not patch around it because current code takes an
exclusive lock on the target vnode. zfs supports shared-locking for
writes, so this serializes different calls to the routine as is, despite
range locking inside. At the same time you may notice the source vnode
can get some traffic if only shared-locked, thus once more this goes
the safer route of exclusive-locking. Note this should be patched to
use shared-locking for both once the feature is considered stable.
Technically the switch to vn_lock_pair should be a separate change, but
it would only introduce churn immediately whacked by the rest of the
patch.
[note: technically the review is still in progress, but so is the
active breakage]
Sponsored by: Rubicon Communications, LLC ("Netgate")
MAC flapping occurs when a bridge receives packets with the same source MAC
address on different member interfaces. The common reasons are:
- user roams from one bridge port to another
- user has wrong network setup, bridge loops e.g.
- someone set duplicated ethernet address on his/her nic
- some bad guy / virus / trojan send spoofed packets
if_bridge currently updates the bridge routing entry silently hence it is hard
to diagnose.
Emit logs when MAC address port flapping occurs to make it easier to diagnose.
Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D39375
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack
has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also
there is a t_maxpeak_rate that needs to be removed (its un-used).
Reviewed by: tuexen, cc
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39427