If VOP_ADD_WRITECOUNT() or adv locking failed, so VOP_CLOSE() needs to
be called, we cannot use fp fo_close() when there is no fp. This occurs
when e.g. kernel code directly calls vn_open() instead of the open(2)
syscall.
In this case, VOP_CLOSE() can be called directly, after possible lock
upgrade.
Reported by: nvass@gmx.com
PR: 255119
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29830
- SNDSTAT_LABEL_* are renamed to SNDST_DSPS_*, and SNDSTAT_LABEL_DSPS
becomes SNDST_DSPS.
- Centralize channel number/rate/formats into a single nvlist
The above nvlist is named "info_play" and "info_rec"
- Expose only encoding format in pfmts/rfmts. Userland has no direct
access to AFMT_ENCODING/CHANNEL/EXTCHANNEL macros, thus it serves no
meaning to expose too much information through this pair of labels.
However pminrate/rminrate, pmaxrate/rmaxrate, pfmts/rfmts are
deprecated and will be removed in future.
This commit keeps ioctls ABI compatibility with __FreeBSD_version
1400006 for now. In future the compat ABI with 1400006 will be removed
once audio/virtual_oss is rebuilt.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29770
Currently, PCB caching mechanism relies on the rib generation
counter (rnh_gen) to invalidate cached nhops/LLE entries.
With certain fib algorithms, it is now possible that the
datapath lookup state applies RIB changes with some delay.
In that scenario, PCB cache will invalidate on the RIB change,
but the new lookup may result in the same nexthop being returned.
When fib algo finally gets in sync with the RIB changes, PCB cache
will not receive any notification and will end up caching the stale data.
To fix this, introduce additional counter, rnh_gen_rib, which is used
only when FIB_ALGO is enabled.
This counter is incremented by the control plane. Each time when fib algo
synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value.
Differential Revision: https://reviews.freebsd.org/D29812
Reviewed by: donner
MFC after: 2 weeks
Address multiple issues with strict rtsock message validation.
D28668 "normalisation" approach was based on the assumption that
we always have at least "standard" sockaddr len.
It turned out to be false - certain older applications like quagga
or routed abuse sin[6]_len field and set it to the offset to the
first fully-zero bit in the mask. It is impossible to normalise
such sockaddrs without reallocation.
With that in mind, change the approach to use a distinct memory
buffer for the altered sockaddrs. This allows supporting the older
software while maintaining the guarantee on the "standard" sockaddrs.
PR: 255273,255089
Differential Revision: https://reviews.freebsd.org/D29826
MFC after: 3 days
iwnstats was not compiling because of some issues raised by the clang
compiler due to -Werror. As a tool it is not connected to world build.
Add missing field "barker_mrc" initialization in struct
iwn_sensitivity_limits for -Wmissing-field-initializers, remove unused
pointer *is on iwn_stats_*_print functions and unused variables for
-Wunused-parameter and -Wunused-variable.
The value for field "barker_mrc" of struct iwn2030_sensitivity_limits
was obtained from linux 3.2 wireless/iwlwifi driver code (iwl-2000.c:115
.barker_corr_th_min_mrc = 390).
Also set BINDIR in Makefile to make it possible to install under
/usr/local/sbin/iwnstats as it require super user.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29800
In certain cases, e.g. a SYN-flood from a limited set of hosts,
the TCP hostcache becomes the main contention point. To solve
that, this change introduces lockless lookups on the hostcache.
The cache remains a hash, however buckets are now CK_SLIST. For
updates a bucket mutex is obtained, for read an SMR section is
entered.
Reviewed by: markj, rscheff
Differential revision: https://reviews.freebsd.org/D29729
This is further rework of 08d9c92027. Now we carry the knowledge of
lock type all the way through tcp_input() and also into tcp_twcheck().
Ideally the rlocking for pure SYNs should propagate all the way into
the alternative TCP stacks, but not yet today.
This should close a race when socket is bind(2)-ed but not yet
listen(2)-ed and a SYN-packet arrives racing with listen(2), discovered
recently by pho@.
When a rescue retransmission is successful, rather than
inserting new holes to the left of it, adjust the old
rescue entry to cover the missed sequence space.
Also, as snd_fack may be stale by that point, pull it forward
in order to never create a hole left of snd_una/th_ack.
Finally, with DSACKs, tcp_sack_doack() may be called
with new full ACKs but a DSACK block. Account for this
eventuality properly to keep sacked_bytes >= 0.
MFC after: 3 days
Reviewed By: kbowling, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29835
This change makes the TCP LRO code more generic and flexible with regards
to supporting multiple different TCP encapsulation protocols and in general
lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
to merge TCP packets for the same flow, to reduce the number of calls to upper
layers. This reduces CPU and increases performance, due to being able to send
larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
possible to avoid per-packet interaction by the host CPU.
Because the current TCP LRO code was tightly bound and optimized for TCP/IP
over ethernet only, several larger changes were needed. Also a minor bug was
fixed in the flushing mechanism for inactive entries, where the expire time,
"le->mtime" was not always properly set.
To avoid having to re-run time consuming regression tests for every change,
it was chosen to squash the following list of changes into a single commit:
- Refactor parsing of all address information into the "lro_parser" structure.
This easily allows to reuse parsing code for inner headers.
- Speedup header data comparison. Don't compare field by field, but
instead use an unsigned long array, where the fields get packed.
- Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
recursivly, only applying deltas as the result of updating payload data.
- Make smaller inline functions doing one operation at a time instead of
big functions having repeated code.
- Refactor the TCP ACK compression code to only execute once
per TCP LRO flush. This gives a minor performance improvement and
keeps the code simple.
- Use sbintime() for all time-keeping. This change also fixes flushing
of inactive entries.
- Try to shrink the size of the LRO entry, because it is frequently zeroed.
- Removed unused TCP LRO macros.
- Cleanup unused TCP LRO statistics counters while at it.
- Try to use __predict_true() and predict_false() to optimise CPU branch
predictions.
Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.
Tested by: Netflix
Reviewed by: rrs (transport)
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Extract the state killing code from pfioctl() and rephrase the filtering
conditions for readability.
No functional change intended.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29795
In clang 12.0.0.rc2, going from weak to global is now a hard error:
```
/usr/src/stand/libsa/amd64/_setjmp.S:67:25: error: _longjmp changed binding to STB_GLOBAL
.text; .p2align 4,0x90; .globl _longjmp; .type _longjmp,@function; _longjmp:; .cfi_startproc
```
And the other way is a warning, but we have -Werror:
```
error: __start_set_Xcommand_set changed binding to STB_WEAK [-Werror,-Winline-asm]
error: __stop_set_Xcommand_set changed binding to STB_WEAK [-Werror,-Winline-asm]
```
ref: https://reviews.llvm.org/D90108
Reviewed By: arichardson
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29159
At a recent testing event I found out that I had misinterpreted
RFC5661 where it describes the stripe size in the File Layout's
nfl_util field. This patch fixes the pNFS File Layout server
so that it returns the correct value to the NFSv4.1/4.2 pNFS
enabled client.
This affects almost no one, since pNFS server configurations
are rare and the extant pNFS aware NFS clients seemed to
function correctly despite the erroneous stripe size.
It *might* be needed for correct behaviour if a recent
Linux client mounts a FreeBSD pNFS server configuration
that is using File Layout (non-mirrored configuration).
MFC after: 2 weeks
Add support for current and future client platform PCI IDs. These are
all I219 variants and have no known driver changes versus previous
generation client platform I219 variants.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29801
There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.
The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.
PR: 140647
Reported by: jtl
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29789
We don't need to set the bits here since the if/else if/else statements
fully cover setting these bit pairs.
Reported by: markj
Reviewed by: markj, erj
Approved by: #intel_networking
MFC aftter: 1 week
Differential Revision: https://reviews.freebsd.org/D29827
Only allocate struct_mm after we checked that other threads do not carry
useful mm_struct. If they don't, drop process lock, allocate, and recheck.
Note that for M_NOWAIT allocations we could avoid dropping process lock,
but I do not think that this increased complexity is useful.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Create and use zones for task and mm. Reserve items in zones based on the
estimation of the max number of interrupts in the system. Use M_USE_RESERVE
to allow to take reserved items when allocation occurs from the interrupt
thread context.
Of course, this would only work first time we allocate the task for
interrupt thread. If interrupt is deallocated and allocated anew,
creating a new thread, it might be that zone is depleted. It still
should be good enough for practical uses.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
For anonymous objects, provide a handle kvo_me naming the object,
and report the handle of the backing object. This allows userspace
to deconstruct the shadow chain. Right now the handle is the address
of the object in KVA, but this is not guaranteed.
For the same anonymous objects, report the swap space used for actually
swapped out pages, in kvo_swapped field. I do not believe that it is
useful to report full 64bit counter there, so only uint32_t value is
returned, clamped to the max.
For kinfo_vmentry, report anonymous object handle backing the entry,
so that the shadow chain for the specific mapping can be deconstructed.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29771
In particular, this avoids malloc(9) calls when from early tunable handling,
with no working malloc yet.
Reported and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Usually rule counters are reset to zero on every update of the ruleset.
With keepcounters set pf will attempt to find matching rules between old
and new rulesets and preserve the rule counters.
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29780
PFRULE_REFS should never be used by userspace, so hide it behind #ifdef
_KERNEL.
MFC after: never
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29779
Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a
kernel-internal flag and should not be exposed to or read from
userspace.
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29778
IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers.
Incoming BPDU messages should be checked against the permitted ranges.
The rest of 17.14 appears to be enforced already.
PR: 254924
Reviewed by: kp, donner
Differential Revision: https://reviews.freebsd.org/D29782
This is required for the current Arch Linux binaries to work.
PR: 254112
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29218
- Use malloc(9) to allocate ivhd_hdrs list. The previous assumption
that there are at most 10 IVHDs in a system is not true. A counter
example would be a system with 4 IOMMUs, and each IOMMU is related
to IVHDs type 10h, 11h and 40h in the ACPI IVRS table.
- Always scan through the whole ivhd_hdrs list to find IVHDs that has
the same DeviceId but less prioritized IVHD type.
Sponsored by: The FreeBSD Foundation
MFC with: 74ada297e8
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D29525
The NAV (network allocation vector) register reflects the current MAC
tracking of NAV - when it will stay quiet before transmitting.
Other devices transmit their frame durations in their 802.11 PHY headers
and all devices that hear a frame - even if it's one in an encoding
they don't understand - will understand the low bitrate PHY header that
includes the frame duration. So, they'll set NAV to this value so
they'll stay quiet until the transmit completes.
Anyway, sometimes the PHY NAV header is garbled and sometimes, notably
older broadcom devices, will fake a long NAV so they can get "cleaner" air
for local calibration. When this happens, the hardware will stay quiet
for quite some time and this can lead to missed/stuck beacons, or
(for Very Large Values) a MAC hang.
This code just adds the ability to get/set the NAV; the driver will
need to take care of using it during transmit hangs and beacon misses
to see if it's due to a trash looking NAV.
Fix a few 'if(' to be 'if (' in a few places, per style(9) and
overwhelming usage in the rest of the kernel / tree.
MFC After: 3 days
Sponsored by: Netflix
We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.
MFC After: 3 days
Sponsored by: Netflix
The 'ticket' and 'my_ticket' arguments are both read and written within
the same asm block. Clang is stricter with the constraints than gcc4
was, so accepts the '=r' at face value and will happily overwrite
registers that "should" be preserved.
Mark these operands to not clobber other operands, so they get their own
registers.
This fixes a panic on bringing up the octe interfaces.
Fib algo uses a per-family array indexed by the fibnum to store
lookup function pointers and per-fib data.
Each algorithm rebuild currently requires re-allocating this array
to support atomic change of two pointers.
As in reality most of the changes actually involve changing only
data pointer, add a shortcut performing in-flight pointer update.
MFC after: 2 weeks
Some algorithms may require updating datapath and control plane
algo pointers after the (batched) updates.
Export fib_set_datapath_ptr() to allow setting the new datapath
function or data pointer from the algo.
Add fib_set_algo_ptr() to allow updating algo control plane
pointer from the algo.
Add fib_epoch_call() epoch(9) wrapper to simplify freeing old
datapath state.
Reviewed by: zec
Differential Revision: https://reviews.freebsd.org/D29799
MFC after: 1 week
Adding support for TCP over UDP allows communication with
TCP stacks which can be implemented in userspace without
requiring special priviledges or specific support by the OS.
This is joint work with rrs.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29469
We must make sure that incoming packets will never overflow the netmap
buffers, even when the user is using the offset feature. In the typical
scenario, the netmap buffer is 2KiB and, with an MTU of 1500, there are
~500 bytes available for user offsets.
Unfortunately, some NICs accept incoming packets even when they are
larger then the MTU. This means that the only way to stop DMA from
overflowing the netmap buffers, when offsets are allowed, is to choose
a hardware buffer length which is smaller than the netmap buffer
length. For most NICs and for 2KiB netmap buffers, this means 1024
bytes, which is unconveniently small.
The current code will select the small hardware buf size even when
offsets are not in use. The main purpose of this change is to
fix this bug by returning to the normal behavior for the no-offsets
case.
At the same time, the patch pushes the handling of the offset case
to the lower level driver code, so that it can be made NIC-specific
(in future patches).
driver_t was supposed to just be a quick hack for 4.x
compatibility. However, it's been documented now as the preferred API
rather than the replacement kobj_class_t. Drop the note about 4.x since
it's clear we're a bit late to retiring its use through the tree with
almost 1500 references to driver_t.
Sponsored by: Netflix
Maintain code similarity between RACK and base stack
for ECN. This may not strictly be necessary, depending
when a state transition to FIN_WAIT_1 is done in RACK
after a shutdown() or close() syscall.
MFC after: 3 days
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29658
First, two of those four checks are unreachable.
Second, I don't believe there should be ">=" instead of ">".
Third, bus_dma(9) already returns the same EFBIG if ">".
This fixes false I/O errors in worst S/G cases with maxphys >= 2MB.
MFC after: 1 week
This is the April update to vendor/wpa committed upstream
2021/04/07.
This is MFV efec822389.
Suggested by: philip
Reviewed by: philip
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D29744
Explicitly disable ring synchronization before calling
callbacks that may result in a hardware reset.
Before this patch we relied on capturing the down/up events which,
however, may not be issued by all drivers.