While refactoring an earlier series of changes during review, the
'saved_data' variable stopped being used at the bottom of if_ioctl().
Suggested by: brooks
Reviewed by: brooks, imp, kib
Fixes: d17e0940f7 Rework compat shims in ifioctl().
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D30197
The hotplug script will be executed only once for each backend,
regardless of the frontend triggering reconnections. Fix blkback to
deal with the hotplug script being executed only once, so that
reconnections don't stall waiting for a hotplug script execution
that will never happen.
As a result of the fix move the initialization of dev_mode, dev_type
and dev_name to the watch callback, as they should be set only once
the first time the backend connects.
This fix is specially relevant for guests wanting to use UEFI OVMF
firmware, because OVMF will use Xen PV block devices and disconnect
afterwards, thus allowing them to be used by the guest OS. Without
this change the guest OS will stall waiting for the block backed to
attach.
Fixes: de0bad0001 ('blkback: add support for hotplug scripts')
MFC after: 1 week
Sponsored by: Citrix Systems R&D
Rack now after the previous commit is very careful to translate any
value in the hostcache for srtt/rttvar into its proper format. However
there is a snafu here in that if tp->srtt is 0 is the only time that
the HC will actually restore the srtt. We need to then only convert
the srtt restored when it is actually restored. We do this by making
sure it was zero before the call to cc_conn_init and it is non-zero
afterwards.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30213
Looping back router multicast traffic signifficantly
stresses network stack. Add possibility to disable or enable
loopbacked based on sysctl value.
Reported by: Daniel Deville
Reviewed by: mw
Differential Revision: https://reviews.freebsd.org/D29947
There is a race condition between V_ip_mrouter de-init
and ip_mforward handling. It might happen that mrouted
is cleaned up after V_ip_mrouter check and before
processing packet in ip_mforward.
Use epoch call aproach, similar to IPSec which also handles
such case.
Reported by: Damien Deville
Obtained from: Stormshield
Reviewed by: mw
Differential Revision: https://reviews.freebsd.org/D29946
There are two module declarations in the nfscl.ko module for "nfscl"
and "nfs". Both of these declarations had MODULE_DEPEND() calls.
This patch deletes the MODULE_DEPEND() calls for "nfs" to avoid
confusion with respect to what modules this module is dependent upon.
The patch also adds comments explaining why there are two module
declarations within the module.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30102
vn_fullpath_any_smr() will return a positive error number if the
caller-supplied buffer isn't big enough. In this case the error must be
propagated up, otherwise we may copy out uninitialized bytes.
Reported by: syzkaller+KMSAN
Reviewed by: mjg, kib
MFC aftr: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30198
It reopens the passed file descriptor, checking the file backing vnode'
current access rights against open mode. In particular, this flag allows
to convert file descriptor opened with O_PATH, into operable file
descriptor, assuming permissions allow that.
Reviewed by: markj
Tested by: Andrew Walker <awalker@ixsystems.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30148
Recover from excessive losses without reverting to a
retransmission timeout (RTO). Disabled by default, enable
with sysctl net.inet.tcp.do_lrd=1
Reviewed By: #transport, rrs, tuexen, #manpages
Sponsored by: Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D28931
The hostcache up to now as been updated in the discard callback
but without checking if we are all done (the race where there are
more than one calls and the counter has not yet reached zero). This
means that when the race occurs, we end up calling the hc_upate
more than once. Also alternate stacks can keep there srtt/rttvar
in different formats (example rack keeps its values in microseconds).
Since we call the hc_update *before* the stack fini() then the
values will be in the wrong format.
Rack on the other hand, needs to convert items pulled from the
hostcache into its internal format else it may end up with
very much incorrect values from the hostcache. In the process
lets commonize the update mechanism for srtt/rttvar since we
now have more than one place that needs to call it.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30172
Distinguish between truly invalid requests and those that fail because
we've already joined the group. Both cases fail, but differentiating
them allows userspace to make more informed decisions about what the
error means.
For example. radvd tries to join the all-routers group on every SIGHUP.
This fails, because it's already joined it, but this failure should be
ignored (rather than treated as a sign that the interface's multicast is
broken).
This puts us in line with OpenBSD, NetBSD and Linux.
Reviewed by: donner
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30111
On rk3399 the VOP-little node has a single 'port' property (not a
collection of 'ports' or indexed ports).
Reviewed by: manu
Sponsored by: UKRI
Differential Revision: https://reviews.freebsd.org/D30165
There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156
platforms that for whatever reason cannot include the RATELIMIT option
can still work with rack. It adds two dummy functions that rack will
call and find out that the highest hw supported b/w is 0 (which
kinda makes sense and rack is already prepared to handle).
Reviewed by: Michael Tuexen, Warner Losh
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30163
IF non-existend gateway was specified, the code responsible for calculating
an updated nexthop group, returned the same already-used nexthop group.
After the route table update, the operation result contained the same
old & new nexthop groups. Thus, the code responsible for decomposing
the notification to the list of simple nexthop-level notifications,
was not able to find any differences. As a result, it hasn't updated any
of the "simple" notification fields, resulting in empty rtentry pointer.
This empty pointer was the direct reason of a panic.
Fix the problem by returning ESRCH when the new nexthop group is the same
as the old one after applying gateway filter.
Reported by: Michael <michael.adm at gmail.com>
PR: 255665
MFC after: 3 days
This allows us to kill states created from a rule with route-to/reply-to
set. This is particularly useful in multi-wan setups, where one of the
WAN links goes down.
Submitted by: Steven Brown
Obtained from: https://github.com/pfsense/FreeBSD-src/pull/11/
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30058
Introduce an nvlist based alternative to DIOCKILLSTATES.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30054
div_output_outbound() and div_output_inbound() relied on the caller to
free the mbuf if an error occurred. However, this is contrary to the
semantics of their callees, ip_output(), ip6_output() and
netisr_queue_src(), which always consume the mbuf. So, if one of these
functions returned an error, that would get propagated up to
div_output(), resulting in a double free.
Fix the problem by making div_output_outbound() and div_output_inbound()
responsible for freeing the mbuf in all cases.
Reported by: Michael Schmiedgen <schmiedgen@gmx.net>
Tested by: Michael Schmiedgen
Reviewed by: donner
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30129
When unwinding the stack, we may encounter a stack frame in a poisoned
region of the stack, triggering a false positive.
Reviewed by: andrew, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30126
KASAN does not insert redzones around global variables and so is not
susceptible to the problem that led to us disabling ASAN for linker set
elements in the first place (see commit fe3d8086fb).
Reviewed by: andrew, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30126
issues.
A) Not enough hdrlen was being calculated when a UDP tunnel is
in place.
and
B) Not enough memory is allocated in racks fsb. We need to
overbook the fsb to include a udphdr just in case.
Submitted by: Peter Lei
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D30157
This is OBJT_SWAP pager, specialized for tmpfs. Right now, both swap pager
and generic vm code have to explicitly handle swap objects which are tmpfs
vnode v_object, in the special ways. Replace (almost) all such places with
proper methods.
Since VM still needs a notion of the 'swap object', regardless of its
use, add yet another type-classification flag OBJ_SWAP. Set it in
vm_object_allocate() where other type-class flags are set.
This change almost completely eliminates the knowledge of tmpfs from VM,
and opens a way to make OBJT_SWAP_TMPFS loadable from tmpfs.ko.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
Put each type into dedicated line, which makes addition of new
types cleaner.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
Allow vp_heldp argument to be NULL, in which case the returned vnode
is not held for tmpfs swap objects.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
Makes the code in vm_object collapse/page_remove cleaner
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
This eliminates the staircase of conditions in vm_map_entry_set_vnode_text().
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
specialized for swap and vnode pagers, and used to implement
vm_object_set_writeable_dirty().
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
Fill lines with the function definitions.
Use local var to shorten repeated extra-long expressions.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30070
It is needed to invalidate cache in case of inode space removal
to avoid situation, when extents cache returns not exist extent.
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29931
It is possible to walk thru inode extents if EXT2FS_PRINT_EXTENTS
macro is defined. The extents headers magics and physical blocks
ranges are checked during extents walk.
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29932
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Reviewed by: whu, imp
MFC after: 2 weeks
Sponsored by: CyberSecure
Differential Revision: https://reviews.freebsd.org/D30124
The dev field is placed into the inode structure.
The major/minor numbers conversion to/from linux compatile
format happen during on-disk inodes writing/reading.
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29930
The birthtime field of struct vattr does not checked
for VNOVAL in case of ext2_setattr() and produce incorrect
inode birthtime values.
Found using pjdfstest:
pjdfstest/tests/utimensat/03.t
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29929
Virtio modern has the common data organized in little endian, but
on powerpc64 BE it was reading and writing in the wrong endian.
Submitted by: Leonardo Bianconi <leonardo.bianconi@eldorado.org.br>
Reviewed by: bryanv, alfredo
Sponsored by: Eldorado Research Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28947
Only LS1046A and LS1028A require the base clk to be divided by 2.
Implement that by moving the divider to a SoC specific data.
This commit fixes base clk setup for the entire SoC family,
including the already suported LS2160A.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30120
The new driver provides probe and attach functions for the NXP LS1028A
clockgen and passes configuration information to QorIQ clockgen class.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30125
When _ISOC11_SOURCES is defined for glibc at the same time
__POSIX_C_SOURCE is defined, it extends the __POSIX_C_SOURCE definition
by exaclty what C11 adds to the spec for each system header. We follow
both OpenBSD's and glibc's convention by also C11 or higher compliation
mode is selected.
The Open Group is working on issuing a new version of the POSIX standard
that will realign the standard from C99 to a newer version of C. This
commit is a stop-gap measure for greater compatibility until that
environment has been standardized.
Reviewed by: brooks@, arichards@, Olivier Certne
(comments tweaked before commit)
PR: 255290
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29902
Attaching and detaching devices can be heavy-weight and detaching can
sleep waiting for events. For that reason using the system-wide
single-threaded taskqueue_thread is not really appropriate.
There is even a possibility for a deadlock if taskqueue_thread is used
for detaching.
In fact, there is an easy to reproduce deadlock involving nvme, pass
and a sudden removal of an NVMe device.
A pass peripheral would not release a reference on an nvme sim until
pass_shutdown_kqueue() is executed via taskqueue_thread. But the
taskqueue's thread is blocked in nvme_detach() -> ... -> cam_sim_free()
because of the outstanding reference.
MFC after: 10 days
Sponsored by: CyberSecure
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D30144
zfsd uses a device's physical path attribute to automatically replace a
missing ZFS disk when a blank disk is inserted into the same physical
slot. Currently gmultipath passes through its underlying providers'
physical path attribute. That may cause zfsd to replace a missing
gmultipath provider with a newly arrived, single-path disk. That would
be bad.
This commit fixes that problem by simply appending "/mp" to the
underlying providers' physical path, in a manner similar to what geli
already does.
Sponsored by: Axcient
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D29941
This fixes several breakages (panics) since the tcp_lro code was
committed that have been reported. Quite a few new features are
now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the
largest). There is also support for ack-war prevention. Documents
comming soon on rack..
Sponsored by: Netflix
Reviewed by: rscheff, mtuexen
Differential Revision: https://reviews.freebsd.org/D30036
Summary:
Some methods are split between DMAP and non-DMAP, conditional on
hw_direct_map variable. Rather than checking this variable every time,
use it to install different functions via IFUNCs.
Reviewed By: luporl
Differential Revision: https://reviews.freebsd.org/D30071
When verifying, byte-by-byte, that the user-supplied counters are
zero-filled, sysctl_igmp_stat() would check for zero before checking the
loop bound. Perform the checks in the correct order.
Reported by: KASAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA. This value is "alloc". Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map. Do so using the correct size.
Reported by: kp
Tested by: kp
Sponsored by: The FreeBSD Foundation
Centralize logic for handling compat ioctls into two blocks of code at
the start and end of the ioctl routine. This avoids the conversion
logic being spread out both in multiple blocks in ifioctl as well as
various helper functions.
Reviewed by: brooks, kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29891
In FreeBSD, the current time is computed from uptime + boottime. Uptime
is a continuous, smooth function that's monotonically increasing. To
effect changes to the current time, boottime is adjusted. boottime is
mutable and shouldn't be cached against future need. Document the
current implementation, with the caveat that we may stop stepping
boottime on resume in the future and will step uptime instead (noted in
the commit message, but not in the code).
Sponsored by: Netflix
Reviewed by: phk, rpokala
Differential Revision: https://reviews.freebsd.org/D30116
Add description for what each of the parameters are to the cam_sim_alloc
call. Add some additional context for the mtx and queue parameters to
explain what special values passed in mean.
MFC After: 3 days
Reviewed by: mav@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30115
DRIVER_OK status is set after device_attach() succeeds. For now postpone
disk_create to attach_completed() method.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D30049
Remove all the 'entry' and 'return' probes; they clutter up the source
and are redundant to FBT.
Reviewed By: dchagin
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30040
DXR maintains compressed lookup structures with a trivial search
procedure. A two-stage trie is indexed by the more significant bits of
the search key (IPv4 address), while the remaining bits are used for
finding the next hop in a sorted array. The tradeoff between memory
footprint and search speed depends on the split between the trie and
the remaining binary search. The default of 20 bits of the key being
used for trie indexing yields good performance (see below) with
footprints of around 2.5 Bytes per prefix with current BGP snapshots.
Rebuilding lookup structures takes some time, which is compensated for by
batching several RIB change requests into a single FIB update, i.e. FIB
synchronization with the RIB may be delayed for a fraction of a second.
RIB to FIB synchronization, next-hop table housekeeping, and lockless
lookup capability is provided by the FIB_ALGO infrastructure.
DXR works well on modern CPUs with several MBytes of caches, especially
in VMs, where is outperforms other currently available IPv4 FIB
algorithms by a large margin.
Synthetic single-thread LPM throughput test method:
kldload test_lookup; kldload dpdk_lpm4; kldload fib_dxr
sysctl net.route.test.run_lps_rnd=N
sysctl net.route.test.run_lps_seq=N
where N is the number of randomly generated keys (IPv4 addresses) which
should be chosen so that each test iteration runs for several seconds.
Each reported score represents the best of three runs, in million
lookups per second (MLPS), for two bechmarks (RND & SEQ) with two FIBs:
host: single interface address, local subnet route + default route
BGP: snapshot from linx.routeviews.org, 887957 prefixes, 496 next hops
Bhyve VM on an Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 40.6 20.2 N/A N/A
radix4 7.8 3.8 1.2 0.6
radix4_lockless 18.0 9.0 1.6 0.8
dpdk_lpm4 14.4 5.0 14.6 5.0
dxr 70.3 34.7 43.0 19.5
Intel(R) Core(TM) i5-5300U CPU @ 2.30 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 47.0 23.1 N/A N/A
radix4 8.5 4.2 1.9 1.0
radix4_lockless 19.2 9.5 2.5 1.2
dpdk_lpm4 31.2 9.4 31.6 9.3
dxr 84.9 41.4 51.7 23.6
Intel(R) Core(TM) i7-4771 CPU @ 3.50 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 59.5 29.4 N/A N/A
radix4 10.8 5.5 2.5 1.3
radix4_lockless 24.7 12.0 3.1 1.6
dpdk_lpm4 29.1 9.0 30.2 9.1
dxr 101.3 49.9 69.8 32.5
AMD Ryzen 7 3700X 8-Core Processor @ 3.60 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 70.8 35.4 N/A N/A
radix4 14.4 7.2 2.8 1.4
radix4_lockless 30.2 15.1 3.7 1.8
dpdk_lpm4 29.9 9.0 30.0 8.9
dxr 163.3 81.5 99.5 44.4
AMD Ryzen 5 5600X 6-Core Processor @ 3.70 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 93.6 46.7 N/A N/A
radix4 18.9 9.3 4.3 2.1
radix4_lockless 37.2 18.6 5.3 2.7
dpdk_lpm4 51.8 15.1 51.6 14.9
dxr 218.2 103.3 114.0 49.0
Reviewed by: melifaro
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29821
Add a LPS benchmark variant which introduces artificial dependencies
between successive lookups. While here, instead of writing the results
from the lookups to a huge array, add them to an accumulator, in a more
lightweight attempt at preventing the CPU's OOO machinery from
discarding the lookup results if they would be completely unused.
net.route.test.run_lps_rnd measures LPS throughput with independent
uniformly random keys
net.route.test.run_lps_seq measures LPS throughput with uniformly
random keys with artificial interdependencies
Reviewed by: melifaro
MFC after: 7 days
Differential Revision: https://reviews.freebsd.org/D30096
Document what __FreeBSD_version means a bit better by documenting the
sorts of events it should be bumped for. Also include a handy shorthand
for what it means. Add a some advice for how frequently to change this
as well.
Added a note about the approved way to parse this from the param.h file,
though that was not in the review. All in-tree users have been updated
to this method prior to this commit. Move and reword the comment that
was on the same line.
Suggestions by: greg@unrelenting, arch@
Reviewed by: rgrimes@ (earlier version).
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29850
The _ext event notification includes the address being added/removed and
that gives the driver an easy way to ignore non-IPv6 addresses. Remove
'tom' from the handler's name while here, it was moved out of t4_tom a
long time ago.
MFC after: 1 week
Sponsored by: Chelsio Communications
Add a new control message to move ethernet addresses to a given link
in ng_bridge(4). Send this message instead of doing the work directly.
This decouples the read-only activity from the modification under a
more strict writer lock.
Decoupling the work is a prerequisite for multithreaded operation.
Approved by: manpages (bcr), kp (earlier version)
MFC: 3 weeks
Differential Revision: https://reviews.freebsd.org/D28516
This fixes strace(1) erroneously reporting return values
as "Function not implemented", combined with reporting the binary
ABI as X32.
Very similar code in linux_ptrace_getregs() is left as it is - it's
probably wrong too, but I don't have a way to test it.
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29927
When loading attributes from the cache, the NFS client is careful to
copy only the fields that it initialized. After fetching attributes
from the server, however, it would copy the entire vattr structure
initialized from the RPC response, so uninitialized stack bytes would
end up being copied to userspace. In particular, va_birthtime (v2 and
v3) and va_gen (v3) had this problem.
Use a common subroutine to copy fields provided by the NFS client, and
ensure that we provide a dummy va_gen for the v3 case.
Reviewed by: rmacklem
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30090
AIM (adaptive interrupt moderation) was part of BSD11 driver. Upon IFLIB
migration, AIM feature got lost. Re-introducing AIM back into IFLIB
based IXGBE driver.
One caveat is that in BSD11 driver, a queue comprises both Rx and Tx
ring. Starting from BSD12, Rx and Tx have their own queues and rings.
Also, IRQ is now only configured for Rx side. So, when AIM is
re-enabled, we should now consider only Rx stats for configuring EITR
register in contrast to BSD11 where Rx and Tx stats were considered to
manipulate EITR register.
Reviewed by: gallatin, markj
Sponsored by: NetApp, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27344
Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.
Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519
These should never get values large enough for sign to matter, but one
of them becoming negative could cause problems.
MFC after: 1 week
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D29327
devvn_refthread() will initialize *devp only if it succeeds, so check for
success before comparing with fp->f_data. Other devvn_refthread()
callers are careful to do this.
Reported by: KMSAN
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30068
Some filesystems, e.g., pseudofs and the NFSv3 client, do not provide
one.
Reviewed by: kib
Reported by: KMSAN
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30091
Otherwise, if !smp_started is true, then smp_rendezvous_cpus_done() will
harmlessly perform an atomic RMW on an uninitialized variable.
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
User-supplied data might make this loop too time-consuming. Divide
directly, and handle both the possibility that we were woken up earlier,
and arithmetic overflows/underflows from the calculation.
Reported and tested by: pho (previous version)
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30069
It writes the core of live stopped process to the file descriptor
provided as an argument.
Based on the initial version from https://reviews.freebsd.org/D29691,
submitted by Michał Górny <mgorny@gentoo.org>.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
This way threads in ptracestop can be discovered by debugger
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
Set a new P2_PTRACEREQ flag around the request Wait for the target .
process P2_PTRACEREQ flag to clear before setting ours .
Otherwise, we rely on the moment that the process lock is not dropped
until the stopped target state is important. This is going to be no
longer true after some future change.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
It unsuspends single suspended thread, passed as the argument.
It is up to the caller to arrange the target thread to suspend later,
since the state of the process is not changed from stopped. In particular,
the unsuspended thread must not leave to userspace, since boundary code
is not prepared to this situation.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
The helper removes the thread from a sleep queue, assuming that it would
need to sleep. The sleepq_remove_nested() function is intended for quite
special case, where suspended thread from traced stopped process is
temporary unsuspended to do some work on behalf of the debugger in the
target context, and this work might require sleep.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
- SVC_ALL request dumping all map entries, including those marked as
non-dumpable
- SVC_NOCOMPRESS disallows compressing the dump regardless of the sysctl
policy
- SVC_PC_COREDUMP is provided for future use by userspace core dump
request
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29955
From my understanding this could happen with iSCSI LUNs with
unusually long names. The bug would make CAM fail to retrieve
the full inquiry data. Instead of bumping the size of the local
variable, just use a macro.
Reviewed By: imp, mav
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
X-NetApp-PR: #50
Differential Revision: https://reviews.freebsd.org/D29991
Add PCI IDs for Intel Apollo Lake Series HSUARTs:
# pciconf -ll
drv selector class rev hdr vendor device subven subdev
uart0@pci0:0:24:0: 118000 0b 00 8086 5abc 8086 7270
uart1@pci0:0:24:1: 118000 0b 00 8086 5abe 8086 7270
uart2@pci0:0:24:2: 118000 0b 00 8086 5ac0 8086 7270
uart3@pci0:0:24:3: 118000 0b 00 8086 5aee 8086 7270
NB (Intel Document Number 336256-004US):
1. The E3900 and A3900 Series Processors support four LPSS_UART ports,
while the N- and J- Series Processors support only LPSS_UART [2:1]
ports.
2. The LPSS_UART1 port is dedicated for discrete Global Navigation
Satellite System (GNSS). This port can be used for generic UART
functionality if GNSS is not used.
3. The LPSS_UART2 port is dedicated for host OS debug.
4. The LPSS_UART0 and LPSS_UART3 ports are for generic UART functionality.
5. Only UART [1:0] ports support DMA.
PR: 255556
Submitted by: Jose Luis Duran <jlduran@gmail.com>
MFC after: 1 week
Fix what appears to have been a small copy/paste typo in ifconfig(8)'s
documentation (man page and header file).
Not that it matters anymore.
Reference: Table I-2 in IEEE Std 802.1Q-2014.
PR: 255557
Submitted by: Jose Luis Duran <jlduran@gmail.com>
MFC after: 1 week
When estimating working set size, measure only allocation batches, not free
batches. Allocation and free patterns can be very different. For example,
ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call,
but it does not mean it will request the same amount back that fast too, in
fact it won't.
Update working set size on every reclamation call, shrinking caches faster
under pressure. Lack of this caused repeating vm_lowmem events squeezing
more and more memory out of real consumers only to make it stuck in UMA
caches. I saw ZFS drop ARC size in half before previous algorithm after
periodic WSS update decided to reclaim UMA caches.
Introduce voluntary reclamation of UMA caches not used for a long time. For
each zdom track longterm minimal cache size watermark, freeing some unused
items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed
memory can get better use by other consumers. For example, ZFS won't grow
its ARC unless it see free memory, since it does not know it is not really
used. And even if memory is not really needed, periodic free during
inactivity periods should reduce its fragmentation.
Reviewed by: markj, jeff (previous version)
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D29790
When processing INIT and INIT-ACK information, also during
COOKIE processing, delete the current association, when it
would end up in an inconsistent state.
MFC after: 3 days
PR#255523 reported that a file copy for a file with a large hole
to EOF on ZFS ran slowly over NFSv4.2.
The problem was that vn_generic_copy_file_range() would
loop around reading the hole's data and then see it is all
0s. It was coded this way since UFS always allocates a data
block near the end of the file, such that a hole to EOF never exists.
This patch modifies vn_generic_copy_file_range() to check for a
ENXIO returned from VOP_IOCTL(..FIOSEEKDATA..) and handle that
case as a hole to EOF. asomers@ confirms that it works for his
ZFS test case.
PR: 255523
Tested by: asomers
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30076
Not all interrupt controllers enable IPIs by default as the Arm
GIC specs make it an implementation defined option. As at least two
hypervisors have also previously masked the IPIs on boot.
As we already enable these IPIs on the non-boot CPUs it is expected
this is a safe operation.
Differential Revision: https://reviews.freebsd.org/D26975
This will allow us to allocate an unmapped memory resource, then
later map it with a specific memory attribute.
This is also needed for virtio with the modern PCI attachment.
Reviewed by: kib (via D29723)
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29694
It is defined as a uint64_t in the UEFI spec. As it's not used as a
pointer by the kernel follow this and define it as the same in the
kernel.
Reviewed by: kib, manu, imp
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D29759
On arm64 we currently use a non-posted write for device memory, however
we should move to use posted writes. This is expected to work on most
hardware, however we will need to support a non-posted option for some
broken hardware.
Reviewed by: imp, manu, bcr (manpage)
Differential Revision: https://reviews.freebsd.org/D29722
Summary:
Since PCPU can live in a GPR for a while longer, let it, rather than
re-getting it in yet another register. MFSPR is an expensive operation,
12 clock latency on POWER9, so the fewer operations we need, the better.
Since the check is tightly coupled to the fetch, by reducing the number
of fetch+check, we reduce the stalls, and improve the performance
marginally. Buildworld was measured at a ~5-7% improvement on a single
run.
Reviewed By: nwhitehorn
Differential Revision: https://reviews.freebsd.org/D30003
It has to be zeroed before committing it to device.
We do that by allocating it with M_ZERO, but there was no
memory barrier or cache flush to ensure its sees it zeroed.
This fixes MSIX on LS1028A SoC.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30033
When sending an IPI, if a previous IPI is still pending delivery,
native_lapic_ipi_vectored() waits for the previous IPI to be sent.
We've seen a few inexplicable panics with the current timeout of 50 ms.
Increase the timeout to 1 second and make it tunable.
No hardware specification mentions a timeout in this case; I checked
the Intel SDM, Intel MP spec, and Intel x2APIC spec. Linux and illumos
wait forever. In Linux, see __default_send_IPI_shortcut() in
arch/x86/kernel/apic/ipi.c. In illumos, see apic_send_ipi() in
usr/src/uts/i86pc/io/pcplusmp/apic_common.c. However, misbehaving hardware
could hang the system if we wait forever.
Reviewed by: mav kib
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D29942
Its use is for cases where some filler is needed for cmd, or we need an
indication that there were no cmd supplied, and so on.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29935
Filter on fifos is real filter for the object, and not a filesystem
events filter like EVFILT_VNODE.
Reported by: markj using syzkaller
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
A testing on the real hardware uncovered an issue, and since I do not have
access to the machine, disable until the bug can be fixed.
Reported by: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
When setting up trampoline mapping for LA57 switcher, it is possible
that TLB still has some random mapping at that address.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Drivers can specify padding of received frames with iri_pad field.
This can be used to enforce ip alignment by hardware.
Iflib ignored that padding when processing small frames,
which rendered this feature inoperable.
I found it while writing a driver for a NIC that can ip align
received packets. Note that this doesn't change behavior of existing
drivers as they all set iri_pad to 0.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: gallatin
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30009
If we reassemble a packet we modify the IP header (to set the length and
remove the fragment offset information), but we failed to update the
checksum. On certain setups (mostly where we did not re-fragment again
afterwards) this could lead to us sending out packets with incorrect
checksums.
PR: 255432
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30026
The original implementation only supports getting the address from legacy
BIOS (by searching for the SMBIOS_SIG pattern in a fixed address space).
Try to get the SMBIOS table from EFI through efirt (EFI Runtime Services)
firstly. Continue to search in the legacy BIOS if a NULL address is
returned from EFI.
By this way the ipmi function supports both legacy BIOS and UEFI systems.
Reviewed by: dab, vangyzen
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D30007
There is no need to panic in if_transmit if the checksums requested are
inconsistent with the frame being transmitted. This typically indicates
that the kernel and driver were built with different INET/INET6 options,
or there is some other kernel bug. The driver should just throw away
the requests that it doesn't understand and move on.
MFC after: 1 week
Sponsored by: Chelsio Communications
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts.
Approved by: erj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29876
pipe_poll() may add the calling thread to the selinfo lists of both ends
of a pipe. It is ok to do this for the local end, since we know we hold
a reference on the file and so the local end is not closed. It is not
ok to do this for the remote end, which may already be closed and have
called seldrain(). In this scenario, when the polling thread wakes up,
it may end up referencing a freed selinfo.
Guard the selrecord() call appropriately.
Reviewed by: kib
Reported by: syzkaller+KASAN
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30016
Teach poll(2) to support Linux-style POLLRDHUP events for sockets, if
requested. Triggered when the remote peer shuts down writing or closes
its end.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29757
Provide wrapper for the rnh_walktree_from() rib callback.
As currently `struct rib_head` is considered internal to the
routing subsystem, this wrapper is necessary to maintain isolation
from the external code.
Differential Revision: https://reviews.freebsd.org/D29971
MFC after: 1 week
Add suspend/resume callbacks to the driver and a live reset built around
them. This commit covers the basic NIC and future commits will expand
this functionality to other stateful parts of the chip. Suspend and
resume operate on the chip (the t?nex nexus device) and affect all its
ports. It is not possible to suspend/resume or reset individual ports.
All these operations can be performed on a running NIC. A reset will
look like a link bounce to the networking stack.
Here are some ways to exercise this functionality:
/* Manual suspend and resume. */
# devctl suspend t6nex0
# devctl resume t6nex0
/* Manual reset. */
# devctl reset t6nex0
/* Manual reset with driver sysctl. */
# sysctl dev.t6nex.0.reset=1
/* Automatic adapter reset on any fatal error. */
# hw.cxgbe.reset_on_fatal_err=1
Suspend disables the adapter (DMA, interrupts, and the port PHYs) and
marks the hardware as unavailable to the driver. All ifnets associated
with the adapter are still visible to the kernel but operations that
require hardware interaction will fail with ENXIO. All ifnets report
link-down while the adapter is suspended.
Resume will reattach to the card, reconfigure it as before, and recreate
the queues servicing the existing ifnets. The ifnets are able to send
and receive traffic as soon as the link comes back up.
Reset is roughly the same as a suspend and a resume with at least one of
these events in between: D0->D3Hot->D0, FLR, PCIe link retrain.
MFC after: 1 month
Relnotes: yes
Sponsored by: Chelsio Communications
Commit aad780464f added a function called nfscl_delegreturnvp()
to return delegations during the NFS VOP_RECLAIM().
The function erroneously assumed that nm_clp would
be non-NULL. It will be NULL for NFSV4.0 mounts until
a regular file is opened. It will also be NULL during
vflush() in nfs_unmount() for a forced dismount.
This patch adds a check for clp == NULL to fix this.
Also, since it makes no sense to call nfscl_delegreturnvp()
during a forced dismount, the patch adds a check for that
case and does not do the call during forced dismounts.
PR: 255436
Reported by: ish@amail.plala.or.jp
MFC after: 2 weeks
It was reported that a NFSv4.1 Linux client mount against
a FreeBSD12 server was hung, with the TCP connection in
CLOSE_WAIT state on the server.
When a NFSv4.1/4.2 mount is done and the back channel is
bound to the TCP connection, the soclose() is delayed until
a new TCP connection is bound to the back channel, due to
a reference count being held on the SVCXPRT structure in
the krpc for the socket. Without the soclose() call, the socket
will remain in CLOSE_WAIT and this somehow caused the Linux
client to hang.
This patch adds calls to soshutdown(.., SHUT_WR) that
are performed when the server side krpc sees that the
socket is no longer usable. Since this can be done
before the back channel is bound to a new TCP connection,
it allows the TCP connection to proceed to CLOSED state.
PR: 254590
Reported by: jbreitman@tildenparkcapital.com
Reviewed by: tuexen
Comments by: kevans
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29526
* Fix 82574 Link Status Changes, carrying the OTHER mask bit around as
needed.
* Move igb-class LSC re-arming out of FAST back into the handler.
* Clarify spurious/other interrupt re-arms in FAST.
In MSI-X mode, 82574 and igb-class devices use an interrupt filter to
handle Link Status Changes. We want to do LSC re-arms in the handler
to take advantage of autoclear (EIAC) single shot behavior.
82574 uses 'Other' in ICR and IMS for LSC interrupt types when in MSI-X
mode, so we need to set and re-arm the 'Other' bit during attach and
after ICR reads in the FAST handler if not an LSC or after handling on
LSC due to autoclearing.
This work was primarily done to address the referenced PR, but inspired
some clarification and improvement for igb-class devices once the
intentions of previous bug fix attempts became clearer.
PR: 211219
Reported by: Alexey <aserp3@gmail.com>
Tested by: kbowling (I210 lagg), markj (I210)
Approved by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29943
Currently, most of the rib(9) KPI does not use rnh pointers, using
fibnum and family parameters to determine the rib pointer instead.
This works well except for the case when we initialize new rib pointers
during fib growth.
In that case, there is no mapping between fib/family and the new rib,
as an entirely new rib pointer array is populated.
Address this by delaying fib algo initialization till after switching
to the new pointer array and updating the number of fibs.
Set datapath pointer to the dummy function, so the potential callers
won't crash the kernel in the brief moment when the rib exists, but
no fib algo is attached.
This change allows to avoid creating duplicates of existing rib functions,
with altered signature.
Differential Revision: https://reviews.freebsd.org/D29969
MFC after: 1 week
During early qemu development, the /soc node was marked as compatible
with "riscv-virtio-soc" instead of "simple-bus".
This was changed in qemu 53f54508dae6 in Sep 2018, and predates the
baseline required qemu version (5.0) for riscv by a wide margin.
The generic simplebus code handles attachment in all cases nowadays.
Sponsored by: Tag1 Consulting, Inc.
Reviewed by: jrtc27, mhorne
Differential Revision: https://reviews.freebsd.org/D30011
A lot more generic cam related things are done in mmc_sim so this simplify
the driver a lot.
Differential Revision: https://reviews.freebsd.org/D27487
Reviewed by: kibab
A lot more generic cam related things are done in mmc_sim so this simplify
the driver a lot.
Differential Revision: https://reviews.freebsd.org/D27486
Reviewed by: imp
This adds a generic sim that abstract a lot of what needs to be implemented
in a driver for mmccam support.
A new interface with three methods is added :
- mmc_sim_get_tran_settings: Use to get what the controller supports in term
of capabilities, freq etc ...
- mmc_sim_set_tran_settings: Use to change the speed/freq/etc of the
sdcard host controller
- mmc_sim_cam_request: Used for MMCIO requests
Differential Revision: https://reviews.freebsd.org/D27485
Reviewed by: kibab
Modular fib lookup framework features logic that allows
route update batching for the algorithms that cannot easily
apply the routing change without rebuilding. As a result,
dataplane lookups may return old data until the the sync
takes place. With the default sync timeout of 50ms, it is
possible that new binary like ping(8) executed exactly after
route(8) will still use the old fib data.
To address some aspects of the problem, framework executes
all rtable changes without RTF_GATEWAY synchronously.
To fix the aforementioned problem, this diff extends sync
execution for all RTF_STATIC routes (e.g. ones maintained by
route(8).
This fixes a bunch of tests in the networking space.
Reported by: ci, arichardson
MFC after: 2 weeks
b31fbebeb3 introduced alloc_sockaddr_aligned() which, in fact,
failed to produce aligned addresses.
Reported by: Oskar Holmlund <oskar.holmlund at yahoo.com>
MFC after: immediately
33cb3cb2e3 introduced an `rib_head` structure field under the
FIB_ALGO define. This may be problematic for the CTF, as some
of the files including `route_var.h` do not have `fib_algo`
defined.
Make dtrace happy by making the field unconditional.
Suggested by: markj
For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that
have both Open and LayoutGet operations in them.
If the pNFS server were tp reply NFSERR_DELAY for one of these
compounds, the retry after a delay cannot be handled by
newnfs_request(), since there is a reference held on the open
state for the Open operation in them.
Fix this by adding these RPCs to the "don't do delay here"
list in newnfs_request().
This patch is only needed if the mount is using pNFS (the "pnfs"
mount option) and probably only matters if the MDS server
is issuing delegations as well as pNFS layouts.
Found by code inspection.
MFC after: 2 weeks
Commit 4281bfec36 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.
To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case. For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.
Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled. This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.
MFC after: 2 weeks
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
MFC after: 1 month
Sponsored by: Chelsio Communications
Stop further processing of a packet when detecting that it
contains an INIT chunk, which is too small or is not the only
chunk in the packet. Still allow to finish the processing
of chunks before the INIT chunk.
Thanks to Antoly Korniltsev and Taylor Brandstetter for reporting
an issue with the userland stack, which made me aware of this
issue.
MFC after: 3 days
parse_notes relies on the caller-supplied callback to initialize "res".
Two callbacks are used in practice, brandnote_cb and note_fctl_cb, and
the latter fails to initialize res. Fix it.
In the worst case, the bug would cause the inner loop of check_note to
examine more program headers than necessary, and the note header usually
comes last anyway.
Reviewed by: kib
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29986
Revert rest of de8dd262c4 since it's now unused.
jhibbits@ introduced this to give powerpc MMU functions IFUNC like
performance while retaining the kobj interface, speeding up operations
10-20%. Since there was only ever one instance of the mmu interface
active at any given time, we could cache the looked up results more
agressively.
powerpc migrated to using IFUNCs to get an even larger performance boost
in 45b69dd63e, deleting the two files it was added to in de8dd262c4.
However, there's few, if any, other potential applications of this to
the tree today. It's now unused and undocumented. Retire it to eliminate
this wart and to preclude the need to document it. Should a simmilar
case arise in the future, the code is in git...
Discusssed with: jhibbits@
Reviewed by: jhb@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29997
When parsing the nvlist for a struct pf_addr_wrap we unconditionally
tried to parse "ifname". This broke for PF_ADDR_TABLE when the table
name was longer than IFNAMSIZ. PF_TABLE_NAME_SIZE is longer than
IFNAMSIZ, so this is a valid configuration.
Only parse (or return) ifname or tblname for the corresponding
pf_addr_wrap type.
This manifested as a failure to set rules such as these, where the pfctl
optimiser generated an automatic table:
pass in proto tcp to 192.168.0.1 port ssh
pass in proto tcp to 192.168.0.2 port ssh
pass in proto tcp to 192.168.0.3 port ssh
pass in proto tcp to 192.168.0.4 port ssh
pass in proto tcp to 192.168.0.5 port ssh
pass in proto tcp to 192.168.0.6 port ssh
pass in proto tcp to 192.168.0.7 port ssh
Reported by: Florian Smeets
Tested by: Florian Smeets
Reviewed by: donner
X-MFC-With: 5c11c5a365
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29962
This is needed by the drm-kmod 5.5 update and is similar in logic to the
existing wait_event_killable macro.
Reviewed by: hselasky, manu
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29987
Add 'syncok' field to ifconfig's pfsync interface output. This allows
userspace to figure out when pfsync has completed the initial bulk
import.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29948
Allow up to 5 labels to be set on each rule.
This offers more flexibility in using labels. For example, it replaces
the customer 'schedule' keyword used by pfSense to terminate states
according to a schedule.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29936
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)
Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872
iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache. The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs. When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs. See the comment on get_cpuid_for_queue() for the
entire matrix.
The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset (existing)
dev.<device>.<unit>.iflib.separate_txrx (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)
The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu
When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.
Reviewed by: kbowling
Tested by: olivier, pkelsey
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24094
After a vnode is recycled it can no longer be
acquired via vfs_hash_get() and, as such,
a delegation for the vnode cannot be recalled.
In the unlikely event that a delegation still
exists when the vnode is being recycled, return
the delegation since it will no longer be
recallable.
Until you have this patch in your NFSv4 client,
you should consider avoiding the use of delegations.
MFC after: 2 weeks
Without this patch, if a NFSv4 server recalled a
delegation when the file is not open, the renew
thread would block in the NFS VOP_INACTIVE()
trying to acquire the client state lock that it
already holds.
This patch fixes the problem by delaying the
vrele() call until after the client state
lock is released.
This bug has been in the NFSv4 client for
a long time, but since it only affects
delegation when recalled due to another
client opening the file, it got missed
during previous testing.
Until you have this patch in your client,
you should avoid the use of delegations.
MFC after: 2 weeks
Option `FIB_ALGO` gates new modular fib lookup functionality,
enabling more performant routing table lookups and improving
control plane convergence under the load.
Detailed feature description is available in D27401.
Reviewed By: olivier, gnn
Differential Revision: https://reviews.freebsd.org/D28434
Traditionally we had 2 sources of information whether the
added/delete route request targets network or a host route:
netmask (RTA_NETMASK) and RTF_HOST flag.
The former one is tricky: netmask can be empty or can explicitly
specify the host netmask. Parsing netmask sockaddr requires per-family
parsing and that's what rtsock code traditionally avoided. As a result,
consistency was not enforced and it was possible to specify network with
the RTF_HOST flag and vice versa.
Continue normalization efforts from D29826 and D29826 and ensure that
RTF_HOST flag always reflects host/network data from netmask field.
Differential Revision: https://reviews.freebsd.org/D29958
MFC after: 2 days
PIPE_MINDIRECT determines at what (blocking) write size one-copy
optimizations are applied in pipe(2) I/O. That threshold hasn't
been tuned since the 1990s when this code was originally
committed, and allowing run-time reconfiguration will make it
easier to assess whether contemporary microarchitectures would
prefer a different threshold.
(On our local RPi4 baords, the 8k default would ideally be at least
32k, but it's not clear how generalizable that observation is.)
MFC after: 3 weeks
Reviewers: jrtc27, arichardson
Differential Revision: https://reviews.freebsd.org/D29819
Previously we've returned the error from native ptrace(2), ENOMEM.
This confused Linux strace(2).
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29925
structure is zeroed, by setting the VNET after checking the mbuf count
for zero. It appears there are some cases with early interrupts on some
network devices which still trigger page-faults on accessing a NULL "ifp"
pointer before the TCP LRO control structure has been initialized.
This basically preserves the old behaviour, prior to
9ca874cf74 .
No functional change.
Reported by: rscheff@
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 weeks
Sponsored by: Mellanox Technologies // NVIDIA Networking
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.
If we can not process DPM due to corruption -- wipe it and restart
from scratch. Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.
Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table. It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.
Also while there make some enclosure mapping errors more informative.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
3e7bae0821 turns the BUS_READ_IVAR() failure from a warning into a
KASSERT. For certain PCI audio devices such like snd_csa(4) and
snd_emu10kx(4), the ac97_create() keeps the device handler generated
by device_add_child(pci_dev, "pcm"), which is not really a PCI device
handler. This in turn causes the subsequent pci_get_subdevice()
inside ac97_initmixer() triggering a panic.
This patch tries to put a bandaid for the aforementioned pcm device
children such that they can use the correct PCI handler(from parent)
to avoid a KASSERT panic in the INVARIANTS kernel.
Tested with: snd_csa(4), snd_ich(4), snd_emu10kx(4)
Reviewed by: imp
MFC after: 1 month
When the NFSv4.1/4.2 server does a callback to a client
on the back channel, it will use a session slot in the
back channel session. If the back channel has failed,
the callback will fail and, without this patch, the
session slot will not be released.
As more callbacks are attempted, all session slots
can become busy and then the nfsd thread gets stuck
waiting for a back channel session slot.
This patch frees the session slot upon callback
failure to avoid this problem.
Without this patch, the problem can be avoided by leaving
delegations disabled in the NFS server.
MFC after: 2 weeks
This reverts a portion of 274579831b ("capsicum: Limit socket
operations in capability mode") as at least rtsol and dhcpcd rely on
being able to configure network interfaces while in capability mode.
Reported by: bapt, Greg V
Sponsored by: The FreeBSD Foundation
Changes to the LRO code have exposed a bug in iflib where devices
which are not capable of doing LRO are still calling
tcp_lro_flush_all(), even when they have not initialized the LRO
context. This used to be mostly harmless, but the LRO code now sets
the VNET based on the ifp in the lro context and will try to access it
through a NULL ifp resulting in a panic at boot.
To fix this, we unconditionally initializes LRO so that we have a
valid LRO context when calling tcp_lro_flush_all(). One alternative is
to check the device capabilities before calling tcp_lro_flush_all() or
adding a new state flag in the ctx. However, it seems unwise to add an
extra, mostly useless test for higher performance devices when we can
just initialize LRO for all devices.
Reviewed by: erj, hselasky, markj, olivier
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29928
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers. They can be called
from the 1Hz callout or if_get_counter. All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
While the PV SCSI SG list can handle 512k of SG entries, it can only do
so for I/O that's aligned to 4k or better. newfs_msdos does unaligned
I/O, so triggers too long for host errors in cam when a 512k I/O is
attempted. Prefer power of 2 256k to the absolute maximum 508k, though
that can be revisited should the latter show to give significant
performance improvement.
MFC After: 3 days
Tested by: darius on discord (508k version of patch)
Sponsored by: Netflix
Since then, the FreeBSD USB stack has got proper USB RNDIS support.
PR: 254345
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Update the reference of which file to update in the doc tree when
bumping __FreeBSD_version.
This change is to catch up with commit f8fed61b80 in the doc repository.
MFC after: 3 days
Approved by: lwhsu (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29920
"i" is not used in this loop at all. There's no need to initialize and
increment it.
Reviewed by: markj@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29898
Previously, a page fault taken during copyin/out and related functions
would run the entire fault handler while permitting direct access to
user addresses. This could also leak across context switches (e.g. if
the page fault handler was preempted by an interrupt or slept for disk
I/O).
To fix, clear SUM in assembly after saving the original version of
SSTATUS in the supervisor mode trapframe.
Reviewed by: mhorne, jrtc27
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D29763
Notify the TOE driver when when an ICMP type 3 code 4 (Fragmentation
needed and DF set) message is received for an offloaded connection.
This gives the driver an opportunity to lower the path MTU for the
connection and resume transmission, much like what the kernel does for
the connections that it handles.
Reviewed by: glebius@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D29755
Also add an M_ASSERTMAPPED() macro to verify that all mbufs in the chain
are mapped. Use it in ipfw_nat, which operates on a chain returned by
m_megapullup().
PR: 255164
Reviewed by: ae, gallatin
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29838
If VOP_ADD_WRITECOUNT() or adv locking failed, so VOP_CLOSE() needs to
be called, we cannot use fp fo_close() when there is no fp. This occurs
when e.g. kernel code directly calls vn_open() instead of the open(2)
syscall.
In this case, VOP_CLOSE() can be called directly, after possible lock
upgrade.
Reported by: nvass@gmx.com
PR: 255119
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29830
- SNDSTAT_LABEL_* are renamed to SNDST_DSPS_*, and SNDSTAT_LABEL_DSPS
becomes SNDST_DSPS.
- Centralize channel number/rate/formats into a single nvlist
The above nvlist is named "info_play" and "info_rec"
- Expose only encoding format in pfmts/rfmts. Userland has no direct
access to AFMT_ENCODING/CHANNEL/EXTCHANNEL macros, thus it serves no
meaning to expose too much information through this pair of labels.
However pminrate/rminrate, pmaxrate/rmaxrate, pfmts/rfmts are
deprecated and will be removed in future.
This commit keeps ioctls ABI compatibility with __FreeBSD_version
1400006 for now. In future the compat ABI with 1400006 will be removed
once audio/virtual_oss is rebuilt.
Sponsored by: The FreeBSD Foundation
Reviewed by: hselasky
Approved by: philip (mentor)
Differential Revision: https://reviews.freebsd.org/D29770
Currently, PCB caching mechanism relies on the rib generation
counter (rnh_gen) to invalidate cached nhops/LLE entries.
With certain fib algorithms, it is now possible that the
datapath lookup state applies RIB changes with some delay.
In that scenario, PCB cache will invalidate on the RIB change,
but the new lookup may result in the same nexthop being returned.
When fib algo finally gets in sync with the RIB changes, PCB cache
will not receive any notification and will end up caching the stale data.
To fix this, introduce additional counter, rnh_gen_rib, which is used
only when FIB_ALGO is enabled.
This counter is incremented by the control plane. Each time when fib algo
synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value.
Differential Revision: https://reviews.freebsd.org/D29812
Reviewed by: donner
MFC after: 2 weeks
Address multiple issues with strict rtsock message validation.
D28668 "normalisation" approach was based on the assumption that
we always have at least "standard" sockaddr len.
It turned out to be false - certain older applications like quagga
or routed abuse sin[6]_len field and set it to the offset to the
first fully-zero bit in the mask. It is impossible to normalise
such sockaddrs without reallocation.
With that in mind, change the approach to use a distinct memory
buffer for the altered sockaddrs. This allows supporting the older
software while maintaining the guarantee on the "standard" sockaddrs.
PR: 255273,255089
Differential Revision: https://reviews.freebsd.org/D29826
MFC after: 3 days
iwnstats was not compiling because of some issues raised by the clang
compiler due to -Werror. As a tool it is not connected to world build.
Add missing field "barker_mrc" initialization in struct
iwn_sensitivity_limits for -Wmissing-field-initializers, remove unused
pointer *is on iwn_stats_*_print functions and unused variables for
-Wunused-parameter and -Wunused-variable.
The value for field "barker_mrc" of struct iwn2030_sensitivity_limits
was obtained from linux 3.2 wireless/iwlwifi driver code (iwl-2000.c:115
.barker_corr_th_min_mrc = 390).
Also set BINDIR in Makefile to make it possible to install under
/usr/local/sbin/iwnstats as it require super user.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29800
In certain cases, e.g. a SYN-flood from a limited set of hosts,
the TCP hostcache becomes the main contention point. To solve
that, this change introduces lockless lookups on the hostcache.
The cache remains a hash, however buckets are now CK_SLIST. For
updates a bucket mutex is obtained, for read an SMR section is
entered.
Reviewed by: markj, rscheff
Differential revision: https://reviews.freebsd.org/D29729
This is further rework of 08d9c92027. Now we carry the knowledge of
lock type all the way through tcp_input() and also into tcp_twcheck().
Ideally the rlocking for pure SYNs should propagate all the way into
the alternative TCP stacks, but not yet today.
This should close a race when socket is bind(2)-ed but not yet
listen(2)-ed and a SYN-packet arrives racing with listen(2), discovered
recently by pho@.
When a rescue retransmission is successful, rather than
inserting new holes to the left of it, adjust the old
rescue entry to cover the missed sequence space.
Also, as snd_fack may be stale by that point, pull it forward
in order to never create a hole left of snd_una/th_ack.
Finally, with DSACKs, tcp_sack_doack() may be called
with new full ACKs but a DSACK block. Account for this
eventuality properly to keep sacked_bytes >= 0.
MFC after: 3 days
Reviewed By: kbowling, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29835
This change makes the TCP LRO code more generic and flexible with regards
to supporting multiple different TCP encapsulation protocols and in general
lays the ground for broader TCP LRO support. The main job of the TCP LRO code is
to merge TCP packets for the same flow, to reduce the number of calls to upper
layers. This reduces CPU and increases performance, due to being able to send
larger TSO offloaded data chunks at a time. Basically the TCP LRO makes it
possible to avoid per-packet interaction by the host CPU.
Because the current TCP LRO code was tightly bound and optimized for TCP/IP
over ethernet only, several larger changes were needed. Also a minor bug was
fixed in the flushing mechanism for inactive entries, where the expire time,
"le->mtime" was not always properly set.
To avoid having to re-run time consuming regression tests for every change,
it was chosen to squash the following list of changes into a single commit:
- Refactor parsing of all address information into the "lro_parser" structure.
This easily allows to reuse parsing code for inner headers.
- Speedup header data comparison. Don't compare field by field, but
instead use an unsigned long array, where the fields get packed.
- Refactor the IPv4/TCP/UDP checksum computations, so that they may be computed
recursivly, only applying deltas as the result of updating payload data.
- Make smaller inline functions doing one operation at a time instead of
big functions having repeated code.
- Refactor the TCP ACK compression code to only execute once
per TCP LRO flush. This gives a minor performance improvement and
keeps the code simple.
- Use sbintime() for all time-keeping. This change also fixes flushing
of inactive entries.
- Try to shrink the size of the LRO entry, because it is frequently zeroed.
- Removed unused TCP LRO macros.
- Cleanup unused TCP LRO statistics counters while at it.
- Try to use __predict_true() and predict_false() to optimise CPU branch
predictions.
Bump the __FreeBSD_version due to changing the "lro_ctrl" structure.
Tested by: Netflix
Reviewed by: rrs (transport)
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Extract the state killing code from pfioctl() and rephrase the filtering
conditions for readability.
No functional change intended.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29795
In clang 12.0.0.rc2, going from weak to global is now a hard error:
```
/usr/src/stand/libsa/amd64/_setjmp.S:67:25: error: _longjmp changed binding to STB_GLOBAL
.text; .p2align 4,0x90; .globl _longjmp; .type _longjmp,@function; _longjmp:; .cfi_startproc
```
And the other way is a warning, but we have -Werror:
```
error: __start_set_Xcommand_set changed binding to STB_WEAK [-Werror,-Winline-asm]
error: __stop_set_Xcommand_set changed binding to STB_WEAK [-Werror,-Winline-asm]
```
ref: https://reviews.llvm.org/D90108
Reviewed By: arichardson
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29159
At a recent testing event I found out that I had misinterpreted
RFC5661 where it describes the stripe size in the File Layout's
nfl_util field. This patch fixes the pNFS File Layout server
so that it returns the correct value to the NFSv4.1/4.2 pNFS
enabled client.
This affects almost no one, since pNFS server configurations
are rare and the extant pNFS aware NFS clients seemed to
function correctly despite the erroneous stripe size.
It *might* be needed for correct behaviour if a recent
Linux client mounts a FreeBSD pNFS server configuration
that is using File Layout (non-mirrored configuration).
MFC after: 2 weeks
Add support for current and future client platform PCI IDs. These are
all I219 variants and have no known driver changes versus previous
generation client platform I219 variants.
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29801
There are a number of issues in the e1000 multicast filter handling
that have been present for a long time. Take the updated approach from
ixgbe(4) which does not have the issues.
The issues are outlined in the PR, in particular this solves crossing
over and under the hardware's filter limit, not programming the
hardware filter when we are above its limit, disabling SBP (show bad
packets) when the tunable is enabled and exiting promiscuous mode, and
an off-by-one error in the em_copy_maddr function.
PR: 140647
Reported by: jtl
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29789
We don't need to set the bits here since the if/else if/else statements
fully cover setting these bit pairs.
Reported by: markj
Reviewed by: markj, erj
Approved by: #intel_networking
MFC aftter: 1 week
Differential Revision: https://reviews.freebsd.org/D29827
Only allocate struct_mm after we checked that other threads do not carry
useful mm_struct. If they don't, drop process lock, allocate, and recheck.
Note that for M_NOWAIT allocations we could avoid dropping process lock,
but I do not think that this increased complexity is useful.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Create and use zones for task and mm. Reserve items in zones based on the
estimation of the max number of interrupts in the system. Use M_USE_RESERVE
to allow to take reserved items when allocation occurs from the interrupt
thread context.
Of course, this would only work first time we allocate the task for
interrupt thread. If interrupt is deallocated and allocated anew,
creating a new thread, it might be that zone is depleted. It still
should be good enough for practical uses.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
For anonymous objects, provide a handle kvo_me naming the object,
and report the handle of the backing object. This allows userspace
to deconstruct the shadow chain. Right now the handle is the address
of the object in KVA, but this is not guaranteed.
For the same anonymous objects, report the swap space used for actually
swapped out pages, in kvo_swapped field. I do not believe that it is
useful to report full 64bit counter there, so only uint32_t value is
returned, clamped to the max.
For kinfo_vmentry, report anonymous object handle backing the entry,
so that the shadow chain for the specific mapping can be deconstructed.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D29771
In particular, this avoids malloc(9) calls when from early tunable handling,
with no working malloc yet.
Reported and tested by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Usually rule counters are reset to zero on every update of the ruleset.
With keepcounters set pf will attempt to find matching rules between old
and new rulesets and preserve the rule counters.
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29780
PFRULE_REFS should never be used by userspace, so hide it behind #ifdef
_KERNEL.
MFC after: never
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29779
Split the PFRULE_REFS flag from the rule_flag field. PFRULE_REFS is a
kernel-internal flag and should not be exposed to or read from
userspace.
MFC after: 4 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29778
IEEE Std 802.1D-2004 Section 17.14 defines permitted ranges for timers.
Incoming BPDU messages should be checked against the permitted ranges.
The rest of 17.14 appears to be enforced already.
PR: 254924
Reviewed by: kp, donner
Differential Revision: https://reviews.freebsd.org/D29782
This is required for the current Arch Linux binaries to work.
PR: 254112
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29218
- Use malloc(9) to allocate ivhd_hdrs list. The previous assumption
that there are at most 10 IVHDs in a system is not true. A counter
example would be a system with 4 IOMMUs, and each IOMMU is related
to IVHDs type 10h, 11h and 40h in the ACPI IVRS table.
- Always scan through the whole ivhd_hdrs list to find IVHDs that has
the same DeviceId but less prioritized IVHD type.
Sponsored by: The FreeBSD Foundation
MFC with: 74ada297e8
Reviewed by: grehan
Approved by: lwhsu (mentor)
Differential Revision: https://reviews.freebsd.org/D29525
The NAV (network allocation vector) register reflects the current MAC
tracking of NAV - when it will stay quiet before transmitting.
Other devices transmit their frame durations in their 802.11 PHY headers
and all devices that hear a frame - even if it's one in an encoding
they don't understand - will understand the low bitrate PHY header that
includes the frame duration. So, they'll set NAV to this value so
they'll stay quiet until the transmit completes.
Anyway, sometimes the PHY NAV header is garbled and sometimes, notably
older broadcom devices, will fake a long NAV so they can get "cleaner" air
for local calibration. When this happens, the hardware will stay quiet
for quite some time and this can lead to missed/stuck beacons, or
(for Very Large Values) a MAC hang.
This code just adds the ability to get/set the NAV; the driver will
need to take care of using it during transmit hangs and beacon misses
to see if it's due to a trash looking NAV.
Fix a few 'if(' to be 'if (' in a few places, per style(9) and
overwhelming usage in the rest of the kernel / tree.
MFC After: 3 days
Sponsored by: Netflix
We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.
MFC After: 3 days
Sponsored by: Netflix
The 'ticket' and 'my_ticket' arguments are both read and written within
the same asm block. Clang is stricter with the constraints than gcc4
was, so accepts the '=r' at face value and will happily overwrite
registers that "should" be preserved.
Mark these operands to not clobber other operands, so they get their own
registers.
This fixes a panic on bringing up the octe interfaces.
Fib algo uses a per-family array indexed by the fibnum to store
lookup function pointers and per-fib data.
Each algorithm rebuild currently requires re-allocating this array
to support atomic change of two pointers.
As in reality most of the changes actually involve changing only
data pointer, add a shortcut performing in-flight pointer update.
MFC after: 2 weeks
Some algorithms may require updating datapath and control plane
algo pointers after the (batched) updates.
Export fib_set_datapath_ptr() to allow setting the new datapath
function or data pointer from the algo.
Add fib_set_algo_ptr() to allow updating algo control plane
pointer from the algo.
Add fib_epoch_call() epoch(9) wrapper to simplify freeing old
datapath state.
Reviewed by: zec
Differential Revision: https://reviews.freebsd.org/D29799
MFC after: 1 week
Adding support for TCP over UDP allows communication with
TCP stacks which can be implemented in userspace without
requiring special priviledges or specific support by the OS.
This is joint work with rrs.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29469
We must make sure that incoming packets will never overflow the netmap
buffers, even when the user is using the offset feature. In the typical
scenario, the netmap buffer is 2KiB and, with an MTU of 1500, there are
~500 bytes available for user offsets.
Unfortunately, some NICs accept incoming packets even when they are
larger then the MTU. This means that the only way to stop DMA from
overflowing the netmap buffers, when offsets are allowed, is to choose
a hardware buffer length which is smaller than the netmap buffer
length. For most NICs and for 2KiB netmap buffers, this means 1024
bytes, which is unconveniently small.
The current code will select the small hardware buf size even when
offsets are not in use. The main purpose of this change is to
fix this bug by returning to the normal behavior for the no-offsets
case.
At the same time, the patch pushes the handling of the offset case
to the lower level driver code, so that it can be made NIC-specific
(in future patches).
driver_t was supposed to just be a quick hack for 4.x
compatibility. However, it's been documented now as the preferred API
rather than the replacement kobj_class_t. Drop the note about 4.x since
it's clear we're a bit late to retiring its use through the tree with
almost 1500 references to driver_t.
Sponsored by: Netflix
Maintain code similarity between RACK and base stack
for ECN. This may not strictly be necessary, depending
when a state transition to FIN_WAIT_1 is done in RACK
after a shutdown() or close() syscall.
MFC after: 3 days
Reviewed By: tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D29658
First, two of those four checks are unreachable.
Second, I don't believe there should be ">=" instead of ">".
Third, bus_dma(9) already returns the same EFBIG if ">".
This fixes false I/O errors in worst S/G cases with maxphys >= 2MB.
MFC after: 1 week
This is the April update to vendor/wpa committed upstream
2021/04/07.
This is MFV efec822389.
Suggested by: philip
Reviewed by: philip
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D29744
Explicitly disable ring synchronization before calling
callbacks that may result in a hardware reset.
Before this patch we relied on capturing the down/up events which,
however, may not be issued by all drivers.
As full support of RFC6675 is in place, deprecating
net.inet.tcp.rfc6675_pipe and enabling by default
net.inet.tcp.sack.revised.
Reviewed By: #transport, kbowling, rrs
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28702
"It looks like it would be less confusing to rename 'count' to
something like 'idx', since that's what it's used for in this
function."
Reviewed by: erj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29798
Adds OPAL_CONSOLE_WRITE error handling and implements a call to
OPAL_CONSOLE_WRITE_BUFFER_SPACE to verify if there's enough space
before writing to console.
This fixes serial port output getting corrupted on fast writes, like
on "dmesg" output.
Tested on Raptor Blackbird running powerpc64 BE kernel
Reviewed by: luporl
Sponsored by: Eldorado Reserach Institute (eldorado.org.br)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29063
There is a weird limit of AGTIAPI_MAX_DMA_SEGS (128) S/G segments per
I/O since the initial driver import. I don't know why it was added,
can only guess some hardware limitation, but in worst case it means
maximum I/O size of 508KB. Respect it to be safe, rounding to 256KB.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
It is a direct request for data corruptions, one report of which we
have received. I am very surprised that only one.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Without it, Qt5 apps from Focal fail to start, being unable to load
their plugins. It's also necessary for glibc 2.33, as found in recent
Arch snapshots.
PR: 254112
Reviewed By: kib
Sponsored by: The FreeBSD Foundation, EPSRC
Differential Revision: https://reviews.freebsd.org/D28192
Use M_NOWAIT flag when hash growing is called from callout.
PR: 255041
Reviewed by: kevans
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D29772
A security feature from c06f087ccb appeared to be a huge bottleneck
under SYN flood. To mitigate that add a sysctl that would make
syncache(4) globally visible, ignoring UID/GID, jail(2) and mac(4)
checks. When turned on, we won't need to call crhold() on the listening
socket credential for every incoming SYN packet.
Reviewed by: bz
This reverts commit 9edaceca81.
It turns out that the Linux client intentionally does an NFSv4.1
RPC with only a Sequence operation in it and with "seqid + 1"
for the slot. This is used to re-synchronize the slot's seqid
and the client expects the NFS4ERR_SEQ_MISORDERED error reply.
As such, revert the patch, so that the server remains RFC5661
compliant.
Initial fib algo implementation was build on a very simple set of
principles w.r.t updates:
1) algorithm is ether able to apply the change synchronously (DIR24-8)
or requires full rebuild (bsearch, lradix).
2) framework falls back to rebuild on every error (memory allocation,
nhg limit, other internal algo errors, etc).
This changes brings the new "intermediate" concept - batched updates.
Algotirhm can indicate that the particular update has to be handled in
batched fashion (FLM_BATCH).
The framework will write this update and other updates to the temporary
buffer instead of pushing them to the algo callback.
Depending on the update rate, the framework will batch 50..1024 ms of updates
and submit them to a different algo callback.
This functionality is handy for the slow-to-rebuild algorithms like DXR.
Differential Revision: https://reviews.freebsd.org/D29588
Reviewed by: zec
MFC after: 2 weeks
Restore 525e07418c after the iflib conversion of igb(4). This
reenables random MAC address generation when attaching to a VF with a
zeroed MAC.
PR: 253535
Reported by: Balaev PA <mail@void.so>
Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29785
The boundary differentiating "lem" vs "em" class devices was wrong
after the iflib conversion of lem(4).
The Packet Buffer size for 82547 class chips was not set correctly
after the iflib conversion of lem(4).
These changes restore functionality on an 82547 for the submitter.
PR: 236119
Reported by: Jeff Gibbons <jgibbons@protogate.com>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29766
This is a debugging tunable that shouldn't have retained this setting
after the initial iflib conversion of the driver
PR: 248934
Reported by: Franco Fichtner <franco@opnsense.org>
Reviewed by: markj
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D29768
As this controller requires firmware patch downloading to operate.
"Intel Wireless 7265" support in iwmbtfw(8) is yet to be done.
Tested by: arrowd et al
PR: 228787
MFC after: 2 weeks
Unconditional execution of "clear feature" request at SETUP stage was
workaround for probe failures on ng_ubt.ko re-kldloading which is
unnecessary now.
Reviewed by: hselasky
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29775