When we create a table without counters, add an entry and later
re-define the table to have counters we wound up trying to read
non-existent counters.
We now cope with this by attempting to add them if needed, removing them
when they're no longer needed and not trying to read from counters that
are not present.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34131
Reduce the burden to maintain correct and
extensible ECN related code across multiple
stacks and codepaths.
Formally no functional change.
Incidentially this establishes correct
ECN operation in one instance.
Reviewed By: rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34162
sbcut() returns mbufs in reverse order so is not suitable for reading
data from the socket buffer. Instead, check for already-received data
in the receive worker thread before passing offload PDUs up to the
iSCSI layer. This uses soreceive() to read data from the socket and
is also to use M_WAITOK since it now runs from a worker thread instead
of an interrupt thread.
Also, fix decoding of the data segment length for pre-offload PDUs.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: a8c4147edc cxgbei: Parse all PDUs received prior to enabling offload mode.
Sponsored by: Chelsio Communications
FUSE file systems that do not set FUSE_NO_OPENDIR_SUPPORT do not
guarantee that d_off will be valid after closing and reopening a
directory. That conflicts with NFS's statelessness, that results in
unresolvable bugs when NFS reads large directories, if:
* The file system _does_ change the d_off field for the last directory
entry previously returned by VOP_READDIR, or
* The file system deletes the last directory entry previously seen by
NFS.
Rather than doing a poor job of exporting such file systems, it's better
just to refuse.
Even though this is technically a breaking change, 13.0-RELEASE's
NFS-FUSE support was bad enough that an MFC should be allowed.
MFC after: 3 weeks.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33726
In its lowest common denominator, FUSE does not require that a directory
entry's d_off field is valid outside of the lifetime of the directory's
FUSE file handle. But since NFS is stateless, it must reopen the
directory on every call to VOP_READDIR. That means reading the
directory all the way from the first entry. Not only does this create
an O(n^2) condition for large directories, but it can also result in
incorrect behavior if either:
* The file system _does_ change the d_off field for the last directory
entry previously seen by NFS, or
* The file system deletes the last directory entry previously seen by
NFS.
Handily, for file systems that set FUSE_NO_OPENDIR_SUPPORT d_off is
guaranteed to be valid for the lifetime of the directory entry, there is
no need to read the directory from the start.
MFC after: 3 weeks
Reviewed by: rmacklem
The FUSE protocol does not require that a directory entry's d_off field
outlive the lifetime of its directory's file handle. Since the NFS
server must reopen the directory on every VOP_READDIR call, that means
it can't pass uio->uio_offset down to the FUSE server. Instead, it must
read the directory from 0 each time. It may need to issue multiple
FUSE_READDIR operations until it finds the d_off field that it's looking
for. That was the intention behind SVN r348209 and r297887, but a logic
bug prevented subsequent FUSE_READDIR operations from ever being issued,
rendering large directories incompletely browseable.
MFC after: 3 weeks
Reviewed by: rmacklem
For write operation pseudofs creates an sbuf with the data.
Use this data instead of the uio as it's not usable anymore after
uiomove.
Reviewed by: hselasky
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D34114
Modules no longer call kernel functions for atomic ops, and since the
previous commit, we always use lock prefix.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: jhb, markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
Atomics have significant other use besides providing in-system
primitives for safe memory updates. They are used for implementing
communication with out of system software or hardware following some
protocols.
For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor. Or
real hardware might need atomic accesses as part of the proper
management protocol.
Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.
Requested by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: Elliott Mitchell, imp, jhb, markj, royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
While here clean up the names for the naming convention of the other
registers in this file.
Reviewed by: kib, mhorne (earlier version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34060
Summary:
This switch is based off of the AR8327/AR8337 external switch/PHY.
However unlike the AR8327/AR8337 it itself doesn't have any PHYs;
instead an external PHY connects to it using the PSGMII port.
Differential Revision: https://reviews.freebsd.org/D34112
Reviewed by: manu
This code is inspired by the ar40xx code in openwrt, which itself
is based on the Qualcomm QCA-SSDK. Both of these sources are, amusingly,
BSD licenced - and thus I have included some of the comments in the
hardware workaround paths to document some of the magic numbers.
This adds the ethernet MAC and ethernet switch definitions.
I've rewritten the header file and the DTS based on documentation
and the required driver fields rather than the GPL'ed
ones from openwrt.
Differential Revision: https://reviews.freebsd.org/D34111
Reviewed by: manu
This adds support for the IPQ4018/IPQ4019 MDIO bus. This is used to
talk to external PHYs and switches. (There's an internal switch
in the IPQ4018/IPQ4019 as well, but it's accessible via MMIO/AXI.)
Differential Revision: https://reviews.freebsd.org/D34110
Reviewed by: manu
A lot more generic cam related things were done in mmc_sim so this
simplifies the driver a lot.
Differential Revision: https://reviews.freebsd.org/D32154
Reviewed by: imp
After b5d227b0 FreeBSD was panicking on boot with "Duplicate free" in
UMA. Analyzing the asm, the '1' mask was treated as an integer, rather
than a long, causing 'slw' (shift left word) to be used for the shifting
instruction, not 'sld' (shift left double). This means the upper bits
of the bitfield were not getting used, resulting in corruption of the
bitfield.
While fixing this, the 'and' check of the mask does not need to be
recorded, so don't record (drop the '.').
There seem to be systems returning some garbage here. I still don't
know why, but at least I hope this check fix indefinite printf loop.
MFC after: 2 weeks
setsockopt() grants full access to the deprecated
TOS byte. For TCP, mask out the ECN codepoint, so that
only the DSCP portion can be adjusted.
Reviewed By: tuexen, hselasky, #manpages, #transport, debdrup
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34154
GCC's -Wformat complains about NULL format strings passed to
iwl_fw_dbg_collect_trig (though the function handles NULL format
strings). Curious that upstream iwlwifi in Linux is built with GCC
and explicitly opts into this warning via the __printf() attribute.
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D34146
Due to variable size of struct ctl_ha_msg_mode ctl_isc_announce_mode()
sent only first 4 bytes of modified mode page to the other HA side,
that caused its corruption there, noticeable only after failover.
I've found alike bug also in ctl_isc_announce_lun(), but there it was
sending slightly more than needed, that is a smaller problem.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
74cf7cae4d ("softclock: Use dedicated ithreads for running callouts.")
switched callouts away from the swi infrastructure. It turns out that
this was a major source of entropy in early boot, which we've now lost.
As a result, first boot on hardware without a 'fast' entropy source
would block waiting for fortuna to be seeded with little hope of
progressing without manual intervention.
Let's resolve it by explicitly harvesting entropy in callout_process()
if we've handled any callouts. cc/curthread/now seem to be reasonable
sources of entropy, so use those.
Discussed with: jhb (also proposed initial patch)
Reported by: many
Reviewed by: cem, markm (both csprng)
Differential Revision: https://reviews.freebsd.org/D34150
TCP RACK can cache the IP header while preparing
a new TCP packet for transmission. Thus all the
IP ECN codepoint bits need to be assigned, without
assuming a clear field beforehand.
Reviewed By: tuexen, kbowling, #transport
MFC after: 3 days
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34148
In order to consistently provide access to all
(including reserved) TCP header flag bits,
use an accessor function tcp_get_flags and
tcp_set_flags. Also expand any flag variable from
uint8_t / char to uint16_t.
Reviewed By: hselasky, tuexen, glebius, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D34130
When FILEMON_SET_FD is used, the filemon handle effectively wraps the
passed file. In particular, the handle may be inherited by a child
process, or transferred over a unix domain socket, so we must verify
that the backing file permits this.
Reported by: syzbot+36e6be9e02735fe66ca8@syzkaller.appspotmail.com
Reviewed by: emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34128
Consistently only pass the inp and the sopt around. Don't pass the
so around, since in a upcoming commit tcp_ctloutput_set() will be
called from a context different from setsockopt(). Also expect
the inp to be locked when calling tcp_ctloutput_[gs]et(), this is
also required for the upcoming use by tcpsso, a command line tool
to set socket options.
Reviewed by: glebius, rscheff
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D34151
According to Broadcom, mixing 64-bit SGEs with 32-bit chain entries can
lead to IOC Fault code 0x40000d04. This fault code has been observed to
suddenly increase on certain machines when the OCA firmware images are
deployed. The hardware interprets all elements of a 64-bit SGE, even
ones marked as 32-bit. Depending on the other bits, this will just work,
but sometimes generate the above fault. Broadcom recommends this
practice, and the Linux and NetBSD drivers follow it.
Rework the chaining code to use MPI2_SGE_CHAIN64 instead of
MPI2_SGE_CHAIN32. Adjust MPS_SGC_SIZE from 8 to 12 to match the size of
the new structure. Flag the structure as being 64-bits now. Since
MPS_SGE64_SIZE and MPS_SGC_SIZE are the same now, mps_push_sge could be
simplified (after the same fashion of mpr). The different number of
cases collapse to whether or not there's room for the segments and if
not we need a chain, however these changes haven't been made yet as the
current code handles those cases properly with the new defines.
Made chain_busaddr 64-bits, even though we ask for all allocations to be
below 4GB for this tag. Use it to set both parts of the CHAIN64 address
rather than baking the 4GB assumption. Add asserts around the allocation
to detect and BUSDMA bugs in allocation.
Remove asserts and associated comment in mpi_pre_fw_download and
mpi_pre_fw_upload. The code does not, it seems, depend on this
invariant. The mpr driver has similar code, no asserts and also doesn't
depend on this.
Adjust comments to reflect the updated size.
Sponsored by: Netflix
Reviewed by: scottl, mav
Differential Revision: https://reviews.freebsd.org/D34016
I see no apparent need to avoid waiting on the lock just because
vinactive() may be called on another thread while the thread that
cleared the vnode refcount has the lock dropped. In fact, this
can at least lead to a panic of the form "vn_lock: error <errno>
incompatible with flags" if LK_RETRY was passed to VOP_LOCK().
In this case LK_NOWAIT may cause the underlying FS to return an
error which is incompatible with LK_RETRY.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
The unionfs root vnode will always share a lock with its lower vnode.
If unionfs was mounted with the 'below' option, this will also be the
vnode covered by the unionfs mount. During unmount, the covered vnode
will be locked by dounmount() while the unionfs root vnode will be
locked by vgone(). This effectively requires recursion on the same
underlying like, albeit through two different vnodes.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
VOP_LOCK() may be handed a vnode that is concurrently reclaimed.
unionfs_lock() accounts for this by checking for empty vnode private
data under the interlock. But it incorrectly asserts that the vnode
is using the unionfs dispatch table before making this check.
Reverse the order, and also update KASSERT_UNIONFS_VNODE() to provide
more useful information.
Reported by: pho
Reviewed by: kib, markj, pho
Differential Revision: https://reviews.freebsd.org/D34109
Commit b0b7d978b6 changed the NFSv4 server's default
behaviour to check the file's mode or ACL for permission to
open the file, to be Linux and Solaris compatible.
However, it turns out that Linux makes an exception for
the case of Claim_delegate_cur(_fh).
When a NFSv4 client is returning a delegation, it must
acquire Opens against the server to replace the ones
done locally in the client. The client does this via
an Open operation with Claim_delegate_cur(_fh). If
this operation fails, due to a change to the file's
mode or ACL after the delegation was issued, the
client does not have any way to retain the open.
As such, the Linux client allows the file's owner
to perform an Open with Claim_delegate_cur(_fh)
no matter what the mode or ACL allows.
This patch makes the FreeBSD server allow this case,
to be Linux compatible.
This patch only affects the case where delegations
are enabled, which is not the default.
MFC after: 2 weeks
If port resume fails, likely the USB device is detached. Ignore such errors,
because else the USB stack might try forever trying to resume the device,
before it will proceed detaching it.
MFC after: 1 week
Sponsored by: NVIDIA Networking
Eliminate shlq $3,address shift after masking of the va is done, which
is needed to convert pt_entry_t[] array index into byte offset.
Do it by preshifting the mask, and compensating the right shift of va.
Suggested by: alc
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33786
instead of only OBJ_ANON objects that are backing, as it is now.
This is required for e.g. vm_meter is_object_active() detection, and
should be useful in some more cases.
Use refcount KPI for all objects, regardless of owning the object lock,
and the fact that currently OBJ_ANON cannot change for the live object.
Noted and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33549