For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that
have both Open and LayoutGet operations in them.
If the pNFS server were tp reply NFSERR_DELAY for one of these
compounds, the retry after a delay cannot be handled by
newnfs_request(), since there is a reference held on the open
state for the Open operation in them.
Fix this by adding these RPCs to the "don't do delay here"
list in newnfs_request().
This patch is only needed if the mount is using pNFS (the "pnfs"
mount option) and probably only matters if the MDS server
is issuing delegations as well as pNFS layouts.
Found by code inspection.
MFC after: 2 weeks
When fsck_ffs is running in interactive mode and finds unlinked files,
it offers to either unlink them or place them in a lost+found directory.
If the lost+found directory option is requested and no lost+found
directory exists, fsck_ffs offers to create one. When creating one,
it must allocate an inode and a filesystem block. It attempts to
allocate them from the first cylinder group. If the first cylinder
group has a bad check hash, it gives up.
This change expands the search into later cylinder groups when the
first one fails with a bad check hash.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
Commit 4281bfec36 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.
To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case. For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.
Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled. This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.
MFC after: 2 weeks
The driver uses both software resources (locks, callouts, memory for
descriptors and for bookkeeping, sysctls, etc.) and hardware resources
(VIs, DMA queues, TCAM entries, etc.) to operate the NIC. This commit
splits the single *_ALLOCATED flag used to track all these resources
into separate *_SW_ALLOCATED and *_HW_ALLOCATED flags.
This is the simplified pseudocode that now applies to most queues (foo
can be ctrlq/txq/rxq/ofld_txq/ofld_rxq):
/* Idempotent */
alloc_foo
{
if (!SW_ALLOCATED)
init_iq/init_eq/init_fl no-fail sw init
alloc_iq_fl/alloc_eq/alloc_wrq may-fail sw alloc
add_foo_sysctls, etc. no-fail post-alloc items
if (!HW_ALLOCATED)
alloc_iq_fl_hwq/alloc_eq_hwq hw resource allocation
}
/* Idempotent */
free_foo
{
if (!HW_ALLOCATED)
free_iq_fl_hwq/free_eq_hwq release hw resources
if (!SW_ALLOCATED)
free_iq_fl/free_eq/free_wrq release sw resources
}
The routines that take the driver to FULL_INIT_DONE and VI_INIT_DONE and
back are now all idempotent. The quiesce routines pay attention to the
HW_ALLOCATED flag and will not wait on the hardware for pidx/cidx
updates and other completions if this flag is not set.
MFC after: 1 month
Sponsored by: Chelsio Communications
Stop further processing of a packet when detecting that it
contains an INIT chunk, which is too small or is not the only
chunk in the packet. Still allow to finish the processing
of chunks before the INIT chunk.
Thanks to Antoly Korniltsev and Taylor Brandstetter for reporting
an issue with the userland stack, which made me aware of this
issue.
MFC after: 3 days
The mangling was present in the initial revision of the script, but its
purpose is not clear. It may have been to avoid defining make(1)
variables with a dash in the name, but this is permitted. Furthermore,
it results in invalid dependency information if a dependency's name
contains an underscore, causing e.g., libcompiler_rt-dev to depend on
libcompiler-rt, and resulting in warnings when installing base system
packages. Remove the mangling.
Reviewed by: manu
MFC after: 2 months
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29883
parse_notes relies on the caller-supplied callback to initialize "res".
Two callbacks are used in practice, brandnote_cb and note_fctl_cb, and
the latter fails to initialize res. Fix it.
In the worst case, the bug would cause the inner loop of check_note to
examine more program headers than necessary, and the note header usually
comes last anyway.
Reviewed by: kib
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29986
Revert rest of de8dd262c4 since it's now unused.
jhibbits@ introduced this to give powerpc MMU functions IFUNC like
performance while retaining the kobj interface, speeding up operations
10-20%. Since there was only ever one instance of the mmu interface
active at any given time, we could cache the looked up results more
agressively.
powerpc migrated to using IFUNCs to get an even larger performance boost
in 45b69dd63e, deleting the two files it was added to in de8dd262c4.
However, there's few, if any, other potential applications of this to
the tree today. It's now unused and undocumented. Retire it to eliminate
this wart and to preclude the need to document it. Should a simmilar
case arise in the future, the code is in git...
Discusssed with: jhibbits@
Reviewed by: jhb@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29997
Add a test case where the pfctl optimizer will generate a table
automatically. These tables have long names, which we accidentally broke
in the nvlist ADDRULE ioctl.
Reviewed by: melifaro
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29989
When parsing the nvlist for a struct pf_addr_wrap we unconditionally
tried to parse "ifname". This broke for PF_ADDR_TABLE when the table
name was longer than IFNAMSIZ. PF_TABLE_NAME_SIZE is longer than
IFNAMSIZ, so this is a valid configuration.
Only parse (or return) ifname or tblname for the corresponding
pf_addr_wrap type.
This manifested as a failure to set rules such as these, where the pfctl
optimiser generated an automatic table:
pass in proto tcp to 192.168.0.1 port ssh
pass in proto tcp to 192.168.0.2 port ssh
pass in proto tcp to 192.168.0.3 port ssh
pass in proto tcp to 192.168.0.4 port ssh
pass in proto tcp to 192.168.0.5 port ssh
pass in proto tcp to 192.168.0.6 port ssh
pass in proto tcp to 192.168.0.7 port ssh
Reported by: Florian Smeets
Tested by: Florian Smeets
Reviewed by: donner
X-MFC-With: 5c11c5a365
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29962
This is needed by the drm-kmod 5.5 update and is similar in logic to the
existing wait_event_killable macro.
Reviewed by: hselasky, manu
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D29987
Add 'syncok' field to ifconfig's pfsync interface output. This allows
userspace to figure out when pfsync has completed the initial bulk
import.
Reviewed by: donner
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29948
Now that we support having multiple labels on a rule ensure that we can
use each rule label to kill states.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29938
Allow up to 5 labels to be set on each rule.
This offers more flexibility in using labels. For example, it replaces
the customer 'schedule' keyword used by pfSense to terminate states
according to a schedule.
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29936
This is just clerical work to ease bug triage and may be used to set
expectations around the ability for anyone in the community to perform
testing and development on older parts (this driver covers over 20 years
of silicon)
Reviewed by: erj
Approved by: markj
Sponsored by: Pink Floyd - Any Colour You Like (in kind)
Differential Revision: https://reviews.freebsd.org/D29872
iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache. The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs. When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs. See the comment on get_cpuid_for_queue() for the
entire matrix.
The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset (existing)
dev.<device>.<unit>.iflib.separate_txrx (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)
The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu
When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.
Reviewed by: kbowling
Tested by: olivier, pkelsey
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24094
After a vnode is recycled it can no longer be
acquired via vfs_hash_get() and, as such,
a delegation for the vnode cannot be recalled.
In the unlikely event that a delegation still
exists when the vnode is being recycled, return
the delegation since it will no longer be
recallable.
Until you have this patch in your NFSv4 client,
you should consider avoiding the use of delegations.
MFC after: 2 weeks
Without this patch, if a NFSv4 server recalled a
delegation when the file is not open, the renew
thread would block in the NFS VOP_INACTIVE()
trying to acquire the client state lock that it
already holds.
This patch fixes the problem by delaying the
vrele() call until after the client state
lock is released.
This bug has been in the NFSv4 client for
a long time, but since it only affects
delegation when recalled due to another
client opening the file, it got missed
during previous testing.
Until you have this patch in your client,
you should avoid the use of delegations.
MFC after: 2 weeks
Previously it depended on sysctl, which itself has no dependencies,
so rcorder(8) had a bit too much flexibility when choosing when to run
it. Make sure it runs just between 'fsck' and 'root'.
Reviewed By: jmg, imp
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29748
Option `FIB_ALGO` gates new modular fib lookup functionality,
enabling more performant routing table lookups and improving
control plane convergence under the load.
Detailed feature description is available in D27401.
Reviewed By: olivier, gnn
Differential Revision: https://reviews.freebsd.org/D28434
Traditionally we had 2 sources of information whether the
added/delete route request targets network or a host route:
netmask (RTA_NETMASK) and RTF_HOST flag.
The former one is tricky: netmask can be empty or can explicitly
specify the host netmask. Parsing netmask sockaddr requires per-family
parsing and that's what rtsock code traditionally avoided. As a result,
consistency was not enforced and it was possible to specify network with
the RTF_HOST flag and vice versa.
Continue normalization efforts from D29826 and D29826 and ensure that
RTF_HOST flag always reflects host/network data from netmask field.
Differential Revision: https://reviews.freebsd.org/D29958
MFC after: 2 days
Most of the routing tests create per-test VNET, making
it harder to repeat the failure with CLI tools.
Provide an additional route/nexthop data on failure.
Differential Revision: https://reviews.freebsd.org/D29957
Reviewed by: kp
MFC after: 2 weeks
PIPE_MINDIRECT determines at what (blocking) write size one-copy
optimizations are applied in pipe(2) I/O. That threshold hasn't
been tuned since the 1990s when this code was originally
committed, and allowing run-time reconfiguration will make it
easier to assess whether contemporary microarchitectures would
prefer a different threshold.
(On our local RPi4 baords, the 8k default would ideally be at least
32k, but it's not clear how generalizable that observation is.)
MFC after: 3 weeks
Reviewers: jrtc27, arichardson
Differential Revision: https://reviews.freebsd.org/D29819
Previously we've returned the error from native ptrace(2), ENOMEM.
This confused Linux strace(2).
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29925
structure is zeroed, by setting the VNET after checking the mbuf count
for zero. It appears there are some cases with early interrupts on some
network devices which still trigger page-faults on accessing a NULL "ifp"
pointer before the TCP LRO control structure has been initialized.
This basically preserves the old behaviour, prior to
9ca874cf74 .
No functional change.
Reported by: rscheff@
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 weeks
Sponsored by: Mellanox Technologies // NVIDIA Networking
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.
If we can not process DPM due to corruption -- wipe it and restart
from scratch. Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.
Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table. It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.
Also while there make some enclosure mapping errors more informative.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
3e7bae0821 turns the BUS_READ_IVAR() failure from a warning into a
KASSERT. For certain PCI audio devices such like snd_csa(4) and
snd_emu10kx(4), the ac97_create() keeps the device handler generated
by device_add_child(pci_dev, "pcm"), which is not really a PCI device
handler. This in turn causes the subsequent pci_get_subdevice()
inside ac97_initmixer() triggering a panic.
This patch tries to put a bandaid for the aforementioned pcm device
children such that they can use the correct PCI handler(from parent)
to avoid a KASSERT panic in the INVARIANTS kernel.
Tested with: snd_csa(4), snd_ich(4), snd_emu10kx(4)
Reviewed by: imp
MFC after: 1 month