hese kstats are often expensive to compute so we want to avoid them
unless specifically requested.
The following kstats are affected by this change:
kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg
PR: 249258
Reported by: mjg
Reviewed by: mjg, allanjude
Obtained from: https://github.com/openzfs/zfs/pull/11099
Sponsored by: iXsystems, Inc.
Ensure we also skip descendants of SKIP nodes when iterating through children
of an explicitly specified node.
Reported by: np
Reviewed by: np
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26833
This will pave the way of setting ssthresh differently in TCP CUBIC, according
to RFC8312 section 4.7.
No functional change, only code movement.
Submitted by: chengc_netapp.com
Reviewed by: rrs, tuexen, rscheff
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26807
This would be more accurately expressed as COMPAT_LINUXKPI implying or
requiring backlight, but config(8) doesn't really have a way to express
that. This fixes the build with COMPAT_LINUXKPI specified in one's kernel
config.
Remove unused oidpp parameter from sysctl_sysctl_next_ls and
add high level comments to describe how it works.
No functional change.
Reviewed by: imp
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D26854
r326145 corrected do_execve() to return EJUSTRETURN upon success so that
important registers are not clobbered. This had the side effect of tapping
out 'failures' for all *execve(2) audit records, which is less than useful
for auditing purposes.
Audit exec returns earlier, where we can know for sure that EJUSTRETURN
translates to success. Note that this unsets TDP_AUDITREC as we commit the
audit record, so the usual audit in the syscall return path will do nothing.
PR: 249179
Reported by: Eirik Oeverby <ltning-freebsd anduin net>
Reviewed by: csjp, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26922
Remove code that supported pre-2011 kernels. CTLTYPE_S64 was defined
in rev 217616. All supported branches have it, so remove its compat
definition as OBE.
platforms.
This allows to not depend on the IOMMU macro in AHCI driver.
Requested by: kib
Suggested by: andrew
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D26887
PCIe allows for MSI-X BAR to be either dedicated, or MSI-X Table may
be co-located in some functional BAR. In the later case xhci(4) is
unable to allocate active resource for the table because BAR is
already activated.
Handle it by checking for this special case, and not try to alloc
resource if MSI-X BAR is IO.
Reported and tested by: emaste
Reviewed by: emaste, hselasky
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D26913
The previous scheme for calculating the total size was doing sizeof
on the struct and then adding the wanted space for the buffer.
nc_name is at offset 58 while sizeof(struct namecache) is 64.
With CACHE_PATH_CUTOFF of 39 bytes and 1 byte of padding we were
allocating 104 bytes for the entry and never accounting for the 6
byte padding, wasting that space.
It no longer protects any of tested fields, keeping all the checks racy.
While here make vtryrecycle drop the vnode on its own. Avoids an additional
lock trip.
The NTB hardware starting with Skylake has some changes to the register
map and the doorbell interface. Add a new NTB_XEON_GEN3 device type and
use it to conditionalize driver logic that differs from the existing
Xeon code.
Reviewed by: vangyzen
Discussed with: cem, Bret Ketchum <Bret.Ketchum@dell.com>
MFC after: 1 month
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26683
seems to use it - it works fine without it, but still.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26898
module by name and not only by the version information, so that
"kldstat -q -m cuse" works.
Found by: Goran Mekic <meka@tilda.center>
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
- Get the number of classes from chip_params.
- Get the number of ethofld tids from the firmware.
- Do not let tcp_ratelimit allocate all traffic classes.
Sponsored by: Chelsio Communications
In certain edge cases, the NIC might have only received a partial TLS
record which it needs to return to the driver. For example, if the
local socket was closed while data was still in flight, a partial TLS
record might be pending when the connection is closed. Receiving a
RST in the middle of a TLS record is another example. When this
happens, the firmware returns the the partial TLS record as plain TCP
data via CPL_RX_DATA. Handle these requests by returning an error to
OpenSSL (via so_error for KTLS or via an error TLS record header for
the older Chelsio OpenSSL interface).
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: Revision: https://reviews.freebsd.org/D26800
It does not change anything immediately, but allows further support of
Command Priority, Status Qualifier and new task management functions.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.
The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.
For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.
Tested by: pho
Discussed with: kib
pagezero(). Ultimately, they use the same method for bulk zeroing, but
the generality of bzero() requires size and alignment checks that
pagezero() does not.
Eliminate an unnecessary #include.
Reviewed by: emaste, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D26876
the failover protocol is supported due to limitations in the IPoIB
architecture. Refer to the lagg(4) manual page for how to configure
and use this new feature. A new network interface type,
IFT_INFINIBANDLAG, has been added, similar to the existing
IFT_IEEE8023ADLAG .
ifconfig(8) has been updated to accept a new laggtype argument when
creating lagg(4) network interfaces. This new argument is used to
distinguish between ethernet and infiniband type of lagg(4) network
interface. The laggtype argument is optional and defaults to
ethernet. The lagg(4) command line syntax is backwards compatible.
Differential Revision: https://reviews.freebsd.org/D26254
Reviewed by: melifaro@
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Size of the per-process semaphore undo structure (semusz) depends on
the number of the per-process undos. If kern.ipc.semume is adjusted,
semusz must be adjusted as well, and it makes no sense to delegate
adjustment to user. Make it automatic.
Reported and tested by: Olef <o.vandestadt@gmail.com>
PR: 250361
Reviewed by: jhb, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26826
The firmware can allocate ingress and egress context ids anywhere from
its configured range. Size the iq/eq maps to match the entire range
instead of assuming that the firmware always allocates the first
available context id.
Reported by: Baptiste Wicht @ Verisign
MFC after: 1 week
Sponsored by: Chelsio Communications
Use ELR register value instead of LR for PMC_TRAPFRAME_TO_PC macro since
it's the former that indicates PC if the interrupted execution thread.
This fixes a bug where pmcstat lost the leaf function of the call chain
and started with the second function in the chain.
Although this change is an improvement over the previous logic there is still
posibility for incomplete data: if the leaf function does not have stack
variables and does not call any other functions compiler would not generate
a stack frame for it and the FP value would point to the caller's frame, so
instead of the actual "caller1 -> caller2 -> leaf" chain only
"caller1 -> leaf" would be captured.
Sponsored by: Ampere Computing
Submitted by: Klara, Inc.
Add missing break to prevent falling through to the default case statement
and returning EINVAL for all session configs.
Sponsored by: Ampere Computing
Submitted by: Klara, Inc.
802.1ad interfaces are created with ifconfig using the "vlanproto" parameter.
Eg., the following creates a 802.1Q VLAN (id #42) over a 802.1ad S-VLAN
(id #5) over a physical Ethernet interface (em0).
ifconfig vlan5 create vlandev em0 vlan 5 vlanproto 802.1ad up
ifconfig vlan42 create vlandev vlan5 vlan 42 inet 10.5.42.1/24
VLAN_MTU, VLAN_HWCSUM and VLAN_TSO capabilities should be properly
supported. VLAN_HWTAGGING is only partially supported, as there is
currently no IFCAP_VLAN_* denoting the possibility to set the VLAN
EtherType to anything else than 0x8100 (802.1ad uses 0x88A8).
Submitted by: Olivier Piras
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D26436
ubuf buffer is too small. It should be 18 if a NULL is not needed,
or 19 to hold the NULL terminator for the full 64-BIT value plus
the 0x prefix.
Submitted by: bret_ketchum@dell.com
Reviewed by: markj mav
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26893
Instead, add arguments to vmapbuf. Since this argument is
always a pointer use a type of void * and cast to vm_offset_t in
vmapbuf. (In CheriBSD we've altered vm_fault_quick_hold_pages to
take a pointer and check its bounds.)
In no other situtation does b_data contain a user pointer and vmapbuf
replaces b_data with the actual mapping.
Suggested by: jhb
Reviewed by: imp, jhb
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26784
It helps to reduce complexity with debugging of large ipfw rulesets.
Also define several constants and translators, that can by used by
dtrace scripts with this probe.
Reviewed by: gnn
Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D26879
Improve the code reconstructing en_tw in struct fpreg32 from FXSAVE
results so that all register states are indicated correctly. The
previous code unconditionally mapped non-empty register state to
'normalized value' constant. The new code explicitly distinguishes
the 'zero value' and 'special value' constants as well. This improves
consistency between real FSAVE and translation from FXSAVE, and
ensures that tests using PT_GETFPREGS can rely on a single correct
value independently of the underlying implementation.
PR: 250454
Sponsored by: The FreeBSD Foundation
Obtained from: Moritz Systems
Submitted by: Michał Górny <mgorny@moritz.systems>
Discussed with: emaste
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26856
- Remove "opt_geom.h", no kernel options are used.
- Remove <sys/sysctl.h>, no sysctl functionality is used here.
- Remove <sys/bio.h>, requirements for bio moved out in r112534.
- Remove <sys/lock.h> and <sys/mutex.h>, last used by DROP_GIANT() and
PICKUP_GIANT(), which were removed in r115624.
- Remove <sys/disk.h> and <sys/kernel.h>, not used.
Reviewed by: phk, kevans (mentor)
Approved by: phk, kevans (mentor)
Differential Revision: https://reviews.freebsd.org/D26805
Currently, this supports SHA1 and SHA2-{224,256,384,512} both as plain
hashes and in HMAC mode on both amd64 and i386. It uses the SHA
intrinsics when present similar to aesni(4), but uses SSE/AVX
instructions when they are not.
Note that some files from OpenSSL that normally wrap the assembly
routines have been adapted to export methods usable by 'struct
auth_xform' as is used by existing software crypto routines.
Reviewed by: gallatin, jkim, delphij, gnn
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26821
Steam's Anti-Cheat might depend on it.
PR: 248223
Analyzed by: Alex S <iwtcex@gmail.com>
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26816
This uses the .incbin directive to pull in the MFS image contents.
Using assembly directly ensures that symbols can be defined with the
name and properties (such as .size) desired without having to rename
symbols, etc. via a second objcopy invocation. Since it is compiled
by the C compiler driver, it also avoids the need for all of the
EMBEDFS* make variables.
Suggested by: jrtc27
Reviewed by: kib, markj
Obtained from: CheriBSD
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26781
Some controllers use PCI function 1 as the requester ID for DMA transfers,
but the controllers are not PCI multifunction.
Set the iommu buswide flag for them. This should instruct an IOMMU driver
to use the same translation rule for all the devices and functions of
a bus.
This was discovered on the ARM Neoverse N1 System Development Platform
(ARM N1SDP).
Bug reference: https://bugzilla.kernel.org/show_bug.cgi?id=42679
Reported by: andrew
Reviewed by: kib, mav
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D26857
pvscsi and vmxnet3 build and work. Exclude vmci for now as it contains
x86-specific assembly.
Reported by: Vincent Milum Jr
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Flow control was disabled during initial TOE TLS development to
workaround a hang (and to match the Linux TOE TLS support for T6).
The rest of the TOE TLS code maintained credits as if flow control was
enabled which was inherited from before the workaround was added with
the exception that the receive window was allowed to go negative.
This negative receive window handling (rcv_over) was because I hadn't
realized the full implications of disabling flow control.
To clean this up, re-enable flow control on TOE TLS sockets. The
existing TPF_FORCE_CREDITS workaround is sufficient for the original
hang. Now that flow control is enabled, remove the rcv_over
workaround and instead assert that the receive window never goes
negative matching plain TCP TOE sockets.
Reviewed by: np
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26799
- Check for null pointers in the crypto_drivers[] array when checking
for empty slots in crypto_select_kdriver().
- Handle the case where crypto_kdone() is invoked on a request where
krq_cap is NULL due to not finding a matching driver.
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26811
If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime. In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.
Check for the condition and acquire nullfs vnode lock if detected.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
It is a common pattern for filesystems' VOP_INACTIVE() implementation
to forcibly reclaim the vnode when its state is final. For instance,
UFS vnode with zero link count is removed, and since it is
inactivated, the last open reference on it is dropped.
On the other hand, vnode might get spurious usecount reference for
many reasons. If the spurious reference exists while vgonel() checks
for active state of the vnode, it would recurse into VOP_INACTIVE().
Fix it by checking and not doing inactivation when vgone() was called
from inactive VOP.
Reported and tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
During tinderbox and similar workloads negative entries get at least one
hit before they get evicted. In the current scheme this avoidably promotes
them.
Be conservative and stick to 2 hits for now.
The TF_TOE flag is the check used in the rest of the network stack to
determine if TOE is active on a socket. There is at least one path in
the cxgbe(4) TOE driver that can leave the tod pointer non-NULL on a
socket not using TOE.
Reported by: Sony Arpita Das <sonyarpitad@chelsio.com>
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26803
Only one MIPS-specific driver implements support for one of the
asymmetric operations. There are no in-kernel users besides
/dev/crypto. The only known user of the /dev/crypto interface was the
engine in OpenSSL releases before 1.1.0. 1.1.0 includes a rewritten
engine that does not use the asymmetric operations due to lack of
documentation.
Reviewed by: cem, markj
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26810
Pad the icmp6stat structure so that we can add more counters in the
future without breaking compatibility again, last done in r358620.
Annotate the rarely executed error paths with __predict_false while
here.
Reviewed by: bz, melifaro
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26578
This will cause the VM to back sufficiently large .text sections, such
as those in zfs.ko or amdgpu.ko on amd64, with superpage mappings when
possible.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26802
When a reserve of free items is configured for a zone, the reserve must
not be reclaimed under memory pressure. Modify keg_drain() to simply
respect the reserved pool.
While here remove an always-false uk_freef == NULL check (kegs that
shouldn't be drained should set _NOFREE instead), and make sure that the
keg_drain() KTR statement does not reference an uninitialized variable.
Reviewed by: alc, rlibby
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26772
zone_import() fetches a free or partially free slab from the keg and
then uses its items to populate an array, typically filling a bucket.
If a single allocation causes the keg to drop below its minimum reserve,
the inner loop ends. However, if the bucket is still not full and
M_USE_RESERVE is specified, the outer loop will continue to fetch items
from the keg.
If M_USE_RESERVE is specified and the number of free items is below the
reserved limit, we should return only a single item. Otherwise, if the
bucket size is larger than the reserve, all of the reserved items may
end up in a single per-CPU bucket, invisible to other CPUs.
Reviewed by: rlibby
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26771
BT_MAXALLOC (4) is the number of boundary tags required to complete an
allocation in the worst case: two to clip a free segment, and two to
import from a parent arena. vmem_xalloc() preallocates four boundary
tags before attempting a search to simplify the segment allocation code.
It implements a loop that:
1) ensures that BT_MAXALLOC boundary tags are available,
2) attempts to find and clip a free segment satisfying the allocation
constraints, and failing that,
3) attempts to import a segment.
On !UMA_MD_SMALL_ALLOC platforms the btag zone has to handle recusion:
it needs boundary tags to allocate boundary tags. Thus we reserve
2 * BT_MAXALLOC * mp_ncpus tags for use when recursing: the factor of 2
is because there are two layers of vmem arenas, the per-domain arena and
global arena. For a single thread, 2 * BT_MAXALLOC tags should be
sufficient.
Because of the way the loop is structured, BT_MAXALLOC tags are not
sufficient. The first bt_fill() call may allocate BT_MAXALLOC tags,
then import a segment (consuming two tags), then attempt to top up the
preallocation before carving into the imported free segment, thus
requiring up to six tags in the worst case. Because we don't
preallocate that many, this bug can cause deadlocks in rare scenarios.
Fix the problem by moving the preallocation out the loop. This assumes
that only a single import is ever required to satisfy an allocation
request.
Thanks to manu, emaste and lwhsu for helping test debug patches.
Reported by: Jenkins (hardware CI lab)
Reviewed by: alc, kib, rlibby
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26770
Rewrite the code that maintains pm_active and invalidates EPTP-tagged
TLB entries in C. Previously this work was done in vmx_enter_guest(),
in assembly, but there is no good reason for that and it makes the TLB
invalidation algorithm for nested page tables harder to review.
No functional change intended. Now, an error from the invept
instruction results in a kernel panic rather than a vmexit. Such errors
should occur only as a result of VMM bugs.
Reviewed by: grehan, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26830
This allows the interrupt controller driver only need a small change to
create a map for the page the device will write to raise an interrupt.
Submitted by: andrew
Reviewed by: kib
Sponsored by: Innovate DSbD
Differential Revision: https://reviews.freebsd.org/D26705
In the functions that copy between userspace and kernel space we check the
user space address is valid before performing the copy. These are mostly
identical within each type of function so create two macros to perform the
check.
Obtained from: CheriBSD
Sponsored by: Innovate UK
This directory doesn't exist and causes gcc-6.4 to complain about
a non-existent include directory
Approved by: kevans, imp
Differential Revision: https://reviews.freebsd.org/D26846
Factor out the priv(9) checks into OS specifc code so other OSes can equally
implement them. This sorts out those XXX in the net80211 code.
We provide 3 arguments (cmd, vap, ifp) where available to the functions, in
order to allow other OSes to use that data but also in case we'd add auditing
to these check to have the information available. For now the arguments are
marked __unused.
PR: 249403
Reported by: martin(NetBSD)
Reviewed by: adrian, martin(NetBSD)
MFC after: 10 days
Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")
Differential Revision: https://reviews.freebsd.org/D26541
connections over multiple paths.
Multipath routing relies on mbuf flowid data for both transit
and outbound traffic. Current code fills mbuf flowid from inp_flowid
for connection-oriented sockets. However, inp_flowid is currently
not calculated for outbound connections.
This change creates simple hashing functions and starts calculating hashes
for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the
system.
Reviewed by: glebius (previous version),ae
Differential Revision: https://reviews.freebsd.org/D26523
another thread has started to send in a CCB and already checked
the queue wasn't frozen, we would end up with iscsi_action()
being called despite the queue is now frozen.
Add a check to make sure this doesn't happen . Perhaps this should
be fixed at the CAM level instead, but given how the send queue and
SIM are governed by two separate mutexes, it is somewhat hard to do.
Reviewed by: imp, mav
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26750
The former clobbers some registers that shouldn't be touched.
Reviewed by: kib (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26406
Turns out the dummy rlimits fix prlimit(1), but break su(8)
(login-1:4.5-1ubuntu2) - although not sudo(8), for some reason.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26814
Implement two macros IEEE80211_VHTCAP_SUPP_CHAN_WIDTH_IS_160MHZ()
and its 80+80 counter part to check in vhtcaps for appropriate
levels of support and use the macros throughout the code.
Add vht160_chan_ranges/is_vht160_valid_freq and handle analogue
to vht80 in various parts of the code.
Add ieee80211_add_channel_cbw() which also takes the CBW flag
fields and make the former ieee80211_add_channel() a wrapper to it.
With the CBW flags we can add HT/VHT channels passing them to
getflags() for the 2/5ghz functions.
In ifconfig(8) add the regdomain_addchans() support for VHT160
and VHT80P80.
With this (+ regdoain.xml updates) VHT160 channels can be
configured, listed, and pass regdomain where appropriate.
Tested with: iwlwifi
Reviewed by: adrian
MFC after: 10 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26712
Add a show sysinit command to ddb (similar to show vnet_sysinit) which
proved to be helpful to debug some ordering issues on early-mid kernel
start panics.
The previous scheme only looked at negative entry count in relation to the
total count, leading to tons of spurious evictions if the cache is not
significantly populated.
Instead, only try the above if negative entry count goes beyond namecache
capacity.
As was done for L3 PTEs in r362853, mask out the reserved bits when
extracting the physical address from an L2 PTE. Future versions of the
spec or custom implementations may make use of these reserved bits, in
which case the resulting physical address could be incorrect.
Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: kp, mhorne
Differential Revision: https://reviews.freebsd.org/D26607
Split everything into neg, debug, param and stat categories.
The legacy nchstats sysctl (queried e.g., by systat) remains untouched.
While here rename some vars to be easier on the eye.
- declutter sysctl vfs.cache by moving relevant entries into
vfs.cache.neg
- add a little more parallelism to eviction by replacing the
global lock with an atomically modified counter
- track more statistics
The code needs further effort.
- fix panic due to tqid overflow
- Improve libzfs_error_init messages
- Expose zfetch_max_idistance tunable
- Make dbufstat work on FreeBSD
- Fix EIO after resuming receive of new dataset over an existing one
on some more advanced C features.
This fixes gcc-toolchain build of exception.S.
Reported and tested by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Add support for ARC-1886, NVMe/SAS/SATA controller.
Many thanks to Areca for continuing to support FreeBSD.
Submitted by: 黃清隆 <ching2048 areca com tw>
MFC after: 2 weeks
These were missed in the previous pass. The extensions (partially)
supported by this change are:
- ARMv8.2-FHM, Floating-point multiplication variant
- ARMv8.4-LSE, Large System Extensions
- ARMv8.4-DIT, Data Independent Timing instructions
Reviewed by: andrew, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26707
This brings these definitions in sync with the ARMv8.6 version of the
architecture reference manual.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26706
This patch adds 80% of UINT32_MAX limit on sequence number.
When sequence number reaches limit kernel sends SADB_EXPIRE message to
IKE daemon which is responsible to perform rekeying.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: ae
Differential revision: https://reviews.freebsd.org/D22370
Obtained from: Semihalf
Sponsored by: Stormshield
Implement support for including IPsec ESN (Extended Sequence Number) to
both encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined
mode (eg. AES-GCM). Both ESP and AH protocols are updated. Additionally
pass relevant information about ESN to crypto layer.
For the ETA mode the ESN is stored in separate crp_esn buffer because
the high-order 32 bits of the sequence number are appended after the
Next Header (RFC 4303).
For the AEAD modes the high-order 32 bits of the sequence number
[e.g. RFC 4106, Chapter 5 AAD Construction] are included as part of
crp_aad (SPI + ESN (32 high order bits) + Seq nr (32 low order bits)).
Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb, gnn
Differential revision: https://reviews.freebsd.org/D22369
Obtained from: Semihalf
Sponsored by: Stormshield
As RFC 4304 describes there is anti-replay algorithm responsibility
to provide appropriate value of Extended Sequence Number.
This patch introduces anti-replay algorithm with ESN support based on
RFC 4304, however to avoid performance regressions window implementation
was based on RFC 6479, which was already implemented in FreeBSD.
To keep things clean and improve code readability, implementation of window
is kept in seperate functions.
Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D22367
Obtained from: Semihalf
Sponsored by: Stormshield
defaults, makes core files smaller, and fixes applications which use
pthread_join(3) in a wrong way, namely Steam.
This is based on a patch submitted by Jason Yang, which I've reworked
to set the limit instead of only changing the value reported (which
is enough to fix the bug for Linux pthreads, but could be confusing).
PR: 248225
Submitted by: Jason_YH_Yang at wistron.com (earlier version)
Analyzed by: Alex S <iwtcex@gmail.com>
Reviewed by: emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26778
This flag is going to be used by IKE daemon to signal if
Extended Sequence Number feature is going to be used.
Value for this flag was taken from OpenBSD source code
6b4cbaf181
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: ae
Differential revision: https://reviews.freebsd.org/D22366
Obtained from: Semihalf
Sponsored by: Stormshield
This patch adds support for IPsec ESN (Extended Sequence Numbers) in
encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined mode
(eg. AES-GCM).
For the encrypt and authenticate mode the ESN is stored in separate
crp_esn buffer because the high-order 32 bits of the sequence number are
appended after the Next Header (RFC 4303).
For the combined modes the high-order 32 bits of the sequence number
[e.g. RFC 4106, Chapter 5 AAD Construction] are part of crp_aad
(prepared by netipsec layer in case of ESN support enabled), therefore
non visible diff around combined modes.
Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D22365
Obtained from: Semihalf
Sponsored by: Stormshield
This patch adds support for IPsec ESN (Extended Sequence Numbers) in
encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined mode
(eg. AES-GCM).
For encrypt and authenticate mode the ESN is stored in separate crp_esn
buffer because the high-order 32 bits of the sequence number are
appended after the Next Header (RFC 4303).
For combined modes the high-order 32 bits of the sequence number [e.g.
RFC 4106, Chapter 5 AAD Construction] are part of crp_aad (prepared by
netipsec layer in case of ESN support enabled), therefore non visible
diff around combined modes.
Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D22364
Obtained from: Semihalf
Sponsored by: Stormshield
This permits requests (netipsec ESP and AH protocol) to provide the
IPsec ESN (Extended Sequence Numbers) in a separate buffer.
As with separate output buffer and separate AAD buffer not all drivers
support this feature. Consumer must request use of this feature via new
session flag.
Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com>
Patryk Duda <pdk@semihalf.com>
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D24838
Obtained from: Semihalf
Sponsored by: Stormshield
The staleness reported in an error cause is in us, not ms.
Enforce limits on the life time via sysct; and socket options
consistently. Update the description of the sysctl variable to
use the right unit. Also do some minor cleanups.
This also fixes an interger overflow issue if the peer can
modify the cookie. This was reported by Felix Weinrank by fuzz testing
the userland stack and in
https://oss-fuzz.com/testcase-detail/4800394024452096
MFC after: 3 days
Hiding this feature behind RB_VERBOSE is gratuitous. The tunable is enough
to limit its use to only those who explicitly request it.
Suggested by: kevans
This simplifies the code while allowing for concurrent negative eviction
down the road.
Cache misses increased slightly due to higher rate of evictions allowed by
the change.
The current algorithm remains too aggressive.
It is reported to fix kernel panics when early unsolicited responses
delivered to the CODEC device not having driver attached yet.
PR: 250248
Reported by: Rajeev Pillai <rajeev_v_pillai@yahoo.com>
Reviewed by: avg
MFC after: 2 weeks
Only assign the address from the iovec to bio_data if it is a kernel
address. This was the single place where bio_data stored (however
briefly) a userspace pointer.
Reviewed by: imp, markj
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26783
VMware now has arm64 support; move these to MI files in advance of
building them on arm64.
PR: 250308
Reported by: Vincent Milum Jr
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
motherboard temperatures. In particular, the U4 northbridge die is very
hard to cool or heat effectively with fans and is not responsive to load.
It generally sits around 64C, where it seems happy, so (like Linux) just
declare that to be its target temperature.
This makes the PowerMac G5 much less loud, with no change in the
temperatures of any system components.
MFC after: 2 weeks
Offensive) the Linux Steam client likes to occasionally scan the game
process memory, presumably as part anti-cheat measures. Turns out
the client also expects each inode entry to be followed by a space
character, otherwise the parsing code crashes.
PR: 248216
Submitted by: Alex S <iwtcex@gmail.com>
MFC after: 2 weeks
The try lock loop in HN_LOCK put the thread spinning on cpu if the lock
is not available. It is possible to cause deadlock if the thread holding
the lock is sleeping. Relinquish the cpu to work around this problem even
it doesn't completely solve the issue. The priority inversion could cause
the livelock no matter how less likely it could happen. A more complete
solution may be needed in the future.
Reported by: Microsoft, Netapp
MFC after: 2 weeks
Sponsored by: Microsoft
It is possible that the vmbus pcib channel is revoked during attach path.
The attach path could be waiting for response from host and this response will never
arrive since the channel has already been revoked from host point of view. Check
this situation during wait complete and return failed if this happens.
Reported by: Netapp
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D26486
Some vnodes come with a hack which inherits the fplookup flag despite having vops
which don't provide the routine.
Reported by: YAMAMOTO Shigeru <shigeru@os-hackers.jp>
Ampere Altra in a dual socket configuration has 12 ITSes for the
12 PCIe root complexes. The NIRQ interrupts are statically split
between each child of the gic bus, so here we increase that
value. 16k is enough for
(#cpus * #its * max_pcie_bifurcation) LPIs + (#SPIs and #PPIs)
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing
Differential Revision: https://reviews.freebsd.org/D26766
RIght now PCB_KERNFPU is used both as indication that kernel prepared
hardware FPU context to use and that the thread is fpu-kern
thread. This also breaks fpu_kern_enter(FPU_KERN_NOCTX), since
fpu_kern_leave() then clears PCB_KERNFPU.
Introduce new flag PCB_KERNFPU_THR which indicates that the thread is
fpu-kern. Do not clear PCB_KERNFPU if fpu-kern thread leaves noctx
fpu region.
Reported and tested by: jhb (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25511
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10. For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved. Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.
While there, style the code:
move MSR defines to specialreg.h
move identification to initcpu.c
Reported by: whu
Reviewed by: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26470
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.
Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741
Weirdly, I needed to sprinkle more parens here to get gcc-as in 6.4
to correctly generate things.
Without them, I'd get an unknown variable reference to SKEIN_ASM_UNROLL1024.
This at least links now, but I haven't run any test cases against it.
It may be worthwhile doing it in case gcc-as demands we liberally sprinkle
more brackets around variables in .if statements.
Thanks to ed for the suggestion of just sprinkling more brackets to
see if that helped.
Reviewed by: emaste
This field was not in specs when the driver was written, but now there
are SSDs with the reported latency of 10s, where hardcoded value of 5s
seems to be not enough sometimes, causing shutdown timeout messages.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
These already use the load variant that simulates userspace access.
Remove the macros that enable normal loads and stores from userspace
as they are unneeded.
Sponsored by: Innovate UK
For some reason I don't want to really understand, the following
happens with gnu as.
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S: Assembler messages:
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:466: Error: found '(', expected: ')'
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:466: Error: junk at end of line, first unrecognized character is `('
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:795: Error: found '(', expected: ')'
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:795: Error: junk at end of line, first unrecognized character is `('
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:885: Error: non-constant expression in ".if" statement
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:885: Error: non-constant expression in ".if" statement
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:885: Error: non-constant expression in ".if" statement
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:885: Error: non-constant expression in ".if" statement
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:885: Error: non-constant expression in ".if" statement
/home/adrian/git/freebsd/src/sys/crypto/skein/amd64/skein_block_asm.S:885: Error: non-constant expression in ".if" statement
After an exhaustive search and experimentation at 11pm, I discovered that
putting them in parentheses fixes the compilation.
Ed pointed out that I could likely fix this in a bunch of other
locations but I'd rather leave these alone until other options
are enabled.
Tested:
* gcc-6, amd64
Reviewed by: emaste
Compiling it with LLVM 10 triggers https://bugs.llvm.org/show_bug.cgi?id=44351
While LLVM 11 is the default compiler, I regularly build with
CROSS_TOOLCHAIN=llvm10 or use system packages for clang on Linux/macOS and
those have not been updated to 11 yet.
It is lightweight way to check if an IPv4 address exists.
Submitted by: Roy Marples
Reviewed by: gnn, melifaro
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26636
Call M_SETFIB() to make sure the IPoIB packet is directed to the correct
interface-specific FIB.
This was sufficient to allow general-purpose routing using the default FIB,
and a separate FIB for routing between IPoIB on ib0 and IPoEthernet on mce0.
Reviewed by: hselasky
Obtained from: Anmol Kumar <anmolk at panasas dot com>
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D25239
The 32-bit counter eventually wraps to 0 which is a sentinel for invalid
id.
Make it 64-bit on LP64 platforms and 0-check otherwise.
Note: Linux counterpart uses id stored per queue instead of a global.
I did not check going that way is feasible with the goal being the
minimal fix doing the job.
Reported by: YAMAMOTO Shigeru <shigeru@os-hackers.jp>
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D26759
When SIOCAIFADDR ioctl configures an IPv4 address that is already exist,
it removes old ifaddr. When this IPv4 address is only one configured on
the interface, this also leads to leaving from AllHosts multicast group.
Then an address is added again, but due to the bug, this doesn't lead
to joining to AllHosts multicast group.
Submitted by: yannis.planus_alstomgroup.com
Reviewed by: gnn
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D26757
NAT64LSN requires the presence of upper level protocol header
in a IPv4 datagram to find corresponding state to make translation.
Now it will be handled automatically by nat64lsn instance.
Reviewed by: melifaro
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D26758
On the target side TLB shootdown IPI handler, prevent the compiler
from performing a forward store optimization which may mask a
subsequent update to the scoreboard by the initiator.
Reported by: Max Laier, Anton Rang
Discussed with: kib
Sponsored by: Dell EMC Isilon
This is a simplistic approach which encrypts each TLS record in two
separate passes: one to generate the MAC and a second to encrypt.
This supports TLS 1.0 connections with implicit IVs as well as TLS
1.1+ with explicit IVs.
Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26730
Due to a weakness in the TLS 1.0 protocol, OpenSSL will periodically
send empty TLS records ("empty fragments"). These TLS records have no
payload (and thus a page count of zero). m_uiotombuf_nomap() was
returning NULL instead of an empty mbuf, and a few places needed to be
updated to treat an empty TLS record as having a page count of "1" as
0 means "no work to do" (e.g. nothing to encrypt, or nothing to mark
ready via sbready()).
Reviewed by: gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26729
arm64 has a similar wrapper. This permits defining <machine/fpu.h> as
the standard header for fpu_kern_*.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26753
This also fixes a bug in the existing list_add_rcu() where the
prev->prev pointer was updated to the new element instead of
next->prev. Currently this function is not widely used.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Even if a kif doesn't have an ifp or if_group pointer we still can't delete it
if it's referenced by a rule. In other words: we must check rulerefs as well.
While we're here also teach pfi_kif_unref() not to remove kifs with flags.
Reported-by: syzbot+b31d1d7e12c5d4d42f28@syzkaller.appspotmail.com
MFC after: 2 weeks
When trapping on a wrote access to a buffer the kernel has mapped as write
only we should only pass the VM_PROT_WRITE flag. Previously the call to
vm_fault_trap as the VM_PROT_READ flag was unexpected.
Reported by: manu
Sponsored by: Innovate UK
Add support to the _STANDALONE environment enough bits of the kernel
that we can compile it. We still have a small zstd_shim.c since there
were 3 items that were a bit hard to nail down and may be cleaned up
in the future. These go hand in hand with a number of commits to
sys/sys in the past weeks, should this need be MFCd.
Discussed with: mmacy (in review and on IRC/Slack)
Reviewed by: freqlabs (on openzfs repo)
Differential Revision: https://reviews.freebsd.org/D26218
Both s_len and s_size are ssize_t, so their differece is also more
properly a ssize_t not a size_t. Also, assert that len is <= size when
we enter. This should always be the case. Ensure that we have that one
byte that we write to the end of the buffer before we do so, though
the error should already be set on the buffer if not, and the only
times we supply 'partial' buffers they should be plenty large.
Reviewed by: cem, jhb (prior version, I did cem's suggestion)
Differential Revsion: https://reviews.freebsd.org/D26752
hints work on per-channel basis as documented, rather than chip-wide. Also,
when configured via hints, return BUS_PROBE_NOWILDCARD on successful hints
match, so that the hints don't bogusly match other types of i2c chips.
If userspace tries to set flags (e.g. 'set skip on <ifspec>') and <ifspec>
doesn't exist we should create a kif so that we apply the flags when the
<ifspec> does turn up.
Otherwise we'd end up in surprising situations where the rules say the
interface should be skipped, but it's not until the rules get re-applied.
Reviewed by: Lutz Donnerhacke <lutz_donnerhacke.de>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26742
There's a number of types we forward declare for the kernel. We need
struct ucred for the ZSTD ZFS integration, so go ahead and forward
declare it here too.
This patch has the driver for 10Gigabit Ethernet controller in AMD
SoC. This driver is written compatible to the Iflib framework. The
existing driver is for the old version of hardware. The submitted
driver here is for the recent versions of the hardware where the Ethernet
controller is PCI-E based.
Submitted by: Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25793
It seems that in r354857 I got more than one thing wrong.
Convert the SYSCTL_OPAQUE to a SYSCTL_PROC to properly export the these
days allocated and not longer static per-vnet viftable array.
This fixes a problem with netstat -g which would show bogus information
for the IPv4 Virtual Interface Table.
PR: 246626
Reported by: Ozkan KIRIK (ozkan.kirik gmail.com)
MFC after: 3 days
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no
functional change. The mechanism can be disabled with
debug.fxrng_vdso_enable=0.
arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time. Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed. On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.
This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator. That change is
left for future discussion/work.
Reviewed by: kib (previous version)
Approved by: csprng (me -- only touching FXRNG here)
Differential Revision: https://reviews.freebsd.org/D22839
There is no functional change for the existing Fortuna random(4)
implementation, which remains the default in GENERIC.
In the FenestrasX model, when the root CSPRNG is reseeded from pools due to
an (infrequent) timer, child CSPRNGs can cheaply detect this condition and
reseed. To do so, they just need to track an additional 64-bit value in the
associated state, and compare it against the root seed version (generation)
on random reads.
This revision integrates arc4random(9) into that model without substantially
changing the design or implementation of arc4random(9). The motivation is
that arc4random(9) is immediately reseeded when the backing random(4)
implementation has additional entropy. This is arguably most important
during boot, when fenestrasX is reseeding at 1, 3, 9, 27, etc., second
intervals. Today, arc4random(9) has a hardcoded 300 second reseed window.
Without this mechanism, if arc4random(9) gets weak entropy during initial
seed (and arc4random(9) is used early in boot, so this is quite possible),
it may continue to emit poorly seeded output for 5 minutes. The FenestrasX
push-reseed scheme corrects consumers, like arc4random(9), as soon as
possible.
Reviewed by: markm
Approved by: csprng (markm)
Differential Revision: https://reviews.freebsd.org/D22838
Fortuna remains the default; no functional change to GENERIC.
Big picture:
- Scalable entropy generation with per-CPU, buffered local generators.
- "Push" system for reseeding child generators when root PRNG is
reseeded. (Design can be extended to arc4random(9) and userspace
generators.)
- Similar entropy pooling system to Fortuna, but starts with a single
pool to quickly bootstrap as much entropy as possible early on.
- Reseeding from pooled entropy based on time schedule. The time
interval starts small and grows exponentially until reaching a cap.
Again, the goal is to have the RNG state depend on as much entropy as
possible quickly, but still periodically incorporate new entropy for
the same reasons as Fortuna.
Notable design choices in this implementation that differ from those
specified in the whitepaper:
- Blake2B instead of SHA-2 512 for entropy pooling
- Chacha20 instead of AES-CTR DRBG
- Initial seeding. We support more platforms and not all of them use
loader(8). So we have to grab the initial entropy sources in kernel
mode instead, as much as possible. Fortuna didn't have any mechanism
for this aside from the special case of loader-provided previous-boot
entropy, so most of these sources remain TODO after this commit.
Reviewed by: markm
Approved by: csprng (markm)
Differential Revision: https://reviews.freebsd.org/D22837
DTS must be synced with the kernel, add a freebsd,dts-version string in
the root node of each DTS that we compile so we can later in the kernel
check that it contain a correct value.
Reviewed by: imp, mmel
Differential Revision: https://reviews.freebsd.org/D26724
r366344 corrected the optimized amd64 skein assembly implementation, so
we can now enable it again.
Also add a dependency on this Makefile for the skein_block object, so
that it will be rebuit (similar to r366362).
PR: 248221
Sponsored by: The FreeBSD Foundation
r365732 was the first attempt to get an accurate count but it was
writing to some read-only registers to clear them and that obviously
didn't work. Instead, note the counter's value when it is supposed to
be cleared and subtract it from future readings.
dev.<port>.stats.rx_fcs_error should not be serviced from the MPS
register for T6.
The stats.* sysctls should all use T5_PORT_REG for T5 and above. This
must have been missed in the initial T5 support years ago. Fix it while
here.
MFC after: 3 days
Sponsored by: Chelsio Communications
Truncating requires an exclusive lock, but it was not taken if the
filesystem indicates support for shared writes. This only concerns
ZFS.
In particular fixes cp of files which have trailing holes.
Reported by: bdrewery
The module handler code invokes a MOD_UNLOAD event immediately if
MOD_LOAD fails. The result was that if seminit() failed, semunload()
was invoked twice. semunload() is not idempotent however and would
try to remove it's process_exit eventhandler twice resulting in a
panic.
Reviewed by: kib, markj
Obtained from: CheriBSD
MFC after: 1 month
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26696
Use of dead_vnodeops would result in a panic instead of returning the intended
EOPNOTSUPP error.
While here make sure to abort, not just try to return a partial result.
The former allows the regular lookup to restart from scratch, while the latter
makes it stuck with an unusable vnode.
Reported by: kevans
Create the RISC-V NOTES and LINT files. As of r366559, LINT configs are
no longer generated but checked in to the tree.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D26502
Allow the DSCP codepoint also to be configurable
for the traffic in the direction from the initiator
to the target, such that writes and any requests
are also treated in the appropriate QoS class.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26714
Consider the currently in-use TCP options when
calculating the amount of new data to be injected during
SACK loss recovery. That addresses the effect that very small
(new) segments could be injected on partial ACKs while
still performing a SACK loss recovery.
Reported by: Liang Tian
Reviewed by: tuexen, chengc_netapp.com
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26446
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt)
option IP(V6)_VLAN_PCP, which can be set to -1 (interface
default), or explicitly to any priority between 0 and 7.
Note that for untagged traffic, explicitly adding a
priority will insert a special 801.1Q vlan header with
vlan ID = 0 to carry the priority setting
Reviewed by: gallatin, rrs
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26409
Extend netstat to display TCP stack and detailed congestion state
Adding the "-c" option used to show detailed per-connection
congestion control state for TCP sessions.
This is one summary patch, which adds the relevant variables into
xtcpcb. As previous "spare" space is used, these changes are ABI
compatible.
Reviewed by: tuexen
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26518
makeLINT.mk isn't needed or used anymore, remove it and all the files
it uses.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D26540
Now that config(8) has supported include for 19 years, transition to
including the NOTES files. include support didn't exist at the time,
nor did the envvar stuff recently added. Now that it does, eliminate
the building of LINT files by just including everything you need.
Note: This may cause conflicts with updating in some cases.
find sys -name LINT\* -rm
is suggested across this commit to remove the generated LINT
files.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D26540
Without this patch, when vn_generic_copy_file_range() is
doing a large copy, it will remain in the function for a
considerable amount of time, delaying handling of any
outstanding signals until the copy completes.
This patch adds checks for signals that need to be
processed after each successful data copy cycle.
When sig_intr() returns non-zero, vn_generic_copy_file_range()
will return.
The check "if (len < savlen)" ensures that some data
has been copied, so that progress will be made.
Note that, since copy_file_range(2) is allowed to
return fewer bytes copied than requested, it
will never return EINTR/ERESTART when sig_intr()
returns non-zero.
Reviewed by: kib, asomers
Differential Revision: https://reviews.freebsd.org/D26620
The precedence of the '&' operator is less than of '+'. Added braces
do change the order of evaluation into the natural one, in my opinion.
On the other hand, the value of the expression should not change since
all elements should have page-aligned values.
This fixes a gcc warning reported.
Reported by: adrian
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Normally when a buffer with B_BARRIER is written, the flag is cleared
by g_vfs_strategy() when creating bio. But in some cases FFS buffer
might not reach g_vfs_strategy(), for instance when copy-on-write
reports an error like ENOSPC. In this case buffer is returned to
dirty queue and might be written later by other means. Among then
bdwrite() reasonably asserts that B_BARRIER is not set.
In fact, the only current use of B_BARRIER is for lazy inode block
initialization, where write of the new inode block is fenced against
cylinder group write to mark inode as used. The situation could be
seen that we break dependency by updating cg without written out
inode. Practically since CoW was not able to find space for a copy of
inode block, for the same reason cg group block write should fail.
Reported by: pho
Discussed with: chs, imp, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26511
Check td_flags for relevant AST requests lock-less. This opens the
race slightly wider where sig_intr() returns false negative, but might
be it is worth it.
Requested by: mjg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Specifically, if lookup() returned any error and the topping directory
was not latched, which means that (non-existent) path did not returned
to the topping location, give ENOTCAPABLE a priority over the lookup()
error.
PR: 249960
Reviewed by: emaste, ngie
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26695
Add support for sysctl 'machdep.uprintf_signal' that prints debugging
information on trap signal.
Reviewed by: jhibbits, luporl, bdragon
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26004
APM BIOS was relevant only to early laptops (approximately P166 or
P200 and slower). These have not been relevant for a long time, and
this code has been untested for a long time (as far as I can
tell). The APM compat code in ACPI and the apm(8) command is not being
retired. Both of these items are still in use (apm(8) is more
scriptable than the replacement acpiconf, for the most part). This has
been commented out of i386 GENERIC since 2002. This code is not
relevant to any other port.
Discussed on: arch@
The correct way to identify the end of the metadata is two adjacent
entries set to zero/MODINFO_END. I made a typo and this was checking the
first entry twice.
Reported by: rpokala
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
The boot metadata (also referred to as modinfo, or preload metadata)
provides information about the size and location of the kernel,
pre-loaded modules, and other metadata (e.g. the EFI framebuffer) to be
consumed during by the kernel during early boot. It is encoded as a
series of type-length-value entries and is usually constructed by
loader(8) and passed to the kernel. It is also faked on some
architectures when booted by other means.
Although much of the module information is available via kldstat(8),
there is no easy way to debug the metadata in its entirety. Add some
routines to parse this data and allow it to be printed to the console
during early boot or output via a sysctl.
Since the output can be lengthly, printing to the console is gated
behind the debug.dump_modinfo_at_boot kenv variable as well as the
BOOTVERBOSE flag. The sysctl to print the metadata is named
debug.dump_modinfo.
Reviewed by: tsoome
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26687
These kind of drops come for free in the sense that they do not use the
filter TCAM or any other resource that wouldn't normally be used during
rx. Frames dropped by the hardware get counted in the MAC's rx stats
but are not delivered to the driver.
hw.cxgbe.attack_filter
Set to 1 to enable the "attack filter". Default is 0. The attack
filter will drop an incoming frame if any of these conditions is true:
src ip/ip6 == dst ip/ip6; tcp and src/dst ip is not unicast; src/dst ip
is loopback (127.x.y.z); src ip6 is not unicast; src/dst ip6 is loopback
(::1/128) or unspecified (::/128); tcp and src/dst ip6 is mcast
(ff00::/8).
hw.cxgbe.drop_ip_fragments
Set to 1 to drop all incoming IP fragments. Default is 0. Note that
this drops valid frames.
hw.cxgbe.drop_pkts_with_l2_errors
Set to 1 to drop incoming frames with Layer 2 length or checksum errors.
Default is 1.
hw.cxgbe.drop_pkts_with_l3_errors
Set to 1 to drop incoming frames with IP version, length, or checksum
errors. Default is 0.
hw.cxgbe.drop_pkts_with_l4_errors
Set to 1 to drop incoming frames with Layer 4 length, checksum, or other
errors. Default is 0.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
It is possible for elf_reloc_local() to fail in the unlikely case of
an unsupported relocation type. If this occurs, do not continue to
process the file.
Reviewed by: kib, markj (earlier version)
MFC after: 1 week
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26701
This code was iteratively implemented during the work on various WiFi
drivers -- from individual functions to a macro-created implementations
for the various bit sized needed (and then extended to more for
comepleteness). Some of the bit combinations do not seem to make sense
so are left out.
The __bf_shf(x) was obtained from D26681 [1].
Requested by: manu [1]
Reviewed by: hselasky, manu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26708
Sort a few VHT160 and 80+80 lines, update some comments, and remove
a superfluous ','.
No functional changes intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
It is unlikely, but possible, that an unrecognized or unsupported
relocation type is encountered while trying to load a kernel module. If
this occurs we should offer the symbol index as a hint to the user.
While here, fix some small style issues.
Reviewed by: markj, kib (amd64 part, in D26701)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Cleanup all host resources, SYSCTLs, MSIX vectors and memory used
by the host and only leave the device allocated memory behind, if any,
because it may still be in use, when the PCI remove function is called.
Else future probe calls may fail due to SYSCTLs already existing.
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking