Allow the resending of DATA chunks to be controlled by the caller,
which allows retiring sctp_mtu_size_reset() in a separate commit.
Also improve the computaion of the overhead and use 32-bit integers
consistently.
Thanks to Timo Voelker for pointing me to the code.
MFC after: 3 days
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451
Since popcnt is only supported by CPUTYPE=nehalem and later, ensure that
this instruction is only emitted when appropriate. Otherwise, programs
using the library can abort with SIGILL.
See also: https://github.com/llvm/llvm-project/issues/52893
PR: 258156
Reported by: Eric Rucker <bhtooefr@bhtooefr.org>
MFC after: 3 days
Add functionality for extents validation inside the filesystem
extents block. The main logic is implemented under
ext4_validate_extent_entries() function, which verifies extents
or extents indexes depending of extent depth value.
PR: 259112
Reported by: Robert Morris
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33375
Rename ext2_dirbadentry() to ext2_check_direntry(). Add directory
entry inode value check, and call ext2_check_direntry() in all cases.
The dirchk sysctl is removed.
PR: 259024,259041
Reported by: Robert Morris
Reviewed by: pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33374
For years CTL block backend limited I/O size to 1MB, splitting larger
requests into sequentially processed chunks. It is sufficient for
most of use cases, since typical initiators rarely use bigger I/Os.
One of known exceptions is VMWare VAAI offload, by default sending up
to 8 4MB EXTENDED COPY requests same time. CTL internally converted
those into 32 1MB READ/WRITE requests, that could overwhelm the block
backend, having finite number of processing threads and making more
important interactive I/Os to wait in its queue. Previously it was
partially covered by CTL core serializing sequential reads to help
ZFS speculative prefetcher, but that serialization was significantly
relaxed after recent ZFS improvements.
With the new settings block backend receives 8 4MB requests, that
should be easier for both CTL itself and the underlying storage.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Rather than duplicating the switches in crypto_auth_hash() and
crypto_cipher(), copy the algorithm constants from the new session
ioctl into a csp directly which permits using the functions in
crypto.c.
Reviewed by: markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33613
Text requests and responses can span multiple PDUs. In that case, the
sender sets the Continue bit in non-final PDUs and the Final bit in
the last PDU. The receiver responds to non-final PDUs with an empty
text PDU.
To support this, add a more abstract API in libiscsi which accepts and
receives key sets rather than PDUs. These routines internally send or
receive one or more PDUs. Use these new functions to replace the
handling of TextRequest and TextResponse PDUs in discovery sessions in
both ctld and iscsid.
Note that there is not currently a use case for large Text requests
and those are still always sent as a single PDU. However, discovery
sessions can return a text response listing targets that spans
multiple PDUs, so the new API supports sending and receiving multi-PDU
responses.
Reported by: Jithesh Arakkan @ Chelsio
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D33548
If you look at the SMBIOS specification, we'll find something is
missing. In particular at offset 0Dh is supposed to be the OEM-defined
field. This should go between security and height. It is not legal to
actually skip this and will lead to other folks not properly
interpreting later parts of the table.
https://www.illumos.org/issues/14312
Reviewed by: jhb
Submitted by: Robert Mustacchi <rm@fingolfin.org>
Obtained from: ilumos
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33682
With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need
to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway
address family. Store them explicitly in the private part of the nexthop data.
While here, store nhop fibnum in nhop_prip datastructure to make it self-contained.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33663
Introduce a new function, lltable_get(), to retrieve lltable pointer
for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
ARP/ND records.
Differential Revision: https://reviews.freebsd.org/D33660
MFC after: 2 weeks
If GEOM is idle but the root device is not yet present when we enter
vfs_mountroot_wait_if_necessary, we call vfs_mountroot_wait to wait
for root holds (e.g. CAM or USB initialization). Upon returning from
vfs_mountroot_wait, we wait 100 ms at a time until the root device
shows up.
Since the root device most likely appeared during vfs_mountroot_wait
-- waiting for subsystems which may be responsible for the root
device is the whole purpose of that function -- it makes sense to
check if the device is now present rather than printing a warning
and pausing for 100 ms before checking.
Reviewed by: trasz
Fixes: a3ba3d09c2 Make root mount wait mechanism smarter
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D33593
In the case of a root hold related to the initialization of a disk
device, a flurry of GEOM tasting is likely to take place as soon as
the device is initialized and the root hold is released. If we
don't wait for GEOM idle it's easy for vfs_mountroot to "win" the
race and proceed before the root filesystem GEOM is ready.
Reviewed by: imp
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D33592
While the message is technically correct, it's not particularly
helpful in the case where we're only waiting a few ms; this case
occurs frequently on EC2 arm64 instances with CAM initialization
racing to release its root hold before vfs_mountroot reaches this
point. Only print the message if we end up waiting for more than
one second.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D33591
Other ar implementations (GNU, LLVM) use -T to mean thin archive
rather than use only the first fifteen characters of the archive member
name. We support both -T and -f for this, with -f documented as an
alias of -T.
An exp-run showed that the ports invoking `ar -T` expect thin archives,
not truncated names. Switch -f to be the documented flag for this
behaviour, and emit a warning when -T is used.
The warning will be changed to an error in the future (in main), once
ports no longer use -T.
PR: 260523 [exp-run]
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
As with other runtime components like libc or libcxxrt.
If desired we can stop linking devd statically after this change (to
achive approximately no net change in required root filesystem size).
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33123
It is only called in the file that defines it, so make it static and
remove the declaration from the header.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33688
The first time we start bhyve with a passthru device everything is fine
as on boot we do enable BARs. If a driver (unload) inside bhyve disables
the BAR(s) as some Linux drivers do, we need to make sure we re-enable
them on next bhyve start.
If we are trying to mmap a disabled BAR for MSI-X (PCIOCBARMMAP)
the kernel will give us an EBUSY.
While we were re-enabling the BAR(s) in the current code loop
cfginit() was writing the changes out too late to the real hardware.
Move the call to init_msix_table() after the register on the real
hardware was updated. That way the kernel will be happy and the
mmap will succeed and bhyve will start.
Also simplify the code given the last argument to init_msix_table()
is unused we do not need to do checks for each bar. [1]
MFC after: 3 days
PR: 260148
Pointed out by: markj [1]
Sponsored by: The FreeBSD Foundation
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D33628
Match igb(4) as in f7926a6d0c. From Vincenzo, this check is redundant
to setup providing us an IGC_RXD_STAT_VP bit and would make for an
unexpected condition if IFCAP_VLAN_HWTAGGING were not set but the tag
was stripped, which would be passed up the stack breaking isolation.
PR: 260068
Approved by: vmaffione
MFC after: 1 month
This will always sleep at least once, so it's a slow path by definition.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33387
Summary: It's currently just as stable as powerpc64, with more ports working.
Reviewers: alfredo, bdragon, luporl, jhibbits, #manpages
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D33610
Functions manipulating mbuf tags are using an int type for passing the
'type' parameter, but the internal tag storage is using a 16bit
integer to store it. This leads to the following code:
t = m_tag_alloc(...,0xffffffff,...,...);
m_tag_prepend(m, t);
r = m_tag_locate(m ,...,0xffffffff, NULL);
Returning r == NULL because m_tag_locate doesn't truncate the type
parameter when doing the match. This is unexpected because the type of
the 'type' parameter is int, and the caller doesn't need to know about
the internal truncations.
Fix this by making the 'type' parameter of type uint16_t in order to
match the size of its internal storage and make it obvious to the
caller the actual size of the parameter.
While there also use uint uniformly replacing the existing u_int
instances.
Reviewed by: kp, donner, glebius
Differential revision: https://reviews.freebsd.org/D33680
sge->ctrlq is not always allocated during attach (eg. if firmware
initialization fails) and detach should be able to deal with this.
MFC after: 1 week
Sponsored by: Chelsio Communications
If a "raw" IPv6 address (denoted by a leading '[') is used as a target
address, then 'arg' is incremented by one to skip over the '['.
However, this meant that at the end of the function the wrong address
was passed to free(). With malloc junking enabled and given suitably
small strings, malloc() would happily overwrite the correct number of
bytes with junk, but off by one byte overwriting the byte after the
allocation.
This manifested as the first byte of the 'HeaderDigest' key being
overwritten causing the key name on the wire to be sent as
'\x5eaderDigest' which the target rejected.
Reported by: Jithesh Arakkan @ Chelsio
Found with: ASAN (via WITH_ASAN=yes)
Sponsored by: Chelsio Communications
Reported and tested by: Michael Butler <imb@protected-networks.net>
Reviewed by: jhb, kib
Fixes: 62d09b46ad ("x86: Defer LAPIC calibration until after timecounters are available")
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33669
We only attempt to gracefully handle absence of an APIC if "device
atpic" is defined in the kernel configuration.
Suggested by: kib
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447