++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CHANGES
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version : 1.26.6.0
Date : 01/03/2022
================================================================================
Fixes
-----
BASE:
- Fixed one module eeprom read failure.
- Fixed an issue with speed selection when 40G and 25G are advertised and
supported.
- Fixed a random traffic hang when T5 receives invalid ets BW in dcbx
messages from a switch.
- Fixed very long link up time with few switches.
================================================================================
Obtained from: Chelsio Communications
MFC after: 2 weeks
Sponsored by: Chelsio Communications
This fixes a driver panic during stats collection when a port's id does
not match its tx channel. The bug affected only the T580 card running
with a non-default VPD.
Reported by: Suhas Lokesha @ Chelsio
MFC after: 1 week
Sponsored by: Chelsio Communications
A feature of arm64's instruction for TLB invalidation is the ability
to determine whether cached intermediate entries, i.e., L{0,1,2}_TABLE
entries, are invalidated in addition to the final entry, e.g., an
L3_PAGE entry.
Update pmap_invalidate_{page,range}() to support both types of
invalidation, allowing the caller to determine which type of
invalidation is performed.
Update the callers to request the appropriate type of invalidation.
Eliminate redundant TLB invalidations in pmap_abort_ptp() and
pmap_remove_l3_range().
Add a comment to pmap_invalidate_all() making clear that it always
invalidates entries at all levels.
As expected, these changes result in a tiny yet measurable
performance improvement.
Reviewed by: kib, markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D33705
Provide structure inpcbstorage, that holds zones and lock names for
a protocol. Initialize it with global protocol init using macro
INPCBSTORAGE_DEFINE(). Then, at VNET protocol init supply it as
the main argument to the in_pcbinfo_init(). Each VNET pcbinfo uses
its private hash, but they all use same zone to allocate and SMR
section to synchronize.
Note: there is kern.ipc.maxsockets sysctl, which controls UMA limit
on the socket zone, which was always global. Historically same
maxsockets value is applied also to every PCB zone. Important fact:
you can't create a pcb without a socket! A pcb may outlive its socket,
however. Given that there are multiple protocols, and only one socket
zone, the per pcb zone limits seem to have little value. Under very
special conditions it may trigger a little bit earlier than socket zone
limit, but in most setups the socket zone limit will be triggered
earlier. When VIMAGE was added to the kernel PCB zones became per-VNET.
This magnified existing disbalance further: now we have multiple pcb
zones in multiple vnets limited to maxsockets, but every pcb requires a
socket allocated from the global zone also limited by maxsockets.
IMHO, this per pcb zone limit doesn't bring any value, so this patch
drops it. If anybody explains value of this limit, it can be restored
very easy - just 2 lines change to in_pcbstorage_init().
Differential revision: https://reviews.freebsd.org/D33542
Now that each module handles its global and VNET initialization
itself, there is no VNET related stuff left to do in domain_init().
Differential revision: https://reviews.freebsd.org/D33541
There left only three modules that used dom_init(). And netipsec
was the last one to use dom_destroy().
Differential revision: https://reviews.freebsd.org/D33540
The function now modifies pr_usrreqs only, which are always
global. Rename it to pr_usrreqs_init().
Differential revision: https://reviews.freebsd.org/D33538
The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.
Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33537
largepage_mprotect maps a superpage and later extends the mapping. This
occasionally fails with ASLR disabled. To fix this, first try to
reserve a sufficiently large virtual address region.
Reported by: Jenkins
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Some AMD Geode-based systems end up using the 8254 PIT to calibrate the
TSC during late calibration, which doesn't work because that
timecounter's mask (65535) is much smaller than its frequency (1193182).
Moreover, early calibration is done against the 8254 timer anyway.
Work around the problem by simply using early calibration results if no
high-quality timecounters exist.
PR: 260868
Fixes: 22875f88799e ("x86: Implement deferred TSC calibration")
Reported and tested by: mike@sentex.net, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
Reviewed by: imp, kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33730
Currently, any errors when adding a PIC child handler are ignored,
instead just continuing on to registering that PIC as an MSI, and
ignoring any errors that occur for that too.
Reviewed by: andrew
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33342
Currently intr_pic_add_handler either returns the PIC you gave it (which
is useless and risks causing confusion about whether it's creating
another PIC) or, on error, NULL. Instead, convert it to return an int
error code as one would expect.
Note that the only consumer of this API, arm64's gicv3_its, does not use
the return value, so no uses need updating to work with the revised API.
Reviewed by: markj, mmel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33341
In previous versions of BSD ar -T was an alias for -f -- use only the
first 15 characters of archive member names. In GNU ar and LLVM ar -T
creates a thin archive.
The -f / old BSD ar -T functionality is not particularly useful, and
ignoring -T still results in a usable and compatible (but not thin)
archive.
An exp-run found a few ports invoking ar -T but they all expect thin
archives. In addition, -T will be used to specify thin archives after
a migration to LLVM-ar.
PR: 260523 [exp-run]
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33553
Bhyve allocates small 64 bit BARs below 4 GB and generates ACPI tables
based on this allocation. If the guest decides to relocate those BARs
above 4 GB, it could lead to mismatching ACPI tables. Especially
when using OVMF with enabled bus enumeration it could cause
issues. OVMF relocates all 64 bit BARs above 4 GB. The guest OS
may be unable to recover from this situation and disables some PCI
devices because their BARs are located outside of the MMIO space
reported by ACPI. Avoid this situation by giving the guest more
space for relocating BARs.
Let's be paranoid. The available space for BARs below 4 GB is 512 MB
large. Use a slop of 512 MB. It'll allow the guest to relocate all
BARs below 4 GB to an address above 4 GB. We could run into issues
when we exceeding the memlimit above 4 GB. However, this space has
a size of 32 GB. Even when using many PCI device with large BARs
like framebuffer or when using multiple PCI busses, it's very
unlikely that we run out of space due to the large slop.
Additionally, this situation will occur on startup and not at runtime
which is much better.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D33118
At the moment, you only have one single chance to read the fwctl
signature. At boot bhyve is in the state IDENT_WAIT. It's then
possible to switch to IDENT_SEND. After bhyve sends the signature,
it switches to REQ. From now on it's impossible to switch back to
IDENT_SEND to read the signature. For that reason, only a single
driver can read the signature. A guest can't use two drivers to
identify that fwctl is present. It gets even worse when using
OVMF. OVMF uses a library to access fwctl. Therefore, every single
OVMF driver would try to read the signature. Currently, only a
single OVMF driver accesses the fwctl. So, there's no issue with
it yet. However, no OS driver would have a chance to detect fwctl when
using OVMF because it's signature was already consumed by OVMF.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D31981
E.g. Framebuffers can require large space and BARs need to be aligned
by their size. If BARs aren't allocated by size, it'll cause much
fragmentation of the MMIO space. Reduce fragmentation by ordering
the BAR allocation on their size to reduce the risk of
OUT_OF_MMIO_SPACE issues.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D28278
Some passthru devices only support MSI instead of MSI-X. For those
devices the initialization of MSI-X table will fail. Re-add the
check erroneously removed in f1442847c9404d4bc5f5524a0c3362dd39cb14f9.
MFC after: 3 days
X-MFC with: f1442847c9404d4bc5f5524a0c3362dd39cb14f9
PR: 260148
Reviewed by: manu, bz
Differential Revision: https://reviews.freebsd.org/D33728
Remove vestiges of arm big endian support. Also use the more proper
MACHINE_CPUARCH instead of MACHINE to test for that here.
This leaves powerpc as the only big endian arch.
Sponsored by: Netflix
Mips had a number of special cases that disabled features that didn't
work. Remove them all. However, retain the llvm mips bits because that
requires a lot more effort to unwind and will be done separately.
Sponsored by: Netflix
Remove the tweaks to the compiler, as well as additional command line
args to get the proper endian, word size and floating style.
Sponsored by: Netflix
Extend the dnctl (dummynet config) tool to be able to read commands from
a file, just like ipfw already does.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33627
The implementation simply passes the text ref to the appropriate
underlying vnode. Without this, the default [un]set_text
implementation will only manage the text ref on the unionfs vnode,
causing it to be out of sync with the underlying filesystems and
potentially allowing corruption of executable file contents.
On INVARIANTS kernels, it also readily produces a panic on process
termination because the VM object representing the executable mapping
is backed by the underlying vnode, not the unionfs vnode.
PR: 251342
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33611
Use atomics to track the writecount granted to the underlying FS,
and avoid holding the vnode interlock while calling the underling FS'
VOP_ADD_WRITECOUNT(). This also fixes a WITNESS warning about nesting
the same lock type. Also add comments explaining why we need to track
the writecount on the unionfs vnode in the first place. Finally,
simplify writecount management to only use the upper vnode and assert
that we shouldn't have an active writecount on the lower vnode through
unionfs.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33611
OpenSSH has dropped libwrap support in OpenSSH 6.7p in 2014
(f2719b7c in github.com/openssh/openssh-portable) and we
maintain the patch ourselves since 2016 (a0ee8cc636cd).
Over the years, the libwrap support has deteriotated and probably
that was reason for removal upstream. Original idea of libwrap was
to drop illegitimate connection as soon as possible, but over the
years the code was pushed further down and down and ended in the
forked client connection handler.
The negative effects of late dropping is increasing attack surface
for hosts that are to be dropped anyway. Apart from hypothetical
future vulnerabilities in connection handling, today a malicious
host listed in /etc/hosts.allow still can trigger sshd to enter
connection throttling mode, which is enabled by default (see
MaxStartups in sshd_config(5)), effectively casting DoS attack.
Note that on OpenBSD this attack isn't possible, since they enable
MaxStartups together with UseBlacklist.
A only negative effect from early drop, that I can imagine, is that
now main listener parses file in /etc, and if our root filesystems
goes bad, it would get stuck. But unlikely you'd be able to login
in that case anyway.
Implementation details:
- For brevity we reuse the same struct request_info. This isn't
a documented feature of libwrap, but code review, viewing data
in a debugger and real life testing shows that if we clear
RQ_CLIENT_NAME and RQ_CLIENT_ADDR every time, it works as intended.
- We set SO_LINGER on the socket to force immediate connection reset.
- We log message exactly as libwrap's refuse() would do.
Differential revision: https://reviews.freebsd.org/D33044
in handling the cpuset sizes different from sizeof(cpuset_t).
For both cases, cpuset size shorter than sizeof(cpuset_t) results
in EINVAL on Linux.
For sched_getaffinity(), be more permissive and accept cpuset size
larger than our cpuset_t, by clipping the syscall argument and zeroing
the rest of the output buffer. For sched_setaffinity(), we should allow
shorter cpusets than current ABI size, again zeroing the rest of the bits.
With this change, python os.sched_get/setaffinity functions work.
Reported by: se
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The MINIMUM_SUPPORTED_OSREL is 1002501 (FreeBSD 10.3), and xlocale is
supported there.
While I'm there, explicitly use config.h generated with --disable-bzlib
--disable-xzlib instead of deleting them manually.
MFC after: 2 weeks
With crafted input to the G_GATE_CMD_CREATE ioctl, geom_gate can be made
to print kernel memory to the system console, potentially revealing
sensitive data from whatever was previously in that memory page.
But but but: this is a case of the sys admin misconfiguring, and you'd
need root privileges to do this.
Submitted By: Johannes Totz <jo@bruelltuete.com>
MFC after: 2 weeks
Reviewed By: asomers
Differential Revision: https://reviews.freebsd.org/D31727
If I'm not mistaken, the underlying sendmsg() for nvlist_send() is
failing with ENOBUFS. In turn, nvlist_recv() returns NULL because it
didn't receive the expected number of file descriptors.
Adjusting net.local.dgram.recvspace worked on my local machine, but on
CI the test still fails consistently.
PR: 260891
An earlier version of this code computed the TSC frequency in kHz.
When the code was changed to compute the frequency more accurately,
the variable name was not updated.
Reviewed by: markj
Fixes: 22875f88799e x86: Implement deferred TSC calibration
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33696
It's possible that the "early" TSC calibration gave us a value which
is known to be exact; in that case, skip the later re-calibration.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33695
The code here tries to be smart and zeroes out both di_db and di_ib with
a single bzero call, thereby overrunning the di_db subobject. This is
fine on most architectures, if a little dodgy. However, on CHERI, the
compiler can optionally restrict the bounds on pointers to subobjects to
just that subobject, in order to mitigate intra-object buffer overflows,
and this is enabled in CheriBSD's pure-capability kernels.
Instead, use separate bzero calls for each array, and let the compiler
optimise it as it sees fit; even if it's not generating inline zeroing
code, Clang will happily optimise two consecutive bzero's to a single
larger call.
Reviewed by: mckusick
Differential Revision: https://reviews.freebsd.org/D33651
Shortlinks occupy the space of both di_db and di_ib when used. However,
everywhere that wants to read or write a shortlink takes a pointer do
di_db and promptly runs off the end of it into di_ib. This is fine on
most architectures, if a little dodgy. However, on CHERI, the compiler
can optionally restrict the bounds on pointers to subobjects to just
that subobject, in order to mitigate intra-object buffer overflows, and
this is enabled in CheriBSD's pure-capability kernels.
Instead, clean this up by inserting a union such that a new di_shortlink
can be added with the right size and element type, avoiding the need to
cast and allowing the use of the DIP macro to access the field. This
also mirrors how the ext2fs code implements extents support, with the
exact same structure other than having a uint32_t i_data[] instead of a
char di_shortlink[].
Reviewed by: mckusick, jhb
Differential Revision: https://reviews.freebsd.org/D33650