Per the i2c spec, a slave device can stretch SCL idefinitely, so 25ms is
a bit arbitrary in general. smbus does specify an optional timeout
recovery mechanism to be done at about 25~35ms, but the IPMI SSIF spec
says that BMCs don't have any obligation to implement that.
The BMC on Altra seems to mostly respond within 25ms, but occasionally
will stretch SCL for ~300 msec.
Also, the count_us mechanism seems to actually timeout around 25%
earlier than it would claim (timeout really happening around 19ms
instead of 25ms).
Sponsored by: Ampere Computing LLC
Submitted by: Klara Inc.
Reviewed by: manu, imp
Differential Revision: https://reviews.freebsd.org/D28747
Notable upstream changes:
bf156c966 Remove unused abd_alloc_scatter_offset_chunkcnt
658fb8020 Add "compatibility" property for zpool feature sets
This update introduces a new pool property called "compatibility"
that can be used to enable a limited set of pool features on pool
creation and "stick" to it, so the "zpool upgrade" does not
accidentally enable features that are not desired. The value of
this property may then be changed later.
See zpool-features(5) for more information about the "compatibility"
pool property.
Obtained from: OpenZFS
MFC after: 2 weeks
KCSAN complains about racy accesses in the locking code. Those races are
fine since they are inside a TD_SET_RUNNING() loop that expects the value
to be changed by another CPU.
Use relaxed atomic stores/loads to indicate that this variable can be
written/read by multiple CPUs at the same time. This will also prevent
the compiler from doing unexpected re-ordering.
Reported by: GENERIC-KCSAN
Test Plan: KCSAN no longer complains, kernel still runs fine.
Reviewed By: markj, mjg (earlier version)
Differential Revision: https://reviews.freebsd.org/D28569
In the new ENA-based instances like c6gn, the vector table moved to a
new PCIe bar - BAR1. Previously it was always located on the BAR0, so
the resources were already allocated together with the registers.
As the FreeBSD isn't doing any resource allocation behind the scenes,
the driver is responsible to allocate them explicitly, before other
parts of the OS (like the PCI code allocating MSIx) will be able to
access them.
To determine dynamically BAR on which the MSIx vector table is present
the pci_msix_table_bar() is being used and the new BAR is allocated if
needed.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 3 days
Read the PF-only hardware settings directly in get_params__post_init.
Split the rest into two routines used by both the PF and VF drivers: one
that reads the SGE rx buffer configuration and another that verifies
miscellaneous hardware configuration.
MFC after: 1 week
Sponsored by: Chelsio Communications
__NO_TLS was originally added to disable use of _Thread in the locale
code in libc in 82dd5016bd. At the time
libc did not support TLS on MIPS (I believe), but TLS support was
added to libc (at least _set_tp.c) for MIPS about a month after
__NO_TLS was added, but __NO_TLS was still left around.
Reviewed by: imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28713
__NO_TLS was originally added to disable use of _Thread in the locale
code in libc in 82dd5016bd. The initial
RISC-V import set this for RISC-V presumably due to immaturity in the
toolchains at the time. However, TLS via _Thread works fine in both
GCC and clang on RISC-V.
Reviewed by: mhorne, imp
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28712
This macro returns true if a provided virtual address is contained
in the kernel's clean submap.
In CHERI kernels, the buffer cache and transient I/O map are allocated
as separate regions. Abstracting this check reduces the diff relative
to FreeBSD. It is perhaps slightly more readable as well.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D28710
or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking
exclusive open is in progress, and not allowing F_LOCK request to make
a progress until the first open finishes.
Requested by: mckusick
Reviewed by: markj, mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28697
This updates r311987/fb1d9b7f4113d which allowed any number of vectors to be
used. Since we're just attaching one instance, the meaning of more than one
vector is not clear and seems to cause problems. Fall back to old methods for
these cards.
PR: 235016
Submitted by: David Cross
ACPI Sec 5.2.16.5 (SRAT, GIC Interrupt Translation Service (ITS)
Affinity Structure) says:
> The GIC ITS Affinity Structure provides the association between
> a GIC ITS and a proximity domain. This enables the OSPM to
> discover the memory that is closest to the ITS, and use that in
> allocating its management tables and command queue.
Previously the ITS driver was using the proximity domain to
restrict which CPUs can be targeted by an LPI. We keep that logic
just for the original dual socket ThunderX which cannot forward
LPIs between sockets.
We also use the SRAT entry for its intended purpose of attempting
to allocate ITS table structures near the ITS.
Reviewed by: andrew
Sponsored by: Ampere Computing LLC
Differential Revision: https://reviews.freebsd.org/D28340
These errors do not clear so to NULL, so the existing check was
treating these failures as success. The rest of do_pass_establish()
then tried to use the listen socket as if it was a connection socket
newly created by syncache_expand().
In addition, for negative return values, do not send a RST to the
peer.
Reported by: Sony Arpita Das @ Chelsio
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28243
The fallback for __align_up() used by roundup2() uses __typeof__()
which doesn't work for bitfields. This fixes the build on GCC which
uses the fallback.
Reviewed by: arichardson, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D28599
This follows the behavior on x86 where edge triggered interrupts are
not disabled when executing the handler. Because the ITS is a shared
resource, contention for the command queue lock can be substantial.
Suggested by: gallatin
Reviewed by: andrew
Tested by: gallatin
Sponsored by: Ampere Computing LLC
Differential Revision: https://reviews.freebsd.org/D28709
The motivation is to provide access to these registers from userspace
via ptrace(2) requests PT_GETDBREGS and PT_SETDBREGS.
This change breaks the ABI of these particular requests, but is
justified by the fact that the intended consumers (debuggers) have not
been taught to use them yet. Making this change now enables active
upstream work on lldb to begin using this interface, and take advantage
of the hardware debugging registers available on the platform.
PR: 252860
Reported by: Michał Górny (mgorny@gentoo.org)
Reviewed by: andrew, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28415
This is a prerequisite to allowing the use of hardware watchpoints for
userspace debuggers.
This is also a slight departure from the x86 behaviour, since `si_addr`
returns the data address that triggered the watchpoint, not the
address of the instruction that was executed. Otherwise, there is no
straightforward way for the application to determine which watchpoint
was triggered. Make a note of this in the siginfo(3) man page.
Reviewed by: jhb, markj (earlier version)
Tested by: Michał Górny (mgorny@gentoo.org)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28561
In particular, we want to disallow setting breakpoints on kernel
addresses from userspace. The control register fields are validated or
ignored as appropriate.
Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28560
a further CPU enhancements for compressed acks. These
are acks that are compressed into an mbuf. The transport
has to be aware of how to process these, and an upcoming
update to rack will do so. You need the rack changes
to actually test and validate these since if the transport
does not support mbuf compression, then the old code paths
stay in place. We do in this commit take out the concept
of logging if you don't have a lock (which was quite
dangerous and was only for some early debugging but has
been left in the code).
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D28374
These should only fail if we use them incorrectly, so assert that they
succeed.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (“Netgate”’)
These functions always return 0, which is good, because the code calling
them doesn't handle this error gracefully.
As the functions always succeed remove their return value, and the code
handling their errors (because it was never executed anyway).
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (“Netgate”’)
When refill_fl() fails to allocate large (9/16KB) mbuf cluster, it
falls back to safe (4KB) ones. But it still saved into sd->zidx
the original fl->zidx instead of fl->safe_zidx. It caused problems
with the later use of that cluster, including memory and/or data
corruption.
While there, make refill_fl() to use the safe zone for all following
clusters for the call, since it is unlikely that large succeed.
MFC after: 3 days
Sponsored by: iXsystems, Inc.
Reviewed by: np, jhb
Differential Revision: https://reviews.freebsd.org/D28716
Traditionally routing socket code did almost zero checks on
the input message except for the most basic size checks.
This resulted in the unclear KPI boundary for the routing system code
(`rtrequest*` and now `rib_action()`) w.r.t message validness.
Multiple potential problems and nuances exists:
* Host bits in RTAX_DST sockaddr. Existing applications do send prefixes
with hostbits uncleared. Even `route(8)` does this, as they hope the kernel
would do the job of fixing it. Code inside `rib_action()` needs to handle
it on its own (see `rt_maskedcopy()` ugly hack).
* There are multiple way of adding the host route: it can be DST without
netmask or DST with /32(/128) netmask. Also, RTF_HOST has to be set correspondingly.
Currently, these 2 options create 2 DIFFERENT routes in the kernel.
* no sockaddr length/content checking for the "secondary" fields exists: nothing
stops rtsock application to send sockaddr_in with length of 25 (instead of 16).
Kernel will accept it, install to RIB as is and propagate to all rtsock consumers,
potentially triggering bugs in their code. Same goes for sin_port, sin_zero, etc.
The goal of this change is to make rtsock verify all sockaddr and prefix consistency.
Said differently, `rib_action()` or internals should NOT require to change any of the
sockaddrs supplied by `rt_addrinfo` structure due to incorrectness.
To be more specific, this change implements the following:
* sockaddr cleanup/validation check is added immediately after getting sockaddrs from rtm.
* Per-family dst/netmask checks clears host bits in dst and zeros all dst/netmask "secondary" fields.
* The same netmask checking code converts /32(/128) netmasks to "host" route case
(NULL netmask, RTF_HOST), removing the dualism.
* Instead of allowing ANY "known" sockaddr families (0<..<AF_MAX), allow only actually
supported ones (inet, inet6, link).
* Automatically convert `sockaddr_sdl` (AF_LINK) gateways to
`sockaddr_sdl_short`.
Reported by: Guy Yur <guyyur at gmail.com>
Reviewed By: donner
Differential Revision: https://reviews.freebsd.org/D28668
MFC after: 3 days
More and more code migrates from lock-based protection to the NET_EPOCH
umbrella. It requires some logic changes, including, notably, refcount
handling.
When we have an `ifa` pointer and we're running inside epoch we're
guaranteed that this pointer will not be freed.
However, the following case can still happen:
* in thread 1 we drop to 0 refcount for ifa and schedule its deletion.
* in thread 2 we use this ifa and reference it
* destroy callout kicks in
* unhappy user reports bug
To address it, new `ifa_try_ref()` function is added, allowing to return
failure when we try to reference `ifa` with 0 refcount.
Additionally, existing `ifa_ref()` is enforced with `KASSERT` to provide
cleaner error in such scenarious.
Reviewed By: rstone, donner
Differential Revision: https://reviews.freebsd.org/D28639
MFC after: 1 week
It fixes loopback route installation for the interfaces
in the different fibs using the same prefix.
Reviewed By: donner
PR: 189088
Differential Revision: https://reviews.freebsd.org/D28673
MFC after: 1 week
jail_remove(2) includes a loop that sends SIGKILL to all processes
in a jail, but skips processes in PRS_NEW state. Thus it is possible
the a process in mid-fork(2) during jail removal can survive the jail
being removed.
Add a prison flag PR_REMOVE, which is checked before the new process
returns. If the jail is being removed, the process will then exit.
Also check this flag in jail_attach(2) which has a similar issue.
Reported by: trasz
Approved by: kib
MFC after: 3 days
iflib_init_locked() assumes that iflib_stop() has been called, however,
it is not called for media changes.
iflib_if_init_locked() calls stop then init, so fixes the problem.
PR: 253473
MFC after: 3 days
Reviewed by: markj
Sponsored by: Juniper Networks, Inc., Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D28667
linux_shared_page_init() creates an object and grabs and maps a single
page to back the VDSO. When destroying the VDSO object, we failed to
destroy the mapping and free KVA. Fix this.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28696
FreeBSD when running as a dom0 under Xen is not supposed to access the
run time services directly, and instead should proxy the calls through
Xen using an hypercall interface that exposes access to selected run
time services.
Implement the efirt interface on top of the Xen provided hypercalls.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D28621
Introduce a set of hooks for MI EFI public functions, so that a new
implementation can be done. This will be used to implement the Xen PV
EFI interface that's used when running FreeBSD as a Xen dom0 from UEFI
firmware. Also make the efi_status_to_errno non-static since it will
be used to evaluate status return values from the PV interface.
No functional change indented.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, imp
Differential revision: https://reviews.freebsd.org/D28620
Allow setting the bootmethod variable from the Xen PVH entry point, in
order to be able to correctly set the underlying firmware mode when
booted as a dom0.
Move the bootmethod variable to be defined in x86/cpu_machdep.c
instead so it can be shared by both i386 and amd64.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D28619
Add some basic multiboot2 infrastructure to the EFI loader in order to
be capable of booting a FreeBSD/Xen dom0 when booted from UEFI.
Only a very limited subset of the multiboot2 protocol is implemented
in order to support enough to boot into Xen, the implementation
doesn't intend to be a full multiboot2 capable implementation.
Such multiboot2 functionality is hooked up into the amd64 EFI loader,
which is the only architecture that supports Xen dom0 on FreeBSD.
The options to boot a FreeBSD/Xen dom0 system are exactly the same as
on BIOS, and requires setting the xen_kernel and xen_cmdline options
in loader.conf.
Sponsored by: Citrix Systems R&D
Reviewed by: tsoome, imp
Differential revision: https://reviews.freebsd.org/D28497
- improved pipe calculation which does not degrade under heavy loss
- engaging in Loss Recovery earlier under adverse conditions
- Rescue Retransmission in case some of the trailing packets of a request got lost
All above changes are toggled with the sysctl "rfc6675_pipe" (disabled by default).
Reviewers: #transport, tuexen, lstewart, slavash, jtl, hselasky, kib, rgrimes, chengc_netapp.com, thj, #manpages, kbowling, #netapp, rscheff
Reviewed By: #transport
Subscribers: imp, melifaro
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D18985
Citing Kirk:
The previous code [before 8563de2f27 -- kib] did not call
vnode_pager_setsize() but worked because later in ffs_snapshot() it
does a UFS_WRITE() to output the snaplist. Previously the UFS_WRITE()
allocated the extra block at the end of the file which caused it to do
the needed vnode_pager_setsize(). But the new code had already allocated
the extra block, so UFS_WRITE() did not extend the size and thus did not
do the vnode_pager_setsize().
PR: 253158
Reported by: Harald Schmalzbauer <bugzilla.freebsd@omnilan.de>
Reviewed by: mckusick
Tested by: cy
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If uio_offset is past end of the object size, calculated resid is negative.
Delegate handling this case to the locked read, as any other non-trivial
situation.
PR: 253158
Reported by: Harald Schmalzbauer <bugzilla.freebsd@omnilan.de>
Tested by: cy
Sponsored by: The FreeBSD Foundation
MFC after: 1 week