* Zero gw_sdl if switching to interface route - the assumption
that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.
Reported by: vangyzen
- Move assertions out of the main loop to avoid duplicate conditional
expressions, and improve assertion messages.
- Fix va_next updates. In some cases we were not doing the wraparound
check before continuing the loop.
- Use the right va_next. In pmap_advise() and pmap_copy() we would step
through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().
Reviewed by: alc, kib
MFC with: r365518
Sponsored by: Juniper Networks, Inc., Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26463
Due to a HW bug in the RockChip PCIe implementation, attempting to access
a non-existent register in the configuration space will throw an exception.
Use new bus functions bus_peek() and bus_poke() to overcomme this limitation.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.
Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.
This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25371
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures in nfsrv_checksequence(). This was fixed by r365789.
A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called
while mutexes are held.
This patch applies a fix similar to r365789, moving the SVC_RELEASE() call
down to after the mutexes are released.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.
MFC after: 1 week
Fix the logic so it works as it appears.
Reported by: Coverity
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: D26211 (in progress, so omitting full URL)
vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.
Reported by: Coverity
Reviewed by: kib markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26214
Move the initialization of these variables to the beginning of their
respective functions.
On our end this creates a small amount of unneeded churn, as these
variables are properly initialized before their first use in all cases.
However, changing this benefits at least one downstream consumer
(NetApp) by allowing local and future modifications to these functions
to be made without worrying about where the initialization occurs.
Reviewed by: melifaro, rscheff
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26454
Hardware assistance includes checksumming (tx and rx), TSO, and RSS on
the inner traffic in a VXLAN tunnel.
Relnotes: Yes
Sponsored by: Chelsio Communications
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx
and rx), TSO, and RSS if the NIC is capable of performing these operations on
inner VXLAN traffic.
A VXLAN interface inherits the capabilities of its vxlandev interface if one is
specified or of the interface that hosts the vxlanlocal address. If other
interfaces will carry traffic for that VXLAN then they must have the same
hardware capabilities.
On transmit, if_vxlan verifies that the outbound interface has the required
capabilities and then translates the CSUM_ flags to their inner equivalents.
This tells the hardware ifnet that it needs to operate on the inner frame and
not the outer VXLAN headers.
An event is generated when a VXLAN ifnet starts. This allows hardware drivers to
configure their devices to expect VXLAN traffic on the specified incoming port.
On receive, the hardware does RSS and checksum verification on the inner frame.
if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is
not very clear why it didn't do this already.
Future work:
Rx: it should be possible to avoid the first trip up the protocol stack to get
the frame to if_vxlan just so it can decapsulate and requeue for a second trip
up the stack. The hardware NIC driver could directly call an if_vxlan receive
routine for VXLAN traffic instead.
Rx: LRO. depends on what happens with the previous item. There will have to to
be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.
Reviewed by: kib@
Relnotes: Yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25873
This will be used by some upcoming changes to if_vxlan(4). RFC 7348 (VXLAN)
says that the UDP checksum "SHOULD be transmitted as zero. When a packet is
received with a UDP checksum of zero, it MUST be accepted for decapsulation."
But the original IPv6 RFCs did not allow zero UDP checksum. RFC 6935 attempts
to resolve this.
Reviewed by: kib@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25873
These are similar to the existing VLAN capabilities.
Reviewed by: kib@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25873
These are being added to support VXLAN but will work for GENEVE as well.
ENCAP_RSVD1 will likely become ENCAP_GENEVE in the future.
The size of struct mbuf does not change and that means this change can be MFC'd.
If size wasn't a constraint a cleaner way may have been to add inner_csum_flags
and inner_csum_data to go with csum_flags and csum_data.
Reviewed by: kib@
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25873
Change the zone setup:
- Allow slabs to be returned to the OS
- Set the number of slots to the max devctl will queue before discarding
- Reserve 2% of the max (capped at 100) for low memory allocations
- Disable per-cpu caching since we don't need it and we avoid some pathologies
Change the alloation strategiy a bit:
- If a normal allocation fails, try to get the reserve
- If a reserve allocation fails, re-use the oldest-queued entry for storage
- If there's a weird race/failure and nothing on the queue to steal, return NULL
This addresses two main issues in the old code:
- If devd had died, and we're generating a lot of messages, we have an
unbounded leak. This new scheme avoids the issue that lead to this.
- The MPASS that was 'sure' the allocation couldn't have failed turned out
to be wrong in some rare cases. The new code doesn't make this assumption.
Since we reserve only 2% of the space, we go from about 1MB of
allocation all the time to more like 50kB for the reserve.
Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26448
Since r347532 (merged to stable/12) we only count user-wired pages
towards the system limit. However, we now also treat pages wired by
hypervisors (bhyve and virtualbox) as user-wired, so starting VMs with
large amounts of RAM tends to fail due to the low limit.
The purpose of the limit is to provide a seatbelt, not to impose some
policy on the use of wired memory. Thus, increase the default limit to
allow reasonable VM configurations to work without tuning.
Reviewed by: kib
Discussed with: dougm
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26424
This is followup to r365477.
If pre-formatted device has GPT and a partition covering
last available LBAs and the device is attached using
a bridge reducing amount of LBAs, then it could be not enough
forcing GEOM to use primary GPT. Also, we should make it possible
to recover GPT and this requires either deleting or resizing the partition.
This change enables "gpart delete" and "gpart resize" commands
on corrupted GPT with following "gpart recover".
It still does not allow modifying corrupted GPT without
preliminary setting sysctl kern.geom.part.check_integrity=0
For example:
# gpart show da0
=> 34 3906963389 da0 GPT (1.8T) [CORRUPT]
34 262144 1 ms-reserved (128M)
262178 2014 - free - (1.0M)
264192 3906764943 2 freebsd-swap (1.8T)
# gpart resize -i 2 -s 3900000000 da0
# gpart recover da0
Reported by: Alex Korchmar
MFC after: 3 days
Both before and after job control adjustments.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26416
Orphans affect job control state, we must account for them when
changing pg_jobc.
Instead of p_pptr, use proc_realparent() to get parent relevant for
job control.
Use correct calculation of the parent for exiting process. For jobc
purposes, we must use realparent, but if it is also exiting, we should
fall to reaper, then recursively find non-exiting reaper.
Reported by: trasz
PR: 249257
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26416
Use ddb pager.
Make lines more compact.
Eliminate unneeded casts.
Print more job-control related info when reporting process group.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26416
Split TMPFS_NODE_ACCCESSED bit into dedicated byte that can be updated
atomically without locks or (locked) atomics.
tn_update_getattr() change also contains unrelated bug fix.
Reported by: lwhsu
PR: 249362
Reviewed by: markj (previous version)
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26451
As with .text, the aim is to ensure that executable sections are
segregated from the rest, to avoid creation of writeable and executable
mappings. Recent versions of LLVM emit a PLT in firmware modules.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26444
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures.
The code in nfsrv_checksequence() would call SVC_RELEASE() with the mutex
held. Normally this is ok, since all that happens is SVC_RELEASE()
decrements a reference count. However, if the socket has just been shut
down, SVC_RELEASE() drops the reference count to 0 and acquires a sleep
lock during destruction of the server side krpc structure.
This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.
MFC after: 1 week
Or it could be explained as lockless (for vnode lock) reads. Reads
are performed from the node tn_obj object. Tmpfs regular vnode object
lifecycle is significantly different from the normal OBJT_VNODE: it is
alive as far as ref_count > 0.
Ensure liveness of the tmpfs VREG node and consequently v_object
inside VOP_READ_PGCACHE by referencing tmpfs node in tmpfs_open().
Provide custom tmpfs fo_close() method on file, to ensure that close
is paired with open.
Add tmpfs VOP_READ_PGCACHE that takes advantage of all tmpfs quirks.
It is quite cheap in code size sense to support page-ins for read for
tmpfs even if we do not own tmpfs vnode lock. Also, we can handle
holes in tmpfs node without additional efforts, and do not have
limitation of the transfer size.
Reviewed by: markj
Discussed with and benchmarked by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346
Avoid tmpfs mount and node locks when ref count is greater than zero,
which is the case until node is being destroyed by unlink or unmount.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346
There are several negative side-effects of not calling into VOP layer
at all for page cache reads. The biggest is the missed activation of
EVFILT_READ knotes.
Also, it allows filesystem to make more fine grained decision to
refuse read from page cache.
Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for
asserts.
Reviewed by: markj
Tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346
The pointer to vnode is already stored into f_vnode, so f_data can be
reused. Fix all found users of f_data for DTYPE_VNODE.
Provide finit_vnode() helper to initialize file of DTYPE_VNODE type.
Reviewed by: markj (previous version)
Discussed with: freqlabs (openzfs chunk)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346
From Franco:
The iflib rewrite forced the promisc flag but it was not reported
to the system. Noticed on a stock VM that went into unsolicited
promisc mode when dhclient was started during bootup.
PR: 248869
Submitted by: Franco Fichtner <franco@opnsense.org>
Reviewed by: erj@
MFC after: 3 days
This re-adds the opt_rss.h header to the driver and includes some
RSS-specific headers when RSS is defined.
PR: 249191
Submitted by: Milosz Kaniewski <milosz.kaniewski@gmail.com>
MFC after: 3 days
Due to a check that should have been an endian check being an #if 0,
the wrong checksum mask table was being used on LE, which was causing
extreme strangeness in DNS resolution -- *some* hosts would be resolvable,
but most would not.
This fixes DNS resolution.
(I am committing some parts of the LE patchset ahead of time to reduce the
amount of work I have to do while committing the main patchset.)
Sponsored by: Tag1 Consulting, Inc.
Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an
SVM instruction. Otherwise, SVM allows execution of them, and instructions
operate on host physical addresses despite being executed in guest mode.
Reported by: Maxime Villard <max@m00nbsd.net>
admbug: 972
CVE: CVE-2020-7467
Reviewed by: grehan, markj
Differential revision: https://reviews.freebsd.org/D26313
This function wasn't converted to use the new locking protocol in
r333744. Make it use the PCB lock for synchronizing connection state.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26300
unp_pcb_owned_lock2() has some sharp edges and forces callers to deal
with a bunch of cases. Simplify it:
- Rename to unp_pcb_lock_peer().
- Return the connected peer instead of forcing callers to load it
beforehand.
- Handle self-connected sockets.
- In unp_connectat(), just lock the accept socket directly. It should
not be possible for the nascent socket to participate in any other
lock orders.
- Get rid of connect_internal(). It does not provide any useful
checking anymore.
- Block in unp_connectat() when a different thread is concurrently
attempting to lock both sides of a connection. This provides simpler
semantics for callers of unp_pcb_lock_peer().
- Make unp_connectat() return EISCONN if the socket is already
connected. This fixes a race[1] when multiple threads attempt to
connect() to different addresses using the same datagram socket.
Upper layers will disconnect a connected datagram socket before
calling the protocol connect's method, but there is no synchronization
between this and protocol-layer code.
Reported by: syzkaller [1]
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26299
The allocated memory is only required for SOCK_STREAM and SOCK_SEQPACKET
sockets.
Reviewed by: kevans
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26298
In all cases, PCBs are unlocked after unp_disconnect() returns. Since
unp_disconnect() may release the last PCB reference, callers may have to
bump the refcount before the call just so that they can release them
again.
Change unp_disconnect() to release PCB locks as well as connection
references; this lets us remove several refcount manipulations. Tighten
assertions.
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26297
unp_pcb_lock_pair() seems like a better name. Also make it handle the
case where the two sockets are the same instead of making callers do it.
No functional change intended.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26296
- Use refcount_init().
- Define an INVARIANTS-only zone destructor to assert that various
bits of PCB state aren't left dangling.
- Annotate unp_pcb_rele() with __result_use_check.
- Simplify control flow.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26295
- Define a locking key for unpcb members.
- Rewrite some of the locking protocol description to make it less
verbose and avoid referencing some subroutines which will be renamed.
- Reorder includes.
Reviewed by: glebius, kevans, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26294
It's included by header pollution in most of the compile
environments. However, in the standalone envirnment, it's not
included. Go ahead and include it always since the overhead is low and
it is simpler that way.
MFC After: 3 days
When we bring in geli into the boot loader, we are single threaded so
we don't have to worry about locking. We have no mutexes, and don't need
to use them, so comment it out.
MFC After: 3 days
We don't need to do the busy dance for this driver. It's handled by
destroy_dev() entirely. Since all we did was busy/unbusy in
open/close, just delete them. We therefore don't need to track closes
either.
Reviewed by: ian@
Differential Revision: https://reviews.freebsd.org/D26431
in stand.h typically, but when this is included we can define it
multiple times. However, we don't define bool in stand.h at the
moment, so allow it to be defined inside types.h when we're building
for the standalone environment.
MFC After: 3 days
vendor ID string to say just "Microchip Technology" -- the buyout of
Standard Microsystems happened in 2012 and the SMC/SMSC names are pretty
much retired at this point.
PR: 241406
Use MACHINE_CPUARCH with arm64 (aarch64) when we build code that could run
on any 64-bit Arm instruction set. This will simplify checks in downstream
consumers targeting prototype instruction sets.
The only place we check for MACHINE_ARCH == aarch64 is when building the
device tree blobs. As these are targeting current generation ISAs.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D26370
On ibm,extended-clock-frequency, ensure we be64toh() the value.
On clock-frequency, remove the right-shifting hack (which was needed due to
reading a 32 bit value into a 64 bit variable) and switch to OF_getencprop()
for reading (which will handle endian conversion internally.)
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
There are a couple of places in the tree that directly parse the newvers.sh
script looking for the BRANCH variable. I found two locations, one in
release/Makefile and the other in bin/freebsd-version/Makefile.
While there is a good argument that BRANCH_OVERRIDE should properly
propagate in those circumstances and the new behavior is thus better, the
reality is this change broke freebsd-update's ability to find timestamps in
binaries and resulted in a large number of gratuitous changes.
Reported by: freebsd-update
Discussed with: cperciva
MFC after: 1 day
for it to sit in the syscall fast path.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D26368
Add enough infrastructure for interrupts on children of the pl061 GPIO
controller. As gpiobus already provided these the pl061 driver also needs
to pass requests up the newbus hierarchy.
Currently there are no children that expect to configure interrupts, however
this is expected to change to support the ACPI Event Information interface.
Sponsored by: Innovate UK
that can be extended, but also ensure compile-time type checking. Refactor
common code out of arch-specific implementations. Move the mpr and mps
drivers to this new API. The template type remains visible to the consumer
so that it can be allocated on the stack, but should be considered opaque.
The change in D26397 will need a __FreeBSD_version to base off of for
bootstrapping crunchgen, to avoid avoidable build failures just because the
host has an outdated crunchgen.
asomers@ reported a crash on an NFSv4.0 server with a backtrace of:
kdb_backtrace
vpanic
panic
nfsrv_docallback
nfsrv_checkgetattr
nfsrvd_getattr
nfsrvd_dorpc
nfssvc_program
svc_run_internal
svc_thread_start
fork_exit
fork_trampoline
where the panic message was "docallb", which indicates that a callback
was attempted when the ClientID is unconfirmed.
This would not normally occur, but it is possible to have an unconfirmed
ClientID structure with delegation structure(s) chained off it if the
client were to issue a SetClientID with the same "id" but different
"verifier" after acquiring delegations on the previously confirmed ClientID.
The bug appears to be that nfsrv_checkgetattr() failed to check for
this uncommon case of an unconfirmed ClientID with a delegation structure
that no longer refers to a delegation the client knows about.
This patch adds a check for this case, handling it as if no delegation
exists, which is the case when the above occurs.
Although difficult to reproduce, this change should avoid the panic().
PR: 249127
Reported by: asomers
Reviewed by: asomers
MFC after: 1 week
Differential Revision: https://reviews.freebbsd.org/D26342
To make it easier to work with this in the future, convert to c99
designated initializer syntax.
Tested on powerpc, powerpc64, and powerpc64le. No functional change.
Sponsored by: Tag1 Consulting, Inc.
The intention of the bus_be naming was for those to be the no-endian-swapping
and for the bus_le to be endian-swapping in all the functions.
This naming breaks down when we're actually are running in LE and need to
use the opposite sense.
As such, rename bs_be_* to native_bs_* and rename bs_le_* to swapped_bs_*.
No functional change.
Sponsored by: Tag1 Consulting, Inc.
Swap the BE and LE bus_space tags when on LE, and adjust the nexus tag
to match.
This is prep for a a followup that makes the powerpc bus_space macros easier
to maintain in the future.
Sponsored by: Tag1 Consulting, Inc.
Implement pmap_mincore() for moea64.
This will need some slight tweaks when large page support in HPT lands.
Submitted by: Fernando Eckhardt Valle <fernando.valle@eldorado.org.br>
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D26314
Fixes for Raspberry Pi 4B PCIe / USB:
- Pass through a DMA tag for the controller.
- In theory the controller can access the lower 3 GB, but testing found
that unreliable. OpenBSD also restricts DMA to the lowest 960 MiB.
- Rename some constants to be a bit more meaningful.
Submitted by: Robert Crowston, crowston at protonmail.com
Reviewed by: mkarels, outside reviewers
Differential Revision: https://reviews.freebsd.org/D26344
For configurations without x2APIC support (guests, older hardware), the global
LAPIC MMIO mapping will trigger false-positive KCSan reports as it will appear
that multiple CPUs are concurrently reading and writing the same address.
This isn't actually true, as the underlying physical access will be performed
on the local CPU's APIC. Additionally, because LAPIC access can happen during
event timer configuration, the resulting KCSan printf can produce a panic due
to attempted recursion on event timer resources.
Add a __nosanitizethread preprocessor define to prevent the compiler from
inserting TSan hooks, and apply it to the x86 LAPIC accessors.
PR: 249149
Reported by: gbe
Reviewed by: andrew, kib
Tested by: gbe
Differential Revision: https://reviews.freebsd.org/D26354
This update adds support for:
HW VLAN tagging
HW checksum offload for IPv4 and IPv6
tx and rx aggreegation (for full gige speeds)
multiple transactions
In my testing, I am able to get 900-950Mbps depending upon
TCP or UDP, which is a significant improvement over the previous
91Mbps (~8kint/sec*1500bytes/packet*1packet/int).
Reviewed by: hselasky
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D25809
freebsd-update(8) builds, where BRANCH is suffixed with -p0 for
builds.
Noticed by: gordon
With help from: cperciva
MFC after: 3 days
MFC note: before 12.2-BETA2
Sponsored by: Rubicon Communications, LLC (netgate.com)
In r365419 ieee80211_media_change() callers were updated to not longer
act on the obselete ENETRESET return code.
While in the old days iwm has done a stop/init cycle in these cases,
this was not executed since r193340.
As a consequence simplify iwm code as well by passing ieee80211_media_change()
right to ieee80211_vap_attach() as there is no more need for a local
implementation.
Reported by: Tomoaki AOKI (junchoon dec.sakura.ne.jp)
Tested by: Tomoaki AOKI (junchoon dec.sakura.ne.jp)
MFC after: 3 days
X-MFC: fix is already in stable/12
PR: 248955
Return values are passed in a0, so read it from there. We also pass a1 through
to userspace, as the ABI allows small structs to be returned in registers
a0/a1. While here read the register values directly from the trapframe rather
than rtval, and remove the now unneeded argument from dtrace_invop().
Set fbtp_roffset so that we get the correct return location in arg0.
Reviewed by: markj
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D26389
Summary of changes:
- Assorted bug fixes
- Support for newer versions of the device firmware
- Suspend/resume support
- Support for Lenient Link Mode for E82X devices (e.g. can try to link with
SFP/QSFP modules with bad EEPROMs)
- Adds port-level rx_discards sysctl, similar to ixl(4)'s
This version of the driver is intended to be used with DDP package 1.3.16.0,
which has already been updated in a previous commit.
Tested by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 3 days
MFC with: r365332, r365550
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D26322
On write with SHM_GROW_ON_WRITE, use proper truncate.
Do not allow to grow largepage shm if F_SEAL_GROW is set. Note that
shrinks are not supported at all due to unmanaged mappings.
Call to vm_pager_update_writecount() is only valid for swap objects,
skip it for unmanaged largepages.
Largepages cannot support write sealing.
Do not writecnt largepage mappings.
Reported by: kevans
Reviewed by: kevans, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26394
* Add LOAD_LR_NIA define. This is preferred to "bl 1f; 1:" because it
doesn't pollute the branch predictor.
* Add magic sequence to return the CPU to the correct endianness after
jumping to cross-endian code, similar to the sequence from Linux.
Sponsored by: Tag1 Consulting, Inc.
As the pl061 driver can be an interrupt controller attach it earlier in the
boot so other drivers can use it.
Use a new GPIO xref to not conflict with the existing root interrupt
controller.
Sponsored by: Innovate UK
It could be used in various IOMMU platforms, not only DMAR.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26373
On arm64 we may boot via ACPI. In this case we will still try to manage the
gpio providers as if we are using FDT. Fix this by checking if the FDT node
is valid before registering a cross reference.
Sponsored by: Innovate UK
There were multiple bugs in the OPAL RTC code which had never been
discovered, as the default configuration of OPAL machines is to
have the BMC / FSP control the RTC.
* Fix calling convention for setting the time -- the variables are passed
directly in CPU registers, not via memory.
* Fix bug in the bcd encoding routines. (from jhibbits)
Tested on POWER9 Talos II (BE) and POWER9 Blackbird (LE).
Reviewed by: jhibbits (in irc)
Sponsored by: Tag1 Consulting, Inc.
During the DCTCP improvements, use of CCF_DELACK was
removed. This change is just to rename the unused flag
bit to prevent use of it, without also re-implementing
the tcp_input and tcp_output interfaces.
No functional change.
Reviewed by: chengc_netapp.com, tuexen
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D26181
Created with shm_open2(SHM_LARGEPAGE) and then configured with
FIOSSHMLPGCNF ioctl, largepages posix shared memory objects guarantee
that all userspace mappings of it are served by superpage non-managed
mappings.
Only amd64 for now, both 2M and 1G superpages can be requested, the
later requires CPU feature.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
Currently kernel requests deletion for the certain routes with specified gateway,
but this gateway is not actually checked. With multipath routes, internal gateway
checking becomes mandatory. Add the logic performing this check.
Generalise RTF_PINNED routes to the generic route priorities, simplifying the logic.
Add lookup_prefix() function to perform exact match search based on data in @info.
Differential Revision: https://reviews.freebsd.org/D26356
The entries and their clip boundaries must be aligned on supported
superpages sizes from pagesizes[]. vm_map operations return Mach
error KERN_INVALID_ARGUMENT, which is usually translated to EINVAL, if
it would require clip not at the boundary.
In other words, entries force preserving virtual addresses superpage
properties.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
The flag requests entry of non-managed superpage mapping of size
pagesizes[psind] into the page table.
Pmap supports fake wiring of the largepage mappings. Only attributes
of the largepage mapping can be changed by calling pmap_enter(9) over
existing mapping, physical address of the page must be unchanged.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
purpose of epoch_trace() and for calling subsequent panic, but to keep
code fully under INVARIANTS, so don't use bare function call to panic().
However, at the last stage of review a true value slipped in, while
always false was assumed. I checked that in email archive with kib@.
Noticed by: trasz
stacks with a SENT_FIN outstanding. Both rack and bbr will only send a
FIN if all data is ack'd so this must be enforced. Also if the previous stack
sent the FIN we need to make sure in rack that when we manufacture the
"unknown" sends that we include the proper HAS_FIN bits.
Note for BBR we take a simpler approach and just refuse to switch.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D26269
Add support for user-supplied callbacks into phys pager operations,
providing custom getpages(), haspage(), and populate() methods
implementations. Pager stores user data ptr/val in the object to
provide context.
Add phys_pager_allocate() helper that takes user ops table as one of
the arguments.
Current code for these methods is moved to the 'default' ops table,
assigned automatically when vm_pager_alloc() is used.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
When emulating a single thread system for testing reasons, mp_maxid can
be 0. This trips up our math for calculating the order.
Account for this to fix xive attachment when emulating a single-thread
core on qemu powernv (a configuration that doesn't exist in the real world.)
Sponsored by: Tag1 Consulting, Inc.
Future changes would require additional initialization of OBJT_PHYS
objects, and vm_object_allocate() is not suitable for it.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
When doing a build for a modern CPUTYPE, llvm will throw errors if obsolete
instructions are used, even if they will never run due to runtime checks.
Hiding the dssall instruction from the assembler fixes kernel build when
overriding CPUTYPE, without having any effect on the generated binary.
This has been in my local tree for over a year and is well tested across
a variety of machines.
Sponsored by: Tag1 Consulting, Inc.
When enabling an interrupt, assert that we do in fact have a root PIC.
This would have saved me some debugging effort.
Sponsored by: Tag1 Consulting, Inc.
The cryptodev_process() method should either return 0 if it has
completed a request, or ERESTART to defer the request until later. If
a request encounters an error, the error should be reported via
crp_etype before completing the request via crypto_done().
Fix a few more drivers noticed by asomers@ similar to the fix in
r365389. This is an old bug, but went unnoticed since crypto requests
did not start failing as a normal part of operation until digest
verification was introduced which can fail requests with EBADMSG.
PR: 247986
Reported by: asomers
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D26361
There are multiple USB/SATA bridges on the market that unconditionally
cut some LBAs off connected media. This could be a problem
for pre-partitioned drives so GEOM complains and does not create
devices in /dev for slices/partitions preventing access to existing data.
We have kern.geom.part.check_integrity that allows us to correct
partitioning if changed from default 1 to 0 but it works for MBR only.
If backup copy of GPT is unavailable due to decreases number of LBAs,
kernel still does not give access to partitions and prints to dmesg:
GEOM: md0: corrupt or invalid GPT detected.
GEOM: md0: GPT rejected -- may not be recoverable.
This change makes it work for GPT too, so it created partitions in /dev
and prints to dmesg this instead:
GEOM: md0: the secondary GPT table is corrupt or invalid.
GEOM: md0: using the primary only -- recovery suggested.
Then "gpart recover" re-created backup copy of GPT
and allows further manipulations with partitions.
This change is no-op for default configuration having
kern.geom.part.check_integrity=1
Reported by: Alex Korchmar
MFC after: 3 days.
Doing it this way eliminates the assumption about homogeneous support
for the feature, since HWCAP values are only set if support is present
on all CPUs.
Reviewed by: tuexen, markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26032
Expose some of the new HWCAP features added in r65304. This includes the
addition of elf_hwcap2 into the sysvec, and a separate function to parse
for those features.
This only exposes features which require no further configuration, e.g.
indicating the presence of certain instructions. Larger features (SVE)
will not be advertised until we actually support them. The exact list of
features/extensions this patch exposes is:
- ARMv8.0-DGH
- ARMv8.0-SB
- ARMv8.2-BF16
- ARMv8.2-DCCVADP
- ARMv8.2-I8MM
- ARMv8.4-LRCPC
- ARMv8.5-CondM
- ARMv8.5-FRINT
- ARMv8.5-RNG
- PSTATE.SSBS
While here, annotate elf_hwcap and elf_hwcap2 as __read_frequently, and
move the declarations to the machine/md_var.h header.
Submitted by: mikael@ (D22314 portion)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26031
Differential Revision: https://reviews.freebsd.org/D22314
FreeBSD exports CPU features as bits in the AT_HWCAP and AT_HWCAP2
vectors via elf_aux_info(3). This interface is similar to getauxval(3)
on Linux, and for simplicity to consumers we try to maintain an
identical set of feature flags on arm64.
The first batch of AT_HWCAP flags were added in r350166, corresponding
to definitions that already existed in Linux. Unfortunately, one flag
was missed, and a portion of the values are shifted one bit to the right
as a result.
Add the missing definition for HWCAP_ASIMDHP, and adjust the affected
values to match their Linux counterparts.
Although this is an ABI-breaking change, there is no plan to provide
compat code for old binaries. An audit of our ports tree and other
software via Debian code search indicates that there are not yet any
consumers of this interface for FreeBSD/arm64.
Bump __FreeBSD_version to be on the safe side, in case compat code needs
to be added in the future.
Reviewed by: emaste, manu
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26329
There's a race where dying vnets move their interfaces back to their original
vnet, and if_epair cleanup (where deleting one interface also deletes the other
end of the epair). This is commonly triggered by the pf tests, but also by
cleanup of vnet jails.
As we've not yet been able to fix the root cause of the issue work around the
panic by not dereferencing a NULL softc in epair_qflush() and by not
re-attaching DYING interfaces.
This isn't a full fix, but makes a very common panic far less likely.
PR: 244703, 238870
Reviewed by: lutz_donnerhacke.de
MFC after: 4 days
Differential Revision: https://reviews.freebsd.org/D26324
This option was marked as broken because our riscv64-xtoolchain-gcc
package lacked support. Since we are moving away from xtoolchain gcc in
favor of freebsd-gcc9, there should be no issue in enabling this option
by default.
Notably, this enables -Wformat errors.
Reviewed by: kp, jhb
Differential Revision: https://reviews.freebsd.org/D26320
RISC-V is currently built with -Wno-format, which is how these went
undetected. Address them now before re-enabling those warnings.
Differential Revision: https://reviews.freebsd.org/D26319
A PL061 is a simple 8 pin GPIO controller. This GPIO device is used to
signal an internal request for shutdown on some virtual machines including
Arm-based Amazon EC2 instances.
Submitted by: Ali Saidi <alisaidi_amazon.com> (previouss version)
Reviewed by: Ali Saidi, manu
Differential Revision: https://reviews.freebsd.org/D24065