This was announced to happen after the 12 relases.
Remove a depeciated ABI.
The complete removal is for HEAD only. I'll remove the #define in
stable/13 as MFC, so the code will still exist in 13.x, but will not
included by default. Earlier versions will not be affected.
Reviewed by: kp
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D28518
Previously the flags were passed as-is, which could resulted
in spurious EAGAIN returned for non-blocking sockets, which
broke some Steam games.
PR: 248065
Reported By: Alex S <iwtcex@gmail.com>
Tested By: Alex S <iwtcex@gmail.com>
Reviewed By: emaste
MFC After: 3 days
Sponsored By: The FreeBSD Foundation
This is the first patch of a series of necessary steps
to make ng_bridge(4) multithreaded.
Reviewed by: melifaro (network), afedorov
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D28125
Handling of unknown MACs on an bridge with incomplete learning
capabilites (aka uplink ports) can be defined in different ways.
The classical approach is to broadcast unicast frames send to an
unknown MAC, because the unknown devices can be everywhere. This mode
is default for ng_bridge(4).
In the case of dedicated uplink ports, which prohibit learning of MAC
addresses in order to save memory and CPU cycles, the broadcast
approach is dangerous. All traffic to the uplink port is broadcasted
to every downlink port, too. In this case, it's better to restrict the
distribution of frames to unknown MAC to the uplink ports only.
In order to keep the chance small and the handling as natural as
possible, the first attached link is used to determine the behaviour
of the bridge: If it is an "uplink" port, then the bridge switch from
classical mode to restricted mode.
Reviewed By: kp
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D28487
The ng_bridge(4) node is designed to work in moderately small
environments. Connecting such a node to a larger network rapidly fills
the MAC table for no reason. It even become complicated to obtain data
from the gettable message, because the result is too large to
transmit.
This patch introduces, two new functionality bits on the hooks:
- Allow or disallow MAC address learning for incoming patckets.
- Allow or disallow sending unknown MACs through this hook.
Uplinks are characterized by denied learing while sending out
unknowns. Normal links are charaterized by allowed learning and
sending out unknowns.
Reviewed by: kp
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D23963
The ipfilter NAT table host map size is a tunable that defaults to
a macro value defined at build time. HOSTMAP_SIZE is saved in softn
(the ipnat softc) at initialization. It can be tuned (changed) at runtime
using the ipf -T command. If the hostmap_size tunable is adjusted the
calculation to determine where to put new entries in the table was
incorrect. Use the tunable in the NAT softc instead of the static build
time value.
MFC after: 1 week
lang/rust needs COMPAT_FREEBSD11 to build, even though powerpc64le itself is supported only since 13.0.
I also corrected a comment, because if we ever have lib32 for powerpc64le, it will be for powerpcle.
Reviewed by: bdragon (on IRC)
Examples of inconsistencies with the current state:
- references LRU of all entries, removed years ago
- references a non-existent lock (neglist)
- claims negative entries have a NULL target
It will be replaced with a more accurate and more informative
description.
In the meantime take it out so it stops misleading.
On arm64 we can select how strongly we order device memory. Currently
we use the strongest type of non-Gathering, non-Reordering, no Early
write acknowledgement. This is equivalent to VM_MEMATTR_SO in the 32-bit
arm code.
Create a new memory type to remove the no Early write acknowledgement
option to create a memory attribute that is equivalent to the arm
VM_MEMATTR_DEVICE.
Keep the the old nGnRnE memory as what we provide for VM_MEMATTR_DEVICE
until we can test nGnRE on more hardware. A method for dynamically
switching back may be needed as at least one vendor is known to have
broken nGnRE memory.
Sponsored by: Innovate UK
FreeBSD pvscsi and vmx work with VMware ESXi Arm "Fling"; provide these
in GENERIC for a convenient out-of-the-box experience.
PR: 253202
Reported by: Vincent Milum Jr
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
X700 family of controllers has limited number of available VLAN
HW filters. Driver did not handle properly a case when user
assigned more VLANs to the interface which had all filters
already in use. Fix that by disabling HW filtering when
it is impossible to create filters for all requested VLANs.
Keep track of registered VLANs using bitstring to be able
to re-enable HW filtering when number of requested VLANs
drops below the limit.
Also switch all allocations to use M_IXL malloc type
to ease detecting memory leaks in the driver.
Reviewed by: erj
Tested by: gowtham.kumar.ks@intel.com
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28137
When OFED was upgraded to Linux v4.9, a bunch of Linux-specific
netlink changes were dropped. Unfortunately, there was a mismerge
in this process and as a result ib_sa_cancel_query() would fail to
cancel an outstanding MAD.
This was causing rdma_destroy_id() to hang indefinitely waiting
for the MAD to complete and release the final reference.
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D28421
Reviewed by: hselasky, kib
MFC after: 2 months
Consider the following scenario:
1. A delayed_work struct in the WORK_ST_TIMER state.
2. Thread A calls mod_delayed_work()
3. Thread B (a callout thread) simultaneously calls
linux_delayed_work_timer_fn()
The following sequence of events is possible:
A: Call linux_cancel_delayed_work()
A: Change state from TIMER TO CANCEL
B: Change state from CANCEL to TASK
B: taskqueue_enqueue() the task
A: taskqueue_cancel() the task
A: Call linux_queue_delayed_work_on(). This is a no-op because the
state is WORK_ST_TASK.
As a result, the delayed_work struct will never be invoked. This is
causing address resolution in ib_addr.c to stop permanently, as it
never tries to reschedule a task that it thinks is already scheduled.
Fix this by introducing locking into the cancel path (which
corresponds with the lock held while the callout runs). This will
prevent the callout from changing the state of the task until the
cancel is complete, preventing the race.
Differential Revision: https://reviews.freebsd.org/D28420
Reviewed by: hselasky
MFC after: 2 months
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, kadesai (on email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26531
Add endiannes conversions in order to support big-endian platforms
Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br>
Reviewed by: luporl, alfredo, kadesai (on email)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D26531
If a M_WAITOK contig alloc fails, the VM subsystem will try to
reclaim contiguous memory twice before actually failing the
request. On a system with 64GB of RAM I've observed this take
400-500ms before it finally gives up, and I believe that this
will only be worse on systems with even more memory.
In certain contexts this delay is extremely harmful, so add a flag
that will skip reclaim for allocation requests to allow those
paths to opt-out of doing an expensive reclaim.
Sponsored by: Dell Inc
Differential Revision: https://reviews.freebsd.org/D28422
Reviewed by: markj, kib
- limit maximum segment size to 2048 bytes. Although dwmmc supports a buffer
fragment with a maximum length of 4095 bytes, use the nearest lower power
of two as the maximum fragment size. Otherwise, busdma create excessive
buffer fragments.
- fix off by one error in computation of the maximum data transfer length.
- in addition, reserve two DMA descriptors that can be used by busdma
bouncing. The beginning or end of the buffer can be misaligned.
- Don’t ignore errors passed to bus_dmamap_load() callback function.
- In theory, a DMA engine may be running at time when next dma descriptor is
constructed. Create a full DMA descriptor before OWN bit is set.
MFC after: 2 weeks
The lock around callout_drain() is unnecessary and may cause
deadlock when one closes a timer descriptor during timer execution.
Reviewed By: delphij
Submitted By: ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28148
This reverts commit 710e45c4b8.
It breaks for some corner cases on big endian ppc64.
Given the stage of the release process it is best to revert for now.
Reported by: jhibbits
On Linux, read(2) from a timerfd file descriptor returns an unsigned
8-byte integer (uint64_t) containing the number of expirations
that have occurred, if the timer has already expired one or more
times since its settings were last modified using timerfd_settime(),
or since the last successful read(2). That's to say, once we do
a read or call timerfd_settime(), timer fd's expiration count should
be zero. Some Linux applications create timerfd and add it to epoll
with LT mode, when event comes, they do timerfd_settime instead
of read to stop event source from trigger. On FreeBSD,
timerfd_settime(2) didn't set the count to zero, which caused high
CPU utilization.
Submitted by: ankohuu_outlook.com (Shunchao Hu)
Differential Revision: https://reviews.freebsd.org/D28231
This makes roundup2/rounddown2 type- and const-preserving and allows
using it on pointer types without casting to uintptr_t first. Not
performing pointer-to-integer conversions also helps the compiler's
optimization passes and can therefore result in better code generation.
When using it with integer values there should be no change other than
the compiler checking that the alignment value is a valid power-of-two.
I originally implemented these builtins for CHERI a few years ago and
they have been very useful for CheriBSD. However, they are also useful
for non-CHERI code so I was able to upstream them for Clang 10.0.
Rationale from the clang documentation:
Clang provides builtins to support checking and adjusting alignment
of pointers and integers. These builtins can be used to avoid relying
on implementation-defined behavior of arithmetic on integers derived
from pointers. Additionally, these builtins retain type information
and, unlike bitwise arithmetic, they can perform semantic checking on
the alignment value.
There is also a feature request for GCC, so GCC may also support it in
the future: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641
Reviewed By: brooks, jhb, imp
Differential Revision: https://reviews.freebsd.org/D28332
The RW fields in this register reset to architecturally unknown values,
so initialize these to the proper rounding and denormal mode.
MFC after: 1 week
This fixes an issue where a private key contained bits that should
have been cleared by the clamping process, but were passed through
to the scalar multiplication routine and resulted in an invalid
public key.
Issue diagnosed (and an initial fix proposed) by shamaz.mazum in
PR 252894.
This fix suggested by Jason Donenfeld.
PR: 252894
Reported by: shamaz.mazum
Reviewed by: dch
MFC after: 3 days
ROUTE_MPATH was added to the GENERIC kernel in r368648.
According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0.
Given enough time has passed, this change enables route multipath by default.
The goal is to ship FreeBSD 13 with multipath turned on.
Reviewed By: donner, olivier
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28423
kern.proc.proc_td returns the process table with an entry for each
thread. Previously the description included "no threads", presumably
a cut-and-pasteo in 2648efa621.
Description suggested by PauAmma.
PR: 253146
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
to be a true RFC 6598 NAT444 setup, where each network segment (e.g. user,
subnet) can have their own dedicated port aliasing ranges.
Reviewed by: donner, kp
Approved by: 0mp (mentor), donner, kp
Differential Revision: https://reviews.freebsd.org/D23450
DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T
in one go, we don't need to store it anywhere, especially in global
per-command structure. This may allow us to handle multiple R2T per
command at once, if we decide, or may be relax locking.
Rename the second use of that field to io_referenced_task_tag.
MFC after: 1 month
Userspace has OFED build enabled for quite some time, but kernel modules
were not. This is useless config because any userspace IB code requires
kernel support. So enable modules build by default.
Move WITH_OFED to WITHOUT_OFED since defaults are now enabled.
Reviewed by: emaste, hselasky, kevans
MFC after: 3 days
Sponsored by: NVidia Networking / Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D28460
This avoids a SIGILL when running these tests on QEMU (which
defaults to a basic amd64 CPU without SSE4.2).
This commit also tests the table-based implementations in addition to
testing the hw-accelerated crc32 versions.
Reviewed By: cem, kib, markj
Differential Revision: https://reviews.freebsd.org/D28395
One common method of EOI'ing an interrupt at the IO-APIC level is to
switch the pin to edge triggering mode and then back into level mode.
That would cause the IRR bit to be cleared and thus further interrupts
to be injected. FreeBSD does indeed use that method if the IO-APIC EOI
register is not supported.
The bhyve IO-APIC emulation code didn't clear the IRR bit when doing
that switch, and was also missing acknowledging the IRR state when
trying to inject an interrupt in vioapic_send_intr.
Reviewed by: grehan
Differential revision: https://reviews.freebsd.org/D28238
After modifying a redirection entry only try to inject an interrupt if
the pin is in level mode, pins in edge mode shouldn't take into
account the line assert status as they are triggered by edge changes,
not the line status itself.
Reviewed by: grehan
Differential revision: https://reviews.freebsd.org/D28237
vioapic_send_intr does already check whether the pin is masked before
injecting the interrupt, there's no need to do it in vioapic_write
also.
No functional change intended.
Reviewed by: grehan
Differential revision: https://reviews.freebsd.org/D28236
The conscious decision was made not to perform any indentation or
whitespace cleanup while cleaning out old redunant #ifdefs. The
reason for this was to avoid confusing future readers of history and
diffs with cosmetic changes, making bisection of any possible bugs
introduced more difficult. This commit cleans up the whitespace
detritus left behind from the previous #ifdef cleanup commits.
MFC after: 1 week
In the old days when K&R C and STD C were each in use a workaround
(read hack) was required to allow the same code to work on each
without modification. All C compilers support STD C. We can finally
put the __P prototype to rest.
MFC after: 1 week
As we get started with no memory allocator, we set up static font data
for font passed by loader (if there is any). At this time, we also must
set refcount 1, and refcount will get incremented in cnprobe() callback.
At some point the memory allocator will be available, and we will set up
properly allocated font data, but we should not disturb the refcount.
PR: 253147
Update zfs_config.h to match latest merge in FreeBSD
The version string is declared as 2.0.0-FreeBSD_gf11b09dec to provide
more information about the loaded module:
- the OpenZFS version in base is 2.0
- we are using the in tree-module ("FreeBSD")
- the last merged OpenZFS git revision ("gf11b09dec")
With future merges the git revision tag should be updated.
As we are merging from OpenZFS master branch and already include features
like dRAID, referencing patchlevel releases (2.0.1, 2.0.2) is pointless.
Reviewed by: freqlabs
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28447
Memory and PCI resources are freed with no particular order. This could
cause use-after-frees when detaching following a failed attach. For
instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but
iflib_tqg_detach() attempts to access this array. Similarly, adapter
queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to
access adapter queues to free PCI resources.
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D27634
The existing implementation relies on each trap handler saving a normal
stack frame record, which is a waste of time and space when we're
already saving a trapframe to the stack. It's also wrong as it currently
saves LR not ELR.
Instead of patching it up, rewrite it based on the RISC-V implementation
with inspiration from the amd64 implementation for how to handle
vectored traps to provide an improved implementation. This includes
compressing the information down to one line like other architectures
rather than the highly-verbose old form that repeats itself by printing
LR and FP in one frame only to print them as PC and SP in the next. It
also includes printing out actually useful information about the traps
that occurred, though FAR is not saved in the trapframe so we cannot
print it (in general it can be clobbered between when the trap happened
and now), only ESR.
The AAPCS also allows the stack frame record to be located anywhere in
the frame, not just the top, so the caller's SP is not at a fixed offset
from the callee's FP like on almost all other architectures in
existence. This means there is no way to derive the caller's SP in the
unwinder, and so we have to drop that bit of (unused) state everywhere.
Reviewed by: jhb, markj
Differential Revision: https://reviews.freebsd.org/D28026