This would enqueue an event to send the gratuitous arp on a dying lagg
interface without any physical ports attached to it.
Apart from that, the taskqueue_drain() on lagg_clone_destroy() runs too
late, when the ifp data structure is already freed. Fix that too.
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
There are no alternatives defined, so there's no point in keeping them. Also,
they weren't around every inline asm block anyway. Without __GNUCLIKE_ASM
defined, the guarded functions return garbage.
Reported by: Andrew Thompson
Clang complains about the shift of (1 << 7) into a int8_t changing the value:
warning: implicit conversion from 'int' to 'int8_t' (aka 'signed char') changes
value from 128 to -128 [-Wconstant-conversion]
Squash this warning by forcing clang to see it as an unsigned bit.
This seems odd, given that it's still a conversion of 128->-128, but I'm
guessing the explicit unsigned attribute notifies clang that sign really doesn't
matter in this case.
Reported by: Mark Millard <markmi AT dsl-only DOT net>
MFC after: 2 weeks
Summary:
Clang throws the following warning in powerpc intr_machdep:
/usr/src/sys/powerpc/powerpc/intr_machdep.c:454:15: warning: comparison of
constant -1 with expression of type 'enum intr_trigger' is always false
[-Wtautological-constant-out-of-range-compare]
if (i->trig == -1)
~~~~~~~ ^ ~~
This may lead to legitimate problems with aggressive optimizations, if not now
then in the future. To avoid this, add a new enum, INTR_TRIG_INVALID, set to
-1, and use this new enumeration in these checks.
Test Plan: Compile test.
Reviewed By: jhb, kib
Differential Revision: https://reviews.freebsd.org/D9300
Summary:
atomic_fcmpset_*() is analogous to atomic_cmpset(), but saves off the read value
from the target memory location into the 'old' pointer in the case of failure.
Requested by: mjg
Differential Revision: https://reviews.freebsd.org/D9325
After some digging and looking at packet traces, it looks like the
sequence number allocation being done by net80211 doesn't meet
802.11-2012.
Specifically, group addressed frames (broadcast, multicast) have
sequence numbers allocated from a separate pool, even if they're
QoS frames.
This patch starts to try and address this, both on transmit and
receive.
* When receiving, don't throw away multicast frames for now.
It's sub-optimal, but until we correctly track group addressed
frames via another TID counter, this is the best we can do.
* When doing A-MPDU checks, don't include group addressed frames
in the sequence number checks.
* When transmitting, don't allocate group frame sequence numbers
from the TID, instead use the NONQOS TID for allocation.
This may fix iwn(4) 11n because I /think/ this was one of the
handful of places where ni_txseqs[] was being assigned /outside/
of the driver itself.
This however doesn't completely fix things - notably the way that
TID assignment versus WME assignment for driver hardware queues
will mess up multicast ordering. For example, if all multicast
QoS frames come from one sequence number space but they're
expected to obey the QoS value assigned, they'll end up in
different queues in the hardware and go out in different
orders.
I can't fix that right now and indeed fixing it will require some
pretty heavy lifting of both the WME<->TID QoS assignment, as well
as figuring out what the correct way for drivers to behave.
For example, both iwn(4) and ath(4) shouldn't put QoS multicast
traffic into the same output queue as aggregate traffic, because
the sequence numbers are all wrong. So perhaps the correct thing
to do there is ignore the WME/TID for QoS traffic and map it all
to the best effort queue or something, and ensure it doesn't
muck up the TID/blockack window tracking. However, I'm /pretty/
sure that is still going to happen.
.. maybe I should disable multicast QoS frames in general as well,
but I don't know what that'll do for whatever the current state
of 802.11s mesh support is.
Tested:
* STA mode, ath10k NIC
* AP mode, AR9344/AR9580 AP
* iperf tcp/udp tests with concurrent multicast QoS traffic.
Before this, iperfs would fail pretty quickly because the sending
AP would start sending out QoS multicast frames that would be
out of order from the rest of the TID traffic, causing the blockack
window to get way, way out of sync.
This now doesn't occur.
TODO:
* verify which QoS frames SHOULD be tagged as M_AMPDU_MPDU.
For example, QoS NULL frames shouldn't be tagged!
Reviewed by: avos
Differential Revision: https://reviews.freebsd.org/D9357
protection change.
On superpage promotion, x86 pmaps do not invalidate existing 4K
entries for the superpage range, because they are compatible with the
promoted 2/4M entry. But the invalidation on superpage removal or
protection change only did single INVLPG with the base address of the
superpage. This reliably flushed superpage TLB entry, and 4K entry
for the first page of the superpage, potentially leaving other 4K TLB
entries lingering. Do the invalidation of the whole superpage range
to correct the problem.
Note that the precise invalidation is done by x86 code for kernel_pmap
only, for user pmaps whole (per-AS) TLB is flushed. This made the bug
well hidden, because promotions of the kernel mappings require
specific load.
Reported and tested by: Jonathan Looney <jtl@netflix.com> (previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
convention for interfaces, because only one stf(4) interface can exist
in the system.
This disallow the use of unit numbers different than 0, however, it is
possible to create the clone without specify the unit number (wildcard).
In the wildcard case we must update the interface name before return.
This fix an infinite recursion in pf code that keeps track of network
interfaces and groups:
1 - a group for the cloned type of the interface is added (stf in this
case);
2 - the system will now try to add an interface named stf (instead of
stf0) to stf group;
3 - when pfi_kif_attach() tries to search for an already existing 'stf'
interface, the 'stf' group is returned and thus the group is added
as an interface of itself;
This will now cause a crash at the first attempt to traverse the groups
which the stf interface belongs (which loops over itself).
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
in the i386 pmap.
The curcpu macro loads the per-cpu data pointer as its first step,
so the remaining steps of pcpu_find(curcpu) are circular.
get_pcpu() is already implemented for arm, arm64, and risc-v.
My plan is to implement it for the remaining architectures and use
it to replace several instances of pcpu_find(curcpu) in MI code.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9370
-G0 is sufficent except on old version of clang (<3.8) and such versions
are unlikely to be generally useful on mips64.
Reported by: sbruno
Sponsored by: DARPA, AFRL
initialized, this can cause a divide by zero (if the VNET initialization
takes to long to complete).
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
While GCC's __nonnull__ attribute is generally useful to prevent misuse of
some functions it also tends to do rather dangerous "optimizations". Now
that we have replaced (r312934) all such uses with the clang nullability
qualifiers, the GCC attribute is unnecessary.
Remove the definition completely to prevent its use in system's headers.
Replace uses of the GCC __nonnull__ attribute with the clang nullability
qualifiers. The replacement should be transparent for clang developers as
the new qualifiers will produce the same warnings and will be useful for
static checkers but will not cause aggressive optimizations.
GCC will not produce such warnings and developers will have to use
upgraded GCC ports built with the system headers from r312538.
Hinted by: Apple's Libc-1158.20.4, Bionic libc
MFC after: 11.1 Release
Differential Revision: https://reviews.freebsd.org/D9004
This interface type ("a parent interface of wlanX") is not used since
r287197
Reviewed by: adrian, glebius
Differential Revision: https://reviews.freebsd.org/D9308
driver to support exposing a GEOM device, which can be used to mount
Avalon-attached ROMs, reserved areas of DRAM, etc, as a filesystem:
commit 9deb1e60eaaaf7a3687e48c58af5efd756f32ec6
Author: Robert N. M. Watson <robert.watson@cl.cam.ac.uk>
Date: Sat Mar 5 20:33:12 2016 +0000
Use format strings with make_dev(9) in avgen(4).
commit 0bf2176c23e7425bfa042c08a24f8a25fe6d8885
Author: Robert N. M. Watson <robert.watson@cl.cam.ac.uk>
Date: Tue Mar 1 10:23:23 2016 +0000
Implement a new "geomio" configuration argument to altera_avgen(4),
the generic I/O device we attach to various BERI peripherals. The new
option requests that, instead of exposing the underlying device via a
special device node in /dev, it instead be exposed via geom(4),
allowing it to be used with filesystems. The current implementation
does not allow a device to be exposed both for file/mmap and geom, so
one of the two models must be selected when configuring it via FDT or
device.hints. A typical use of the new option will be:
sri-cambridge,geomio = "rw";
MFC after: 1 week
Sponsored by: DARPA, AFRL
CheriBSD, which attempt to work around an inherent race in the UART's
control-register design in detecting whether JTAG is currently,
present, which will otherwise lead to moderately frequent output
drops when running in polled rather than interrupt-driven operation.
Now, these drops are quite infrequent.
commit 9f33fddac9215e32781a4f016ba17eab804fb6d4
Author: Robert N. M. Watson <robert.watson@cl.cam.ac.uk>
Date: Thu Jul 16 17:34:12 2015 +0000
Add a new sysctl, hw.altera_jtag_uart.ac_poll_delay, which allows the
(default 10ms) delay associated with a full JTAG UART buffer combined
with a lack of a JTAG-present flag to be tuned. Setting this higher
may cause some JTAG configurations to be more reliable when printing
out low-level console output at a speed greater than the JTAG UART is
willing to carry data. Or it may not.
commit 73992ef7607738b2973736e409ccd644b30eadba
Author: Robert N. M. Watson <robert.watson@cl.cam.ac.uk>
Date: Sun Jan 1 15:13:07 2017 +0000
Minor improvements to the Altera JTAG UART device driver:
- Minor rework to the logic to detect JTAG presence in order to be a bit
more resilient to inevitable races: increase the retry period from two
seconds to four seconds for trying to find JTAG, and more agressively
clear the miss counter if JTAG has been reconnected. Once JTAG has
vanished, stop prodding the miss counter.
- Do a bit of reworking of the output code to frob the control register
less by checking whether write interrupts are enabled/disabled before
changing their state. This should reduce the opportunity for races
with JTAG discovery (which are inherent to the Altera
hardware-software interface, but can at least be minimised).
- Add statistics relating to interrupt enable/disable/JTAG
discovery/etc.
With these changes, polled-mode JTAG UART ttys appear substantially
more robust.
MFC after: 1 week
Sponsored by: DARPA, AFRL
Thank glebius for pointing this out:
"The network stuff shall not be added to sys/eventhandler.h"
Reviewed by: David_A_Bright_DELL.com, sephe, glebius
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D9345
default, and the fewer changes relative to the upstream u-boot the
better.
Add compatibility links for the old names.
Add dts file for BeagleBone Green while we're here.
Upstream GCC and devel/mips64-gcc use "octeon+" as the CPU setting for
the Octeon processor in the EdgeRouter Lite. As of r312899 the base
system GCC 4.2.1 accepts octeon+ as an alias for the Octeon support
added in r208737 for the same CPU.
Sponsored by: The FreeBSD Foundation
regardless of what the default stack for the system is set to.
With current/default behavior, after changing the default tcp stack, the
application needs to be restarted to pick up that change. Setting this new knob
net.inet.tcp.functions_inherit_listen_socket_stack to '0' would change that
behavior and make any new connection use the newly selected default tcp stack.
Reviewed by: rrs
MFC after: 2 weeks
Sponsored by: Limelight Networks
We found routing performance dropped significantly when configuring
FreeBSD as a router, we are applying the following changes in order to
resolve those issues and hopefully perform better.
- don't prefetch the flags array, we usually don't need it
- prefetch the next cache line of each of the software descriptor arrays as
well as the first cache line of each of the next four packets' mbufs and
clusters
- reduce max copy size to 63 bytes
- convert rx soft descriptors from array of structures to a structure of arrays
- update copyrights
Submitted by: Matt Macy <mmacy@nextbsd.org>
undo_offload_socket() is only called by t4_connect() during a connection
setup failure, but t4_connect() still owns the TOE PCB and frees ita
after undo_offload_socket() returns. Release a reference in
undo_offload_socket() resulted in a double-free which panicked when
t4_connect() performed the second free. The reference release was
added to undo_offload_socket() incorrectly in r299210.
MFC after: 1 week
Sponsored by: Chelsio Communications
FASTTRAP_MAX_INSTR_SIZE is the largest valid value of a tracepoint, so
correct the assertion accordingly. This limit was hit with a 15-byte NOP.
Reported by: bdrewery
MFC after: 1 week
Sponsored by: Dell EMC Isilon
The intended use is to annotate frequently used globals which either rarely
change (and thus can be grouped in the same cacheline) or are an atomic counter
(which means it may benefit from being the only variable in the cacheline).
Linker script support is provided only for amd64. Architectures without it risk
having other variables put in, i.e. as if they were not annotated. This is
harmless from correctness point of view.
Reviewed by: bde (previous version)
MFC after: 1 month
Recent changes in the pseudo header accessor prototypes start to
use common code RxQ handle on datapath. The handle was located
at the end of the structure with members not used on datapath.
Reviewed by: philip
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D9359
TxQ is destroyed on stop and last used tag should be reset to default 0
on the next start.
Reviewed by: philip
Sponsored by: Solarflare Communications, Inc.
MFC after: 2 days
Differential Revision: https://reviews.freebsd.org/D9358
The MLX5 driver has four different types of DMA allocations which are
now allocated using busdma:
1) The 4K firmware DMA-able blocks. One busdma object per 4K allocation.
2) Data for firmware commands use the 4K firmware blocks split into four 1K blocks.
3) The 4K firmware blocks are also used for doorbell pages.
4) The RQ-, SQ- and CQ- DMA rings. One busdma object per allocation.
After this patch the mlx5en driver can be used with DMAR enabled in
the FreeBSD kernel.
MFC after: 1 week
Sponsored by: Mellanox Technologies
- When device disappears from PCI indicate error device state and:
1) Trigger command completion for all pending commands
2) Prevent new commands from executing and return:
- success for modify and remove/cleanup commands
- failure for create/query commands
3) When reclaiming pages for a device in error state don't ask FW to
return all given pages, just release the allocated memory
MFC after: 1 week
Sponsored by: Mellanox Technologies
PCI device(s), changes:
- alloc_entry() now clears bit for page slot entry aswell
- update of cmd->ent_arr[] is now under cmd->alloc_lock
- complete command if alloc_entry() fails
MFC after: 1 week
Sponsored by: Mellanox Technologies
By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.
To enable reading of diagnostic counters set one or more of the
following sysctls to one:
dev.mce.X.conf.diag_general_enable=1
dev.mce.X.conf.diag_pci_enable=1
MFC after: 1 week
Sponsored by: Mellanox Technologies
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.
MFC after: 1 week
Sponsored by: Mellanox Technologies
register, in addition to configuring it as input with the pinmux driver.
There was a control register bit commented as "no desc in datasheet". A
later revision of the manual reveals the bit to be an input/output control
for the timer pin. In addition to configuring capture or pulse mode, you
apparently have to separately configure the pin direction in the timer
control register.
Before this change, the timer block was apparently driving a signal onto a
pad configured by pinmux as input. Capture mode still accidentally worked
for me during testing because I was using a very strong signal source that
just out-muscled the weaker drive from the misconfigured pin.
* allocate an ext bit for fragment offload. Some NICs (like the ath10k
hardware in native wifi or 802.3 mode) support doing packet fragmentation
in firmware/hardware, so we don't have to do it here.
* allocate an ext bit for VHT and start using it.
* Although the hardware is awake, the power state handling doesn't think so.
So just explicitly wake it up early in setup so ath_hal calls don't complain.
* We shouldn't be transmitting or ACKing frames during DFS CAC or on passive
channels before we hear a beacon. So, start laying down comments in the
places where this work has to be done.
Note:
* The main bit missing from finishing this particular bit of work is a state
call to transition a VAP from passive to non-passive when a beacon is heard.
CAC is easy, it's an interface state. So, I'll go and add a method to control
that soon.
all of them in terms of an sbuf-based back-end, xpt_path_sbuf. This
unifies the implementation, but more importantly it stops the output
fropm being split between 4 or more invocations of printf. The
multiple invocations cause interleaving of the messages on the
console during boot, making the output of disk discovery often
unintelligible. This change helps a lot, but more work is needed.
Reviewed by: ken, mav
Sponsored by: Netflix
Commit r312747 ("Setup decoding windows for ARMADA38X") resulted
in build failing for Marvell platforms, which don't have AHCI controller.
This patch provides a fix by adding dummy functions for such cases.
On the occasion rename register dump routine to decode_win_ahci_dump,
in order to avoid confusion.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
- Replace pcpu_find(curcpu) with get_pcpu(), which is much
more direct.
- Remove armv4 pcpu fields which I added in r286296 but never
needed to use.
- armv6 pc_qmap_addr was leftover from the old armv6 pmap
implementation. Rename it and put it to use in the new one.
Noted by: skra
Reviewed by: skra
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9312
under a shared read lock. This patch attempts to upgrade the lock to
an exclusive write lock. If the exclusive write lock fails to be
obtained, the current fragment is not placed at the head of the list.
This portion of the patch was inspired by NetBSD ip_frag.c r1.4 (which
effectively removed the section of code that performed the reordering).
The patch to sys/contrib/ipfilter/netinet/ip_compat.h adds the
MUTEX_TRY_UPGRADE macro to support the patch to ip_frag.c.
The patch to contrib/ipfilter/lib/rwlock_emul.c supports this patch
by emulating the mutex in userspace when exercised by ipftest(1).
Inspired by: NetBSD ip_frag.c r1.4
MFC after: 1 month
This calls ioctl() handlers for the different interfaces in the bridge.
These handlers expect to get called in an ioctl context where it's safe
for them to sleep. We may not sleep with the bridge lock held.
However, we still need to protect the interface list, to ensure it
doesn't get changed while we iterate over it.
Use BRIDGE_XLOCK(), which prevents bridge members from being removed.
Adding bridge members is safe, because it uses LIST_INSERT_HEAD().
This caused panics when adding xen interfaces to a bridge.
PR: 216304
Reviewed by: ae
MFC after: 1 week
Sponsored by: RootBSD
Differential Revision: https://reviews.freebsd.org/D9290
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).
This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.
This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.
There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.
Reviewed by: glebius
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
Taking closer look on my ASM1062 I found that it has bunch of issues around
error recovery: reported wrong CCS, failed commands reported as completed,
READ LOG EXT times out after NCQ error. This patch workarounds first two
problems, that were making ATAPI devices close to unusable on these HBAs.
MFC after: 2 weeks
In r263232 sys/capability.h was renamed to sys/capsicum.h, to avoid
conflicts with a capability.h header found on other operating systems.
Sponsored by: The FreeBSD Foundation
success and a good value. Only then try to use it and set the MSIX_ENABLE
bit.
With the current em(4) driver we have observed failures in this case in a
specific environment when pci_find_cap() would not return the assumed
value, which meant we ended up writing to PCI register 2 (PCI_DEVICE_ID)
which is read-only.
PR: 216456
Submitted by: bz
This file provides support for AHCI mode on Armada38x
and adds new optional AHCI device to arm/mv/files.mv.
Submitted by: Konrad Adamczyk <ka@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D9222
It occurred that some Marvell integrated controllers
require additional time after soft reset to work properly.
Introduce new quirk (AHCI_Q_MRVL_SR_DEL), that enable
such operation.
Submitted by: Konrad Adamczyk <ka@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: mav
Differential revision: https://reviews.freebsd.org/D9221
It is necesarry to open memory windows on internal bus for
AHCI driver to work correctly.
Submitted by: Konrad Adamczyk <ka@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D9220
Configure decoding windows only for devices with
enabled nodes in FDT.
Submitted by: Konrad Adamczyk <ka@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D9219
method
Method could be used before we can access device_t structure.
As per simple phandle_t handle we can access FDT to check
if specified node has been enabled.
It will be used in Marvell's common configuration code.
Submitted by: Konrad Adamczyk <ka@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: zbb, meloun-miracle-cz
Differential revision: https://reviews.freebsd.org/D9218
Adding SHA256 support to Marvell crypto driver resulted in regression
for older SoC's, not capable of handling this mode in hardware.
Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu>
Obtained from: Stormshield
Sponsored by: Stormshield
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D9215
This commit introduces following changes in order to get rid of
ifdef's from all around the driver.
* Introduce sc_soc_id field in cesa_softc structure - this value is
obtained in cesa_attach() anyway, so make use of it.
* Replace ifdefs with SoC ID checks.
* Perform PM control status only for relevant SoC's.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D9247
Marvell Armada 38x is supported in 3 variants,
so take all into consideration in crypto driver
attach routine.
Submitted by: Marcin Wojtas <mw@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Reviewed by: zbb
Differential revision: https://reviews.freebsd.org/D9248
* Currently supports only Armada38X family but other Marvell SoC's
can be added if needed.
* Provides temperature is C deg.
* To print the temperature one can use:
sysctl dev.armada_thermal.0.temperature
Submitted by: Zbigniew Bodek <zbb@semihalf.com>
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D9217
HWPMC_HOOKS is enabled in GENERIC and triggers some work avoidable in the
common (module not loaded) case.
In particular this avoids permission checks + lock downgrade
singlethreaded and in cases were an executable mapping is found the pmc
sx lock is no longer bounced.
Note this is a band aid.
MFC after: 1 week
Add additionally safety and overflow checks to clock_ts_to_ct and the
BCD routines while we're here.
Perform a safety check in sys_clock_settime() first to avoid easy local
root panic, without having to propagate an error value back through
dozens of APIs currently lacking error returns.
PR: 211960, 214300
Submitted by: Justin McOmie <justin.mcomie at gmail.com>, kib@
Reported by: Tim Newsham <tim.newsham at nccgroup.trust>
Reviewed by: kib@
Sponsored by: Dell EMC Isilon, FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D9279
Add internal tracking of smp startup status to reliably figure out
what methods are to be used to get gtaskqueue up and running.
e1000:
Calculating this pointer gives undefined behaviour when (last == -1)
(it is before the buffer). The pointer is always followed. Panics
occurred when it points to an unmapped page. Otherwise, the pointed-to
garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the
broken case the loop was usually null and the function just returned, and
this was acidentally correct.
Submitted by: bde
Reported by: Matt Macy <mmacy@nextbsd.org>
Add internal tracking of smp startup status to reliably figure out
what methods are to be used to get gtaskqueue up and running.
e1000:
Calculating this pointer gives undefined behaviour when (last == -1)
(it is before the buffer). The pointer is always followed. Panics
occurred when it points to an unmapped page. Otherwise, the pointed-to
garbage tends to not have the E1000_TXD_STAT_DD bit set in it, so in the
broken case the loop was usually null and the function just returned, and
this was acidentally correct.
Submitted by: bde
Reviewed by: Matt Macy <mmacy@nextbsd.org>
If "capacity" LU option is set, ramdisk backend now implements featured
thin provisioned disk, storing data in malloc(9) allocated memory blocks
of pblocksize bytes (default PAGE_SIZE or 4KB). Additionally ~0.2% of LU
size is used for indirection tree (bigger pblocksize reduce the overhead).
Backend supports all unmap and anchor operations. If configured capacity
is overflowed, proper error conditions are reported.
If "capacity" LU option is not set, the backend operates mostly the same
as before without allocating real storage: writes go to nowhere, reads
return zeroes, reporting that all LBAs are unmapped.
This backend is still mostly oriented on testing and benchmarking (it is
still a volatile RAM disk), but now it should allow to run real FS tests,
not only simple dumb dd.
MFC after: 2 weeks
This makes it easier for the userland script to find the releated
VF interface.
Reviewed by: sephe
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D9101
Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together (both NICs have the same MAC address), mainly to
support seamless live migration.
When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).
Note: multicast/broadcast packets are still received through the synthetic
NIC and we need to inject the packets through the VF interface (if the VF is
UP), even if the synthetic NIC is DOWN (so we need to force the rxfilter
to be NDIS_PACKET_TYPE_PROMISCUOUS, when the VF is UP).
Reviewed by: sephe
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8964
Hyper-V's NIC SR-IOV implementation needs a Hyper-V synthetic NIC and
a VF NIC to work together, mainly to support seamless live migration.
When the VF device becomes UP (or DOWN), the synthetic NIC driver needs
to switch the data path from the synthetic NIC to the VF (or the opposite).
So the synthetic NIC driver needs to know when a VF device is becoming
UP or DOWN and hence the patch is made.
Reviewed by: sephe
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8963
It's unnecessary because the upper nework stack does the same checking.
In the case of Hyper-V SR-IOV, we need to remove the checking because
1) multicast/broadcast packets are still received through the synthetic
NIC and we need to inject the packets through the VF interface;
2) we must inject the packets even if the synthetic NIC is down, or has
a different MTU from the VF device.
Reviewed by: sephe
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8962
This will be used by the coming NIC SR-IOV patch.
Reviewed by: sephe
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8909
Variables "fast" and "active" are both constant in lacp_port_create(), but
comments mispleadingly suggest that "fast" can be changed via ioctl. The
constant values control the value of "lp->lp_state", so it too is constant,
and the code for assigning different value to it is essentially dead.
Remove both "fast" and "active", and set "lp->lp_state" unconditionally;
that gets rid of the dead code and misleading comments.
CID: 1305692
CID: 1305734
Reported by: asomers
Reviewed by: asomers
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D9302
Remove custom DTS duplicate of tda19988 node and use upstream-provided
one introduced by r295436. This duplication created two tdaX devices
which confused fb driver into using only 640x480 area while setting
display to native resolution.
Reported by: Michael Smith
MFC after: 3 days
* limit cabq to 64 - in practice if this stays at ath_txbuf then
all buffers can be tied up by a very busy broadcast domain (eg ARP
storm, way too much MDNS/NETBIOS). It's been like this in the
freebsd-wifi-build AP project for the longest time.
* Now that I figured out the hilarity inherent in aggregate forming
and AR9380 EDMA work, change the per-node to 64 frames by default.
I'll do some more work to shorten the queue latency introduced when
doing data so TCP isn't so terrible, but it's now no longer /always/
tens of milliseconds of extra latency when doing active iperf tests.
Notes:
The reason for the extra latency is partly tx/rx taskqueue handling and
scheduling, and partly due to a lack of airtime/QoS awareness of per-node
traffic. Ideally we'd have different limits/priorities on the QoS/TID
levels per node so say, voice/video data got a better share of buffer
allocations over best effort/bulk data, but we currently don't implement
that. It's not /hard/ to do, I just need to do it.
Tested:
* AR9380 (STA), AR9580 (hostap) - both with the relevant changes.
TCP is now at around 180mbit with rate control and RTS protection
enabled. UDP stays at 355mbit at MCS23, no HT protection.
This is two fixes, which establishes what I /think/ is pretty close to the
theoretical PHY maximum speed on the AR9380 devices.
* When doing A-MPDU on a TID, don't queue to the hardware directly if
the hardware queue is busy. This gives us time to get more packets
queued up (and the hardware is busy, so there's no point in queuing
more to the hardware right now) to potentially form an A-MPDU.
This fixes up the throughput issue I was seeing where a couple hundred
single frames were being sent a second interspersed between A-MPDU
frames. It just happened that the software queue had exactly one
frame in it at that point. Queuing it until the hardware finishes
transmitting isn't exactly costly.
* When determining whether to dequeue from a software node/TID queue into
the hardware queue, fix up the checks to work right for EDMA chips
(ar9380 and later.) Before it was not dispatching anything until
the FIFO was empty. Now we allow it to dispatch another aggregate
up to the hardware aggregate limit, like I intended with the earlier
work.
This allows a 5GHz HT40, short-GI, "htprotmode off" test at MCS23
to achieve 357 Mbit/sec in a one-way UDP test. The stars have to be
aligned /just right/ so there are no retries but it can happen.
Just don't expect it to work in an OTA test if your 2yo is running
around the room - MCS23 is very very sensitive to channel conditions.
Tested:
* AR9380 STA (test) -> AR9580 hostap
TODO:
* More thorough testing on pre-AR9380 chips (AR5416, AR9160, AR9280)
* (Finally) teach ath_rate_sample about throughput/latency rather than
air time, so I can get good transmit rates with a 2yo running around.
When investigating performance on UDP TX on the AR9380 I found that the
following sequence was occuring:
* INTR
* EINPROGRESS - nothing yet
* INTR
* TXSTATUS - process a TX completion for an aggregate
* INTR, INTR
* TXSTATUS - process a TX completion for an aggregate
* TXD, TXD ... populate frames from the hardware queue and submit
What should be happening is a completed TXSTATUS fires off more packets
that are queued on active TIDs.
What /was/ happening was after that first TXSTATUS the TX queue hardware queue
was still empty, so it didn't push anything into the FIFO. Only after the
second TXSTATUS did any progress get made.
This is one of two commits - it ensures that the software TX queue scheduler
is called /after/ TX completion, otherwise no frames from the software staging
queues will be processed into the hardware queues.
The second commit will fix it so it populates aggregate frames correctly
when the above occurs - right now ath_txq_sched() is called, but it doesn't
populate anything because its pre-check conditions are wrong.
Whilst here, add/tweak debugging.
Tested:
* AR9380 STA (testing device) -> AR9580 hostap
Building kernel with devel/powerpc64-gcc (6.2.0) yields the following error:
/usr/src/sys/powerpc/powerpc/db_trace.c:299:20: error: calling
'__builtin_frame_address' with a nonzero argument is unsafe
[-Werror=frame-address]
Work around this by dereferencing the frame address manually instead.
PR: 215600
Reported by: Mark Millard <markmi AT dsl-only DOT net>
MFC after: 2 weeks
This ioctl has been considered legacy by upstream since the DTrace code
was first imported, and is unused. The removal also allows some
simplification of dtrace_helper_slurp().
Also remove a bogus copyout in the DTRACEHIOC_ADDDOF handler. Due to a
bug, it would overwrite an in-memory copy of the DOF header rather than
the passed-in DOF helper. Moreover, DTRACEHIOC_ADDDOF already copies the
helper back out automatically since its argument has the IOC_OUT attribute.
critical_exit().
Based on the discussion with: jhb
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential revision: D9276
MFC after: 1 week
Our base binutils sets -many by default anyway, but external gcc may not do
this.
PR: kern/215948
Submitted by: Mark Millard <markmi AT dsl-only DOT net>
Reported by: Mark Millard
MFC after: 2 weeks
This is supposed to only be applied to the first subframe and only if
RTS/CTS is being done. I'm still not yet checking RTS/CTS exchange status
so it's just happening for all subframes on AR9380 and later.
This gets MCS23 throughput up from around 250mbit to 303mbit with RTS/CTS
protection enabled, and around 330mbit with no HT protection enabled.
Now, MCS23 has a PHY rate of 450mbit and we should be seeing closer to
400mbit for a straight one-way UDP test, but this beats the previous
maximum throughput.
Tested:
* AR9380 (STA) -> AR9580 (AP) - STA with the modifications, doing UDP TX
test using iperf.
mappings for armv6 pmap zero and copy operations to the MD PCPU region.
Change sysmap initialization to only allocate KVA pages for CPUs that
are actually present.
While here, collapse CMAP3 into CMAP2 (their use was mutually exclusive
anyway) and "recover" some space in PCPU padding that has always been
available due to 64-byte cacheline padding.
Reviewed by: skra
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9172
descriptor state will not change anymore). This seems to eliminate the
race where we can miss a stalled queue under high load.
While here remove the unnecessary curly brackets.
Reported by: Konstantin Kormashev <konstantin@netgate.com>
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
Set both IEEE80211_HTCAP_LDPC and IEEE80211_HTC_TXLDPC capability flags
if LDPC is supported + set 'do_ldpc = 1' only when it is not disabled,
not just supported.
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D9277
- Pad small packets to 60 bytes and not 64 (exclude the CRC bytes);
- Pad the packet using m_append(9), if the packet has enough space for
padding, which is usually true, it will not be necessary append a newly
allocated mbuf to the chain.
Suggested by: yongari
MFC after: 3 days
Sponsored by: Rubicon Communications, LLC (Netgate)
It is only a first step and not perfect, but better then nothing.
The main blocker is CAM target frontend, that can not be unloaded,
since CAM does not have mechanism to unregister periph driver now.
MFC after: 2 weeks
Stop testing for LK_RETRY and error multiple times. Also postpone the
VI_DOOMED until after LK_RETRY was seen as it reads from the vnode.
No functional changes.
A recent change enforced the VAP limit as well as the peer limit.
I now need to actually set iv_ampdu_limit or we don't transmit more
than 8K sized aggregates.
This restores the expected (suboptimal, but still much faster) behaviour.
Tested:
* AR9380, STA mode
SDM states that CLFLUSHOPT instructions can be ordered with other
writes by SFENCE, heavier MFENCE is not required.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The length of the scsi_set_timestamp_parameters struct was incorrect. LTO-5
drives don't care, but LTO-7 drives do.
Reviewed by: Sam Klopsch
MFC after: 2 weeks
Sponsored by: Spectra Logic Corp
configtimer().
During normal operation "state->nextcallopt" will always be less than
or equal to "state->nextcall" and checking only "state->nextcallopt"
before calling "callout_process()" is sufficient. However when
"configtimer()" is called a race might happen requiring both of these
binary times to be checked.
Short description of race:
1) A configtimer() call will reset both "state->nextcall" and
"state->nextcallopt" to the same binary time.
2) If a "callout_reset()" call happens between "configtimer()" and the
next "callout_process()" call, "state->nextcallopt" will get updated
and "state->nextcall" will remain at the current time. Refer to logic
inside cpu_new_callout().
3) getnextcpuevent() only respects "state->nextcall" and returns this
value over and over again, even if it is in the past, until "now >=
state->nextcallopt" becomes true. Then these two time variables are
corrected by a "callout_process()" call and the situation goes back to
normal.
The problem manifests itself in different ways. The common factor is
the timer process(es) consume all CPU on one or more CPU cores for a
long time, blocking other kernel processes from getting execution
time. This can be seen by very high interrupt counts as displayed by
"vmstat -i | grep timer" right after boot.
When EARLY_AP_STARTUP was enabled in r310177 the likelyhood of hitting
this bug apparently increased.
Example output from "vmstat -i" before patch:
cpu0:timer 7591 69
cpu9:timer 39031773 358089
cpu4:timer 9359 85
cpu3:timer 9100 83
cpu2:timer 9620 88
Example output from "vmstat -i" after patch:
cpu0:timer 4242 34
cpu6:timer 5531 44
cpu3:timer 6450 52
cpu1:timer 4545 36
cpu9:timer 7153 58
Before the patch cpu9 in the example above, was spinning in a loop in
order to reach 39 million interrupts just a few seconds after
bootup. After the patch the timer interrupt counts are more or less
consistent.
Discussed with: mav @
Reported by: several people
MFC after: 1 week
Sponsored by: Mellanox Technologies
When ixgbe receives an interrupt indicating that a new optical module
may have been inserted, it discards all of its current media types
by calling ifmedia_removeall() and then creates a new set of media
types for the supported media on the new module. However,
ifmedia_removeall() was maintaining a pointer to whatever the
current media type was before the call to ifmedia_removealL().
The result of this was that any attempt to read the current media
type of the interface (e.g. via ifconfig) would return potentially
garbage data from free memory (or if one were particularly unlucky
on an architecture that does not malloc() from a direct map, page
fault the kernel).
Fix this by NULL'ing out the current media field in if_media.c,
and have ixgbe update the current media type after recreating
them.
Submitted by: Matt Joras <matt.joras AT gmail DOT com>
Reviewed by: sbruno, erj
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9164
For consistency with the qualifiers added in r310977, define a new
qualifier _Null_unspecified which is also defined in clang 3.7+.
Add two new macros:
__NULLABILITY_PRAGMA_PUSH
__NULLABILITY_PRAGMA_POP
These are for use in headers when we want avoid noisy warnings if
some pointers are left without nullability annotations.
These are added with way ahead of their first use to teach the GCC
ports headers of their existance before their first use.
- Add new sysctl node to control the transmit packet bufring.
- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.
- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Suggested by: gallatin @
6569 large file delete can starve out write ops
illumos/illumos-gate@ff5177ee8bff5177ee8bhttps://www.illumos.org/issues/6569
The core issue I've found is that there is no throttle for how many
deletes get assigned to one TXG. As a results when deleting large files
we end up filling consecutive TXGs with deletes/frees, then write
throttling other (more important) ops.
There is an easy test case for this problem. Try deleting several
large files (at least 1/2 TB) while you do write ops on the same
pool. What we've seen is performance of these write ops (let's
call it sideload I/O) would drop to zero.
More specifically the problem is that dmu_free_long_range_impl()
can/will fill up all of the dirty data in the pool "instantly",
before many of the sideload ops can get in. So sideload
performance will be impacted until all the files are freed.
The solution we have tested at Nexenta (with positive results)
creates a relatively simple throttle for how many "free" ops we let
into one TXG.
However this solution exposes other problems that should also be
addressed. If we are to slow down freeing of data that means one
has to wait even longer (assuming vnode ref count of 1) to get shell
back after an rm or for NFS thread to finish the free-ing op.
To avoid this the proposed solution is to call zfs_inactive() async
for "large" files. Async freeing then begs for the reclaimed space
to be accounted for in the zpool's "freeing" prop.
The other issue with having a longer delete is the inability to
export/unmount for a longer period of time. The proposed solution
is to interrupt freeing of blocks when a fs is unmounted.
Author: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Reviewed by: avg
Differential Revision: D9008
It's possible to get EFAULT when writing a segment backed by a file
if the segment extends beyond the file.
The core dump could still be useful if we skip the rest of the segment
and proceed to other segements.
The skipped segment (or a portion of it) will be zero-filled.
While there, use 'const' to signify that core_write() only reads the
buffer and use __DECONST before calling vn_rdwr_inchunks() because it
can be used for both reading and writing.
Before the change:
kernel: Failed to write core file for process mmap_trunc_core (error 14)
kernel: pid 77718 (mmap_trunc_core), uid 1001: exited on signal 6
After the change:
kernel: Failed to fully fault in a core file segment at VA 0x800645000 with size 0x4000 to be written at offset 0x29000 for process mmap_trunc_core
kernel: pid 4901 (mmap_trunc_core), uid 1001: exited on signal 6 (core dumped)
Reviewed by: julian, kib
Obtained from: Panzura (older version of the change)
MFC after: 5 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9233
The error is:
vmm_dev.c: In function 'alloc_memseg':
vmm_dev.c:261:11: error: null argument where non-null required (argument 1) [-Werror=nonnull]
Apparently, the gcc is unable to figure out that if a ternary operator
produced a non-NULL value once, then the operator with exactly the same
operands would produce the same value again.
MFC after: 1 week
Add own state variable to track if a sendqueue is stopped or not.
This will prevent traffic from entering the sendqueue while it is
being destroyed.
Update drain function to wait for traffic to be transmitted before
returning when the link state is active.
Add extra checks in transmit path for stopped SQ's.
While at it:
- Use likely() for a mbuf pointer check.
- Remove redundant IFF_DRV_RUNNING check.
MFC after: 1 week
Sponsored by: Mellanox Technologies
There were several places where reference to compression were left
unfinished. Furthermore, KASSERTs contained references to MPPC_INVALID
which is not defined in the tree and therefore were sure to break with
INVARIANTS: comment them out.
Reported by: Eugene Grosbein
PR: 216265
MFC after: 3 days
All of the printing from the tables file now has wrappers so that the
handling is cleaner and it's possible to print something out (say, during
development) without having to fight the global debug flags. This re-org
will also make it easier to have the tables be compiled out at build time
if desired.
Other than fixing some minor bugs, there are no user-visible changes from
this change
Sponsored by: Netflix, Inc.
Differential Revision: D9238
users that choose not to use EARLY_AP_STARTUP.
There is still an initialization issue/panic with !SMP and !EARLY_AP_STARTUP
that we have yet to resolve.
Submitted by: bde
The option "nonc" disables using of namecache for the created mount,
by default namecache is used. The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs. Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default. On the
other hand, smaller machines may benefit from lesser namecache
pressure.
Discussed with: mjg
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
For directories, node->tn_spec.tn_dir.tn_parent pointer to the parent
is used. For non-directories, the implementation is naive, all
directory nodes are scanned to find a dirent linking the specified
node. This can be significantly improved by maintaining tn_parent for
all nodes, later.
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer. In this case, tmpfs_alloc_vp() accesses freed memory.
Introduce the reference count on the nodes. The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference. Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list. Both nodes and tmpfs_mounts are removed when
refcount goes to zero.
Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished. The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.
Tested by: pho (as part of larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Commit r270423 fixed a regression in sched_yield() that was introduced
in earlier changes. Unfortunately, at the same time it introduced an
new regression. The problem is that SWT_RELINQUISH (6), like all other
SWT_* constants and unlike SW_* flags, is not a bit flag. So, (flags &
SWT_RELINQUISH) is true in cases where that was not really indended,
for example, with SWT_OWEPREEMPT (2) and SWT_REMOTEPREEMPT (11).
A straight forward fix would be to use (flags & SW_TYPE_MASK) ==
SWT_RELINQUISH, but my impression is that the switch types are designed
mostly for gathering statistics, not for influencing scheduling
decisions.
So, I decided that it would be better to check for SW_PREEMPT flag
instead. That's also the same flag that was checked before r239157.
I double-checked how that flag is used and I am confident that the flag
is set only in the places where we really have the preemption:
- critical_exit + td_owepreempt
- sched_preempt in the ULE scheduler
- sched_preempt in the 4BSD scheduler
Reviewed by: kib, mav
MFC after: 4 days
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D9230
sure the XHCI controller is reset after halting it. The problem is
clearly a BIOS bug as the suspend and resume is failing without
loading the XHCI driver. The same happens when using Linux and the
XHCI driver is not loaded.
Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com>
PR: 216261
MFC after: 1 week
As suggested in r167010, use the structure type and macros to access and
modify UFS2 extended attributes. Add assertions that pointers are
aligned in places where we now access the data through a structure
pointer, instead of character-by-character.
PR: 216127
Reported by: dewayne at heuristicsystems.com.au
Reviewed by: kib@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9225
disabled (Hi netmap!).
Only remove the CRC bytes from packets when the hardware tell us to do so.
Fixes the 'discard frame w/o leading ethernet header' issues.
Sponsored by: Rubicon Communications, LLC (Netgate)
Remove TMPFS_ASSERT_ELOCKED(). Its claims are already stated by other
asserts nearby and by VFS guarantees.
Change TMPFS_ASSERT_LOCKED() and one inlined place to use
ASSERT_VOP_(E)LOCKED() instead of hand-rolled imprecise asserts.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Edit comments which explain no longer relevant details, and add
locking annotations to the struct tmpfs_node members.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The ea_name string is not nul-terminated. Correct the documentation.
Because the subsequent field is padded to 8 bytes, and the padding is
zeroed, the ea_name string will appear to be nul-terminated whenever the
length isn't exactly one (mod eight).
This was introduced in r167010 (2007).
Additionally, mark the length fields as unsigned. This particularly
matters for the single byte ea_namelength field, which can represent
extended attribute names up to 255 bytes long.
No functional change.
PR: 216127
Reported by: dewayne at heuristicsystems.com.au
Reviewed by: kib@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9206
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.
- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.
- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().
- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.
- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.
- How rate limiting works:
1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.
2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.
3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.
4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.
Reviewed by: wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision: https://reviews.freebsd.org/D3687
Sponsored by: Mellanox Technologies
MFC after: 3 months
the fifth argument to functions being traced, however there was an error
where the userspace stack was being used. This may be invalid leading to
a kernel panic if this address is unmapped.
Submitted by: Graeme Jenkinson <graeme.jenkinson@cl.cam.ac.uk>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D9229
As the efi_devpath_last_node() and efi_devpath_trim() can return NULL
pointers, the consumers of this API should check the the NULL pointers.
Same for efinet_dev_init() using calloc().
Reported by: Robert Mustacchi <rm@joyent.com>
Reviewed by: jhb, allanjude
Approved by: allanjude (mentor)
Differential Revision: https://reviews.freebsd.org/D9203
Clang apparently requires the explicit form of this instruction, and rejects
uses which ignore the optional cmpD register. This was the only use of the
shorthand form of the instruction, so just fix it up to match the others.
PR: kern/215681
Submitted by: Mark Millard
Reported by: Mark Millard <markmi _AT_ dsl-only.net>
MFC after: 2 weeks
Languages like C++17 and Go provide direct support for slice types:
pointer/length pairs. The CloudABI generator now has more complete for
this, meaning that for the C binding, pointer/length pairs now use an
automatic naming scheme of ${name} and ${name}_len.
Apart from this change and some reformatting, the ABI definitions are
identical. Binary compatibility is preserved entirely.
This field has no practical use and never readed. Initiators already
receive respective residual size from frontends. Removed field had
different semantics, which looks useless, and was never passed through
by any frontend.
While there, fix kern_data_resid field support in case of HA, missed in
r312291.
MFC after: 13 days
This lock was replaced from rwlock in r272840. But unlike rwlock, rmlock
doesn't allow recursion on rm_rlock(), so at this time fix this with
RM_RECURSE flag. Later we need to change ipfw to avoid such recursions.
PR: 216171
Reported by: Eugene Grosbein
MFC after: 1 week
The Zedboard has a hardware bug where initialization of the USB PHY
occasionally fails on boot-up. Fix regression in -CURRENT when
kernel panics on such occasion. 11-RELEASE branch works fine
PR: 215862
Submitted by: Thomas Skibo <thoma555-bsd@yahoo.com>