This driver seems to have a bug. The bug was carefully saved during
conversion. In the al_eth_mac_table_unicast_add() the argument 'addr',
which is the actual address is unused. So, the function is called as
many times as we have addresses, but with the exactly same argument
list. This doesn't make any sense, but was preserved.
- In em_msix_link(), properly handle IGB-class devices after the iflib(4)
conversion again by only setting EM_MSIX_LINK for the EM-class 82574
and by re-arming link interrupts unconditionally, i. e. not only in
case of spurious interrupts. This fixes the interface link state change
detection for the IGB-class. [1]
- In em_if_update_admin_status(), only re-arm the link state change
interrupt for 82574 and also only if such a device uses MSI-X, i. e.
takes advantage of autoclearing. In case of INTx and MSI as well as
for LEM- and IGB-class devices, re-arming isn't appropriate here and
setting EM_MSIX_LINK isn't either.
While at it, consistently take advantage of the hw variable.
PR: 236724 [1]
Differential Revision: https://reviews.freebsd.org/D21924
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D21263
They're formatted into the device name like unit numbers, anyway; store the
number in mda_unit => si_drv0 like dev2unit() expects.
No functional change intended.
Sponsored by: Dell EMC Isilon
The sentinel value for "use the rest of the region," -1, isn't zero modulo
PAGE_SIZE. Relax the check to permit the intended special value.
X-MFC-With: r353110
Sponsored by: Dell EMC Isilon
Follow-up to incomplete pedantic change in r353691 by actually fixing the
default implementation to match the interface type. Mea culpa.
X-MFC-With: r353691, r339754
After r339754, the additional interface parameter was accidentally left out
of the default acpi_generic_id_probe implementation. Apparently this does
not cause any real problems, so this fix is mostly stylistic.
No functional change intended.
X-MFC-With: r339754
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).
It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).
The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.
Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.
The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.
No other functional change intended.
Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
r259680 added support to vt(4) for printing double-width characters.
Remove the comment that claims no support.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
This change applies some suggestions by delphij from D21979.
A write-only variable is removed.
There is a diagnostic message if the driver does not recognize the chip.
A chained if-statement is converted to a switch.
MFC after: 3 weeks
From Jake:
When updating the device statistics, report whether or not we have
received any pause frames to the iflib stack. This allows the iflib
stack to avoid generating a Tx hang message while the device is paused.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21870
From Jake:
Notify the iflib stack of whether we received any pause frames during
the timer window. This allows the stack to avoid reporting a Tx hang due
to the device being paused.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21869
From Jake:
The e1000 driver sets the iflib shared context isc_pause_frames value to
the number of received xoff frames. This is done so that the iflib
watchdog timer won't trigger a Tx Hang due to pause frames.
Unfortunately, the function simply sets it to the value of the xoffrxc
counter. Once the device has received a single XOFF packet, the driver
always reports that we received pause frames. This will prevent the Tx
hang detection entirely from that point on.
Fix this by assigning isc_pause_frames to a non-zero value if we
received any XOFF packets in the last timer interval.
We could attempt to calculate the total number of received packets by
doing a subtraction, but the iflib stack only seems to check if
isc_pause_frames is non-zero.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21868
the slot is flagged as 'embedded'.
The features related to embedded and shared slots were added in v3.0 of
the sdhci spec. Hardware prior to v3 sometimes supported 1.8v on non-
removable devices in embedded systems, but had no way to indicate that
via the standard sdhci registers (instead they use out of band metadata
such as FDT data).
This change adds the controller specification version to the check for
whether to filter out the 1.8v selection. On older hardware, the 1.8v
option is allowed to remain. On 3.0 or later it still requires the
embedded-slot flag to remain.
This is part of the fix for PR 241301 (eMMC not detected on Beaglebone).
Changes to the sdhci_ti driver are also needed for a full fix.
PR: 241301
This allows to remove a bunch of low level code.
Also, superio(4) provides safer interaction with other drivers
that work with Super I/O configuration registers.
Tested only on PCengines APU2:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
wbwd0: <Nuvoton NCT6102 (0xc4/0x53) Watchdog Timer> at WDT ldn 0x08 on superio0
The watchdog output is incorrectly wired on that system and the watchdog
does not really do it its job, but the pulse can be seen with a signal
analyzer.
Reviewed by: delphij, bcr (man page)
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21979
This is where it logically belongs.
The change allows to drop a bunch of low lewel code.
Reviewed by: gonzo
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21980
From Zach:
Intel documentation indicates that backplane X550EM_X KR devices do not
support Energy Efficient Ethernet. Prior to this patch, X552 devices
(device ID 0x15AB) will crash the system when transitioning EEE state
via sysctl.
Signed-off-by: Zach Vargas <zvargas@xes-inc.com>
PR: 240320
Submitted by: Zach Vargas <zvargas@xes-inc.com>
Reviewed by: erj@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21673
Rescan a PCI bus when the ACPI_NOTIFY_BUS_CHECK event is posted to a
PCI bus.
Reviewed by: scottl
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21948
Install ACPI notify handlers on PCI devices with an _EJ0 method. This
handler is invoked when devices are added or removed.
- When an ACPI_NOTIFY_DEVICE_CHECK event posts, rescan the parent bus
device. Note that strictly speaking we only need to rescan the
specified device, but BUS_RESCAN is what is available, so we rescan
the entire bus.
- When an ACPI_NOTIFY_EJECT_REQUEST event posts, detach the device
associated with the ACPI handle, invoke the _EJ0 method, and then
delete the device.
Eventually this might be changed to vector notify events to devd in
userspace where devctl can be used instead to permit more complex
actions such as graceful unmounting of filesystems.
Tested by: cperciva
Reviewed by: cperciva, imp, scottl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21948
Ignore only ENOENT (no DTS properties found) and ENODEV (driver not
present) non-zero return values from ext_resources.
Reviewed by: manu
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22043
Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548
(resets, regulators, clocks) are not available.
Rely on a system initialization done by a bootloader in that cases.
This fixes operation on Terasic DE10-Pro (an Intel Stratix 10
development kit).
Sponsored by: DARPA, AFRL
Refactor nvdimm_spa_memattr() routine and callers to just save the value at
initialization and use the value directly. The reference value from NFIT,
MemoryMapping, is read only once, so the associated memattr could never
change.
No functional change.
Sponsored by: Dell EMC Isilon
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
This ensures the clip task won't race with t4_destroy_clip_table.
While here, make some mutex destroys unconditional since attach always
initializes them.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21952
If the bootloader enabled DMA we need to fully reset the DMA controller
otherwise we might have some stale data in it that provoke weird
behavior.
MFC after: 1 week
This adds a TOE hook to allocate a KTLS session. It also recognizes
TLS mbufs in the socket buffer and sends those to the NIC using a TLS
work request to encrypt the record before segmenting it.
TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in
addition to enabling KTLS.
Reviewed by: np, gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21891
The PCI block in the adapter requires this field to be set to a valid
queue ID. It is not clear why it did not fail on all machines, but
the effect was that crypto operations reading input data via DMA
failed with an internal PCI read error on machines with 128G or more
of RAM.
Reported by: gallatin
Reviewed by: np
MFC after: 3 days
Sponsored by: Chelsio Communications
As noted by the commit message, callouts are now persistant
and should not be in the auto-zero section of the RQ's and SQ's.
This fixes an assert when using the TX completion event
factor feature with mlx5en(4).
Found by: gallatin@
MFC after: 3 days
Sponsored by: Mellanox Technologies
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
nvdimm_e820 is a newbus pseudo driver that looks for "legacy" e820 PRAM
spans and creates ordinary-looking SPA devfs nodes for them
(/dev/nvdimm_spaN).
As these legacy regions lack real NFIT SPA regions and namespace
definitions, they must be administratively sliced up externally using
device.hints. This is similar in purpose to the Linux memmap= mechanism.
It is assumed that systems with working NFIT tables will not have any use
for this driver, and that that will be the prevailing style going forward,
so if there are no explicit hints provided, this driver does not
automatically create any devices.
Reviewed by: kib (previous version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21885
Create an attachment file for the existing ACPI attachment, and create a
new FDT attachment for the generic-ehci driver.
Submitted by: andrew (Original version)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D19389
Regression introduced in r343629 when malloc result was renamed from spa to
spa_mapping and the 'spa' name was instead used to iterate a table, but the
free() target was not updated.
Reviewed by: kib, scottph
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21871
The PRM suggests random 0 - 10ms to prevent multiple waiters on the same
interval in order to avoid starvation.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
Before attempting to initialize the command interface we must wait till
the fw_initializing bit is clear.
If we fail to meet this condition the hardware will drop our
configuration, specifically the descriptors page address. This scenario
can happen when the firmware is still executing an FLR flow and did not
finish yet so the driver needs to wait for that to finish.
Linux commits:
6c780a0267b8
b8a92577f4be.
MFC after: 3 days
Sponsored by: Mellanox Technologies
in mlx5en(4) after r348254.
The unlimited send tags are shared amount multiple connections and are
not allocated per send tag allocation request. Only increment the refcount.
MFC after: 3 days
Sponsored by: Mellanox Technologies
It may happen during link down that the running state may be observed
non-zero in the transmit routine, right before the running state is
cleared. This may end up using a destroyed mutex.
Make all channel mutexes and callouts persistant.
Preserve receive and send queue statistics during link toggle.
MFC after: 3 days
Sponsored by: Mellanox Technologies
in mlx5core. The EEPROM information is not only a property of the
mlx5en(4) driver.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
The following sysctls are added:
dev.mce.N.conf.qos.cable_length
dev.mce.N.conf.qos.buffers_size
dev.mce.N.conf.qos.buffers_prio
Submitted by: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
All prints in mlx5core should use on of the macros:
mlx5_core_err/dbg/warn
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
In case of health counter fails to increment it indicates a bad device health.
In case when the syndrome indicated by firmware is 0x0, this indicates that
firmware is unable to respond to initialization segment reads.
Add proper print in this case.
Submitted by: slavash@
MFC after: 3 days
Sponsored by: Mellanox Technologies
MPFS is a logical switch in the Mellanox device which forward packets
based on a hardware driven L2 address table, to one or more physical-
or virtual- functions. The physical- or virtual- function is required
to tell the MPFS by using the MPFS firmware commands, which unicast
MAC addresses it is requesting from the physical port's traffic.
Broadcast and multicast traffic however, is copied to all listening
physical- and virtual- functions and does not need a rule in the MPFS
switching table.
Linux commit: eeb66cdb682678bfd1f02a4547e3649b38ffea7e
MFC after: 3 days
Sponsored by: Mellanox Technologies
Add the 512 bytes limit of RDMA READ and the size of remote address to the max
SGE calculation.
Submitted by: slavash@
Linux commit: 288c01b746aa
MFC after: 3 days
Sponsored by: Mellanox Technologies