These registers have read side effects and a read at just the right
(wrong?) time can trash some internal hw state.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
The problem is that ns8250_bus_probe() accesses a field from the
ns8250_softc, which embeds the generic UART softc, but the ns8250_softc
hasn't yet been allocated because we're still probing.
This is a regression from commit 0aefb0a63c. This fixed a problem
where one of the upper four IER bits, which are usually reserved, needs
to be set in order to get RX interrupts before the RX FIFO is full. At
the same time, we avoid clearing those reserved bits (see commit
58957d8717, though other UART drivers I looked at do not bother with
this).
So, copy what ns8250_init() does to disable interrupts, since we don't
know what the "right" mask is at this point.
Reported by: syzbot+f256beefd0df9eb796e7@syzkaller.appspotmail.com
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31124
Changes since 1.25.6.0 are listed here. This list comes from the
Release Notes for "Chelsio Unified Wire 3.14.0.4 for Linux" dated
2021-07-08.
Fixes
-----
BASE:
- Wait 5ms before and after the i2c command that clears the mod_select.
This fixes incorrect port module type read from i2c.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
Properly allocate all mlx5en(4) structures from correct numa domain.
While at it cleanup unused numa domain integers deriving from the
Linux version of mlx5en(4).
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make sure the "uid" field gets properly set when destroying DCT and QP
objects by making a copy of the field when creating such objects.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Currently when we map the hca_core_clock page to the user space,
there are vulnerable registers, one of which is semaphore, on
this page as well. If user read the wrong offset, it can modify the
above semaphore and hang the device.
Hence, mapping the hca_core_clock page to the user space only when
user required it specifically.
After this patch, mlx4 core_clock won't be mapped to user space by
default. Oppose to current state, where mlx4 core_clock is always mapped
to user space.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
To avoid congestion on the same PCI memory register space when
traffic consists mostly of small packets.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The driver expects all TLS tags to be returned to the driver before
it can free the UMA zone where the TLS tags reside.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
IB spec says that a lid should be ignored when link layer is Ethernet,
for example when building or parsing a CM request message (CA17-34).
However, since ib_lid_be16() and ib_lid_cpu16() validates the slid,
not only when link layer is IB, we set the slid to zero to prevent
false warnings in the kernel log.
Linux commit:
65389322b28f81cc137b60a41044c2d958a7b950
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
RoCE is short for Remote direct memory access over Converged Ethernet.
ECN is short for Explicit Congestion Notification.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Querying the PCI config space for offline for every firmware command blocks
the PCI bus and affects performance. Especially for packet pacing and TLS
when objects are frequently created and destroyed.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
Linux commit:
f696bf6d64b195b83ca1bdb7cd33c999c9dcf514
7bb1fafc2f163ad03a2007295bb2f57cfdbfb630
d34ac5cd3a73aacd11009c4fc3ba15d7ea62c411
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
These fields declare which timestamp mode is supported
by the device per RQ/SQ/QP.
In addition add the ts_format field to the select the mode
for RQ/SQ/QP.
Linux commit:
a6a217dddcd544f6b75f0e2a60b6e84c1d494b7e
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state makes the
checks of out-of-scope redundant. Let's remove them together with updating
function signature to return boolean result.
While at it remove unused "ll" parameter from ib_modify_qp_is_ok().
Linux commit:
19b1f54099b6ee334acbfbcfbdffd1d1f057216d
d31131bba5a1630304c55ea775c48cc84912ab59
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
In order to improve readability, add ib_port_phys_state enum to replace
the use of magic numbers.
Linux commit:
72a7720fca37fec0daf295923f17ac5d88a613e1
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Extended atomics are supported with RC and XRC QP types, but Linux commit
a60109dc9a95 added an unneeded check to to_mlx5_access_flags().
This broke XRC QPs.
The following ib_atomic_bw invocation over XRC reproduces the issue:
ib_atomic_bw -d mlx5_1 --connection=XRC --atomic_type=FETCH_AND_ADD
It is safe to remove such checks because the QP type was already checked
in ib_modify_qp_is_ok(), which was previously called from
mlx5_ib_modify_qp().
Linux commit:
13f8d9c16693afb908ead3d2a758adbe6a79eccd
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The maximum page size in the mkey context is 2GB.
Until today, we didn't enforce this requirement in the code, and therefore,
if we got a page size larger than 2GB, we have passed zeros in the
log_page_shift instead of the actual value and the registration failed.
This patch limits the driver to use compound pages of 2GB for mkeys.
Linux commit:
762f899ae7875554284af92b821be8c083227092
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
The patch simplifies mlx5_ib_cont_pages and fixes the following
issues in the original implementation:
First issues is related to alignment of the PFNs. After the check
base + p != PFN, the alignment of the PFN wasn't checked. So the PFN
sequence 0, 1, 1, 2 would result in a page_shift of 13 even though
the 3rd PFN is not 8KB aligned.
This wasn't actually a bug because it was supported by all the
existing mlx5 compatible device, but we don't want to require
this support in all future devices.
Another issue is because the inner loop didn't advance PFN so
the test "if (base + p != pfn)" always failed for SGE with
len > (1<<page_shift).
Linux commit:
d67bc5d4e3e100d762c0f57ea67f28bc219698a6
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
- Upon error more completion events than requested may be generated,
particularly when using the completion event factor feature.
- Count number of event errors in the transmit path.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.
This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.
Linux commit:
ca5b91d63192ceaa41a6145f8c923debb64c71fa
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
Make the mlx5e_mode_table[] array one dimensional, because there is only
one entry, 10G ER/LR, which share the same protocol bit.
This patch only adds support for basic sub-type distinguishing for the
extended protocol bits. Use verbose ifconfig eeprom output to get actual
media type.
Remove write only "connector_type" variable while at it.
MFC after: 1 week
Reviewed by: kib
Sponsored by: Mellanox Technologies // NVIDIA Networking
This code practically has not sleeping points, so Giant is locked for very
long time.
Noted and reviewed by: hselasky
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Import Linux commit 534b1204ca4694db1093b15cf3e79a99fcb6a6da
Add reserved mapping to cover all the register in order to avoid setting
arbitrary values to newer FW which implements the reserved fields.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
Import Linux commit ce28f0fd670ddffcd564ce7119bdefbaf08f02d3:
Add reserved mapping to cover all the register in order to avoid
setting arbitrary values to newer FW which implements the reserved
fields.
Taken from: https://patches.linaro.org/patch/417255/
Reviewed by: hselasky
Sponsored by: Mellanox Technologies // NVIDIA Networking
MFC after: 1 week
In particular, avoid creating TIR or installing flow rules for VXLAN
if the capability is disabled.
Reported and reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
Handlers maintain flow rules and inform hardware about non-standard VxLAN
port in use. The database of the vxlan end points is maintained.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies/NVidia Networking
MFC after: 1 week
sys/dev/sound/pci/hda/hdaa_patches.c:
match_pin_patches: Use HDA_DEV_MATCH instead of regular ==
sys/dev/sound/pci/hda/pin_patch_realtek.h:
Add quirk for Lenovo laptops when ALC298 is used.
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.
The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.
Thanks to Mike Karels for testing and feedback on the driver.
Reviewed by: bcr (manpages), kbowling, scottl, erj
MFC after: 1 month
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30668
As in commit 831850d8b0, this routine can trigger false positives, so
exclude it from instrumentation.
Reported by: pho
Sponsored by: The FreeBSD Foundation
during suspend/resume cycle. Previously used bus_generic_suspend_intr and
bus_generic_resume_intr may cause interrupt storm because of missed
interrupt acknowledges caused by blocking of intr handler.
Reported by: J.R. Oldroyd <jr_AT_opal_DOT_com>
MFC after: 1 week
These have been found in some Arm ACPI tables generated by edk2, e.g.
when describing the pl011 uart on the Arm AEMv8 model.
Reviewed by: imp, jkim
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31110
During the removal of cam_sim_alloc_dev() in
aeb04e88f5 for sdhci.c and the
follow-up build-fix in a72af82e31
slot->dev and slot->bus got mixed up for MMCCAM; slot->dev is
only used in the !MMCCAM case so is uninitialised here leading to
a panic; switch back to slot->bus to return to the status quo.
Reviewed by: imp (ack on arm@)
X-Differential Revision: https://reviews.freebsd.org/D30857
With the v5.13 device-tree update speed of the CPU switch port was
changed to 2.5G. Reflect that in the driver.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
New Samsung 980 SSDs report Namespace Preferred Write Alignment of
8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB).
My quick tests show that 16KB is a minimal sequential write size
when the SSD reaches peak IOPS, so writing much less is very slow.
But writing slightly less or slightly more does not change much,
so it seems not so much a size granularity as minimum I/O size.
Thinking about different stripesize consumers:
- Partition alignment should be based on NPWA by definition.
- ZFS ashift in part of forcing alignment of all I/Os should also
be based on NPWA. In part of forcing size granularity, if really
needed, it may be set to NPWG, but too big value can make ZFS too
space-inefficient, and the 16KB is actually the biggest supported
value there now.
- ZFS recordsize/volblocksize could potentially be tuned up toward
NPWG to work as I/O size granularity, but enabled compression makes
it too fuzzy. And those are normally user-configurable things.
- ZFS I/O aggregation code could definitely use Optimal Write Size
value and may be NPWG, but we don't have fields in GEOM now to report
the minimal and optimal I/O sizes, and even maximal is not reported
outside GEOM DISK to be used by ZFS.
MFC after: 1 week
Make it work, but change the interface to be safe for non-root users. In
particular, right now interface only works for the tables which can be
minimally parsed by kernel to determine the table size. Then, userspace can
query the table size, after that it provides a buffer of needed size
and kernel copies out just table to userspace.
Main advantage is that user no longer need to be able to read /dev/mem,
the disadvantage is the need to have minimal parsers aware of the table
types. Right now the parsers are implemented for ESRT and PROP tables.
Future extension of the present interface might be a return of only
the table physical address, in case kernel does not have suitable
parser yet. Then, a privileged user could read the table from /dev/mem.
This extension, which logically equivalent to the old (non-worked)
EFIIOC_GET_TABLE variant, is not implemented until needed.
Submitted by: Pavel Balaev <pavel.balaev@3mdeb.com>
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30104
Coherently read the phase bit of the status completion record. We loop
over the completion record array, looking for all the transactions in
the same phase that have been completed. In doing that, we have to be
careful to read the status field first, and if it indicates a complete
record, we need to read and process that record. Otherwise, the host
might be overtaken by device when reading this completion record,
leading to a mistaken belief that the record is in phase. This leads to
the code using old values and looking at an already completed entry, which
has no current tracker.
To work around this problem, we read the status and make sure it is in
phase, we then re-read the entire completion record guaranteeing it's
complete, valid, and consistent . In addition we resync the dmatag to
reflect changes since the prior loop for the bouncing dma case.
Reviewed by: jrtc27@, chuck@
Found by: jrtc27 (this fix is based in part on her D30995 fix)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31002
Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.
Reviewed by: jrtc27@, mav@, chuck@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31001
Put the { on the same line as the struct nvme_foo when we define these
structures. It's FreeBSD standard and these were inconsistent.
Sponsored by: Netflix
Subtract one SGE for the case of misaligned address. Also take into
account maximum number of sectors reported by firmware, that gives
nicer 256KB limit instead of 276KB calculated from the SGE limit.
While there, remove number of I/O size checks, duplicating what is
already checked by CAM and busdma(9).
MFC after: 1 month
Sponsored by: iXsystems, Inc.
phandle_t is a uint32_t type, <= 0 comparison doesn't work with it as intended.
This caused the ofw_pci code to attach to PCI bus on ACPI based systems.
Since 3eae4e106a ("Fix error value returned by ofw_bus_gen_get_node().")
ofw subsystem can only return -1 for invalid nodes. Use that.
MFC after: 4 weeks
Reviewed by: mw
Differential revision: https://reviews.freebsd.org/D30953
Make it possible to specify event codes without an offset of
PMC_EV_ARMV8_FIRST, by setting a machine-dependent flag. This is
required to make use of event definitions from pmu-events.
Reviewed by: ray (slightly earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30602
This will be used to detect supported pmu events. The expected format is
the MIDR register with the revision and variant fields masked. See also:
lib/libpmc/pmu-events/arch/arm64/mapfile.csv.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30601
Fix forgotten argument and type error. MMCCAM isn't enabled by default,
and I'd mistakenly thought it was, so these went undetected precommit.
Sponsored by: Netflix
xpt_bus_register and xpt_bus_deregister returns a hybrid error that's
neither a cam_status, nor an errno, but a mix of both. Update
xpt_bus_register and xpt_bus_deregister to return an errno. The vast
majority of current users compare against zero, which can also be
spelled CAM_SUCCESS. Nobody uses CAM_FAILURE, so remove that symbol
to prevent comfusion (nothing returns it either).
Where the return value is saved, ensure that the variable 'error' is
used to store an errno and 'status' is used to store a cam_status where
it makes the code clearer (usually just in functions that already mix
and match). Where the return value isn't used at all, avoid storing it
at all.
Reviewed by: scottl@, mav@ (earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30860
Use the new xpt_path_device to get the device_t using the periph's path.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30855
This makes it more usable in that dhclient will autolaunch from devd
now when cdce devices are plugged in.. It also sets the baudrate, but
this isn't exported via tools, and CDCE doesn't have a good way to
specify the media type, so there isn't a good way to tell userland
what the speed is currently...
Reviewed by: hps
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D30625
The driver used to configure all available classes with some default
parameters on attach and the rest of t4_sched.c was written with the
assumption that all traffic classes are always valid in the hardware.
But this resulted in a lot of informational messages being logged in the
firmware's circular log, crowding out other more useful messages.
This change leaves the tx scheduler alone during attach to reduce the
spam in the devlog. The state of every class is now tracked separately
from its flags and there is support for an 'uninitialized' state.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Recent firmwares are able to utilize the traffic classes of tx channels
that were previously unused. This effectively doubles the number of
traffic classes available per port for 2 port cards. Stop using the raw
per-channel value in the driver and ask the firmware for the number of
usable traffic classes instead.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Some of the changes in this release:
* Large LLQ headers,
* Bug/stability fixes,
* Change of the README/Documentation.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Update the driver in order not to break its compilation
and make use of the new ENA logging system
Migrate platform code to the new logging system provided by ena_com
layer.
Make ENA_INFO the new default log level.
Remove all explicit use of `device_printf`, all new logs requiring one
of the log macros to be used.
IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.
Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.
NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Calling free on a NULL pointer is valid, as appropriate check is already
done internally:
/* free(NULL, ...) does nothing */
if (addr == NULL)
return;
Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.
If the device supports larger LLQ headers, the user can activate this
feature by setting sysctl tunable 'hw.ena.force_large_llq_header' to '1'
in the /boot/loader.conf file.
In case the device isn't supporting this feature, the default value (96B)
will be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
According to man style(9), only C-style comments should be used.
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Implement support for the NXP LS1028A SoC MDIO controller.
It is attached to the internal PCI root complex.
The controller is used to communicate with PHYs of ports connected
to the internal switch.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30731
ENETC it a gigabit Ethernet controller found on the LS1028A board.
It supports basic VLAN offloads - tag extraction, injection and hardware
filtering. Inband MDIO connectivity is used for link status
monitoring through the miibus interface. Fixed-link mode is also
supported, which allows for operation of internal cpu to switch port.
Since no admin interrupts are present in hardware, link status polling
has to be used.
Due to a hardware bug software reset of the NIC results in a external
abort. Because of that most of the hardware initialization is done
during attach. This also means that in the case of an fatal error full
board reset is required.
The enetc_hw.h header was imporoted from Linux. It is dual licensed.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30729
Provide common MDIO code for two LS1028 ENETC controllers -
an external one found on the PCI bus and internal one found in ENETC.
Submitted by: Lukasz Hajec <lha@semihalf.com>
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30730
By definition ofw_bus_get_node() should consistently return -1 when there
is no associated OF node.
MFC after: 4 weeks
Discussed with: nwhitehorn
Analyzed in: https://reviews.freebsd.org/D30761
ddfc9c4c59 was missing changes to two files to complete the
bus_child_pnpinfo_str->bus_child_pnpinfo. This fixes the broken kernel
builds.
Sponsored by: Netflix
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
If the connection is in the process of disconnecting, ic_socket can be
NULL. For icl_cxgbei_conn_transfer_setup(), lock the connection and
check ic_socket before using it. For icl_cxgbei_conn_task_setup(),
the caller already holds the connection lock, so assert it and bail
early with ECONNRESET if the connection is disconnecting.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: f949967c8e cxgbei: Fix a race between transfer setup and a peer reset.
The INQUIRY command may return a CAM_DATA_RUN_ERR code, even when
it succeeds. This happens during driver startup, causing the
current and further inquiries to be aborted, resulting in some
missing information about the controller.
Reviewed by: imp
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30843
Reserve one page for the DMA subsystem, that may need it when the I/O
buffer is not page aligned.
Without this change, writes with the maximum allowed size failed, if:
- physical memory was fragmented, making it necessary to use one DMA
segment for each page
- the buffer to be written was not page aligned, causing the DMA
subsystem to need one extra segment
In the scenario above, the DMA subsystem would run out of segments,
resulting in a write with no SG segments, that would fail.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D30798
ni_txseqs is kept as 16-bit counter, but we need to trim the upper four
bits as they may have special meanings for the firmware / hardware.
For instance, bit 15 enables hardware / firmware generation of sequence
numbers that overrides sequence numbers programmed by the driver.
Reviewed by: adrian
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D30814
Given all the code does operate on struct ifnet, the last step in this
longer series of changes now is to rename struct net_device to
struct ifnet (that is what it was defined to in the LinuxKPi code).
While mlx4 and OFED are "shared" code the decision was made years ago
to not write it based on the netdevice KPI but the native ifnet KPI
for most of it. This commit simply spells this out and with that
frees "struct netdevice" to be re-done on LinuxKPI to become a more
native/mixed implementation over time as needed by, e.g., wireless
drivers.
Sponsored by: The FreeBSD Foundation
MFC after: 10 days
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D30515
The NIC TLS and TOE TLS modes in cxgbe(4) both work with TLS key
contexts. Previously, TOE TLS supported TLS key contexts created by
two different methods, and NIC TLS had a separate bit of code copied
from NIC TLS but specific to KTLS. Now that TOE TLS only supports
KTLS, pull common code for creating TLS key contexts and programming
them into on-card memory into t4_keyctx.c.
Sponsored by: Chelsio Communications
TOE TLS offload was first supported via a customized OpenSSL developed
by Chelsio with proprietary socket options prior to KTLS being present
either in FreeBSD or upstream OpenSSL. With the addition of KTLS in
both places, cxgbe's TOE driver was extended to support TLS offload
via KTLS as well. This change removes the older interface leaving
only the KTLS bindings for TOE TLS.
Since KTLS was added to TOE TLS second, it was somehat shoe-horned
into the existing code. In addition to removing the non-KTLS TLS
offload, refactor and simplify the code to assume KTLS, e.g. not
copying keys into a helper structure that mimic'ed the non-KTLS mode,
but using the KTLS session object directly when constructing key
contexts.
This also removes some unused code to send TX keys inline in work
requests for TOE TLS. This code was never enabled, and was arguably
sending the wrong thing (it was not sending the raw key context as we
do for NIC TLS when using inline keys).
Sponsored by: Chelsio Communications
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Last an(4) devices have been End Of Life and End Of Sale in 2007.
Time to remove this driver.
Differential Revision: https://reviews.freebsd.org/D30679
Reviewed by: imp (earlier version), emaste (earlier version)
Sponsored by: Diablotin Systems
Last an(4) devices have been End Of Life and End Of Sale in 2007.
Time to remove this driver.
Differential Revision: https://reviews.freebsd.org/D30678
Reviewed by: imp (earlier version), adrian (earlier version)
MFC after: 3 days
Sponsored by: Diablotin Systems
This framework is initial implementation of the simple-audio-card compatible
audio driver framework. It provides glue for CPU/codec/aux device.
Differential Revision: https://reviews.freebsd.org/D27830
Some arm64 SoCs have nodes in their fdts that describe devices
connected to the internal PCI bus. One such SoC is Freescale LS1028A.
In order to access information stored in them we need to add ofw bus
support to pci. Pass devinfo request up to our parent, which
is responsible for parsing all the information.
It allows to use ofw interface on PCI devices that support it.
This method is similar to sys/dev/acpica/acpi_pci.c.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30181
Some arm64 SoCs have nodes in their fdts that describe devices
connected to the internal PCI bus. One such SoC is Freescale LS1028A.
It expects the nodes to be mapped to devices enumerated using the standard
PCI method. Mapping is done by reading device and function ids from "reg"
property. Information is dts is used to describe MDIO/PHY connected
to a given interface.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30180
ThunderX is the only board known to use them.
Move them to the ThunderX PCIe driver.
Submitted by: Kornel Duleba <mindal@semihalf.com>
Reviewed by: andrew
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential Revision: https://reviews.freebsd.org/D30179
The vmbus ISR needs to live in a trampoline. Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time. Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.
Rather than fight this, move the Hyper-V ISR to i386/exception.s. Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386. This is sufficient to make use of vmbus(4) after the
4/4 split. Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together. I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: whu, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30577
During boot we warn that the kbd and openfirm drivers are Giant-locked
and may be deleted. Generally, the warning helps signal that certain
old drivers are not being maintained and are subject to removal, but
this doesn't really apply to certain drivers which are harder to
detangle from Giant.
Add a flag, D_GIANTOK, that devices can specify to suppress the
misleading warning. Use it in the kbd and openfirm drivers.
Reviewed by: imp, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30649
Normally raw interrupt handler is provided by the kernel text. But
vmbus module registers its own handler that needs to be mapped into
userspace mapping on PTI kernels.
Reported and reviewed by: whu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30310
pqisrc_is_firmware_feature_enabled shouldn't be declared inline in a
header, and then static inline in the .c function. Remove this stray
declartion from the header. gcc6 complains, but clang does not.
Sponsored by: Netflix
When a DMA chain can't be loaded, set the state to STATE_INQUEUE so that
the mp[rs]_complete_command can properly fail the command.
Sponsored by: Netflix
When the mpr(4) and mps(4) drivers probe a SATA device, they issue an
ATA Identify command (via mp{s,r}sas_get_sata_identify()) before the
target is fully setup in the driver. The drivers wait for completion of
the identify command, and have a 5 second timeout. If the timeout
fires, the command is marked with the SATA_ID_TIMEOUT flag so it can be
freed later.
That is where the use-after-free problem comes in. Once the ATA
Identify times out, the driver sends a target reset, and then frees any
identify commands that have timed out. But, once the target reset
completes, commands that were queued to the drive are returned to the
driver by the controller.
At that point, the driver (in mp{s,r}_intr_locked()) looks up the
command descriptor for that particular SMID, marks it CM_STATE_BUSY and
sends it on for completion handling.
The problem at this stage is that the command has already been freed,
and put on the free queue, so its state is CM_STATE_FREE. If INVARIANTS
are turned on, we get a panic as soon as this command is allocated,
because its state is no longer CM_STATE_FREE, but rather CM_STATE_BUSY.
So, the solution is to not free ATA Identify commands that get stuck
until they actually return from the controller. Hopefully this works
correctly on older firmware versions. If not, it could result in
commands hanging around indefinitely. But, the alternative is a
use-after-free panic or assertion (in the INVARIANTS case).
This also tightens up the state transitions between CM_STATE_FREE,
CM_STATE_BUSY and CM_STATE_INQUEUE, so that the state transitions happen
once, and we have assertions to make sure that commands are in the
correct state before transitioning to the next state. Also, for each
state assertion, we print out the current state of the command if it is
incorrect.
mp{s,r}.c: Add a new sysctl variable, dump_reqs_alltypes,
that controls the behavior of the dump_reqs sysctl.
If dump_reqs_alltypes is non-zero, it will dump
all commands, not just the commands that are in the
CM_STATE_INQUEUE state. (You can see the commands
that are in the queue by using mp{s,r}util debug
dumpreqs.)
Make sure that the INQUEUE -> BUSY state transition
happens in one place, the mp{s,r}_complete_command
routine.
mp{s,r}_sas.c: Make sure we print the current command type in
command state assertions.
mp{s,r}_sas_lsi.c:
Add a new completion handler,
mp{s,r}sas_ata_id_complete. This completion
handler will free data allocated for an ATA
Identify command and free the command structure.
In mp{s,r}_ata_id_timeout, do not set the command
state to CM_STATE_BUSY. The command is still in
queue in the controller. Since we were blocking
waiting for this command to complete, there was
no completion handler previously. Set the
completion handler, so that whenever the command
does come back, it will get freed properly.
Do not free ATA Identify commands that have timed
out in mp{s,r}sas_add_device(). Wait for them
to actually come back from the controller.
mp{s,r}var.h: Add a dump_reqs_alltypes variable for the new
dump_reqs_alltypes sysctl.
Make sure we print the current state for state
transition asserts.
This was tested in the Spectra Logic test bed (as described in the
review), as well Netflix's Open Connect fleet (where panics dropped from
a dozen or two a month to zero).
Reviewed by: imp@ (who is handling the commit with ken's OK)
Sponsored by: Spectra Logic
Differential Revision: https://reviews.freebsd.org/D25476
Use the accessor function to get the softc for this sim. This also drops
an unneeded cast.
Sponsored by: Netflix
Reviewed by: mav@, hselasky@
Differential Revision: https://reviews.freebsd.org/D30360
if (sb == NULL) { ... sb->s_error } is going to be a bad time. Return
ENOMEM when we cannot allocate an sbuf for the sysctl rather than
dereferencing the NULL pointer just returned.
Reviewed by: manu@, allanjude@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30373
The need for !! over (bool) pre-dates gcc 4.2, so go with the patch
as-submitted because the kernel tends to prefer that.
Suggested by: emaste@
Sponsored by: Netflix
Currently, mmc_fdt_gpio_get_{present,readonly} return all time true.
true ^ 100b = true
false ^ 100b = true
since that's done after promotion to integers. Use !! to convert
the bit to a bool before xor.
Reviewed by: imp@ (converted to (bool) to !! for portability)
Pull Request: https://github.com/freebsd/freebsd-src/pull/461
Update mmc_switch_status to ignore a few CRC errrors when asking for the
card status after setting the new rate with CMD6. Since the card may
take a little while to make the switch, it's possible we'll get a
communications error if we sent the command at the wrong time. Several
low end laptops needs this workaround as they have a window that seems
longer than other systems. This is known to fix at least the Acer Aspire
A114-32-P7E5.
Reviewed by: imp@, manu@
Differential Revision: https://reviews.freebsd.org/D24740
This patch adds the necessary methods resolution to the sdhci_xenon
driver which are required to configure UHS modes for SD/MMC devices.
Apart from the two generic routines, the custom sdhci_xenon_set_uhs_timing
function is responsible for setting the SDHCI_HOST_CONTROL2 register
with appropriate mode select values - in case of HS200 and HS400
they are non-standard.
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D30565
MFC after: 2 weeks
Improve the VCCQ voltage switch, so that to properly
handle the SDHCI_HOST_CONTROL2 register signaling
flags and along with manipulating the regulator.
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D30564
MFC after: 2 weeks
Until now the "no-1-8-v" DT flag wrongly disabled the SDHCI_CAN_VDD_180
- slot 1.8V power supply capability, whereas it refers to the signaling
voltage. Fix the sdhci_xenon_read_4 and allow to disable the UHS modes
depending on the DT property or PHY slow mode. While at it - make sure
the unsupported 1.2V signaling is always disabled and not reported
in the bootverbose log.
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D30563
MFC after: 2 weeks
The mmc_fdt_parse allows to parse more MMC-related
FDT properties. Start using it. "wp-inverted" property,
VQMMC and newly added VMMC power supply parsing
is now done in a generic code.
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D30562
MFC after: 2 weeks
With this change the host controller drivers can set the MMC capabilities
(e.g. using mmc_fdt_parse() helper) before calling sdhci_init_slot().
This way the configuration dump (eg. in bootverbose) can include the
possible additional information.
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D30561
MFC after: 2 weeks
This patch adds support for the SDHCI_CAN_DO_64BIT
capability, so that to allow 64-bit DMA operation
for the controllers which support this feature.
Reviewed by: manu
Obtained from: Semihalf
Sponsored by: Marvell
Differential Revision: https://reviews.freebsd.org/D30560
MFC after: 2 weeks
DBG2 ACPI table description [1] specifies three subtypes
related to 16550 UART:
0x0 - 16550 compatible
0x1 - 16550 subset
0x12 - 16550 compatible with parameters defined in Generic Address Structure (GAS)
It turned out however, that the Windows OS treats 0x0 subtype as
legacy x86 UART with 8-bit access. ARM SoCs can use types 0x1 (16550 with
fixed mmio32 access) or 0x12 (16550 with fully respected GAS contents).
Switch Marvell SoCs ACPI UART subtype to 0x1 - thanks to that the same firmware
can run properly with UART output in FreeBSD, Windows 10, Linux and ESXI
hypervisor. Tests showed the older firmware versions that use 0x0
UART subtype in SPCR table continue to display output properly.
[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/bringup/acpi-debug-port-table
Obtained from: Semihalf
Sponsored by: ARM
Differential revision: https://reviews.freebsd.org/D30386
MFC after: 2 weeks
Use the correct SGL limit within iw_cxgbe, firmwares >= 1.25.6.0 support
upto 512 entries per MR.
Obtained from: Chelsio Communications
MFC after: 1 week
Sponsored by: Chelsio Communications
Commit message of the identical change in Linux driver says:
"When an I2C HID device is powered off during system sleep, as a result
of removing its power resources (by the ACPI core) the interrupt line
might go low as well. This results inadvertent interrupts."
This change fixes suspend/resume on Asus S510UQ laptops.
While here add a couple of typo fixes as well as a slight change to the
iichid_attach() code to have the power_on flag set properly.
Submitted by: J.R. Oldroyd <jr_AT_opal_DOT_com>
Reviewed by: wulf
MFC after: 1 week
Changes since 1.25.0.0 are listed here. This list comes from the
Release Notes for the "Chelsio Unified Wire v3.14.0.3 for Linux"
release dated 2021-05-21.
Fixes
-----
BASE:
- Fixed Back to back T6 100G-CR4 link coming up with NO FEC sometimes.
- [T5] Try to bring up link in 1G speed if link doesn't come up on 10G.
- Fixed a bug to not allow BaseR fec in 100G speed.
- Fixed linkup issues on BT adapter in 1G and 100M speed.
- Fixed an issue to allow driver to send VI_ENABLE multiple times (once
with rx disable and then later rx enable).
- Fixed rate limiting not working on class number 16 to 30.
- Fixed backward compatibility issue in port type interpretation with vpd
version 0x80.
ETH:
- Fixed a case when firmware failed to deliver NIC WR completion to host.
- No rate limit support for WR ETH_TX_PKTS2 due to performance reasons.
OFLD
- Fixed a connection hang in SO adapters when tp_plen_max (set by driver)
is more than the window size.
- Added fw_filter_vnic_mode to firmware API file (t4fw_interface.h)
- Use correct rx channel in coprocessor crypto completion (CPL_FW6_PLD). This
was causing out of order completion to host.
FOiSCSI
- Fixed a crash due to unaligned access of ipv6 address.
- Fixed a crash during lun reset.
Enhancements
------------
ETH:
- Rate limiting support added for encapsulated (vxlan, nvgre, geneve) NIC TCP
packets.
OFLD:
- More than 128 SGLs supported in FW_RI_FR_NSMR_WR. Now, more than 16GB
(upto 64GB) of PBLs can be written with single FW_RI_FR_NSMR_WR.
Obtained from: Chelsio Communications
MFC after: 1 month
Sponsored by: Chelsio Communications
Part of the nvme recovery process for errors is to reset the
card. Sometimes, this results in failing the entire controller. When nda
is in use, we free the sim, which will sleep until all the I/O has
completed. However, with only one thread, the request fail task never
runs once the reset thread sleeps here. Create two threads to allow I/O
to fail until it's all processed and the reset task can proceed.
This is a temporary kludge until I can work out questions that arose
during the review, not least is what was the race that queueing to a
failure task solved. The original commit is vague and other error paths
in the same context do a direct failure. I'll investigate that more
completely before committing changing that to a direct failure. mav@
raised this issue during the review, but didn't otherwise object.
Multiple threads, though, solve the problem in the mean time until other
such means can be perfected.
Reviewed by: jhb@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D30366
If an iSCSI connection is shutdown abruptly (e.g. by a RST from the
peer), pending iSCSI PDUs and page pod work requests can be in the
ulp_pduq when the final CPL is received indicating the death of the
connection.
Reported by: Jithesh Arakkan @ Chelsio
In 4427ac3675, the TOM driver stopped sending work requests to
program iSCSI page pods directly and instead queued them to be written
asynchronously with iSCSI PDUs. The queue of mbufs to send is
protected by the inp lock. However, the inp cannot be safely obtained
from the toep since a RST from the remote peer might have cleared
toep->inp asynchronously in an ithread. To fix, obtain the inp from
the socket as is already done in icl_cxgbei_conn_pdu_queue_cb() and
fail the new transfer setup with ECONNRESET if the connection has been
reset.
To avoid passing sockets or inps into the page pod routines, pull the
mbufq out of the two relevant page pod routines such that the routines
queue new work request mbufs to a caller-supplied mbufq.
Reported by: Jithesh Arakkan @ Chelsio
Fixes: 4427ac3675