The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get firmware response. Add delayed callback timeout work
before posting the command to firmware. In case of real firmware
command completion we will cancel the delayed work. In case of
firmware command timeout the callback timeout handler will be called
and it will simulate firmware completion with timeout error.
linux commit 65ee67084589c1783a74b4a4a5db38d7264ec8b5
MFC after: 1 week
Sponsored by: Mellanox Technologies
Call command completion handler in case of timeout when working in
interrupts mode. Avoid flushing the commands workqueue after acquiring
the semaphores to prevent a potential deadlock.
linux commit commit 9cba4ebcf374c3772f6eb61f2d065294b2451b49
MFC after: 1 week
Sponsored by: Mellanox Technologies
Creating a UD address handle from user-space or from the kernel-space,
when the link layer is ethernet, requires resolving the remote L3
address into a L2 address. Doing this from the kernel is easy because
the required ARP(IPv4) and ND6(IPv6) address resolving APIs are readily
available. In userspace such an interface does not exist and kernel
help is required.
It should be noted that in an IP-based GID environment, the GID itself
does not contain all the information needed to resolve the destination
IP address. For example information like VLAN ID and SCOPE ID, is not
part of the GID and must be fetched from the GID attributes. Therefore
a source GID should always be referred to as a GID index. Instead of
going through various racy steps to obtain information about the
GID attributes from user-space, this is now all done by the kernel.
This patch optimises the L3 to L2 address resolving using the existing
create address handle uverbs interface, retrieving back the L2 address
as an additional user-space information structure.
This commit combines the following Linux upstream commits:
IB/core: Let create_ah return extended response to user
IB/core: Change ib_resolve_eth_dmac to use it in create AH
IB/mlx5: Make create/destroy_ah available to userspace
IB/mlx5: Use kernel driver to help userspace create ah
IB/mlx5: Report that device has udata response in create_ah
MFC after: 1 week
Sponsored by: Mellanox Technologies
checks to recognize own network devices when using mlx5ib. This patch fixes
an issues where mlx5ib fails to recognize mceX network devices for use with
RoCE.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The IA32 memory model guarantees that all writes are seen in the program
order. Also, any access to the uncacheable memory flushes the store
buffers. As the consequence, SFENCE instruction is (almost) never needed,
in particular, it is not needed to ensure the correct order of updates as
seen by a PCIe device.
Use atomic_thread_fence_rel() instead of wb() to only emit compiler barriers
on x86 there. Other architectures get the right barrier instruction as
well.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
Driver support is only provided for ConnectX4/5.
System-time timestamp is calculated based on the free-running counter
timestamp provided by hardware. Driver periodically samples the
counter to calibrate it against the system clock and uses linear
interpolation to convert. Stability of the crystal which drives the
clock is +-50 ppm at the operational temperature, which makes the
algorithm good enough.
The calculation is somewhat delicate because all values are 64bit and
overflow the naive formula for linear interpolation. The calculation
drops the least significant bits in advance, see the PREC shift in
mlx5_mbuf_tstmp().
Hardware stamps can be turned off by 'ifconfig mceN -hwrxtsmp'. Buggy
firmware might result in small but visible errors in the reported
timestamps, detectable e.g. by nonsensical (negative) RTT values for
LAN pings.
Reviewed by: gallatin, hselasky
Sponsored by: Mellanox Technologies
Differential revision: https://reviews.freebsd.org/D12638
Different compilers may optimise the enum type in different ways. This ensures
coherency when range checking the value of enums in ibcore.
Sponsored by: Mellanox Technologies
MFC after: 1 week
The hardware is capable of 2 requestor endianness modes for standard 8
byte atomics: BE (0x0) and host endianness (0x1). Read the supported
modes from hca atomic capabilities and configure HW to host endianness
mode if supported.
Sponsored by: Mellanox Technologies
MFC after: 1 week
the coming ibcore and mlx5ib updates in order to support traffic redirection
to so-called raw ethernet QPs.
Remove unused E-switch related routines and files while at it.
Sponsored by: Mellanox Technologies
MFC after: 1 week
iWarp and RoCE in ibcore. The selection of RDMA_PS_TCP can not be used
to indicate iWarp protocol use. Backport the proper IB device
capabilities from Linux upstream to distinguish between iWarp and
RoCE. Only allocate the additional socket required for iWarp for RDMA
IDs when at least one iWarp device present. This resolves
interopability issues between iWarp and RoCE in ibcore
Reviewed by: np @
Differential Revision: https://reviews.freebsd.org/D12563
Sponsored by: Mellanox Technologies
MFC after: 3 days
Remote DMA over Converged Ethernet, RoCE, for the ConnectX-4 series of
PCI express network cards.
There is currently no user-space support and this driver only supports
kernel side non-routable RoCE V1. The krping kernel module can be used
to test this driver. Full user-space support including RoCE V2 will be
added as part of the ongoing upgrade to ibcore from Linux 4.9. Otherwise
this driver is feature equivalent to mlx4ib(4). The mlx5ib(4) kernel
module will only be built when WITH_OFED=YES is specified.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.
Submitted by: gallatin@
MFC after: 1 week
Sponsored by: Mellanox Technologies
Code inspection reveals the busdma unload and free functions
do not write to the belonging dma tag and does not need to be
serialized. This allows mlx5_fwp_free() to be called from
software interrupt context.
MFC after: 3 days
Sponsored by: Mellanox Technologies
Add io_mapping_init_wc() and add a third (unused) parameter to
io_mapping_map_wc().
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11286
The MLX5 driver has four different types of DMA allocations which are
now allocated using busdma:
1) The 4K firmware DMA-able blocks. One busdma object per 4K allocation.
2) Data for firmware commands use the 4K firmware blocks split into four 1K blocks.
3) The 4K firmware blocks are also used for doorbell pages.
4) The RQ-, SQ- and CQ- DMA rings. One busdma object per allocation.
After this patch the mlx5en driver can be used with DMAR enabled in
the FreeBSD kernel.
MFC after: 1 week
Sponsored by: Mellanox Technologies
- When device disappears from PCI indicate error device state and:
1) Trigger command completion for all pending commands
2) Prevent new commands from executing and return:
- success for modify and remove/cleanup commands
- failure for create/query commands
3) When reclaiming pages for a device in error state don't ask FW to
return all given pages, just release the allocated memory
MFC after: 1 week
Sponsored by: Mellanox Technologies
PCI device(s), changes:
- alloc_entry() now clears bit for page slot entry aswell
- update of cmd->ent_arr[] is now under cmd->alloc_lock
- complete command if alloc_entry() fails
MFC after: 1 week
Sponsored by: Mellanox Technologies
By default reading the diagnostic counters is disabled. The firmware
decides which counters are supported and only those supported show up
in the dev.mce.X.diagnostics sysctl tree.
To enable reading of diagnostic counters set one or more of the
following sysctls to one:
dev.mce.X.conf.diag_general_enable=1
dev.mce.X.conf.diag_pci_enable=1
MFC after: 1 week
Sponsored by: Mellanox Technologies
consistent return values from the mlx5e_sq_has_room_for()
function. The two counters are incremented by different threads under
different locks.
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Add new sysctl node to control the transmit packet bufring.
- Add optimised version of the transmit routine which output packets
directly to the DMA ring instead of using bufring in case the transmit
lock is congested. This can reduce the number of taskswitches which in
turn influence the overall system CPU usage, depending on the
workload.
- Add " TX" suffix to debug name for transmit mutexes to silence some
witness warnings about aquiring duplicate locks having same name.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Suggested by: gallatin @
Add own state variable to track if a sendqueue is stopped or not.
This will prevent traffic from entering the sendqueue while it is
being destroyed.
Update drain function to wait for traffic to be transmitted before
returning when the link state is active.
Add extra checks in transmit path for stopped SQ's.
While at it:
- Use likely() for a mbuf pointer check.
- Remove redundant IFF_DRV_RUNNING check.
MFC after: 1 week
Sponsored by: Mellanox Technologies
The firmware/hardware does not generate additional completion
events unless we post new buffers. Use a timer to try to post
more buffers in case we are temporarily out of mbufs. Else
the receive schedule completely stops.
Sponsored by: Mellanox Technologies
MFC after: 1 week
When mlx5e_sq_xmit() returns an error code and the mbuf pointer is set,
we should not free the mbuf, because the caller will keep the mbuf in
the drbr. Make sure the mbuf pointer is correctly set upon function
exit.
Sponsored by: Mellanox Technologies
MFC after: 1 week
The hardware MTU size can't be set to a value less than 1500 bytes due
to side-band management support. Allow setting the software MTU size
below 1500 bytes, thus creating a mismatch between hardware and
software MTU sizes.
Sponsored by: Mellanox Technologies
MFC after: 1 week
Try to reuse code to setup sendqueues when possible by making some static
functions global. Further split the mlx5e_close_sq_wait() function to
separate out reusable parts.
Sponsored by: Mellanox Technologies
MFC after: 1 week
This change also reduces the size of the mlx5e_sq structure so that the last
queue_state element will fit into the previous cacheline and then the mlx5e_sq
structure becomes one cacheline less for amd64.
Sponsored by: Mellanox Technologies
MFC after: 1 week
Make some functions and structures global to allow for code reuse
when creating rate limiting sendqueues.
Sponsored by: Mellanox Technologies
MFC after: 1 week
Move setting of CQ moderation mode together with the other
CQ moderation parameters. Pass completion event vector as
a separate argument to mlx5e_open_cq(), because its value is
different for each call. Pass mlx5e_priv pointer instead of
mlx5e_channel pointer so that code can be used by rate
limiting sendqueues.
Sponsored by: Mellanox Technologies
MFC after: 1 week
This change allows for reusing the transmit path for so called
rate limited senqueues. While at it optimise some pointer lookups
in the fast path.
Sponsored by: Mellanox Technologies
MFC after: 1 week
- Add new firmware commands and update existing ones.
- Add more firmware related structures and update existing ones.
- Some minor fixes, like adding missing \n to some prints.
Sponsored by: Mellanox Technologies
MFC after: 1 week
Clear the device description to avoid use after free because the
bsddev is not destroyed when the mlx5en module is unloaded. Only when
the parent mlx5 module is unloaded the bsddev is destroyed. This fixes
a panic on listing sysctls which refer strings in the bsddev after the
mlx5en module has been unloaded.
Sponsored by: Mellanox Technologies
MFC after: 1 week
driver. This change significantly increases the overall RX aggregation
ratio for heavily loaded networks handling 10-80 thousand simultaneous
connections.
Remove the turbo LRO code and all references to it which has now been
superceeded by the tcp_lro_queue_mbuf() function.
Tested by: Netflix
Sponsored by: Mellanox Technologies
MFC after: 1 week
This patch adds the missing pieces needed for device setup using the
mlx5en driver inside a virtual machine which is providing hardware
access through SR-IOV.
Sponsored by: Mellanox Technologies
MFC after: 1 week
tunable SYSCTL's. Linux module parameters are associated with the
module they belong to. FreeBSD does not share this concept of a parent
module. Instead add macros which define the prefix to use for the
module parameters in the LinuxKPI consumers.
While at it convert all "bool" LinuxKPI module parameters to "byte"
type, because we don't have a "bool" type of SYSCTL in FreeBSD.
Sponsored by: Mellanox Technologies
MFC after: 1 week
Store the last doorbell write in the mlx5e_sq structure and write the
doorbell to the hardware when the transmit routine finishes
transmitting all queued mbufs.
Sponsored by: Mellanox Technologies
Tested by: Netflix
MFC after: 1 week
This patch implements a sysctl which allows setting a factor, N, for
how many work queue elements can be generated before requiring a
completion event. When a completion event happens the code simulates N
completion events instead of only one. When draining a transmit queue,
N-1 NOPs are transmitted at most, to force generation of the final
completion event. Further a timer is running every HZ ticks to flush
any remaining data off the transmit queue when the tx_completion_fact
> 1.
The goal of this feature is to reduce the PCI bandwidth needed when
transmitting data.
Sponsored by: Mellanox Technologies
Tested by: Netflix
MFC after: 1 week
function to error out early when no port module is present and doing
eeprom access. This also prevents error codes from filling up in
dmesg.
Sponsored by: Mellanox Technologies
Tested by: Netflix
MFC after: 1 week
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5725
after changing the HW LRO sysctl when previously in up state.
Reviewed by: gnn
Sponsored by: Mellanox Technologies
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D4941