Commit Graph

124 Commits

Author SHA1 Message Date
Dawid Gorecki
d10ec3ad77 ena: do not call reset if device is unresponsive
If the device becomes unresponsive, the driver will not be able to
finish the reset process correctly. Timeout during version validation
indicates that the device is currently not responding. In that case
do not perform the reset and instead reschedule timer service. Because
of that the driver will continue trying to reset the device until it
succeeds or is detached.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2022-01-23 20:48:33 +01:00
Dawid Gorecki
78554d0c70 ena: start timer service on attach
The timer service was started when the interface was brought up and it
was stopped when it was brought down. Since ena_up requires the device
to be responsive, triggering the reset would become impossible if the
device became unresponsive with the interface down.

Since most of the functions in timer service already perform the check
to see if the device is running, this only requires starting the callout
in attach and stopping it when bringing the interface up or down to
avoid race between different admin queue calls.

Since callout functions for timer service are always called with the
same arguments, replace callout_{init,reset,drain} calls with
ENA_TIMER_{INIT,RESET,DRAIN} macros.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2022-01-23 20:48:32 +01:00
Marcin Wojtas
eb4c4f4a2e ena: merge ena-com v2.5.0 upgrade
Merge commit '2530eb1fa01bf28fbcfcdda58bd41e055dcb2e4a'

Adjust the driver to the upgraded ena-com part twofold:

First update is related to the driver's NUMA awareness.

Allocate I/O queue memory in NUMA domain local to the CPU bound to the
given queue, improving data access time. Since this can result in
performance hit for unaware users, this is done only when RSS
option is enabled, for other cases the driver relies on kernel to
allocate memory by itself.

Information about first CPU bound is saved in adapter structure, so
the binding persists after bringing the interface down and up again.

If there are more buckets than interface queues, the driver will try to
bind different interfaces to different CPUs using round-robin algorithm
(but it will not bind queues to CPUs which do not have any RSS buckets
associated with them). This is done to better utilize hardware
resources by spreading the load.

Add (read-only) per-queue sysctls in order to provide the following
information:
- queueN.domain: NUMA domain associated with the queue
- queueN.cpu:    CPU affinity of the queue

The second change is for the CSUM_OFFLOAD constant, as ENA platform
file has removed its definition. To align to that change, it has been
added to the ena_datapath.h file.

Submitted by: Artur Rojek <ar@semihalf.com>
Submitted by: Dawid Gorecki <dgr@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2022-01-23 20:27:13 +01:00
Artur Rojek
6d1ef2abd3 ena: Implement full RSS reconfiguration
Bind RX/TX queues and MSI-X vectors to matching CPUs based on the RSS
bucket entries.

Introduce sysctls for the following RSS functionality:
- rss.indir_table:      indirection table mapping
- rss.indir_table_size: indirection table size
- rss.key:              RSS hash key (if Toeplitz used)

Said sysctls are only available when compiled without `option RSS`, as
kernel-side RSS support currently doesn't offer RSS reconfiguration.

Migrate the hash algorithm from CRC32 to Toeplitz and change the initial
hash value to 0x0 in order to match the standard Toeplitz implementation.
Provide helpers for hash key inversion required for HW operations.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:53 +02:00
Artur Rojek
223c8cb12e ena: Add missing statistics
Provide the following sysctl statistics in order to stay aligned with
the Linux driver:
* rx_ring.csum_good
* tx_ring.unmask_interrupt_num

Also rename the 'bad_csum' statistic name to 'csum_bad' for alignment.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:47 +02:00
Artur Rojek
07aff471c0 ena: Share ena_global_lock between driver instances
In order to use `ena_global_lock` in sysctl context, it must be kept
outside the driver instance's software context, as sysctls can be called
before attach and after detach, leading to lock use before sx_init and
after sx_destroy otherwise.
Solve this issue by turning `ena_global_lock` into a file scope
variable, shared between all instances of the driver and associated
sysctl context, and in turn initialized/destroyed in dedicated
SYSINIT/SYSUNINIT functions.
As a side effect, this change also fixes existing race in the reset
routine, when simultaneously accessing sysctl exposed properties.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:37 +02:00
Artur Rojek
986e7b9227 ena: Move RSS logic into its own source files
Delegate RSS related functionality into separate .c/.h files in
preparation for the full RSS support.

While at it, reorder functions and remove prototypes for ones with
internal linkage.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:26 +02:00
Artur Rojek
cb98c439d6 ena: Add locking assertions
ENA silently assumed that ena_up, ena_down and ena_start_xmit routines
should be called within locked context. Driver's logic heavily assumes
on concurrent access to those routines, so for safety and better
documentation about this assumption, the locking assertions were added
to the above functions.

The assertion was added only for the main steps (skipping the helper
functions) which can be called from multiple places including the kernel
and the driver itself.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:21 +02:00
Artur Rojek
77160654a1 ena: Add extra log messages
Stay aligned with the Linux driver by adding the following logs:
* inform the user about retrying queue creation
* warn on non-empty ena_tx_buffer.mbuf prior to ena_tx_map_mbuf

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:12 +02:00
Artur Rojek
433ab9b698 ena: Prevent reset after device destruction
Check for ENA_FLAG_TRIGGER_RESET inside a locked context in order to
avoid potential race conditions with ena_destroy_device. This aligns the
reset task logic with the Linux driver.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-09-02 01:06:06 +02:00
Marcin Wojtas
3fc5d816f8 Merge tag 'vendor/ena-com/2.4.0'
Update the driver in order not to break its compilation
and make use of the new ENA logging system

Migrate platform code to the new logging system provided by ena_com
layer.

Make ENA_INFO the new default log level.

Remove all explicit use of `device_printf`, all new logs requiring one
of the log macros to be used.
2021-06-24 16:15:18 +02:00
Marcin Wojtas
0e7d31f63b ena: hide sysctl nodes for unused ENA queues
IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.

Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.

NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-06-24 16:02:39 +02:00
Marcin Wojtas
ddec69e6a7 ena: remove surplus NULL checks when freeing ENA resources
Calling free on a NULL pointer is valid, as appropriate check is already
done internally:

/* free(NULL, ...) does nothing */
if (addr == NULL)
    return;

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-06-24 16:02:39 +02:00
Marcin Wojtas
beaadec9ea ena: add support for the large LLQ headers in ENA
Default LLQ (Low-latency queue) maximum header size is 96 bytes and can
be too small for some types of packets - like IPv6 packets with multiple
extension. This can be fixed, by using large LLQ headers.

If the device supports larger LLQ headers, the user can activate this
feature by setting sysctl tunable 'hw.ena.force_large_llq_header' to '1'
in the /boot/loader.conf file.

In case the device isn't supporting this feature, the default value (96B)
will be used.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2021-06-24 16:02:39 +02:00
Michal Krawczyk
1c808fcd85 Allocate BAR for ENA MSIx vector table
In the new ENA-based instances like c6gn, the vector table moved to a
new PCIe bar - BAR1. Previously it was always located on the BAR0, so
the resources were already allocated together with the registers.

As the FreeBSD isn't doing any resource allocation behind the scenes,
the driver is responsible to allocate them explicitly, before other
parts of the OS (like the PCI code allocating MSIx) will be able to
access them.

To determine dynamically BAR on which the MSIx vector table is present
the pci_msix_table_bar() is being used and the new BAR is allocated if
needed.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 3 days
2021-02-18 13:54:36 +01:00
Marcin Wojtas
7d2e6f207e Rename descriptions of the supported ENA devices
Some of the PCI ID were described as ENA with LLQ support - it's not
fully accurate and because of that, their names were changed.

Instead of LLQ, use RSERV0 for the description of those devices.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon, Inc
MFC after:      1 week
Differential revision:  https://reviews.freebsd.org/D27119
2020-11-18 15:20:01 +00:00
Marcin Wojtas
f180142c76 Add ENI metrics for the ENA driver
The new HAL allows the driver to read extra ENI stats. Exact meaning of
each of them can be found in base/ena_defs/ena_admin_defs.h file and
structure ena_admin_eni_stats.

Those stats are being updated inside of the timer service, which is
executed every second.
ENI metrics are turned off by default. They can be enabled, using the
sysctl node: dev.ena.X.eni_metrics.update_delay
0 value in this node means that the update is turned off. Other values
determine how many seconds must pass, before ENI metrics will be
updated.

They can be acquired, using sysctl:

sysctl dev.ena.X.eni_metrics

Where X stands for the interface number.

Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon, Inc
MFC after:      1 week
Differential revision: https://reviews.freebsd.org/D27118
2020-11-18 15:17:55 +00:00
Marcin Wojtas
0835cc783b Add SPDX license tag to the ENA driver files
Refering to guide: https://wiki.freebsd.org/SPDX the SPDX tag should not
replace the standard license text, however it should be added over the
standard license text to make the automation easier.

Because of that, the old license was kept, but the SPDX tag was added
on top of every ENA driver file.

Submited by:    Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon, Inc
MFC after:      1 week
Differential revision:  https://reviews.freebsd.org/D27117
2020-11-18 15:07:34 +00:00
Marcin Wojtas
c74443892c Add Rx offsets support for the ENA driver
For the first descriptor in a chain the data may start at an offset.
It is optional feature of some devices, so the driver must ack that
it supports it.

The data pointer of the mbuf is simply shifted by the given value.

Submitted by:   Maciej Bielski <mba@semihalf.com>
Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon, Inc
MFC after:      1 week
Differential revision:  https://reviews.freebsd.org/D27116
2020-11-18 15:02:12 +00:00
Marcin Wojtas
9eb1615f33 Adjust ENA driver files to latest ena-com changes
* Use the new API of ena_trace_*
* Fix typo syndrom --> syndrome
* Remove validation of the Rx req ID (already performed in the ena-com)
* Remove usage of deprecated ENA_ASSERT macro

Submitted by:   Ido Segev <idose@amazon.com>
Submitted by:   Michal Krawczyk <mk@semihalf.com>
Obtained from:  Semihalf
Sponsored by:   Amazon, Inc
MFC after:      1 week
Differential revision:  https://reviews.freebsd.org/D27115
2020-11-18 14:59:22 +00:00
Marcin Wojtas
4f8f476e73 Fix completion descriptors alignment for the ENA
The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.

Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly. For that
purpose adjust ena_dma_alloc() to support it.

It's a critical fix, especially for the arm64 EC2 instances.

Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc
MFC after: 1 week
Differential revision:  https://reviews.freebsd.org/D27114
2020-11-18 14:50:12 +00:00
Andriy Gapon
b40dd828bd teach ena driver about RSS kernel option
Networking is broken if the driver configures its (virtual) hardware to
use a hash algorithm (or a key) different from the one that the network
stack (software RSS) uses.  This can be seen with connections initiated
from the host.  The PCB will be placed into the hash table based on the
hash value calculated by the software.  The hardware-calculated hash
value in reponse packets will be different, so the PCB won't be found.

Tested with a kernel compiled with 'options RSS' on an instance with ena
driver.

Reviewed by:	mw, adrian
MFC after:	2 weeks
Sponsored by:	Panzura
Differential Revision: https://reviews.freebsd.org/D24733
2020-06-23 04:58:36 +00:00
Marcin Wojtas
2287afd818 Update ENA driver version to v2.2.0
Driver version upgrade is connected with support for the new device
fetures, like Tx drops reporting or disabling meta caching.

Moreover, the driver configuration from the sysctl was reworked to
provide safer and better flow for configuring:
* number of IO queues (new feature),
* drbr size on Tx,
* Rx queue size.

Moreover, a lot of minor bug fixes and improvements were added.

Copyright date in the license of the modified files in this release was
updated to 2020.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
2020-05-26 16:11:46 +00:00
Marcin Wojtas
9bf7da9517 Fix double-free bug within ena_detach()
There is ena_free_all_io_rings_resources() called twice on device
detach:

ena_detach():

ena_destroy_device():
/* First call */
ena_free_all_io_rings_resources()

/* Second call */
ena_free_all_io_rings_resources()

The double-free causes panic() on kldunload, for example.

As the ena_destroy_device() is also called by ena_reset_task() it is
better to stay unchanged. Thus, remove the "Second call" of the function.

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 16:02:10 +00:00
Marcin Wojtas
0b432b702e Allow disabling meta caching for ENA Tx path
Determined by a flag passed from the device. No metadata is set within
ena_tx_csum when caching is disabled.

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 16:00:30 +00:00
Marcin Wojtas
9762a033da Create ENA IO queues with optional backoff
If requested size of IO queues is not supported try to decrease it until
finding the highest value that can be satisfied.

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:58:48 +00:00
Marcin Wojtas
56d41ad5fe Add sysctl node for ENA IO queues number adjustment
By default, in ena_attach() the driver attempts to acquire
ena_adapter::max_num_io_queues MSI-X vectors for the purpose of IO
queues, however this is not guaranteed. The number of vectors acquired
depends also on system resources availability.

Regardless of that, enable the number of effectively used IO queues to
be further limited through the sysctl node.

Example: Assumming that there are 8 IO queues configured by default, the
command

$ sysctl dev.ena.0.io_queues_nb=4

will reduce the number of available IO queues to 4. Similarly, the value
can be also increased up to maximum supported value. A value higher than
maximum supported number of IO queues is ignored. Zero is ignored too.

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:57:02 +00:00
Marcin Wojtas
e2735b095b Fix assumptions about number of IO queues in the ENA
Make the ena_adapter::num_io_queues a number of effectively used IO
queues. While the ena_adapter::max_num_io_queues is an upper-bound
specified by the HW, the ena_adapter::num_io_queues may be lower than
that, depending on runtime system resources availability.

On reset, there are called ena_destroy_device() and then
ena_restore_device(). The latter calls, in turn, ena_enable_msix(),
which will attempt to re-acquire ena_adapter::max_num_io_queues of
MSIX vectors again.

Thus, the value of ena_adapter::num_io_queues may be different before
and after reset. For this reason, free the IO rings structures (drbr,
counters) in ena_destroy_device() and allocate again in
ena_restore_device().

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:54:32 +00:00
Marcin Wojtas
2182354622 Rework ENA Tx buffer ring size reconfiguration
This method has been aligned with the way how the Rx queue size is being
updated - so it's now done synchronously instead of resetting the
device.

Moreover, the input parameter is now being validated if it's a power of
2. Without this, it can cause kernel panic.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:50:30 +00:00
Marcin Wojtas
7d8c4fee95 Rework ENA Rx queue size configuration
This patch reworks how the Rx queue size is being reconfigured and how
the information from the device is being processed.

Reconfiguration of the queues and reset of the device in order to make
the changes alive isn't the best approach. It can be done synchronously
and it will let to pass information if the reconfiguration was
successful to the user. It now is done in the ena_update_queue_size()
function.

To avoid reallocation of the ring buffer, statistic counters and the
reinitialization of the mutexes when only new size has to be assigned,
the io queues initialization function has been split into 2 stages:
basic, which is just copying appropriate fields and the advanced, which
allocates and inits more advanced structures for the IO rings.

Moreover, now the max allowed Rx and Tx ring size is being kept
statically in the adapter and the size of the variables holding those
values has been changed to uint32_t everywhere.

Information about IO queues size is now being logged in the up routine
instead of the attach.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:48:06 +00:00
Marcin Wojtas
92dc69a727 Mark the ENA driver as epoch ready
Recent changes to the epoch requires driver to notify that they knows
epoch in order to prevent input packet function to enter epoch each
time the packet is received.

ENA is using NET_TASK for handling Rx, so it's entering epoch
automatically whenever this task is being executed.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:45:54 +00:00
Marcin Wojtas
579d23aa96 Improve indentation in ena_up() and ena_down()
If the conditional check for ENA_FLAG_DEV_UP is negated, the body of the
function can have smaller indentation and it makes the code cleaner.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:44:08 +00:00
Marcin Wojtas
6959869eae Use single global lock in the ENA driver
Currently, the driver had 2 global locks - one was sx lock used for
up/down synchronization and the second one was mutex, which was used
for link configuration and timer service callout.

It is better to have single lock for that. We cannot use mutex, as it
can sleep and cause witness errors in up/down configuration, so sx lock
seems to be the only choice.

Callout cannot use sx lock, but the timer service is MP safe, so we just
need to avoid race between ena_down() and ena_detach(). It can be
avoided by acquiring sx lock.

Simple macros were added that are encapsulating implementation of the
lock and makes the code cleaner.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:39:41 +00:00
Marcin Wojtas
7926bc4492 Add trigger reset function in the ENA driver
As the reset triggering is no longer a simple macro that was just
setting appropriate flag, the new function for triggering reset was
added. It improves code readability a lot, as we are avoiding additional
indentation.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:37:55 +00:00
Marcin Wojtas
aa9c3226b9 Remove unused argument from static function in ena.c
The function ena_enable_msix_and_set_admin_interrupts takes two
arguments while the second is not used and so can be spared. This is a
static function, only ena.c is affected.

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:33:43 +00:00
Marcin Wojtas
6c84cec373 Enable Tx drops reporting in the ENA driver
Tx drops statistics are fetched from HW every ena_keepalive_wd() call
and are observable using one of the commands:
* sysctl dev.ena.0.hw_stats.tx_drops
* netstat -I ena0 -d

Submitted by:  Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:31:28 +00:00
Marcin Wojtas
8483b844e7 Adjust ENA driver to the new HAL
* Removed adaptive interrupt moderation (not suported on FreeBSD).
* Use ena_com_free_q_entries instead of ena_com_free_desc.
* Don't use ENA_MEM_FREE outside of the ena_com.
* Don't use barriers before calling doorbells as it's already done in
  the HAL.
* Add function that generates random RSS key, common for all driver's
  interfaces.
* Change admin stats sysctls to U64.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2020-05-26 15:29:19 +00:00
Marcin Wojtas
04cf2b885d Optimize ENA Rx refill for low memory conditions
Sometimes, especially when there is not much memory in the system left,
allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
and it is not guaranteed that it'll succeed. In that situation, the
fallback will work, but if the refill needs to take a place for a lot of
descriptors at once, the time spent in m_getjcl looking for memory can
cause system unresponsiveness due to high priority of the Rx task. This
can also lead to driver reset, because Tx cleanup routine is being
blocked and timer service could detect that Tx packets aren't cleaned
up. The reset routine can further create another unresponsiveness - Rx
rings are being refilled there, so m_getjcl will again burn the CPU.
This was causing NVMe driver timeouts and resets, because network driver
is having higher priority.

Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
more space than 9KB will be needed.

However, 9KB jumbo clusters can still cause issues, so by default the
page size mbuf cluster will be used for the Rx descriptors. This can have a
small (~2%) impact on the throughput of the device, so to restore
original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
"1" in "/boot/loader.conf" file.

As a part of this patch (important fix), the version of the driver
was updated to v2.1.2.

Submitted by:   cperciva
Reviewed by:    Michal Krawczyk <mk@semihalf.com>
Reviewed by:    Ido Segev <idose@amazon.com>
Reviewed by:    Guy Tzalik <gtzalik@amazon.com>
MFC after:      3 days
PR:             225791, 234838, 235856, 236989, 243531
Differential Revision: https://reviews.freebsd.org/D24546
2020-05-07 11:28:39 +00:00
Marcin Wojtas
888810f0fb Rework and simplify Tx DMA mapping in ENA
Driver working in LLQ mode in some cases can send only few last segments
of the mbuf using DMA engine, and the rest of them are sent to the
device using direct PCI transaction. To map the only necessary data, two DMA
maps were used. That solution was very rough and was causing a bug - if
both maps were used (head_map and seg_map), there was a race in between
two flows on two queues and the device was receiving corrupted
data which could be further received on the other host if the Tx cksum
offload was enabled.

As it's ok to map whole mbuf and then send to the device only needed
segments, the design was simplified to use only single DMA map.

The driver version was updated to v2.1.1 as it's important bug fix.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
2020-02-24 15:35:31 +00:00
Gleb Smirnoff
6c3e93cb5a Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D23518
2020-02-11 18:57:07 +00:00
Marcin Wojtas
358bcc4c6c Add support for ENA NETMAP partial initialization
In NETMAP mode not all queues need to be allocated to NETMAP. Some of
them could be left to the kernel. Configuration is managed by the flags
nr_mode and nr_pending_mode provided per each NETMAP kring.

ENA driver checks those flags and perform proper rings initialization.

Differential Revision: https://reviews.freebsd.org/D21937
Submitted by: Rafal Kozik <rk@semihalf.com>
              Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 16:02:42 +00:00
Marcin Wojtas
6f2128c7ed Add support for ENA NETMAP Tx
Two new tables are added to ena_tx_buffer structure:
* netmap_map_seg stores DMA mapping structures,
* netmap_buf_idx stores buff indexes taken from the slots.

When Tx resources are being set, the new mapping structures are created
and netmap Tx rings are being reset.

When Tx resources are being released, used netmap bufs are unmapped from
DMA and then mapping structures are destroyed.

When Tx interrupt occurrs, ena_netmap_tx_irq is called.

ena_netmap_txsync callback signalizes that there are new packets which
should be transmitted.
First, it fills ena_netmap_ctx. Then it performs two actions:
* ena_netmap_tx_frames moves packets from netmap ring to NIC,
* ena_netmap_tx_cleanup restores buffers from NIC and gives them back
to the userspace app.
0 is returned in case of Tx error that could be handled by the driver.

ena_netmap_tx_frames checks if there are packets ready for transmission.
Then, for each of them, ena_netmap_tx_frame is called. If error occurs,
transmitting is stopped, but if the error was cause due to HW ring being
full, information about that is not propagated to the userspace app.
When all packets are ready, doorbell is written to NIC and netmap ring
state is updated.

Parsing of one packet is done by the ena_netmap_tx_frame function.
First, it checks if number of slots does not exceed NIC limit. Invalid
packets are being dropped and the error is propagated to the upper
layer. As each netmap buffer has equal size, which is typically greater
then 2KiB, there shouldn't be any packets which contain too many slots.
Then, the ena_com_tx_ctx structure is being filled. As netmap does not
support any hardware offloads, ena_com_tx_meta structure is set to zero.
After that, ena_netmap_map_slots maps all memory slots for DMA.
If the device works in the LLQ mode, the push header is being determined
by checking if the header fits within the first socket.
If so, the portion of data is being copied directly from the slot.
In other case, the data is copied to the intermediate buffer.
First slots are treated the same as as the others, because DMA mapping
has no impact on LLQ mode. Index of each netmap buffer is taken from
slot and stored in netmap_buf_idx array. In case of mapping error,
memory is unmapped and packets are put back to the netmap ring.

ena_netmap_tx_cleanup performs out of order cleanup of sent buffers.
First, req_id is taken and is validated. As validate_tx_req_id from
ena.c is specific to kernels mbuf, another implementation is provided.
Each req_id is cleaned up by ena_netmap_tx_clean_one function. Buffers
are being unmaped from DMA and put back to netmap ring. In the end,
state of netmap and NIC rings are being updated.

Differential Revision: https://reviews.freebsd.org/D21936
Submitted by: Rafal Kozik <rk@semihalf.com>
              Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 15:59:29 +00:00
Marcin Wojtas
9a0f2079ca Add support for ENA NETMAP Rx
Most of code used for Rx ring initialization could be reused in NETMAP.
Reset of NETMAP ring and new alloc method was added. Driver decides if
use kernels mbufs or NETMAPs slots based on IFCAP_NETMAP flag. It
allows to reuse ena_refill_rx_bufs, which provides proper handling of
Rx out of order completion.

ena_netmap_alloc_rx_slot takes exactly the same arguments as
ena_alloc_rx_mbuf, but instead of allocating one mbuf it takes one slot
from NETMAP ring. Based on queue id proper netmap_ring is found. As
NETMAP provides the "partial opening" feature not all of the rings are
avaiable. Not used points to invalid ring. If there is available slot,
it is taken from the ring. Its buffer is mapped to DMA and its index is
stored in ena_rx_buffer field in ena_rx_buffer structure. Then ena_buf
is filled with addresses and ring state is updated.

Cleanup is handled by ena_netmap_free_rx_slot. It unmaps DMA and returns
buffer to ring. As we could not return more bufs than we have taken and
we should not override occupied slots, buf_index should be 0. It is
being checked by assertion.

ena_netmap_rxsync callback puts received packets back to NETMAP ring and
passes them to user space by updating ring pointers. First it fills
ena_netmap_ctx.
Then it performs two actions:
* ena_netmap_rx_frames moves received frames from NIC to NETMAP ring,
* ena_netmap_rx_cleanup fills NIC ring with slots released by userspace
app.

In case of Rx error that could be handled by NIC driver (for example by
performing reset) rx sync should return 0.

ena_netmap_rx_frames first checks if NETMAP ring is in consistent
state and then in the loop receives new frames. When all available
frames are taken nr_hwtail is updated.

Receiving one frame is handled by ena_netmap_rx_frame. If no error
occurrs, each Descriptor is loaded by ena_netmap_rx_load_desc function.
If packets take more than one segments NS_MOREFRAG flag must be set in
all, but not last slot. In case of wrong req_id packet is removed from
NETMAP ring. If packet is successful received counters are updated.

Refiling of NIC ring is performed by ena_netmap_rx_cleanup function.
It calculates number of available slots and call ena_refill_rx_bufs with
proper number.

Differential Revision: https://reviews.freebsd.org/D21935
Submitted by: Rafal Kozik <rk@semihalf.com>
              Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 15:57:44 +00:00
Marcin Wojtas
d17b7d87ee Introduce NETMAP support in ENA
Mock implementation of NETMAP routines is located in ena_netmap.c/.h
files. All code is protected under the DEV_NETMAP macro. Makefile was
updated with files and flag.

As ENA driver provide own implementations of (un)likely it must be
undefined before including NETMAP headers.

ena_netmap_attach function is called on the end of NIC attach. It fills
structure with NIC configuration and callbacks. Then provides it to
netmap_attach. Similarly netmap_detach is called during ena_detach.

Three callbacks are used.
nm_register is implemented by ena_netmap_reg. It is called when user
space application open or close NIC in NETMAP mode. Current action is
recognized based on onoff parameter: true means on and false off. As
NICs rings need to be reconfigured ena_down and ena_up are reused.
When user space application wants to receive new packets from NIC
nm_rxsync is called, and when there are new packets ready for Tx
nm_txsync is called.

Differential Revision: https://reviews.freebsd.org/D21934
Submitted by: Rafal Kozik <rk@semihalf.com>
              Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 15:51:18 +00:00
Marcin Wojtas
38c7b96517 Split Rx/Tx from initialization code in ENA driver
Move Rx/Tx routines to separate file.
Some functions:
* ena_restore_device,
* ena_destroy_device,
* ena_up,
* ena_down,
* ena_refill_rx_bufs
could be reused in upcoming netmap code in the driver. To make it
possible, they were moved to ena.h header.

Differential Revision: https://reviews.freebsd.org/D21933
Submitted by:  Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 15:44:26 +00:00
Marcin Wojtas
2439228158 Fix ENA keep-alive timeout due to prolonged reset
When the ENA_FLAG_DEVICE_RUNNING flag is disabled, the AENQ handlers
aren't executed. To fix that, the watchdog timestamp should be updated
just before enabling the watchdog.

Timer service was always being enabled, even if the device wasn't up
before the reset. That shouldn't happen, as the timer service is being
executed only for working interface.

Differential Revision: https://reviews.freebsd.org/D21932
Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 15:39:54 +00:00
Marcin Wojtas
472d4784a8 Add WC support for arm64 in the ENA driver
As the pmamp_change_attr() is public on arm64 since r351131, it can be
used on the arm64 to map memory range as with the write combined
attribute.

It requires the driver to use generic VM_MEMATTR_WRITE_COMBINING flag
instead of the x86 specific PAT_WRITE_COMBINING.

Differential Revision: https://reviews.freebsd.org/D21931
Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-10-31 15:38:17 +00:00
Marcin Wojtas
9d0073e413 Update ENA version to v2.0.0
ENAv2 introduces many new features, bug fixes and improvements.

Main new features are LLQ (Low Latency Queues) and independent queues
reconfiguration using sysctl commands.

The year in copyright notice was updated to 2019.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:52:32 +00:00
Marcin Wojtas
858659f752 Improve ENA reset handling
For easier debugging, the reset is being triggered and the reset reason is
being set only in case it is done for the first time. Such approach will
ensure that the first reset reason is not going to be overwritten and
will make it easier for debugging.

Also, add a reset trigger upon invalid Tx requested ID.

Submitted by:  Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:45:41 +00:00
Marcin Wojtas
77958fcdab Fix NULL pointer dereference in ena_up()
If the call to ena_up() in ena_restore_device() fails, next usage of
`ifconfig up` will cause NULL pointer dereference.

This patch adds additional checks to prevent that.

Submitted by:  Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by:  Amazon, Inc.
2019-05-30 13:42:52 +00:00