Sometimes, especially when there is not much memory in the system left,
allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
and it is not guaranteed that it'll succeed. In that situation, the
fallback will work, but if the refill needs to take a place for a lot of
descriptors at once, the time spent in m_getjcl looking for memory can
cause system unresponsiveness due to high priority of the Rx task. This
can also lead to driver reset, because Tx cleanup routine is being
blocked and timer service could detect that Tx packets aren't cleaned
up. The reset routine can further create another unresponsiveness - Rx
rings are being refilled there, so m_getjcl will again burn the CPU.
This was causing NVMe driver timeouts and resets, because network driver
is having higher priority.
Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
more space than 9KB will be needed.
However, 9KB jumbo clusters can still cause issues, so by default the
page size mbuf cluster will be used for the Rx descriptors. This can have a
small (~2%) impact on the throughput of the device, so to restore
original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
"1" in "/boot/loader.conf" file.
As a part of this patch (important fix), the version of the driver
was updated to v2.1.2.
Submitted by: cperciva
Reviewed by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: Ido Segev <idose@amazon.com>
Reviewed by: Guy Tzalik <gtzalik@amazon.com>
MFC after: 3 days
PR: 225791, 234838, 235856, 236989, 243531
Differential Revision: https://reviews.freebsd.org/D24546