It allows application driver get initial link state without racing with
hardware interrupts, thanks to the context rmlock held here.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Those events may be reported as soon as callback is registered, if the link
is enabled by hardware or some other application.
While there, clean link_is_up variable on link down event.
MFC after: 1 week
This driver supports both NTB-to-NTB and NTB-to-Root Port modes (though
the second with predictable complications on hot-plug and reboot events).
I tested it with PEX 8717 and PEX 8733 chips, but expect it should work
with many other compatible ones too. It supports up to two NT bridges
per chip, each of which can have up to 2 64-bit or 4 32-bit memory windows,
6 or 12 scratchpad registers and 16 doorbells. There are also 4 DMA engines
in those chips, but they are not yet supported.
While there, rename Intel NTB driver from generic ntb_hw(4) to more specific
ntb_hw_intel(4), so now it is on par with this new ntb_hw_plx(4) driver and
alike to Linux naming.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This fixes interrupt storms on hardware using legacy level-triggered
interrupts, since doorbell processing could take time after interrupt
handler completion, that triggered extra interrupts in a loop.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Since the doorbell bit is already set when interrupt handler is called,
the event was not propagated to upper layer. It was working normally
because present code was not using masking actively, but that is going
to change.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
I believe it never worked correctly for more the one queue even in Linux.
This fixes case when one of consumer drivers is not loaded on one side,
but its queues still announced as ready if something else brought link up.
While there, remove some pointless NULL checks.
New design allows to attach multiple consumers to ntb_transport(4) instance.
Previous design obtained from Linux theoretically allowed that, but was not
practically usable (Linux also has only one consumer driver now).
New design allows hardware resources to be split between several consumers.
For example, one BAR can be dedicated for remote memory access, while other
resources can be used for packet transport for virtual Ethernet interface.
And even without resource split, this code allows to specify which consumer
driver should attach the hardware.
From some points this makes the code even closer to Linux one, even though
Linux does not provide the described flexibility.
Calling it earlier increases the window when MSIX info may change.
This change does not solve the problem completely, but seems logical.
Complete solution should probably include link reset in case of MSIX
remap to trigger new negotiation, but we have no way to get notified
about that now.
I don't know what errata is mentioned there, I was unable to find it, but
setting limit before the base simply does not work at all. According to
specification attempt to set limit out of the present window range resets
it to zero, effectively disabling it. And that is what I see in practice.
Fixing this properly disables access for remote side to our memory until
respective xlat is negotiated and set. As I see, Linux does the same.
At that point link is quite likely not established yet, so messing with
scratch registers is premature there. Original commit message mentioned
code diff reduction from Linux, but this line is not present in Linux now.
For some reason hack with sending MSI-X interrupts by writing to remote
LAPIC memory works only for 32-bit BARs, that are available only if split
BARs mode is enabled in BIOS. If it is not, complain loudly and fall back
to less efficient workaround.
For compatibility reasons make driver not report any checksum offload by
default, since there is indeed none. But if administrator knows that
interface is used only for local traffic, he can enable fake checksum
offload manually on both sides to save some CPU cycles, since the data
are already protected by CRC32 of PCIe link.
Sponsored by: iXsystems, Inc.
This allows at least first three doorbells to work very close to normal
hardware, properly signaling events to upper layers without spurious or
lost events. Doorbells above the first three may still report spurious
events due to lack of reliable information, but they are rarely used.
It is odd idea to serialize different MSI-X vectors. Use of rmlocks
here allows them to execute in parallel, but still protects ctx.
If upper layers require any additional serialization -- they can
do it by themselves.
This follows NTB subsystem modularization in Linux, tuning it to FreeBSD
native NewBus interfaces. This change allows to support different types
of hardware with different drivers, support multiple NTB instances in a
system, ntb_transport module use for needs other then if_ntb, etc.
Sponsored by: iXsystems, Inc.
Since SBARxSZ register can be write-once, it can be unusable for disabling
the SBAR. For such case also set SBARxBASE to zero to not intersect with
config BAR.
This allows IPv6 link local addresses (and other IPv6 functionality) to work.
PR: 210355
Submitted by: Steve Wahl and David Bright (both at Dell Inc.)
Reviewed by: cem, mav
Tested by: mav (on Intel hardware)
Approved by: re (kib)
MFC after: 5 days
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D6885
BAR size to 1MB. According to Xeon v3 specifications and my tests, that
size register is write-once and so not writeable after BIOS written it.
Instead of that, make the code work with BAR of any sufficient size,
properly calculating offset within its base. It also simplifies the code.
Discussed with: cem
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
NTB_MSIX_RECEIVED status, before making upper layers overwrite it.
This is not completely perfect, but now it works better then before.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This patch comes from Dave Jiang's Linux tree, davejiang/ntb. It hasn't
been accepted into Linus' tree, so I do not have an authoritative SHA1
to point at. Original commit log:
=====================================================================
A hardware errata causes the NTB to hang when heavy bi-directional
traffic in addition to the usage of BAR0/1 (where the registers reside,
including the doorbell registers to trigger interrupts).
This workaround is only available on Haswell and Broadwell platform.
The workaround is to enable split BAR in the BIOS to allow the 64bit
BAR4 to be split into two 32bit BAR4 and BAR5. The BAR4 shall be pointed
to LAPIC region of the remote host. We will bypass the db mechanism and
directly trigger the MSIX interrupts. The offsets and vectors are
exchanged during transport scratch pad negotiation. The scratch pads are
now overloaded in order to allow the exchange of the information. This
gets around using the doorbell and prevents the lockup with additional
pcode changes in BIOS.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
=====================================================================
Notable changes in the FreeBSD version of this patch:
* The MSIX BAR is configurable, like hw.ntb.b2b_mw_idx (msix_mw_idx).
The Linux version of the patch only uses BAR4.
* MSIX negotiation aborts if the link goes down.
Obtained from: Linux (Dual BSD/GPL driver)
Sponsored by: EMC / Isilon Storage Division
Replace the hw.ntb.enable_writecombine tunable with
hw.ntb.default_mw_pat. It can be set with several specific numerical
values to select a caching type. Any bogus value is treated as
Uncacheable (UC).
The ntb_mw_set_wc() KPI has removed the restriction that the selected
mode must be one of UC, WC, or WB.
Sponsored by: EMC / Isilon Storage Division