Commit Graph

69 Commits

Author SHA1 Message Date
gallatin
7b2639cc55 Make mxge do a better job recovering from NIC h/w faults
by checking PCI config space when the NIC is not
transmitting.  Previously, a h/w fault would not have been
detected if the NIC was down, or handling an RX only
workload.
2009-10-20 18:58:28 +00:00
gallatin
8f9031d9fe Move mxge(4)'s NIC watchdog reset handler from
a callout to a taskqueue
2009-10-19 20:51:27 +00:00
gallatin
e020b62ba4 Two more mxge watchdog fixes:
1) Restore the PCI Express control register after a watchdog
   reset.  This is required because the device will come out
   of watchdog reset with the pectl reg at its default state,
   and important BIOS configuration (like max payload size)
   could be lost.

2) Call mxge_start_locked() for every tx queue before dropping
   the lock in the watchdog handler.   This is required, as
   the queue's buf ring may have filled during the reset.
2009-09-30 14:42:06 +00:00
gallatin
aeed8258a9 Improve mxge watchdog routine's ability to reliably reset a failed NIC:
- Mark the link as down, so if watchdog reset fails, link watching
    failover software can notice it
- Don't send MXGEFW_CMD_ETHERNET_DOWN if the NIC has been reset, it is
    not needed, and will fail on a freshly reset NIC.
- Ensure the transmit routines aren't attempting to PIO write to doorbells
    while the NIC is being reset.
- Download the correct f/w, rather than using the EEPROM f/w after reset.
- Export a count of the number of watchdog resets via sysctl
- Zero all f/w stats at reset.  This will lead to less confusing
    diagnostic output when investigating NIC failures.

MFC after:	3 days
2009-09-21 20:16:10 +00:00
gallatin
f5d9e7af24 Add support for throttling transmit bandwidth. This is most commonly
used to reduce packet loss on high delay (WAN) paths with a
slow link.
2009-09-21 14:41:07 +00:00
gallatin
315ba5eabf mxge's tunable hw.mxge.rss_hash_type cannot be set from the
loader, because it uses a reserved suffix (_type).  Fix
this by removing the "_" and renaming the tunable to
hw.mxge.rss_hashtype.  The old (rss_hash_type) tunable is
still fetched, in case people load the driver via scripts.
When both are present in the kernel environment,
the new value (hw.mxge.rss_hashtype) overrides the old
value.

Approved by:	re (kib)
2009-07-22 11:57:34 +00:00
rwatson
be5740a255 Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs.  This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by:	re (kib)
MFC after:	6 weeks
2009-06-26 11:45:06 +00:00
gallatin
ce189363a0 Add a dying flag to prevent races at detach.
I tried re-ordering ether_ifdetach(), but this created a new race
where sometimes, when under heavy receive load (>1Mpps) and running
tcpdump, the machine would panic.  At panic, the ithread was still in
the original (not dead) if_input() path, and was accessing stale BPF
data structs.  By using a dying flag, I can close the interface prior
to if_detach() to be certain the interface cannot send packets up in
the middle of ether_ifdetach.
2009-06-24 21:09:56 +00:00
gallatin
87595b5020 Allow admin to specify the initial mtu upon driver load
for mxge.
2009-06-24 14:47:32 +00:00
gallatin
6f1a23c328 - Fix bug where device would loose promisc setting when reset.
- Allow all rss hash modes to be chosen
2009-06-23 20:22:34 +00:00
gallatin
7e640cb24d Revert most of 193311 so as to track mxge transmit stats
on a per-ring basis and avoid racy (and costly) updates
to the ifp stats via drbr by defining NO_SLOW_STATS

Discussed with: kmacy
2009-06-23 19:04:25 +00:00
gallatin
260369c166 Implement minimal set of changes suggested by bz to make
mxge no longer depend on INET.
2009-06-23 17:42:06 +00:00
gallatin
0ccb73a323 Buf-ring fixes for mxge
- always maintain byte/mcast/drop stats via drbr
- move #define of IFNET_BUF_RING so that its picked
  up by all files in the driver
- conditionalize IFNET_BUF_RING on the FreeBSD_version
  bump just after it appeared in the tree.

Sponsored by: Myricom Inc.
2009-06-02 16:52:33 +00:00
gallatin
427620ee06 Set an rx jumbo cluster to the correct size before
using bus_dmamap_load_mbuf_sg() on it. This
prevents data corruption when the mxge MTU is
between 4076 and 8172 on machines with 4KB
pages and MXGE_VIRT_JUMBOS is in use (which it
isn't, in -current or -stable)
2009-06-01 19:16:57 +00:00
gallatin
2e2b8cdbbc Updates to mxge for multiple tx/rx rings:
- Update mxge to use if_transmit(), and the new buf_ring
  interfaces, so as to enable multiple transmit queues.
  Use of if_transmit() is conditional on IFNET_BUF_RING,
  and is enabled by default (as in if_em).

- Record a flow id on receive if receive hashing is active.
  I currently only record the rx ring id (0..8) rather than
  the 32-bit topelitz hash result, as doing the latter would
  require shifting the driver to use a larger rx return ring.

Sponsored by:  Myricom, Inc.
2009-04-27 15:45:54 +00:00
gallatin
0df1aedf21 Fix cut/paste error in previous commit and use the
correct value for SFP+ reserved media type.

MFC after:  1 week
2009-02-17 22:25:19 +00:00
gallatin
6bae60e426 Better support for recent Myricom 10GbE NICs
- Update to firmware 1.4.39 for dual-chip NIC (10G-PCIE2-xxx)
  support, and SFP+ i2c support

- Identify newer "B" NICs (10G-PCIEx-8B-x) correctly, rather than
  mis-identifying them as "A" NICs (cosmetic only)

- Identify the IFM_10G_LRM ifmedia type, where applicable.

- Identify ifmedia types for SFP+ based NICs

- Update copyright

Sponsored by: Myricom
MFC after: 1 week
2009-02-17 22:15:58 +00:00
rdivacky
cc499f763e Remove obsolete C preprocessor assertions.
Approved by:    kib (mentor)
2009-02-12 18:33:56 +00:00
gallatin
187e163c24 Restore sfence semantics in mxge after the introduction
of a global mfence based mb() in r185162
2008-11-24 19:00:57 +00:00
gallatin
57b9f1fb86 Clean up mxge's use of callouts as pointed out by jhb,
and handle NIC hardware watchdog resets.

- remove buggy code at the top of mxge_tick() which tried
  to detect a race which is already detected in the kernel's
  callout code.

- move callout_stop() and callout_reset() into mxge_close()
  mxge_open() rather than doing the callout manipulation
  all over the place.

- use callout_drain(), rather than callout_stop() to prevent
  a potential race between mxge_tick() and mxge_detach()
  which could lead to softclock using a destroyed mutex

- restructure the mxge_tick() and mxge_watchdog_reset()
  routines to avoid resetting a callout, and then
  immediately stopping it if the watchdog reset routine
  is called, and fails.

- enable the driver to handle NIC hardware watchdog
  resets by restoring the NIC's PCI config space, which is
  lost when the NIC hardware watchdog triggers.

Reviewed by: jhb (previus version)
2008-07-17 15:46:35 +00:00
gallatin
f4c98f7455 Initialize if_baudrate using IF_Gbps() macro.
Note that if_baudrate is a long, and 32-bits isn't enough to properly
represent 10Gb/s.

Pointed out by: dwhite
2008-04-02 13:59:43 +00:00
gallatin
d574f1c4f5 Remove dead code which makes a call to mem_range_attr_set().
This fixes a bug where mxge did not declare a dependancy on
mem(4), and failed to load with options nomem.

Pointed out by: antoine
2008-03-12 15:36:00 +00:00
gallatin
0e60a1b33c Now that mxge supports MSI-X interrupts, reverse the logic and flag
legacy interrupts rather than MSI as a special case.  Prior to this
commit, the interrupt handler was doing the slow handshaking with
the device to ensure the legacy interrupt was lowered in both
the legacy and MSI-X case.  This handshaking was not
required for MSI-X.
2008-02-14 16:24:14 +00:00
gallatin
960266ca30 Add minimally invasive shims to ease MFCs of mxge back as far
as RELENG_6

Sponsored by: Myricom, Inc.
2008-02-14 00:09:59 +00:00
gallatin
98cf716012 Only reset driver state when a hardware error is detected.
Preserve warning but do not reset if we enter the routine
without seeing a hardware error.
2008-01-28 13:20:51 +00:00
gallatin
7bafc7ac86 Take advantage of the new physically contiguous 9K jumbos in 8. 2008-01-22 22:04:31 +00:00
gallatin
0957a8f3c9 Add optional support to mxge for MSI-X interrupts and multiple receive
queues (which we call slices).  The NIC will steer traffic into up to
hw.mxge.max_slices different receive rings based on a configurable
hash type (hw.mxge.rss_hash_type).

Currently the driver defaults to using a single slice, so the default
behavior is unchanged.  Also, transmit from non-zero slices is
disabled currently.
2008-01-15 20:34:49 +00:00
gallatin
c169939914 Add support for a new device id (9). Mxge NICs with the new
device id support MSI-X.

Approved by: re (bmah)
2007-09-13 21:29:02 +00:00
gallatin
51a89ea67c - Fix a bug which could cause a panic when enabling LRO
on an down mxge interface
- Fix a bug where mxge reported the link state as
   active when it wasn't (after ifconfig down).
- Prevent spurious watchdog resets when link partner is not consuming
- Add support for CX4 and popular XFP media detection
- Update the firmware and associated header files to 1.4.25

Approved by: re (kensmith)
2007-08-22 13:22:12 +00:00
gallatin
6c7e29b98e - Enable static building of mxge(4) and its firmware.
- Add custom .c wrappers for the firmware, rather than the standard
  firmware(9) generated firmware objects to work around toolchain
  problems on ia64 involving linking objects produced by
  ld -b -binary into the kernel.

- Move from using Myricom's ".dat" firmware blobs to using Myricom's
  zlib compressed ".h" firmware header files.  This is done to
  facilitate the custom wrappers, and saves a fair amount of wired
  memory in the case where the firmware is built in, or preloaded.

- Fix two compile issues in mxge which only appear on non-i386/amd64.

Reviewed by: mlaier, mav (earlier version with just zlib support)
Glanced at by: sam
Approved by: re (kensmith)
2007-07-19 16:16:00 +00:00
gallatin
46995e966f Update the mxge(4) driver's copyright to 2007, and drop
the binary distribution clause.

Approved by: re (bmah)
2007-07-12 16:04:55 +00:00
gallatin
8b5aa76c01 Also mark writecombine as enabled when PAT is used to enable
it rather than MTRRs.
2007-06-17 00:09:51 +00:00
gallatin
18afb27e89 correct some limits on interrupt proccessing so that
fast forwarding back out the same mxge interface works nicely.
2007-06-14 19:35:03 +00:00
gallatin
ac79d73e56 Use the new IFCAP_LRO to enable/disable LRO. 2007-06-12 19:15:16 +00:00
gallatin
ee7a5f8c9a Small LRO related fixes for mxge:
- Allow LRO to be enabled / disabled at runtime
- Fix a double-free at module unload time.
- Only update timestamp in lro merge when it is present in the frame
Sponsored by: Myricom
2007-06-11 14:01:10 +00:00
gallatin
e2b0046ba1 Use pmap_change_attr() to setup a write combine attribute for our
device memory, rather than relying on the less reliable MTRR method
used by mem_range_attr_set().

Glanced at by: jhb
2007-06-05 15:02:14 +00:00
gallatin
4395bbf72e - Use m_getcl() rather than m_getjcl() when we're allocating 2KB
clusters.  This helps quite a bit on my low end machines (improves
performance by about 300Kpps when being blasted by a hardware
packet generator).
- Include one extended f/w counter forgotten in earlier commit

Sponsored by: Myricom Inc.
2007-05-25 19:38:32 +00:00
gallatin
649ffbffeb Add support for "hardware" vlan tag insertion & removal emulation
in the mxge driver so as to be able to do checksum offload
on vlans.  This is good enough to achieve 10GbE line rate on vlans.
2007-05-23 16:25:40 +00:00
gallatin
eeec783554 mxge cleanups:
- Remove code to use the special wc_fifo.  It has been disabled by default
  in our other drivers as it actually slows down transmit by a small amount

- Dynamically determine the amount of space required for the rx_done
  ring rather than hardcoding it.

- Compute the number of tx descriptors we are willing to transmit per
  frame as the minimum of 128 or 1/4 the tx ring size.

- Fix a typo in the tx dma tag setup which could lead to unnecessary
  defragging of TSO packets (and potentially even dropping TSO packets
  due to EFBIG being returned).

- Add a counter to keep track of how many times we've needed to
  defragment a frame.  It should always be zero.

- Export new extended f/w counters via sysctl

Sponsored by: Myricom, Inc.
2007-05-22 15:57:49 +00:00
gallatin
03b25c7049 Improve mxge receive performance:
- Update to the latest (1.4.18) f/w.  This f/w introduces a new
  receive mode which allows us to use FreeBSD's physically discontinuous
  MJUM9BYTES clusters.

- Switch the driver from chaining MJUMPAGESIZE clusters to using
  MJUM9BYTES clusters to avoid mbuf chaining overheads.  Due to this
  change, people running obsolete f/w images will be limited to an MTU of
  PAGE_SIZE - 16.

- Add (disabled by default) support for Large Receive Offload.

Sponsored by: Myricom, Inc.
2007-05-21 18:32:27 +00:00
gallatin
d2dcd8a3d5 - Add handling of MXGEFW_CMD_UNKNOWN in mxge_send_cmd().
- Convert mxge_send_cmd result handling to a switch rather
  than adding a new elseif for MXGEFW_CMD_UNKNOWN

Sponsored by: Myricom Inc.
2007-05-08 18:45:43 +00:00
gallatin
7a83edd79b Firmware update & improvements to firmware selection:
- Update to latest (1.4.17) firmware.

- Use the new MXGEFW_CMD_UNALIGNED_TEST (added in firmare 1.4.16) to
  have the firmware tell us if the PCIe chipset supports aligned PCIe
  completions.

- Hard to maintain, and frequently out of date whitelist of PCIe
  chipsets known to produce aligned completions removed, as it has been
  replaced in its role of selecting the correct firmware to run by the
  use of MXGEFW_CMD_UNALIGNED_TEST.

- Break the dma test out of mxge_reset() and into its own function
  (mxge_dma_test()) so it can be used by both the normal DMA test, and
  to run the unaligned test.

- Improved support for enabling ECRCs

Sponsored by: Myricom Inc.
2007-05-08 14:19:43 +00:00
gallatin
8b90b3b132 -Fix an mbuf leak caused by a cut&paste bug where the small ring's mbufs
were never freed, but the big ring was freed twice.
-Don't supply rx hw csums for frames which are padded beyond the
 length specified in the ip header.  If the padding is non-zero,
 the hw csum will be incorrect for such frames.

Sponsored by: Myricom
2007-04-27 13:11:50 +00:00
gallatin
c277843c22 - Fix a bug in the TSO transmit routine where frames which had
been defragged and had their headers in the same cluster as their
payload would be fed to the NIC in header-sized chunks, and would
likely exceed the number of available transmit descriptors.

- If a TSO frame exceeds the number of available transmit descriptors,
don't leak busdmma resources when freeing it.

Sponsored by: Myricom Inc.
2007-04-03 10:41:33 +00:00
jhb
b0b93a3c55 Optimize sx locks to use simple atomic operations for the common cases of
obtaining and releasing shared and exclusive locks.  The algorithms for
manipulating the lock cookie are very similar to that rwlocks.  This patch
also adds support for exclusive locks using the same algorithm as mutexes.

A new sx_init_flags() function has been added so that optional flags can be
specified to alter a given locks behavior.  The flags include SX_DUPOK,
SX_NOWITNESS, SX_NOPROFILE, and SX_QUITE which are all identical in nature
to the similar flags for mutexes.

Adaptive spinning on select locks may be enabled by enabling the
ADAPTIVE_SX kernel option.  Only locks initialized with the SX_ADAPTIVESPIN
flag via sx_init_flags() will adaptively spin.

The common cases for sx_slock(), sx_sunlock(), sx_xlock(), and sx_xunlock()
are now performed inline in non-debug kernels.  As a result, <sys/sx.h> now
requires <sys/lock.h> to be included prior to <sys/sx.h>.

The new kernel option SX_NOINLINE can be used to disable the aforementioned
inlining in non-debug kernels.

The size of struct sx has changed, so the kernel ABI is probably greatly
disturbed.

MFC after:	1 month
Submitted by:	attilio
Tested by:	kris, pjd
2007-03-31 23:23:42 +00:00
gallatin
a8be3c5db7 Fix a bug which could lead to receive side lockup when WC is disabled.
When submitting rx buffers and not using WC fifo, always replace the
invalid DMA address with the real one, otherwise allocation failures
could lead to the invalid DMA address being given to the NIC, and
that would cause the receive side to lockup.
2007-03-27 15:55:32 +00:00
piso
6a2ffa86e5 o break newbus api: add a new argument of type driver_filter_t to
bus_setup_intr()

o add an int return code to all fast handlers

o retire INTR_FAST/IH_FAST

For more info: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=465712+0+current/freebsd-current

Reviewed by: many
Approved by: re@
2007-02-23 12:19:07 +00:00
gallatin
a5cba253f7 Work around a firmware bug where broadcast frames would be incorrectly
treated as multicast frames and filtered, but when only when "adopting"
running firmware.  By "adopting", I mean using pre-existing firmware
loaded from eeprom at PCI reset, rather than firmware loaded by the
driver.
2007-02-21 17:34:05 +00:00
luigi
bc574e3db5 Cleanup and document the implementation of firmware(9) based on
a version that i posted earlier on the -current mailing list,
and subsequent feedback received.

The core of the change is just in sys/firmware.h and kern/subr_firmware.c,
while other files are just adaptation of the clients to the ABI change
(const-ification of some parameters and hiding of internal info,
so this is fully compatible at the binary level).

In detail:
- reduce the amount of information exported to clients in struct firmware,
  and constify the pointer;

- internally, document and simplify the implementation of the various
  functions, and make sure error conditions are dealt with properly.

The diffs are large, but the code is really straightforward now (i hope).

Note also that there is a subtle issue with the implementation of
firmware_register(): currently, as in the previous version, we just
store a reference to the 'imagename' argument, but we should rather
copy it because there is no guarantee that this is a static string.
I realised this while testing this code, but i prefer to fix it in
a later commit -- there is no regression with respect to the past.

Note, too, that the version in RELENG_6 has various bugs including
missing locks around the module release calls, mishandling of modules
loaded by /boot/loader, and so on, so an MFC is absolutely necessary
there.  I was just postponing it until this cleanup to avoid doing
things twice.

MFC after: 1 week
2007-02-15 17:21:31 +00:00
gallatin
dd7403e0de - Add 99% of a callout based watchdog. The remaining 1% is waiting
for pci_cfg_restore() to be exported.  It was tested using a
  hackily accessed pci_cfg_restore().

- Add ifmedia_removeall() to mxge_detach() in order to stop leaking
  an ifaddr

- Fix a small acounting bug introduced by the locking code shuffle
  which could cause spurious watchdog resets now that we have a
  watchdog.

Sponsored by: Myricom
2007-01-31 19:53:36 +00:00