yongari d19de5111b bus_dma(9) conversion and make txp(4) work on all architectures.
o Header file cleanup.
o bus_dma(9) conversion.
  - Removed all consumers of vtophys(9) and converted to use
    bus_dma(9).
  - Typhoon2 functional specification says the controller supports
    64bit DMA addressing. However all Typhoon controllers are known
    to lack of DAC support so 64bit DMA support was disabled.
  - The hardware can't handle more 16 fragmented Tx DMA segments so
    teach txp(4) to collapse these segments to be less than 16.
  - Added Rx buffer alignment requirements(4 bytes alignment) and
    implemented fixup code to align receive frame. Previously
    txp(4) always copied Rx frame to align it on 2 byte boundary
    but its copy overhead is much higher than unaligned access on
    i386/amd64. Alignment fixup code is now applied only for
    strict-alignment architectures. With this change i386 and
    amd64 will get instant Rx performance boost. Typhoon2 datasheet
    mentions a command that pads arbitrary bytes in Rx buffer but
    that command does not work.
  - Nuked pointer trick in descriptor ring. This does not work on
    sparc64 and replaced it with bcopy. Alternatively txp(4) can
    embed a 32 bits index value into the descriptor and compute
    real buffer address but it may make code complicated.
  - Added endianness support code in various Tx/Rx/command/response
    descriptor access. With this change txp(4) should work on all
    architectures.
o Added comments for known firmware bugs(Tx checksum offloading,
  TSO, VLAN stripping and Rx buffer padding control).
o Prefer faster memory space register access to I/O space access.
  Added fall-back mechanism to use alternative I/O space access.
  The hardware supports both memory and I/O mapped access. Users
  can still force to use old I/O space access by setting
  hw.txp.prefer_iomap tunable to 1 in /boot/loader.conf.
o Added experimental suspend/resume methods.
o Nuke error prone Rx buffer handling code and implemented local
  buffer management with TAILQ. Be definition the controller can't
  pass the last received frame to host if no Rx free buffers are
  available to use as head and tail pointer of Rx descriptor ring
  can't have the same value. In that case the Rx buffer pointer in
  Rx buffer ring still holds a valid buffer and txp_rxbuf_reclaim()
  can't fill Rx buffers as the first buffer is still valid. Instead
  of relying on the value of Rx buffer ring, introduce local buffer
  management code to handle empty buffer situation. This should fix
  a long standing bug which completely hangs the controller under
  high network load. I could easily trigger the issue by sending 64
  bytes UDP frames with netperf. I have no idea how this bugs was
  not fixed for a long time.
o Converted ithread interrupt handler to filter based one.
o Rearranged txp_detach routine such that it's now used for general
  clean-up routine.
o Show sleep image version on device attach time. This will help
  to know what action should be taken depending on sleep image
  version. The version information in datasheet was wrong for newer
  NV images so I followed Linux which seems to correctly extract
  version numbers from response descriptors.
o Firmware image is no longer downloaded in device attach time. Now
  it is reloaded whenever if_init is invoked. This is to ensure
  correct operation of hardware when something goes wrong.
  Previously the controller always run without regard to running
  state of firmware. This change will add additional controller
  initialization time but it give more robust operation as txp(4)
  always start off from a known state. The controller is put into
  sleep state until administrator explicitly up the interface.
o As firmware is loaded in if_init handler, it's now possible to
  implement real watchdog timeout handler. When watchdog timer is
  expired, full-reset the controller and initialize the hardware
  again as most other drivers do. While I'm here use our own timer
  for watchdog instead of using if_watchdog/if_timer interface.
o Instead of masking specific interrupts with TXP_IMR register,
  program TXP_IER register with the interrupts to be raised and
  use TXP_IMR to toggle interrupt generation.
o Implemented txp_wait() to wait a specific state of a controller.
o Separate boot related code from txp_download_fw() and name it
  txp_boot() to handle boot process.
o Added bus_barrier(9) to host to ARM communication.
o Added endianness to all typhoon command processing. The ARM93C
  always expects little-endian format of command/data.
o Removed __STRICT_ALIGNMENT which is not valid on FreeBSD.
  __NO_STRICT_ALIGNMENT is provided for that purpose on FreeBSD.
  Previously __STRICT_ALIGNMENT was unconditionally defined for
  all architectures.
o Rewrote SIOCSIFCAP ioctl handler such that each capability can be
  controlled by ifconfig(8). Note, disabling VLAN hardware tagging
  has no effect due to the bug of firmware.
o Don't send TXP_CMD_CLEAR_STATISTICS to clear MAC statistics in
  txp_tick(). The command is not atomic. Instead, just read the
  statistics and reflect saved statistics to the statistics.
  dev.txp.%d.stats sysctl node provides detailed MAC statistics.
  This also reduces a lot of waste of CPU cycles as processing a
  command ring takes a very long time on ARM93C. Note, Rx
  multicast and broadcast statistics does not seem to right. It
  might be another bug of firmware.
o Implemented link state change handling in txp_tick(). Now sending
  packets is allowed only after establishing a valid link. Also
  invoke link state change notification whenever its state is
  changed so pseudo drivers like lagg(4) that relies on link state
  can work with failover or link aggregation without hacks.
  if_baudrate is updated to resolved speed so SNMP agents can get
  correct bandwidth parameters.
o Overhauled Tx routine such that it now honors number of allowable
  DMA segments and checks for 4 free descriptors before trying to
  send a frame. A frame may require  4 descriptors(1 frame
  descriptor, 1 or more frame descriptors, 1 TSO option descriptor,
  one free descriptor to prevent descriptor wrap-around) at least
  so it's necessary to check available free descriptors prior to
  setting up DMA operation.
o Added a sysctl variable dev.txp.%d.process_limit to control
  how many received frames should be served in Rx handler. Valid
  ranges are 16 to 128(default 64) in unit of frames.
o Added ALTQ(4) support.
o Added missing IFCAP_VLAN_HWCSUM as txp(4) can offload checksum
  calculation as well as VLAN tag insertion/stripping.
o Fixed media header length for VLAN.
o Don't set if_mtu in device attach, it's already set in
  ether_ifattach().
o Enabled MWI.
o Fixed module unload panic when bpf listeners are active.
o Rearranged ethernet address programming logic such that it works
   on strict-alignment architectures.
o Removed unused member variables in softc.
o Added support for WOL.
o Removed now unused TXP_PCI_LOMEM/TXP_PCI_LOIO.
o Added wakeup command TXP_BOOTCMD_WAKEUP definition.
o Added a new firmware version query command, TXP_CMD_READ_VERSION.
o Removed volatile keyword in softc as bus_dmamap_sync(9) should
  take care of this.
o Removed embedded union trick of a structure used to to access
  a pointer on LP64 systems.
o Added a few TSO related definitions for struct txp_tcpseg_desc.
  However TSO is not used at all due to the limitation of hardware.
o Redefined PKT_MAX_PKTLEN to theoretical maximum size of a frame.
o Switched from bus_space_{read|write}_4 to bus_{read|write}_4.
o Added a new macro TXP_DESC_INC to compute next descriptor index.

Tested by:	don.nasco <> gmail dot com
2009-03-12 01:14:47 +00:00
..