275 Commits

Author SHA1 Message Date
Pyun YongHyeon
beaa2ae169 Create sysctl node(dev.bge.%d.focred_collapse) instead of
hw.bge.forced_collapse. hw.bge.forced_collapse affects all bge(4)
controllers on system which may not desirable behavior of the
sysctl node. Also allow the sysctl node could be modified at any
time.

Reviewed by:	bde (initial version)
2009-12-08 17:54:23 +00:00
Pyun YongHyeon
9766cbd144 Partially revert r200228. For mini RCB case, bge(4) still have to
disable mini ring withtout regard to mini ring support.

Reported by:	marcel
Tested by:	marcel
2009-12-08 03:24:29 +00:00
Pyun YongHyeon
2a141b9412 Don't access jumbo frame related registers if controller lacks the
feature. These registers are reserved on controllers that have no
support for jumbo frame.
Only BCM5700 has mini ring so do not poke mini ring related
registers if controller is not BCM5700.

Reviewed by:	marius
2009-12-07 19:26:54 +00:00
Pyun YongHyeon
6fe124d275 Remove PHY isolate/power down code in bge_stop(). The isolation
handler in brgphy(4) does not exist and brgphy(4) just resets the
PHY and returns EINVAL as it has no isolation handler. I also agree
on Marius's opinion that stop handler of every NIC driver seems to
be the wrong place for implementing PHY isolate/power down.
If we need PHY isolate/power down it should be implemented in
brgphy(4) and users should administratively down the PHY.

Reviewed by:	marius
2009-12-07 19:18:23 +00:00
Pyun YongHyeon
d94f2b8506 Add workaround to overcome hardware limitation which allows only a
single outstanding DMA read operation. Most controllers targeted to
client with PCIe bus interface(e.g. BCM5761) may have this
limitation. All controllers for servers does not have this
limitation.
Collapsing mbuf chains to reduce number of memory reads before
transmitting was most effective way to workaround this. I got about
940Mbps from 850Mbps with mbuf collapsing on BCM5761. However it
takes a lot of CPU cycles to collapse mbuf chains so add tunable to
control the number of allowed TX buffers before collapsing. The
default value is 0 which effectively disables the forced collapsing.
For most cases 2 would yield best performance(about 930Mbps)
without much sacrificing CPU cycles.
Note the collapsing is only activated when the controller is on
PCIe bus and the frame does not need TSO operation. TSO does not
seem to suffer from the hardware limitation because the payload
size is much bigger than normal IP datagram.
Thanks to davidch@ who told me the limitation of client controllers
and actually gave possible workarounds to mitigate the limitation.

Reviewed by:	davidch, marius
2009-12-03 23:57:06 +00:00
Pyun YongHyeon
6a15578d8a Fix typo which inversed the logic which in turn disabled MSI.
Pointy hat to:  yongari
2009-11-25 17:51:14 +00:00
Pyun YongHyeon
7e6acdf12b Make sure one shot MSI is enabled.
Submitted by:	marius
2009-11-25 17:30:38 +00:00
Pyun YongHyeon
fd4d32feb2 BGE_FLAG_40BIT_BUG should be set before creating DMA tags.
Pointy hat to:  yongari
2009-11-24 17:46:58 +00:00
Pyun YongHyeon
30f57f615b Reduce status block size DMAed by controller. bge(4) uses single
Tx/Rx/Rx return ring such that large part of status block was not
used at all. All bge(4) controllers except BCM5700 AX/BX has a
feature to control the size of status block. So use minimum status
block size allowed in controller. This reduces number of DMAed
status block size to 32 bytes from 80 bytes.
2009-11-22 21:45:55 +00:00
Pyun YongHyeon
2e1d4df419 Add missing function prototype in r199671. 2009-11-22 21:20:26 +00:00
Pyun YongHyeon
ca3f1187f1 Implement TSO for BCM5755 or newer controllers. Some controllers
seem to require a special firmware to use TSO. But the firmware is
not available to FreeBSD and Linux claims that the TSO performed by
the firmware is slower than hardware based TSO. Moreover the
firmware based TSO has one known bug which can't handle TSO if
ethernet header + IP/TCP header is greater than 80 bytes. The
workaround for the TSO bug exist but it seems it's too expensive
than not using TSO at all. Some hardwares also have the TSO bug so
limit the TSO to the controllers that are not affected TSO issues
(e.g. 5755 or higher).
While I'm here set VLAN tag bit to all descriptors that belengs to
a frame instead of the first descriptor of a frame. The datasheet
is not clear how to handle VLAN tag bit but it worked either way in
my testing. This makes it simplify TSO configuration a little bit.

Big thanks to davidch@ who sent me detailed TSO information.
Without this I was not able to implement it.

Tested by:	current
2009-11-22 21:16:30 +00:00
Pyun YongHyeon
f681b29a6d Fix two long standing bugs on bge(4). Most pre BCM5755 controllers
have a DMA bug when buffer address crosses a multiple of the 4GB
boundary(e.g. 4GB, 8GB, 12GB etc). Limit DMA address to be within
4GB address for these controllers. The second DMA bug limits DMA
address to be within 40bit address space. This bug applies to
BCM5714 and BCM5715 and 5708(bce(4) controller). This is not
actually a MAC controller bug but an issue with the embedded PCIe
to PCI-X bridge in the device. So for BCM5714/BCM5715 controllers
also limit the DMA address to be within 40bit address space.
Special thanks to davidch@ who gave me detailed errata information.
I think this change will fix long standing bge(4) instability
issues on systems with more than 4GB memory.

Reviewed by:	davidch
2009-11-22 20:50:27 +00:00
Pyun YongHyeon
dfe0df9a76 For MSI case, interrupt is not shared and we don't need to force
PCI flush to get correct status block update. Add an optimized
interrupt handler that is activated for MSI case. Actual interrupt
handling is done by taskqueue such that the handler does not
require driver lock for Rx path. The MSI capable bge(4) controllers
automatically disables further interrupt once it enters interrupt
state so we don't need PIO access to disable interrupt in interrupt
handler.
2009-11-22 20:31:40 +00:00
Pyun YongHyeon
b9c05fa593 Cache Rx producer/Tx consumer index as soon as we know status block
update and then clear status block. Previously it used to access
these index without synchronization which may cause problems when
bounce buffers are used. Also add missing bus_dmamap_sync(9) in
polling handler. Since we now update status block in driver, adjust
bus_dmamap_sync(9) for status block.
2009-11-22 20:02:13 +00:00
Pyun YongHyeon
167fdb62e3 Rearrange bge_start_locked to see we can send more frames by
checking IFF_DRV_RUNNING and IFF_DRV_OACTIVE flags. Also if we
have less than 16 free send BDs set IFF_DRV_OACTIVE and try it
later. Previously bge(4) used to reserve 16 free send BDs after
loading dma maps but hardware just need one reserved send BD. If
prouder index has the same value of consumer index it means the Tx
queue is empty.
While I'm here check IFQ_DRV_IS_EMPTY first to save one lock
operation.
2009-11-22 19:44:11 +00:00
Pyun YongHyeon
d77e9fa7be Controller does not write Rx descriptors, remove BUS_DMASYNC_PREREAD. 2009-11-22 19:17:32 +00:00
Pyun YongHyeon
0aaf10578c Use capability pointer to access PCIe registers rather than
directly access them at fixed address. While I'm here don't touch
other bits of PCIe device control register except max payload size.

Reviewed by:	marius
2009-11-22 19:11:34 +00:00
Pyun YongHyeon
d648358b0b Due to newly added PCIe capabilities fallback code for finding the
PCIe capability did not work right on recent controllers. Remove
FreeBSD 6.x support code.

Reviewed by:	marius
2009-11-22 18:47:56 +00:00
Pyun YongHyeon
1b90d0bd3e Fix typo introduced in r199011.
Pointed out by:	marius
2009-11-22 18:34:15 +00:00
Pyun YongHyeon
1715ec0d32 Remove extra white space. 2009-11-22 18:30:19 +00:00
Pyun YongHyeon
5c1da2fac0 Controller does not update Tx descriptors(send BDs) after sending
frames so remove unnecessary BUS_DMASYNC_PREREAD and
BUS_DMASYNC_POSTREAD of bus_dmamap_sync(9).
2009-11-10 20:29:20 +00:00
Pyun YongHyeon
e6bf277eff Zero out Tx/Rx descriptors before using them. Also add missing
bus_dmamap_sync(9) after Tx descriptor initialization.
2009-11-09 23:09:18 +00:00
Pyun YongHyeon
aa94f33338 Add missing bus_dmamap_sync(9) before issuing kick command. 2009-11-09 22:58:30 +00:00
Pyun YongHyeon
4d3a629c65 Correct disabling checksum offloading for BCM5700 B0. 2009-11-09 00:16:50 +00:00
Pyun YongHyeon
f5a034f95a Partially revert r199035.
Revision 1.158 says only lower ten bits of
BGE_RXLP_LOCSTAT_IFIN_DROPS register is valid. For BCM5761 case it
seems the controller maintains 16bits value for the register.
However 16bits are still too small to count all dropped packets
happened in a second. To get a correct counter we have to read the
register in bge_rxeof() which would be too expensive.

Pointed out by:	bde
2009-11-08 19:59:54 +00:00
Pyun YongHyeon
e238d4ead1 Count number of inbound packets which were chosen to be discarded
as input errors. Also count out of receive BDs as input errors.
2009-11-08 01:30:35 +00:00
Pyun YongHyeon
25dc84f22f Don't count input errors twice, we always read input errors from
MAC in bge_tick. Previously it used to show more number of input
errors. I noticed actual input errors were less than 8% even for
64 bytes UDP frames generated by netperf.
Since we always access BGE_RXLP_LOCSTAT_IFIN_DROPS register in
bge_tick, remove useless code protected by #ifdef notyet.
2009-11-08 01:13:38 +00:00
Pyun YongHyeon
61ccb9da43 Tell upper layer we support long frames. ether_ifattach()
initializes it to ETHER_HDR_LEN so we have to override it after
calling ether_ifattch().
While I'm here remove setting if_mtu value, it's initialized in
ether_ifattach().
2009-11-07 20:37:38 +00:00
Pyun YongHyeon
03e78bd096 Fix I mssied in r199011. Rx ring index also should be updated.
If we fill Rx ring full instead of half we can simplify this logic
but this requires more experimentation.
2009-11-07 02:10:59 +00:00
Pyun YongHyeon
943787f3a7 Reimplement Rx buffer allocation to handle dma map load failure.
Introduce two spare dma maps for standard buffer and jumbo buffer
respectively. If loading a dma map failed reuse previously loaded
dma map. This should fix unloaded dma map is used in case of dma
map load failure. Also don't blindly unload dma map and defer
dma map sync and unloading operation until we know dma map for new
buffer is successfully loaded. This change saves unnecessary dma
load/unload operation. Previously bge(4) tried to reuse mbuf
with unloaded dma map which is really bad thing in bus_dma(9)
perspective.
While I'm here update if_iqdrops if we can't allocate Rx buffers.
2009-11-07 01:01:33 +00:00
Pyun YongHyeon
c215fd771d Do bus_dmamap_sync call only if frame size is greater than
standard buffer size. If controller is not capable of handling
jumbo frame, interface MTU couldn't be larger than standard MTU
which in turn the received should be fit in standard buffer. This
fixes bus_dmamap_sync call for jumbo ring is called even if
interface is configured to use standard MTU.
Also if total frame size could be fit into standard buffer don't
use jumbo buffers.
2009-11-06 23:49:20 +00:00
Pyun YongHyeon
a669a81f0b bge(4) already switched to use UMA backed page allocator and local
memory allocator for jumbo frame was removed long time ago. Remove
no more used macros.
2009-11-06 22:37:29 +00:00
Pyun YongHyeon
c3bbfed430 Correct MSI mode register bits. 2009-11-06 01:11:59 +00:00
Pyun YongHyeon
3ee5d7da8e Make bge_newbuf_std()/bge_newbuf_jumbo() returns actual error code
for buffer allocation. If driver know we are out of Rx buffers let
controller stop. This should fix panic when interface is run even
if it had no configured Rx buffers.
2009-11-04 21:06:54 +00:00
Pyun YongHyeon
0ac56796f7 Remove common DMA tag used for TX/RX mbufs and create Tx DMA tag
and Rx DMA tag separately. Previously it used a common mbuf DMA tag
for both Tx and Rx path but Rx buffer(standard ring case) should
have a single DMA segment and maximum buffer size of the segment
should be less than or equal to MCLBYTES. This change also make it
possible to add TSO with minor changes.
2009-11-04 20:57:52 +00:00
Pyun YongHyeon
a23634a177 Covert bge_newbuf_std to use bus_dmamap_load_mbuf_sg(9). Note,
bge_newbuf_std still has a bug for handling dma map load failure
under high network load. Just reusing mbuf is not enough as driver
already unloaded the dma map of the mbuf. Graceful recovery needs
more work.
Ideally we can just update dma address part of a Rx descriptor
because the controller never overwrite the Rx descriptor. This
requires some Rx initialization code changes and it would be done
later after fixing other incorrect bus_dma(9) usages.
2009-11-04 20:40:38 +00:00
Pyun YongHyeon
a41504a9b1 Use correct dma tag for jumbo buffer. 2009-11-04 20:19:21 +00:00
Stanislav Sedov
15eda8010b - On entrance to the rx_eof sync RX rings maps with POSTWRITE flag
instead of POSTREAD: the hardware do not touch this memory (CPU
  updates it).  It is already synchronized as PREWRITE after the
  processing is done.

- Synchronize RX return ring memory in rx_eof.  This is needed
  as the deviced updates this memory when receives packets.

- Decouple the synchronization of BGE status block in the interrupt
  service routine: perfrom PREREAD synchronization only all accesses
  to this block are finished.  This seems to be more natural.

Reviewed by:	yongari, marius
MFC after:	2 weeks
2009-10-21 11:50:18 +00:00
Bjoern A. Zeeb
44b636910b Immediately after clearing a pending callout that didn't make it due
to the lock we hold, disable interrupts, and announce to the firmware
that we are shutting down. Especially do this before disabling blocks.

This makes some types of machines with asf enabled no longer hang upon
boot, when we start configuring the interface.

PR:			i386/96382, kern/100410, kern/122252, kern/116328
Reported by:		erwin
Hardware provided by:	TDC A/S
Reviewed by:		stas
Tested by:		stas
2009-10-13 20:22:12 +00:00
Stanislav Sedov
3889907fb2 - Give a name to the host coalescing bug fix WDMA mode register bit instead
of using hardcoded value in the code.
Obtained from:	OpenBSD
2009-10-07 14:29:48 +00:00
Stanislav Sedov
a57795536a - Add support for new BGE chips (5761, 5784 and 57780). These chips uses new
BGE_PCI_PRODID_ASICREV register to store the chip identifier and its revision.
- Add new grouping macro for 7575+ chips (BGE_IS_5755_PLUS).
- Add IDs for Fujitsu-branded Broadcom adapters.

PR:             kern/127587
Tested by:      Thomas Quinot <thomas@quinot.org> (BCM7561 A0)
MFC after:	2 weeks
Obtained from:  OpenBSD
2009-10-07 13:12:43 +00:00
Stanislav Sedov
7f21e273a8 - Do not try to reevaluate current RX production index on each
loop iteration as it can be updated by the card while we
  process the RX ring forcing us to process RX descriptors
  for which DMA synchronisation operation has not been
  performed.  This fixes the bug when bge(4) drops packets
  under high load.

Discussed with:	yongari, marius
Approved by:	re (kib)
MFC after:	1 week
2009-08-18 21:07:39 +00:00
Robert Watson
eb956cd041 Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs.  This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by:	re (kib)
MFC after:	6 weeks
2009-06-26 11:45:06 +00:00
Attilio Rao
8cf7d13d7a Fix return values appropriately.
Tested by:	zec
2009-05-30 17:56:19 +00:00
Attilio Rao
d4da719cf6 s/rk_npkts/rx_npkts
Reported by:	zec
2009-05-30 17:25:14 +00:00
Attilio Rao
1abcdbd127 When user_frac in the polling subsystem is low it is going to busy the
CPU for too long period than necessary.  Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.

In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).

Bump __FreeBSD_version in order to signal such situation.

Reviewed by:	emaste
Sponsored by:	Sandvine Incorporated
2009-05-30 15:14:44 +00:00
Xin LI
9fe569d8f9 Some comment/space changes (FALLTHRU -> FALLTHROUGH, space after while). 2009-05-14 22:36:56 +00:00
Xin LI
25e13e6895 Try to workaround a race where bge_stop() may sneak in when bge_rxeof()
drops and re-grabs the softc mutex in the middle, resulting in kernel
trap 12.  This may happen when a lot of traffic is being hammered on
one bge(4) interface while the system is shutting down.

Reported by:	Alexander Sack <pisymbol gmail com>
PR:		kern/134548
MFC After:	2 weeks
2009-05-14 22:33:37 +00:00
Marius Strobl
c9ffd9f058 - Ensure that INTx isn't disabled, as these chips apparently have a
quirk requiring it to be enabled even when using MSI. This makes
  the latter work again after r189285.
- Remove a comment which no longer applies since r190194.
2009-03-23 14:36:50 +00:00
Marius Strobl
4f09c4c7e5 - In bge_ifmedia_upd_locked() take advantrage of LIST_FOREACH().
- If boot verbose, print asicrev, chiprev and bus type on attach.
- For PCI Express devices:
  1) Adjust max read request size to 4Kbytes
  2) Turn on FIFO_LONG_BURST in RDMA during bge_blockinit()
  Though 1) does not seem to have much to do with the poor TX performance
  observed on PCI Express bge(4), 2) does fix the problem. [1]
- Nuke the RX CPU self-diag, which prevents working cards from working
  (Linux tg3 does not have this diag neither does OpenBSD's bge(4)).
  The increasing of the firmware handshaking timeout to 20000 retries
  done as part of the original commit isn't merged as way already have a
  way higher BGE_TIMEOUT of 100000.

PR:		119361 [1]
Obtained from:	tg3 via DragonflyBSD [1], DragonflyBSD
2009-03-21 00:23:07 +00:00