Commit Graph

97 Commits

Author SHA1 Message Date
scottl
e256afc28c Disable interrupts while we are setting up the handler. The interrupt really
shouldn't be set up or enabled until much later, but that will be investigated
at a later time.
2006-01-13 05:04:27 +00:00
scottl
57bb282532 Significant performance improvements for the if_em driver:
- Only update the rx ring consumer pointer after running through the rx loop,
  not with each iteration through the loop.
- If possible, use a fast interupt handler instead of an ithread handler.  Use
  the interrupt handler to check and squelch the interrupt, then schedule a
  taskqueue to do the actual work.  This has three benefits:
  - Eliminates the 'interrupt aliasing' problem found in many chipsets by
    allowing the driver to mask the interrupt in the NIC instead of the
    OS masking the interrupt in the APIC.
  - Allows the driver to control the amount of work done in the interrupt
    handler.  This results in what I call 'adaptive polling', where you get
    the latency benefits of a quick response to interrupts with the
    interrupt mitigation and work partitioning of polling.  Polling is still
    an option in the driver, but I consider it orthogonal to this work.
  - Don't hold the driver lock in the RX handler.  The handler and all data
    associated is effectively serialized already.  This eliminates the cost of
    dropping and reaquiring the lock for every receieved packet.  The result
    is much lower contention for the driver lock, resulting in lower CPU usage
    and lower latency for interactive workloads.

The amount of work done in the taskqueue is controlled by the sysctl
dev.em.N.rx_processing_limit

and tunable
hw.em.rx_process_limit

Setting these to -1 effectively removes the limit.

The fast interrupt and taskqueue can be disabled by defining NO_EM_FASTINTR.
This work has been shown to increase fast-forwarding from ~570 kpps to
~750 kpps (note that the same NIC hardware seems unable to transmit more than
800 kpps, so this increase appears to be limited almost solely by the
hardware).  Gains have been shown in other workloads, ranging from better
performance to elimination of over-saturation livelocks.

Thanks to Andre Opperman for his time and resources from his network
performance project in performing much of the testing.  Thanks to Gleb
Smirnoff and Danny Braniss for their help in testing also.
2006-01-11 00:30:25 +00:00
glebius
4c64851f64 A style nit. 2005-12-28 09:37:04 +00:00
glebius
58ee46ace4 Tidy up em_resume():
- Don't call em_init_locked() twice.
  - Collapse two if() blocks into one.
2005-12-28 08:58:28 +00:00
glebius
455bdd04af Add simple suspend and resume methods. We call em_stop() on suspend and
em_init() on resume. With this change the network is ready right after
resume, without half minute lag.

Tested by:	Jacques Garrigue
2005-12-26 10:39:21 +00:00
glebius
546c2347de Add a quirk to fix resume on some laptops.
Reported by:	joe
Reported by:	Huang wen hui <huang gddsn.org.cn>
Reported by:	Jacques Garrigue <garrigue math.nagoya-u.ac.jp>
PR:		kern/89825
2005-12-22 09:09:39 +00:00
glebius
175e16aa4d - Fix VLAN_INPUT_TAG() macro, so that it doesn't touch mtag in
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
  incorrectly in almost all drivers. Indicate failure with
  mbuf value of NULL.

In collaboration with:	yongari, ru, sam
2005-12-18 18:24:27 +00:00
yongari
c549c6b4f0 Add jumbo frame support for architectures with strict alignment.
Reviewed by:	glebius
2005-12-16 08:29:43 +00:00
glebius
0f4e96e8f5 On the 82571 and newer chipset the ICR register is meaningful only
if the E1000_ICR_INT_ASSERTED bit is set.

Submitted by:	Jack Vogel
2005-12-02 08:33:56 +00:00
cognet
3c971e85df Remember the bus_dmamap_t where we loaded the mbuf, and sync this map instead
of tx_buffer->map, or we could end up syncing the wrong map.
2005-11-24 15:13:47 +00:00
glebius
f9340d9884 Merge in new driver version from Intel - 3.2.18.
The most important change is support for adapters based on
82571 and 82572 chips.

Tested on:	82547EI on i386
Tested on:	82540EM on sparc64
2005-11-24 01:44:49 +00:00
yongari
85fff8c580 busdma cleanup for em(4).
- don't force busdma to pre-allocate bounce pages for parent tag.
 - use system supplied roundup2 macro instead of rolling its own version.
 - TX/RX decriptor length should be multiple of 128. There is no
   no need to expand the size with the multiple of 4096.
 - don't create/destroy DMA maps in TX/RX handlers. Use pre-allocated
   DMA maps. Since creating DMA maps on sparc64 is time consuming
   operations(resource mananger overhead), this change should boost
   performance on sparc64. I could get > 2x speedup on Ultra60.
 - TX/RX descriptors could be aligned on 128 boundary. Aligning them
   on PAGE_SIZE is waste of resource.
 - don't blindly create TX DMA tag with size of MCLBYTES * 8. The size
   is only valid under jumbo frame environments. Instead of using the
   hardcoded value, re-compute necessary size on the fly.
 - RX side bus_dmamap_load_mbuf_sg(9) support.
 - remove unused macro EM_ROUNDUP and constant EM_MMBA.

Reviewed by:	scottl
Tested by:	glebius
2005-11-21 04:17:43 +00:00
glebius
a384e41446 - Backout last change, since it is memory overkill for a non busy host or
for a notebook with em(4) adapter.
- Introduce tunables em.hw.txd and em.hw.rxd, which allow administrator
  to configure number of transmit and receive descriptors.
- Check em.hw.txd and em.hw.rxd against hardware limits [*] and require
  them to be multiple of 128.

[*] According to comments in if_em.h the 82540EM/82541ER chips can handle
    more than 256 descriptors. Since we don't have this hardware to test,
    we decided to mimic NetBSD wm(4) driver, that limits these chips to
    256 descriptors.

In collaboration with:	yongari
2005-11-17 10:13:18 +00:00
ru
f70f525b49 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
glebius
91564dc239 Give a try to autoconfiguring the number of transmit and receive
descriptors depending on chip revision.
2005-11-10 11:44:37 +00:00
glebius
e40bee48f7 - Introduce two more stat counters, counting number of RX
overruns and number of watchdog timeouts.
- Do not log(9) RX overrun events, since this pessimizes
  things under load [1].
- Do not increase if->if_oerrors in em_watchdog(), since
  this leads to counter slipping back, when if->if_oerrors
  is recalculated in em_update_stats_counters(). Instead
  increase watchdog counter in em_watchdog() and take it
  into account in em_update_stats_counters().

Submitted by:	ade [1]
2005-11-09 15:23:54 +00:00
yongari
c43546c985 Make em(4) work on big-endian architectures.
- disable jumbo frame support on strict alignment architectures due
   to the limitation of hardware. The driver needs a fix-up code for
   RX side. The fix will show up in near future.
 - fix endian issue for 82544 on PCI-X bus. I couldn't test this as
   I don't have the NIC/hardware.
 - prefer PCIR_BAR to hardcoded EM_MMBA.
 - Properly checks for for 64bit BAR [1]
 - replace inl/outl with bus_space(9) [1]
 - fix endian issue on VLAN handling.
 - reorder header files and remove unnecessary one.

Reviewed by:	cognet
No response from:	pdeuskar, tackerman
Obtained from:	OpenBSD [1]
2005-11-09 08:43:18 +00:00
rwatson
4094ae5452 Put probe-time printf of adapter speed and duplex behind bootverbose:
since the link takes a bit to negotiate, the information is pretty
much never available during the probe.  As such, the boot output
pretty much always prints N/A for speed and duplex.  Since we print
out the output of ifconfig during the user space boot, this early
boot information is also generally redundant, and added to the noise.

MFC after:	2 weeks
2005-10-31 19:59:40 +00:00
glebius
8f54edce78 Some more minor cleanups of em(4) driver:
- Destroy mutex in case of attach failure. [1]
  - Lock properly em_watchdog(). [1]
  - Lock properly em_sysctl_int_delay(). [1]
  - Remove unused global adapter linked list.
  - Remove unused dma_size field from struct em_dma_alloc.
  - Do not touch interface statistics, that must be edited
    only by upper layers. [1]

Submitted by:	yongari [1]
2005-10-20 09:55:49 +00:00
glebius
dfc409a7f8 Revamp interrupt handling in em(4) driver:
o Do not mask the RX overrun interrupt.

o Rewrite em_intr():
  - Axe EM_MAX_INTR.
  - Cycle acknowledging interrupts and processing
    packets until zero interrupt cause register is
    read.
  - If RX overrun comes in log this fact. [ NetBSD also
    resets adapter in this case, but my tests showed that
    this is not needed and only pessimizes behavior under
    heavy load. ]
  - Since almost all functions is rewritten, style the
    remaining lines.

This fixes em(4) interfaces wedging under high load.

In collaboration with:	wpaul, cognet
Obtained from:		NetBSD
2005-10-20 08:46:43 +00:00
glebius
1767a8e54e In the em_process_receive_interrupts() cycle check the IFF_DRV_RUNNING
flag. This fixes panic, when 'ifconfig em0 down' was called and it calls
em_stop() while the em_process_receive_interrupts() has temporarily
dropped the lock.
2005-10-19 13:34:48 +00:00
cognet
f5ba28afbc - Use BUS_DMASYNC_PREWRITE in em_get_buf(), as the adapter is about to read
the descriptors set.
- In em_process_receive_interrupts(), call bus_dmamap_sync() for the
descriptors set each time we modify one descriptor, instead of doing it only
at the function exit, to make sure the adapters know he can re-use the
descriptor.
This helps on arm with write-back data cache (and possibly on other arches
with bounce pages, I don't know) under heavy network load. Without this,
if we attempt to process more than num_rx_desc descriptors, the adapter
would just stop processing rx interrupts.
2005-10-18 00:42:10 +00:00
glebius
128747ab0b From the PR:
The receive function em_process_receive_interrupts() unlocks the
  adapter while ether_input() processes the packet, and then locks
  it back. In the meantime, em_init() may be called, either from
  em_watchdog() from softclock interrupt or from the ifconfig(8)
  program. The em_init() resets the card, in particular it sets
  adapter->next_rx_desc_to_check to 0 and resets hardware RX Head
  and Tail descriptor pointers. The loop in
  em_process_receive_interrupts() does not expect these things to
  change, and a mess may result.

This fixes long wedges of em(4) interfaces receive part under high
load and IP fastforwarding enabled.

PR:		kern/87418
Submitted by:	Dmitrij Tejblum <tejblum yandex-team.ru>
2005-10-14 11:00:15 +00:00
glebius
30cb5eaab9 Cleanup from __FreeBSD_version. 2005-10-14 10:34:46 +00:00
glebius
9efbae40b7 - Don't pollute opt_global.h with DEVICE_POLLING and introduce
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
  can be compiled as loadable modules.

Reviewed by:	bde
2005-10-05 10:09:17 +00:00
glebius
f41a83bf42 Big polling(4) cleanup.
o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
  poll_mtx is not dropped during call to polling handler.
  This fixes problem with idle polling.

o Make registration and deregistration from polling in a
  functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
  with ifconfig.

Detailed kern_poll.c changes:
  - Remove polling handler flags, introduced in 1.21. The are not
    needed now.
  - Forget and do not check if_flags, if_capenable and if_drv_flags.
  - Call all registered polling handlers unconditionally.
  - Do not drop poll_mtx, when entering polling handlers.
  - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
  - In netisr_poll() axe the block, where polling code asks drivers
    to unregister.
  - In netisr_poll() and ether_poll() do polling always, if any
    handlers are present.
  - In ether_poll_[de]register() remove a lot of error hiding code. Assert
    that arguments are correct, instead.
  - In ether_poll_[de]register() use standard return values in case of
    error or success.
  - Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
    poll_switch() goes through interface list and enabled/disables polling.
    A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
  - On attach driver announces IFCAP_POLLING in if_capabilities, but
    not in if_capenable.
  - On detach driver calls ether_poll_deregister() if polling is enabled.
  - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
    flag. If there is no, then unlocks and returns.
  - In ioctl handler driver checks for IFCAP_POLLING flag requested to
    be set or cleared. Driver first calls ether_poll_[de]register(), then
    obtains driver lock and [dis/en]ables interrupts.
  - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
    If present, then returns.This is important to protect from spurious
    interrupts.

Reviewed by:	ru, sam, jhb
2005-10-01 18:56:19 +00:00
glebius
678546cfbf In em_process_receive_interrupts() store and clear adapter->fmt. This
make function reenterable. In the runtime the race is masked by serializing
of em_process_receive_interrupts() either by interrupt thread, or by
polling. The race can be triggered when polling is switched on or off.
2005-09-29 13:23:34 +00:00
glebius
1f2c9e5c2c Remove queue check from last commit. In most cases there is smth in queue,
when start function is called.

Reviewed by:	ru
2005-09-20 14:52:57 +00:00
glebius
b167d0deb2 Check IFF_DRV_RUNNING and presense of packets in queue before calling
em_start_locked(). This fixes panic on shutdown with active traffic
passing through router.

Sponsored by:	Rambler
2005-09-20 13:37:17 +00:00
imp
4e70215e6b Make sure that we call if_free(ifp) after bus_teardown_intr. Since we
could get an interrupt after we free the ifp, and the interrupt
handler depended on the ifp being still alive, this could, in theory,
cause a crash.  Eliminate this possibility by moving the if_free to
after the bus_teardown_intr() call.
2005-09-19 03:10:21 +00:00
ru
743dc61d13 Fix "Memory modified after free" panic on detach, caused by accessing
already freed struct ifnet.
2005-09-14 10:28:01 +00:00
rwatson
5d770a09e8 Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
rwatson
9918d13b80 Modify device drivers supporting multicast addresses to lock if_addr_mtx
over iteration of their multicast address lists when synchronizing the
hardware address filter with the network stack-maintained list.

Problem reported by:	Ed Maste (emaste at phaedrus dot sandvine dot ca>
MFC after:		1 week
2005-08-03 00:18:35 +00:00
ru
c7592f527d Add missing ether_poll_deregister(). This is still not enough to
kldunload/kldload without a panic.  The same (but worse) problem
is also present in ixgb(4).
2005-08-02 08:44:45 +00:00
brooks
567ba9b00a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
tackerman
8b2ddc8fab Changes to update driver with latest Intel driver version 2.1.7
- Changed from using explicit devices id to using descriptive labels.
- Added support for 82573 and 82546 Quad adapters.
- Corrected support for 82547EI and 82541ER (mac_type was not assigned)
- Removed #ifdef DBG_STATS and extraneous code.

if_em_hw.c/if_em_hw.h
- Added support for 82573 and 82546 Quad adapters.
- Brought forward Intel's most current mac and phy changes.
2005-05-26 23:32:02 +00:00
glebius
9c5eef63b7 Run em_local_timer() once per second instead of running it once per 2 seconds.
This makes gathering of error stats more precise, and netstat(1) output look
right.

Reviewed by:	tackerman
2005-04-05 07:06:47 +00:00
imp
a2e81fc93f Use BUS_PROBE_DEFAULT for pci probe return value 2005-03-05 18:30:12 +00:00
glebius
e11596c717 Call if_link_state_change() when link status changes.
PR:		kern/76890
Reviewed by:	rwatson, sam
2005-02-04 18:36:04 +00:00
yar
75baf17e62 Respect the current setting of IFCAP_VLAN_HWTAGGING on
the interface when going to toggle VLAN support for
internal reasons.  If the IFCAP_VLAN_HWTAGGING bit is
cleared, we should rely on the (re)init routine to turn
VLAN support off and never touch the relevant hardware bits.

This applies to other capability bits, too.  The user
obviously has a reason for clearing a capability bit,
e.g., if his particular NIC is buggy and hangs if a
certain hardware capability is turned on even for a
fraction of a second.

The flag adapter->em_insert_vlan_header still is set or
reset irrespective of the IFCAP_VLAN_HWTAGGING setting,
as before, in order to handle the case when a user sets
promiscuous mode on an interface first and later turns
its IFCAP_VLAN_HWTAGGING bit on.

This change might look orthogonal to rev#1.85, but in fact
it is not.  It introduces bugfixes that hopefully will make
implementing the general scheme mentioned in the commit
message of rev#1.85 easier.
2005-01-26 13:44:47 +00:00
rwatson
ab578b0a37 Disable use of hardware VLAN tagging and stripping in if_em in the default
configuration: it appears to work properly in the non-promiscuous case, but
we've not yet implemented a more general solution that maintains full
functionality with promiscuous mode enabled.  While my hope is that we can
get one implemented soon, this will improve functionality substantially in
the mean time.

MFC after:	3 days
2005-01-26 11:40:58 +00:00
scottl
22b7e374a0 Convert if_em to the new bus_dmamap_load_sg() interface. The old callback
was really just a waste of cycles, so this streamlines it considerably.
2005-01-15 20:52:15 +00:00
tackerman
e6938cd35d Corrected a workaround that should only be applied to one adapter. Workaround
was causing device hangs when incorrectly applied to other adapters.

PR:		kern/66634
2005-01-01 19:57:23 +00:00
tackerman
e01f208ae1 Added device id support for Intel 82541ER and 82546GB dual port PCIE adapter.
PR:		 None
2005-01-01 19:54:39 +00:00
rwatson
0188f1bf36 Further refine the if_em vlan fix in if_em.c:1.53:
- Because em_encap() can now fail in a way that leaves us without an
  mbuf chain, potentially set *m_headp to NULL if that happens, so that
  the caller can do the right thing.  This case can occur when we try
  to prepend the vlan header mbuf but can't allocate additional memory.

- Modify the caller of em_encap() to detect a NULL m_head and not try
  to queue the mbuf if that happens.

- When em_encap() fails, make sure to call bus_dmamap_destroy() to
  clean up.
2004-11-14 20:20:28 +00:00
rwatson
56c8fde970 Correct a bug in the if_em driver relating to the use of vlans with
promiscuous mode introduced in 1.45, which programs the em card not
to strip or prepend tags when in promiscuous mode without also
modifying behavior to manually prepend a vlan header in the event
that the card isn't doing it on transmit.  Due to a feature of card
operation, if the global VLAN prepend/strip register isn't set,
setting the VLAN tag flag on individual packet descriptors will
cause the packet to be transmitted using ISL encapsulation rather
than 802.1Q VLAN encapsulation.

This fix causes em_encap() to prepend the header by tracking whether
the card is configured to temporarily disable prepending/stripping
due to promiscuous mode.  As a result, entering promiscuous mode on
the parent em interface no longer causes vlans to appear to "wedge"
or transmit ISL-encapsulated frames, which typically will not be
configured/spoken by the other endpoints on the VLAN trunk.  This
bug may also exist in other drivers, and the additional vlan
encapsulation logic should be abstracted and centralized in
if_vlan.c if so.

RELENG_5_3 candidate.

MFC after:	1 week
Tested by:	pjd, rwatson
Reported by:	astesin at ukrtelecom dot net
Reported by:	Mike Tancsa <mike at sentex dot net>
Reported by:	Iasen Kostov <tbyte at OTEL dot net>
2004-11-12 11:03:07 +00:00
bms
f53d94c2fb Move per-instance sysctls under the per-device-instance tree.
Reviewed by:	mux
Prodded by:	rwatson
2004-11-11 15:31:38 +00:00
phk
6b5c577ba3 Put the "Link is up/down" printfs behind bootverbose. gigE is not so uncommon
that we need to tell people about every cable in the network anymore.  It can
be enabled for debugging purposes with "boot -v".
2004-11-03 14:11:18 +00:00
mux
043596afd3 Add missing bus_dmamap_sync() calls. If you are using an architecture
with a weak memory model or x86 + PAE (or more specifically, your
driver is using bounce pages) and you have had problems with em(4),
this may fix it.  At least this is needed to have em(4) work properly
on FreeBSD/arm.

Original version by:	cognet
Reviewed by:		tackerman
Tested by:		cognet
2004-10-19 23:31:44 +00:00
scottl
57d3e400ee Use an alignment of 1 instead of PAGE_SIZE for the rx and tx buffer tags.
Since the e1000 DMA engines hava no constraints on the alignment of buffer
transfers, there is no reason to tell busdma that there is.  This save a
minimum of 1 malloc call per packet, which translates to eliminating 4 locks.
It also means that buffers are not needlessly bounced when transfered.  The
end result is a 38% improvement in pps in a 4 way bridging environment.

Obtained from: Sandvine, Inc.
2004-10-19 02:39:27 +00:00