Commit Graph

137 Commits

Author SHA1 Message Date
Gleb Smirnoff
9297f2a72e The procedure of raceless switching between polling mode and
taskqueued interrupt mode is going to be quite complex. Since
the polling mode is considered legacy feature for em(4) driver,
the decision is made to make polling and new interrupt handler
mutually exclusive, selected at compile time.

If kernel is compiled with DEVICE_POLLING, the fast taskqueued
interrupt handler code is disabled and the em_poll() and legacy
em_intr() functions are enabled. Otherwise, legacy functions
are disabled and only em_intr_fast() code is compiled.

Discussed with:		scottl
2006-06-06 08:03:49 +00:00
Gleb Smirnoff
86c9af6b69 Fix static array overrun.
(This will be also fixed in next vendor release.)

Coverity ID:	916
Reviewed by:	Jack Vogel
2006-05-17 07:38:58 +00:00
Olivier Houchard
5d1e6e68fc Bring back arm-specific workaround from rev 1.15:
Do not use the IO-mapping to issue the reset on the 82546 on arm. For some
reason, it results in corrupted descriptors.
2006-04-13 15:10:25 +00:00
Gleb Smirnoff
2a7d090a2e Restore accidentially removed rev. 1.3 2006-04-07 10:18:24 +00:00
Gleb Smirnoff
a80d10932e Merge in new driver from Intel, version 5.1.5. Adds support for some
new chips and improves support for already supported ones.

Some details, important for future merges:
  - if_em.c merged manually, viewing diff between new vendor
    driver and previous one.
  - if_em_hw.h dropped in from vendor, and then restored revisions
    1.16, 1.17, 1.18.
  - if_em_hw.c dropped in from vendor, and then two liner change made,
    that restores support for two rare chips.
2006-04-06 17:09:03 +00:00
Gleb Smirnoff
9cc9484cb5 Back out 1.112,1.113. I don't have enough resources to fix breakages
introduced by this change.
2006-02-22 14:11:16 +00:00
Gleb Smirnoff
37790e7f74 Fix fallout from last commit - we need to program the MAC address in em_init(). 2006-02-15 14:27:38 +00:00
Gleb Smirnoff
95c01e57d9 em_hardware_init() in em_init() is not needed, and leads to annoying
link flap.

Submitted by:	ru, Mike Tancsa
2006-02-15 13:45:02 +00:00
Gleb Smirnoff
1d66a76549 Set ifp->if_baudrate according to current speed. 2006-02-15 11:38:33 +00:00
Gleb Smirnoff
c47c580e83 - Rename em_print_link_status() to em_update_link_status().
- In em_attach() remove em_check_for_link(). Not needed here, since
  already done in em_hardware_init().
- In em_attach() replace the printing block with call to
  em_update_link_status().
- Remove modification of sc->link_state from em_hardware_init() and
  from em_media_status(). This makes em_update_link_status() a
  single point of change. Call em_update_link_status() where needed.
2006-02-15 10:51:11 +00:00
Gleb Smirnoff
eb0ec08982 - Second style(9) megacleanup.
- Rename "adapter" to "sc"/"softc", to be like other drivers.

  (-13 Kb less source code)
2006-02-15 08:39:50 +00:00
Gleb Smirnoff
207ebbc424 Move includes from if_em.h to if_em.c and sort them. 2006-02-14 13:11:36 +00:00
Gleb Smirnoff
4f4a363562 Fix two important typos in watchdog handling:
- Restart watchdog if we *did* processed any descriptors. [1]
- Log the watchdog event if the link is *up*. [2]

PR:		kern/92948 [1]
Submitted by:	Mihail Balikov <mihail.balikov interbgc.com> [1]
PR:		kern/92895 [2]
Submitted by:	Vladimir Ivanov <wawa yandex-team.ru> [2]
2006-02-09 12:57:17 +00:00
Gleb Smirnoff
d640400bc5 Since em(4) taskqueue is a new network context, we need to conditionally
lock Giant here.

Submitted by:	Andrey V. Elsukov <bu7cher yandex.ru>
2006-02-07 13:11:13 +00:00
Scott Long
73450b30c8 Now that the em driver no longer needs to directly touch the scheduler, remove some
unneeded headers.
2006-02-04 16:50:14 +00:00
Gleb Smirnoff
479b23b772 This driver can do hardware VLAN tagging + checksum offloading.
In collaboration with:	Mihail Balikov <mihail.balikov interbgc.com>
2006-01-30 13:45:55 +00:00
Scott Long
db161bd596 Squash another invalid use of BUS_DMA_ALLOCNOW.
MFC After: 3 days
2006-01-28 15:50:19 +00:00
Maxime Henrion
b337232591 Fix a race condition by initializing the taskqueue before registering
the fast interrupt handler that uses it.  This fixes a panic at boot
time when em_intr_fast() calls taskqueue_enqueue().
2006-01-22 01:06:55 +00:00
Gleb Smirnoff
5eb5c9639a An attemp to make driver more readable and attaractive for further
hacking:
  - Remove all spaces at eol.
  - Improve style(9) in most frequently edited functions.
  - In em_encap() push variables for 82544 workaround in the block
    where they are only used.
  - In em_get_buf() remove unused variable.
2006-01-20 11:38:25 +00:00
Scott Long
0f92108d32 Add the following to the taskqueue api:
taskqueue_start_threads(struct taskqueue **, int count, int pri,
			const char *name, ...);

This allows the creation of 1 or more threads that will service a single
taskqueue.  Also rework the taskqueue_create() API to remove the API change
that was introduced a while back.  Creating a taskqueue doesn't rely on
the presence of a process structure, and the proc mechanics are much better
encapsulated in taskqueue_start_threads().  Also clean up the
taskqueue_terminate() and taskqueue_free() functions to safely drain
pending tasks and remove all associated threads.

The TASKQUEUE_DEFINE and TASKQUEUE_DEFINE_THREAD macros have been changed
to use the new API, but drivers compiled against the old definitions will
still work.  Thus, recompiling drivers is not a strict requirement.
2006-01-14 01:55:24 +00:00
Scott Long
77d9852a55 Fix the interrupt race for real. Don't register the interrupt until after
the the interface has been configured.  I'm not sure how this could ever
have worked before, but it should be fixed now.  Also break out the interrupt
degresitration function into it's own step.
2006-01-13 08:18:04 +00:00
Scott Long
87a444e62e Disable interrupts while we are setting up the handler. The interrupt really
shouldn't be set up or enabled until much later, but that will be investigated
at a later time.
2006-01-13 05:04:27 +00:00
Scott Long
2ff7d1b635 Significant performance improvements for the if_em driver:
- Only update the rx ring consumer pointer after running through the rx loop,
  not with each iteration through the loop.
- If possible, use a fast interupt handler instead of an ithread handler.  Use
  the interrupt handler to check and squelch the interrupt, then schedule a
  taskqueue to do the actual work.  This has three benefits:
  - Eliminates the 'interrupt aliasing' problem found in many chipsets by
    allowing the driver to mask the interrupt in the NIC instead of the
    OS masking the interrupt in the APIC.
  - Allows the driver to control the amount of work done in the interrupt
    handler.  This results in what I call 'adaptive polling', where you get
    the latency benefits of a quick response to interrupts with the
    interrupt mitigation and work partitioning of polling.  Polling is still
    an option in the driver, but I consider it orthogonal to this work.
  - Don't hold the driver lock in the RX handler.  The handler and all data
    associated is effectively serialized already.  This eliminates the cost of
    dropping and reaquiring the lock for every receieved packet.  The result
    is much lower contention for the driver lock, resulting in lower CPU usage
    and lower latency for interactive workloads.

The amount of work done in the taskqueue is controlled by the sysctl
dev.em.N.rx_processing_limit

and tunable
hw.em.rx_process_limit

Setting these to -1 effectively removes the limit.

The fast interrupt and taskqueue can be disabled by defining NO_EM_FASTINTR.
This work has been shown to increase fast-forwarding from ~570 kpps to
~750 kpps (note that the same NIC hardware seems unable to transmit more than
800 kpps, so this increase appears to be limited almost solely by the
hardware).  Gains have been shown in other workloads, ranging from better
performance to elimination of over-saturation livelocks.

Thanks to Andre Opperman for his time and resources from his network
performance project in performing much of the testing.  Thanks to Gleb
Smirnoff and Danny Braniss for their help in testing also.
2006-01-11 00:30:25 +00:00
Gleb Smirnoff
6bbdde89f5 A style nit. 2005-12-28 09:37:04 +00:00
Gleb Smirnoff
dd157f8102 Tidy up em_resume():
- Don't call em_init_locked() twice.
  - Collapse two if() blocks into one.
2005-12-28 08:58:28 +00:00
Gleb Smirnoff
0b6169c0fa Add simple suspend and resume methods. We call em_stop() on suspend and
em_init() on resume. With this change the network is ready right after
resume, without half minute lag.

Tested by:	Jacques Garrigue
2005-12-26 10:39:21 +00:00
Gleb Smirnoff
0014abfcc4 Add a quirk to fix resume on some laptops.
Reported by:	joe
Reported by:	Huang wen hui <huang gddsn.org.cn>
Reported by:	Jacques Garrigue <garrigue math.nagoya-u.ac.jp>
PR:		kern/89825
2005-12-22 09:09:39 +00:00
Gleb Smirnoff
d147662cd3 - Fix VLAN_INPUT_TAG() macro, so that it doesn't touch mtag in
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
  incorrectly in almost all drivers. Indicate failure with
  mbuf value of NULL.

In collaboration with:	yongari, ru, sam
2005-12-18 18:24:27 +00:00
Pyun YongHyeon
e3829d1f54 Add jumbo frame support for architectures with strict alignment.
Reviewed by:	glebius
2005-12-16 08:29:43 +00:00
Gleb Smirnoff
02a101a611 - Polling can be used on SMP.
- A kernel module can support polling.
2005-12-12 19:29:30 +00:00
Ruslan Ermilov
f4e9888107 Fix -Wundef. 2005-12-04 02:12:43 +00:00
Gleb Smirnoff
9d6457b44c On the 82571 and newer chipset the ICR register is meaningful only
if the E1000_ICR_INT_ASSERTED bit is set.

Submitted by:	Jack Vogel
2005-12-02 08:33:56 +00:00
Olivier Houchard
5498dbb282 Remember the bus_dmamap_t where we loaded the mbuf, and sync this map instead
of tx_buffer->map, or we could end up syncing the wrong map.
2005-11-24 15:13:47 +00:00
Gleb Smirnoff
7ddd4e4126 Merge in new driver version from Intel - 3.2.18.
The most important change is support for adapters based on
82571 and 82572 chips.

Tested on:	82547EI on i386
Tested on:	82540EM on sparc64
2005-11-24 01:44:49 +00:00
Pyun YongHyeon
77805d0e09 busdma cleanup for em(4).
- don't force busdma to pre-allocate bounce pages for parent tag.
 - use system supplied roundup2 macro instead of rolling its own version.
 - TX/RX decriptor length should be multiple of 128. There is no
   no need to expand the size with the multiple of 4096.
 - don't create/destroy DMA maps in TX/RX handlers. Use pre-allocated
   DMA maps. Since creating DMA maps on sparc64 is time consuming
   operations(resource mananger overhead), this change should boost
   performance on sparc64. I could get > 2x speedup on Ultra60.
 - TX/RX descriptors could be aligned on 128 boundary. Aligning them
   on PAGE_SIZE is waste of resource.
 - don't blindly create TX DMA tag with size of MCLBYTES * 8. The size
   is only valid under jumbo frame environments. Instead of using the
   hardcoded value, re-compute necessary size on the fly.
 - RX side bus_dmamap_load_mbuf_sg(9) support.
 - remove unused macro EM_ROUNDUP and constant EM_MMBA.

Reviewed by:	scottl
Tested by:	glebius
2005-11-21 04:17:43 +00:00
Gleb Smirnoff
7f4f47d3fa - Backout last change, since it is memory overkill for a non busy host or
for a notebook with em(4) adapter.
- Introduce tunables em.hw.txd and em.hw.rxd, which allow administrator
  to configure number of transmit and receive descriptors.
- Check em.hw.txd and em.hw.rxd against hardware limits [*] and require
  them to be multiple of 128.

[*] According to comments in if_em.h the 82540EM/82541ER chips can handle
    more than 256 descriptors. Since we don't have this hardware to test,
    we decided to mimic NetBSD wm(4) driver, that limits these chips to
    256 descriptors.

In collaboration with:	yongari
2005-11-17 10:13:18 +00:00
Ruslan Ermilov
4a0d6638b3 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
Gleb Smirnoff
7f7364b113 Give a try to autoconfiguring the number of transmit and receive
descriptors depending on chip revision.
2005-11-10 11:44:37 +00:00
Gleb Smirnoff
7d62e533fb - Introduce two more stat counters, counting number of RX
overruns and number of watchdog timeouts.
- Do not log(9) RX overrun events, since this pessimizes
  things under load [1].
- Do not increase if->if_oerrors in em_watchdog(), since
  this leads to counter slipping back, when if->if_oerrors
  is recalculated in em_update_stats_counters(). Instead
  increase watchdog counter in em_watchdog() and take it
  into account in em_update_stats_counters().

Submitted by:	ade [1]
2005-11-09 15:23:54 +00:00
Pyun YongHyeon
94e3c643cd Make em(4) work on big-endian architectures.
- disable jumbo frame support on strict alignment architectures due
   to the limitation of hardware. The driver needs a fix-up code for
   RX side. The fix will show up in near future.
 - fix endian issue for 82544 on PCI-X bus. I couldn't test this as
   I don't have the NIC/hardware.
 - prefer PCIR_BAR to hardcoded EM_MMBA.
 - Properly checks for for 64bit BAR [1]
 - replace inl/outl with bus_space(9) [1]
 - fix endian issue on VLAN handling.
 - reorder header files and remove unnecessary one.

Reviewed by:	cognet
No response from:	pdeuskar, tackerman
Obtained from:	OpenBSD [1]
2005-11-09 08:43:18 +00:00
Robert Watson
66767c79cf Put probe-time printf of adapter speed and duplex behind bootverbose:
since the link takes a bit to negotiate, the information is pretty
much never available during the probe.  As such, the boot output
pretty much always prints N/A for speed and duplex.  Since we print
out the output of ifconfig during the user space boot, this early
boot information is also generally redundant, and added to the noise.

MFC after:	2 weeks
2005-10-31 19:59:40 +00:00
Gleb Smirnoff
6bec9a1eff Some more minor cleanups of em(4) driver:
- Destroy mutex in case of attach failure. [1]
  - Lock properly em_watchdog(). [1]
  - Lock properly em_sysctl_int_delay(). [1]
  - Remove unused global adapter linked list.
  - Remove unused dma_size field from struct em_dma_alloc.
  - Do not touch interface statistics, that must be edited
    only by upper layers. [1]

Submitted by:	yongari [1]
2005-10-20 09:55:49 +00:00
Gleb Smirnoff
5422f907d4 Revamp interrupt handling in em(4) driver:
o Do not mask the RX overrun interrupt.

o Rewrite em_intr():
  - Axe EM_MAX_INTR.
  - Cycle acknowledging interrupts and processing
    packets until zero interrupt cause register is
    read.
  - If RX overrun comes in log this fact. [ NetBSD also
    resets adapter in this case, but my tests showed that
    this is not needed and only pessimizes behavior under
    heavy load. ]
  - Since almost all functions is rewritten, style the
    remaining lines.

This fixes em(4) interfaces wedging under high load.

In collaboration with:	wpaul, cognet
Obtained from:		NetBSD
2005-10-20 08:46:43 +00:00
Gleb Smirnoff
a7c54158fc In the em_process_receive_interrupts() cycle check the IFF_DRV_RUNNING
flag. This fixes panic, when 'ifconfig em0 down' was called and it calls
em_stop() while the em_process_receive_interrupts() has temporarily
dropped the lock.
2005-10-19 13:34:48 +00:00
Olivier Houchard
60d41c425b - Use BUS_DMASYNC_PREWRITE in em_get_buf(), as the adapter is about to read
the descriptors set.
- In em_process_receive_interrupts(), call bus_dmamap_sync() for the
descriptors set each time we modify one descriptor, instead of doing it only
at the function exit, to make sure the adapters know he can re-use the
descriptor.
This helps on arm with write-back data cache (and possibly on other arches
with bounce pages, I don't know) under heavy network load. Without this,
if we attempt to process more than num_rx_desc descriptors, the adapter
would just stop processing rx interrupts.
2005-10-18 00:42:10 +00:00
Gleb Smirnoff
66e58c20dd From the PR:
The receive function em_process_receive_interrupts() unlocks the
  adapter while ether_input() processes the packet, and then locks
  it back. In the meantime, em_init() may be called, either from
  em_watchdog() from softclock interrupt or from the ifconfig(8)
  program. The em_init() resets the card, in particular it sets
  adapter->next_rx_desc_to_check to 0 and resets hardware RX Head
  and Tail descriptor pointers. The loop in
  em_process_receive_interrupts() does not expect these things to
  change, and a mess may result.

This fixes long wedges of em(4) interfaces receive part under high
load and IP fastforwarding enabled.

PR:		kern/87418
Submitted by:	Dmitrij Tejblum <tejblum yandex-team.ru>
2005-10-14 11:00:15 +00:00
Gleb Smirnoff
106e9401db Cleanup from __FreeBSD_version. 2005-10-14 10:34:46 +00:00
Gleb Smirnoff
f0796cd26c - Don't pollute opt_global.h with DEVICE_POLLING and introduce
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
  can be compiled as loadable modules.

Reviewed by:	bde
2005-10-05 10:09:17 +00:00
Gleb Smirnoff
4092996774 Big polling(4) cleanup.
o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
  poll_mtx is not dropped during call to polling handler.
  This fixes problem with idle polling.

o Make registration and deregistration from polling in a
  functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
  with ifconfig.

Detailed kern_poll.c changes:
  - Remove polling handler flags, introduced in 1.21. The are not
    needed now.
  - Forget and do not check if_flags, if_capenable and if_drv_flags.
  - Call all registered polling handlers unconditionally.
  - Do not drop poll_mtx, when entering polling handlers.
  - In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
  - In netisr_poll() axe the block, where polling code asks drivers
    to unregister.
  - In netisr_poll() and ether_poll() do polling always, if any
    handlers are present.
  - In ether_poll_[de]register() remove a lot of error hiding code. Assert
    that arguments are correct, instead.
  - In ether_poll_[de]register() use standard return values in case of
    error or success.
  - Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
    poll_switch() goes through interface list and enabled/disables polling.
    A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
  - On attach driver announces IFCAP_POLLING in if_capabilities, but
    not in if_capenable.
  - On detach driver calls ether_poll_deregister() if polling is enabled.
  - In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
    flag. If there is no, then unlocks and returns.
  - In ioctl handler driver checks for IFCAP_POLLING flag requested to
    be set or cleared. Driver first calls ether_poll_[de]register(), then
    obtains driver lock and [dis/en]ables interrupts.
  - In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
    If present, then returns.This is important to protect from spurious
    interrupts.

Reviewed by:	ru, sam, jhb
2005-10-01 18:56:19 +00:00
Gleb Smirnoff
482b02b3f3 In em_process_receive_interrupts() store and clear adapter->fmt. This
make function reenterable. In the runtime the race is masked by serializing
of em_process_receive_interrupts() either by interrupt thread, or by
polling. The race can be triggered when polling is switched on or off.
2005-09-29 13:23:34 +00:00