Commit Graph

235 Commits

Author SHA1 Message Date
Kip Macy
3f345a5d09 Greatly simplify cxgb by removing almost all of the custom mbuf management logic
- remove mbuf iovec - useful, but adds too much complexity when isolated to
   the driver

- remove driver private caching - insufficient benefit over UMA to justify
  the added complexity and maintenance overhead

- remove separate logic for managing multiple transmit queues, with the
  new drbr routines the control flow can be made to much more closely resemble
  legacy drivers

- remove dedicated service threads, with per-cpu callouts one can get the same
  benefit much more simply by registering a callout 1 tick in the future if there
  are still buffered packets

- remove embedded mbuf usage - Jeffr's changes will (I hope) soon be integrated
  greatly reducing the overhead of using kernel APIs for reference counting
  clusters

- add hysteresis to descriptor coalescing logic

- add coalesce threshold sysctls to allow users to decide at run-time
  between optimizing for forwarding / UDP or optimizing for TCP

- add once per second watchdog to effectively close the very rare races
  occurring from coalescing

- incorporate Navdeep's changes to the initialization path required to
  convert port and adapter locks back to ordinary mutexes (silencing BPF
  LOR complaints)

- enable prefetches in get_packet and tx cleaning

Reviewed by:	navdeep@
MFC after:	2 weeks
2009-06-19 23:34:32 +00:00
Sam Leffler
d659538f72 r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by:	jhb, kmacy
2009-06-15 19:50:03 +00:00
George V. Neville-Neil
02c7d9a64f Re-add the send queue tunable for people who do not use buffering.
Reviewed by:	jhb
MFC after:	3 days
2009-06-11 21:32:26 +00:00
George V. Neville-Neil
74b0900f05 Add a missing error statistic, the number of FCS errors on receive.
Reviewed by:	jhb
MFC after:	1 day
2009-06-10 14:34:56 +00:00
Kip Macy
a913be0917 - add drbr routines for accessing #qentries and conditionally dequeueing
- track bytes enqueued in buf_ring
2009-06-09 19:19:16 +00:00
Bjoern A. Zeeb
8d8bc0182e After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
John Baldwin
74fb0ba732 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
Marko Zec
b2bc853659 Update VNET base pointer setting macro to use a correct source of
vnet context.

Approved by:	julian (mentor)
2009-06-01 21:10:23 +00:00
George V. Neville-Neil
e3503bc98d Rework interrupt bringup and teardown.
Calculate the exact number of vectors we'll use before calling
pci_alloc_msix.  Don't grab nine all the time.

Call cxgb_setup_interrupts once per T3, not once per port.  Ditto
for cxgb_teardown_interrupts.

Don't leak resources when interrupt setup fails in the middle.

Obtained from:	Navdeep Parhar
MFC after:	10 days
2009-05-27 20:13:36 +00:00
George V. Neville-Neil
7529967798 Partial reversion of previous commit. The CXGB_SHUTDOWN flag does NOT
need to be inverted when doing an ifconfig down of an interface.

Pointed out by:	Navdeep Parhar
MFC after: 1 week
2009-05-22 18:26:47 +00:00
George V. Neville-Neil
c2009a4c90 Fix a possible panic cxgb_controller_attach() routine that would occur
only if prepping the adapter failed.

Slight adjustment to comments.

Fix a bug whereby downing the interface didn't preven it from
processing packets.

Submitted by:	Navdeep Parhar
MFC after:	1 week
2009-05-22 15:06:03 +00:00
George V. Neville-Neil
0bbdea7776 Integrate three changes from Chelsio.
1) Add a sysctl that will say what type of PHYs exist on the card.
2) Fix a bug that occurs when an AEL 2005 PHY resets without a transciever
in the card.
3) Unify the PHY link detection code.

Obtained from:	Navdeep Parhar
MFC after:	10 days
2009-05-21 15:08:03 +00:00
George V. Neville-Neil
3cf138bb79 Modified the attach and detach routines to handle bringing ports up
and down more cleanly.  This addresses a problem where if we have the
link flap during boot the driver would lock up the system.

Reviewed by:	jhb
MFC after:	1 week
2009-05-21 14:43:12 +00:00
Warner Losh
00b4e54ae7 We no longer need to use d_thread_t, migrate to struct thread *. 2009-05-20 17:29:21 +00:00
Kip Macy
24df53962e fix bug introduced by last change
Submitted by:	Navdeep Parhar
2009-05-12 03:30:25 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Kip Macy
bf541f371e simplify by removing dead code 2009-04-27 22:54:30 +00:00
Robert Watson
78b5071407 Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and
TCPSTAT_INC(), rather than directly manipulating the fields across the
kernel.  This will make it easier to change the implementation of
these statistics, such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 22:07:19 +00:00
Kip Macy
80cb9f211a Import "flowid" support for serializing flows across transmit queues
Reviewed by:	rwatson and jeli
2009-04-10 06:16:14 +00:00
George V. Neville-Neil
0c1ff9c605 Minor updates to the Chelsio driver, including removing an LOR.
Submitted by:	Navdeep Parhar at Chelsio
Reviewed by:	gnn
MFC after: 	3 weeks
2009-03-23 19:58:26 +00:00
George V. Neville-Neil
6b6e256ae1 Fix a bug in the recent update to the Chelsio driver.
The tick routine was not being restarted in the init_locked routine
which could resulted in loss of carrier when updating the MTU.

Submitted by:	Navdeep Parhar at Chelsio
MFC after:	3 weeks
2009-03-21 17:09:00 +00:00
Robert Watson
a6c19108c5 Prefer ENETDOWN to ENXIO when returning queuing errors due to a link
down, interface down, etc, with if_cxgb's if_transmit routine.

MFC after:	3 days
Reviewed by:	kmacy
2009-03-10 22:35:45 +00:00
George V. Neville-Neil
f2d8ff04fe Update the Chelsio driver to the latest bits from Chelsio
Firmware upgraded to 7.1.0 (from 5.0.0).
T3C EEPROM and SRAM added; Code to update eeprom/sram fixed.
fl_empty and rx_fifo_ovfl counters can be observed via sysctl.
Two new cxgbtool commands to get uP logic analyzer info and uP IOQs
Synced up with Chelsio's "common code" (as of 03/03/09)

Submitted by:	 Navdeep Parhar at Chelsio
Reviewed by:	gnn
MFC after:	2 weeks
2009-03-10 19:22:45 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
George V. Neville-Neil
837f41b067 Check in the actual module recognition code for the Chelsio
driver.

Obtained from:	Chelsio Inc.
2008-12-18 14:21:35 +00:00
Bjoern A. Zeeb
dcdb4371ca Use inc_flags instead of the inc_isipv6 alias which so far
had been the only flag with random usage patterns.
Switch inc_flags to be used as a real bit field by using
INC_ISIPV6 with bitops to check for the 'isipv6' condition.

While here fix a place or two where in case of v4 inc_flags
were not properly initialized before.[1]

Found by:	rwatson during review [1]
Discussed with:	rwatson
Reviewed by:	rwatson
MFC after:	4 weeks
2008-12-17 12:52:34 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
George V. Neville-Neil
7f15419bb7 Bug fix to support N310 version of Chelsio cards (board ID 1088).
Obtained from:	Chelsio Inc.
MFC after:	3 days
2008-12-06 02:10:53 +00:00
George V. Neville-Neil
5197f3abd7 Re submit code to print the part and serial number for Chelsio cards.
The original code was accidentally removed in another commit.

MFC after: 1 day
2008-12-05 21:40:11 +00:00
George V. Neville-Neil
9036240993 Fix a bug with the ael1006 PHY. The bug shows up as persistent but incomplete
packet loss, of between 10-30%. The fix is to put the PHY into
and take it out of local loopback mode when resetting the interface.

Obtained from:	Chelsio Inc.
MFC after:	3 days
2008-12-04 20:32:53 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
George V. Neville-Neil
c9c0c99f30 Bug fix from Chelsio which addresses the issue of the device resetting
when it sees only received packets.  In some cases where a device only
recieves data it mistakenly thinks that its transmitting side is broken
and resets the device.

Obtained from:	Chelsio Inc.
MFC after:	3 days
2008-12-02 15:42:47 +00:00
Kip Macy
f3c713fdaf - fix bug where dnsperf would stop transmitting after a few seconds
- break complex conditionals in to multiple lines to avoid wrapping
- remove copious unused debug statements
- be more aggressive about cleaning in the calling thread
- eliminate usage of ENOSPC
- increase number of iterations that cxgbsp can do
- eliminate "initerr" usage to simplify ENOBUFS handling
- when coalescing pass all packets to BPF
- always set overrun if hardware queue is full
2008-12-02 07:01:18 +00:00
Kip Macy
e44730ed21 The pkthdr field is flowid not rss_hash 2008-12-02 00:51:56 +00:00
Kip Macy
6b9003b6e6 - fix multiqueue conditional
- don't leak mbuf tags in the non-conditional case

Found by: Navdeep Parhar
2008-12-02 00:48:08 +00:00
Kip Macy
6b58e3f4db integrate use after free fixes from private branch
Found by: kkenn@
2008-12-02 00:39:50 +00:00
Kip Macy
69c4b66c6d null out m_next when marshalling a packet 2008-12-01 05:44:08 +00:00
Kip Macy
f35c2d6551 Update internal mac stats every time the tick task is called
if we don't do this "netstat -w 1" will frequently see negative
differences in packets sent
2008-12-01 05:43:30 +00:00
Kip Macy
45839f4aeb don't manually track statistics 2008-12-01 04:42:39 +00:00
Kip Macy
ceac50eb77 Proper fix for tracking ifnet statistics 2008-12-01 04:41:45 +00:00
Kip Macy
5eba27fe0c Add backward compatibility ifdefs for non-multiq kernels 2008-11-23 07:30:07 +00:00
Kip Macy
5d96be4eb8 work around periodic leak on queue overrun by enabling coalescing of packets in to
work requests by default
2008-11-23 00:22:52 +00:00
Kip Macy
098fb9b469 intr_machdep.h breaks build on some arches and is not needed 2008-11-23 00:13:25 +00:00
Kip Macy
a02573bc06 - enable multiple transmit queues
- invert sense of hw.cxgb.singleq tunable to hw.cxgb.multiq
- don't wake up transmitting thread by default
- add per tx queue ifaltq to handle ALTQ
- remove several unused functions in cxgb_multiq.c
- add several sysctls: multiq_tx_enable, coalesce_tx_enable,
  and wakeup_tx_thread
- this obsoletes the hw.cxgb.snd_queue_len as ifq is replaced
  by a buf_ring
2008-11-22 08:05:05 +00:00
Kip Macy
db7f0b974f - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
George V. Neville-Neil
2a1b9f07fc Several small additions to the Chelsio 10G driver.
1) Fix a bug in dealing with the Alerus 1006 PHY which prevented the
device from ever coming back up once it had been set to down.

2) Add a kernel tunable (hw.cxgb.snd_queue_len) which makes it possible
to give the device more than IFQ_MAXLEN entries in its send queue.  The
default remains 50.

3) Add code to place the card'd identification and serial number into
its description (%desc) so that users can tell which card they have
installed.
2008-11-21 19:22:25 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
Kip Macy
d1340303b9 Update firmware version check
make ddp a tunable

Obtained from:	Chelsio Inc.
MFC after:	3 days
2008-11-12 04:45:09 +00:00
Bjoern A. Zeeb
ca6bc428e7 For now our LRO code (tcp_lro.c) only supports IPv4 properly thus
only enable if INET is on.

Reviewed by:	kmacy
MFC after:	2 months
2008-11-06 10:35:46 +00:00
Bjoern A. Zeeb
34627f9384 Hide AF_INET specific ioctl handling under #ifdef INET.
Reviewed by:	kmacy
MFC after:	2 months
2008-11-06 10:17:57 +00:00