branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
o use if_input for input packet processing
o don't strip the Ethernet header for input packets
o use BPF_* macros bpf tapping
o call ether_ioctl to handle default ioctl case
o track vlan changes
Reviewed by: many
Approved by: re
MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes.
ti.4: Update the ti(4) man page to include information on the
TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
and also include information about the new character
device interface and the associated ioctls.
man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated
links.
jumbo.9: New man page describing the jumbo buffer allocator
interface and operation.
zero_copy.9: New man page describing the general characteristics of
the zero copy send and receive code, and what an
application author should do to take advantage of the
zero copy functionality.
NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.
conf/files: Add uipc_jumbo.c and uipc_cow.c.
conf/options: Add the 5 options mentioned above.
kern_subr.c: Receive side zero copy implementation. This takes
"disposable" pages attached to an mbuf, gives them to
a user process, and then recycles the user's page.
This is only active when ZERO_COPY_SOCKETS is turned on
and the kern.ipc.zero_copy.receive sysctl variable is
set to 1.
uipc_cow.c: Send side zero copy functions. Takes a page written
by the user and maps it copy on write and assigns it
kernel virtual address space. Removes copy on write
mapping once the buffer has been freed by the network
stack.
uipc_jumbo.c: Jumbo disposable page allocator code. This allocates
(optionally) disposable pages for network drivers that
want to give the user the option of doing zero copy
receive.
uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are
enabled if ZERO_COPY_SOCKETS is turned on.
Add zero copy send support to sosend() -- pages get
mapped into the kernel instead of getting copied if
they meet size and alignment restrictions.
uipc_syscalls.c:Un-staticize some of the sf* functions so that they
can be used elsewhere. (uipc_cow.c)
if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
calling malloc() with M_WAITOK. Return an error if
the M_NOWAIT malloc fails.
The ti(4) driver and the wi(4) driver, at least, call
this with a mutex held. This causes witness warnings
for 'ifconfig -a' with a wi(4) or ti(4) board in the
system. (I've only verified for ti(4)).
ip_output.c: Fragment large datagrams so that each segment contains
a multiple of PAGE_SIZE amount of data plus headers.
This allows the receiver to potentially do page
flipping on receives.
if_ti.c: Add zero copy receive support to the ti(4) driver. If
TI_PRIVATE_JUMBOS is not defined, it now uses the
jumbo(9) buffer allocator for jumbo receive buffers.
Add a new character device interface for the ti(4)
driver for the new debugging interface. This allows
(a patched version of) gdb to talk to the Tigon board
and debug the firmware. There are also a few additional
debugging ioctls available through this interface.
Add header splitting support to the ti(4) driver.
Tweak some of the default interrupt coalescing
parameters to more useful defaults.
Add hooks for supporting transmit flow control, but
leave it turned off with a comment describing why it
is turned off.
if_tireg.h: Change the firmware rev to 12.4.11, since we're really
at 12.4.11 plus fixes from 12.4.13.
Add defines needed for debugging.
Remove the ti_stats structure, it is now defined in
sys/tiio.h.
ti_fw.h: 12.4.11 firmware.
ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13,
and my header splitting patches. Revision 12.4.13
doesn't handle 10/100 negotiation properly. (This
firmware is the same as what was in the tree previously,
with the addition of header splitting support.)
sys/jumbo.h: Jumbo buffer allocator interface.
sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to
indicate that the payload buffer can be thrown away /
flipped to a userland process.
socketvar.h: Add prototype for socow_setup.
tiio.h: ioctl interface to the character portion of the ti(4)
driver, plus associated structure/type definitions.
uio.h: Change prototype for uiomoveco() so that we'll know
whether the source page is disposable.
ufs_readwrite.c:Update for new prototype of uiomoveco().
vm_fault.c: In vm_fault(), check to see whether we need to do a page
based copy on write fault.
vm_object.c: Add a new function, vm_object_allocate_wait(). This
does the same thing that vm_object allocate does, except
that it gives the caller the opportunity to specify whether
it should wait on the uma_zalloc() of the object structre.
This allows vm objects to be allocated while holding a
mutex. (Without generating WITNESS warnings.)
vm_object_allocate() is implemented as a call to
vm_object_allocate_wait() with the malloc flag set to
M_WAITOK.
vm_object.h: Add prototype for vm_object_allocate_wait().
vm_page.c: Add page-based copy on write setup, clear and fault
routines.
vm_page.h: Add page based COW function prototypes and variable in
the vm_page structure.
Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.
Tested on: i386, alpha, sparc64
The constant I was using was correct, but I mislabeled it as 256K when
it should have been 512K. This doesn't actually change the code, but
it clarifies things somewhat.
Submitted by: Chuck Cranor <chuck@research.att.com>
- Use pci_get_powerstate()/pci_set_powerstate() in all the other drivers
that need them so we don't have to fiddle with the PCI power management
registers directly.
- Use pci_enable_busmaster()/pci_enable_io() to turn on busmastering and
PIO/memory mapped accesses.
- Add support to the RealTek driver for the D-Link DFE-530TX+ which has
a RealTek 8139 with its own PCI ID. (Submitted by Jason Wright)
- Have the SiS 900/National DP83815 driver be sure to disable PME
mode in sis_reset(). This apparently fixes a problem on some
motherboards where the DP83815 chip fails to receive packets.
(Submitted by Chuck McCrobie <mccrobie@cablespeed.com>)
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.
Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
All calls to mtx_init() for mutexes that recurse must now include
the MTX_RECURSE bit in the flag argument variable. This change is in
preparation for an upcoming (further) mutex API cleanup.
The witness code will call panic() if a lock is found to recurse but
the MTX_RECURSE bit was not set during the lock's initialization.
The old MTX_RECURSE "state" bit (in mtx_lock) has been renamed to
MTX_RECURSED, which is more appropriate given its meaning.
The following locks have been made "recursive," thus far:
eventhandler, Giant, callout, sched_lock, possibly some others declared
in the architecture-specific code, all of the network card driver locks
in pci/, as well as some other locks in dev/ stuff that I've found to
be recursive.
Reviewed by: jhb
Have if_ti stop "hiding" the softc pointer in the buffer region. Rather,
use the available void * passed to the free routine and pass the softc
pointer through there.
To note: in MEXTADD(), TI_JUMBO_FRAMELEN should probably be TI_JLEN. I left it
unchanged, because this way I'm sure to not damage anything in this respect...
takes care of all the 10/100 and gigE PCI drivers that I've done.
Next will be the wireless drivers, then the USB ones. I may pick up
some stragglers along the way. I'm sort of playing this by ear: if
anyone spots any places where I've screwed up horribly, please let me
know.
the drivers.
* Remove legacy inx/outx support from chipset and replace with macros
which call busspace.
* Rework pci config accesses to route through the pcib device instead of
calling a MD function directly.
With these changes it is possible to cleanly support machines which have
more than one independantly numbered PCI busses. As a bonus, the new
busspace implementation should be measurably faster than the old one.
that should be better.
The old code counted references to mbuf clusters by using the offset
of the cluster from the start of memory allocated for mbufs and
clusters as an index into an array of chars, which did the reference
counting. If the external storage was not a cluster then reference
counting had to be done by the code using that external storage.
NetBSD's system of linked lists of mbufs was cosidered, but Alfred
felt it would have locking issues when the kernel was made more
SMP friendly.
The system implimented uses a pool of unions to track external
storage. The union contains an int for counting the references and
a pointer for forming a free list. The reference counts are
incremented and decremented atomically and so should be SMP friendly.
This system can track reference counts for any sort of external
storage.
Access to the reference counting stuff is now through macros defined
in mbuf.h, so it should be easier to make changes to the system in
the future.
The possibility of storing the reference count in one of the
referencing mbufs was considered, but was rejected 'cos it would
often leave extra mbufs allocated. Storing the reference count in
the cluster was also considered, but because the external storage
may not be a cluster this isn't an option.
The size of the pool of reference counters is available in the
stats provided by "netstat -m".
PR: 19866
Submitted by: Bosko Milekic <bmilekic@dsuper.net>
Reviewed by: alfred (glanced at by others on -net)
the 12.4.11 firmware with a few changes to the link handling code merged
in from the 12.4.13 release. I'm doing this because the 12.4.13 firmware
doesn't seem to handle 10/100 link settings properly on 1000baseT cards.
Note that the revision codes still identify the firmware as 12.4.13
because both ti_fw2.h and ti_fw.h have to have the same revision values,
and I wanted to keep the 12.4.13 firmware for Tigon 1 cards.
It's nice to have firmware source.
cards. This basically involves switching to the 12.4.13 firmware, plus
a couple of minor tweaks to the driver.
Also changed the jumbo buffer allocation scheme just a little to avoid
'failed to allocate jumbo buffer' conditions in certain cases.
ether_ifdetach().
The former consolidates the operations of if_attach(), ng_ether_attach(),
and bpfattach(). The latter consolidates the corresponding detach operations.
Reviewed by: julian, freebsd-net
of the individual drivers and into the common routine ether_input().
Also, remove the (incomplete) hack for matching ethernet headers
in the ip_fw code.
The good news: net result of 1016 lines removed, and this should make
bridging now work with *all* Ethernet drivers.
The bad news: it's nearly impossible to test every driver, especially
for bridging, and I was unable to get much testing help on the mailing
lists.
Reviewed by: freebsd-net
there are stubs compiled into the kernel if BPF support is not enabled,
there aren't any problems with unresolved symbols. The modules in /modules
are compiled with BPF support enabled anyway, so the most this will do is
bloat GENERIC a little.
declaration for the interface driver from "foo" to "if_foo" but leave the
declaration for the miibus attached to the interface driver alone. This
lets the internal module name be "if_foo" while still allowing the miibus
instances to attach to "foo."
This should allow ifconfig to autoload driver modules again without
breaking the miibus attach.
This whole idea isn't going to work until somebody makes the bus/kld
code smarter. The idea here is to change the module's internal name
from "foo" to "if_foo" so that ifconfig can tell a network driver from
a non-network one. However doing this doesn't work correctly no matter
how you slice it. For everything to work, you have to change the name
in both the driver_t struct and the DRIVER_MODULE() declaration. The
problems are:
- If you change the name in both places, then the kernel thinks that
the device's name is now "if_foo", so you get things like:
if_foo0: <FOO ethernet> irq foo at device foo on pcifoo
if_foo0: Ethernet address: foo:foo:foo:foo:foo:foo
This is bogus. Now the device name doesn't agree with the logical
interface name. There's no reason for this, and it violates the
principle of least astonishment.
- If you leave the name in the driver_t struct as "foo" and only
change the names in the DRIVER_MODULE() declaration to "if_foo" then
attaching drivers to child devices doesn't work because the names don't
agree. This breaks miibus: drivers that need to have miibuses and PHY
drivers attached never get them.
In other words: damned if you do, damned if you don't.
This needs to be thought through some more. Since the drivers that
use miibus are broken, I have to change these all back in order to
make them work again. Yes this will stop ifconfig from being able
to demand load driver modules. On the whole, I'd rather have that
than having the drivers not work at all.
length for mini receive ring. The max length was MHLEN, however the mbufs
are actually shortened to MHLEN - ETHER_ALIGN to force payload alignment.
PR: 13793
This fixes, at least, panics in ncr_attach() on i386's with about 5MB
of memory. The restriction was a hack to leave some low memory for ISA
DMA, but on i386's we now allocate pages from the top down, so all the
restriction did was cause our allocations to fail when there is no free
memory above 1MB.
a PCI memory mapped region, rman_get_bushandle() returns what happens
to be a kernel virtual address pointing to the base of the PCI shared
memory window. However this is not the behavior on all platforms:
the only thing you should do with the bushandle is pass it to the
bus_spare_read()/bus_space_write() routines. If you actually do want
the kernel virtual address of the base of the PCI memory window, you
need to use rman_get_virtual().
The problem is that at the moment, rman_get_virtual() returns a physical
address, which is bad. In order to get the kernel virtual address we
need, we have to play with it a little.
Presumeably this behavior will be changed, but in the meantime the
Tigon driver won't work. So for the moment, I'm adding a kludge to
make things happy on the alpha: the correct kernel virtual address
is calculated from the value returned by rman_get_virtual(). This
should be removed once rman_get_virtual() starts doing the right
thing.
This should make the Tigon actuall work on the alpha now.
critical mbuf fields to sane values. Simplify the use of ETHER_ALIGN to
enforce payload alignment, and turn it on on the x86 as well as alpha
since it helps with NFS which wants the payload to be longword aligned
even though the hardware doesn't require it.
This fixes a problem with the ti driver causing an unaligned access trap
on the Alpha due to m_adj() sometimes not setting the alignment correctly
because of incomplete mbuf initialization.
in ti_rxeof() instead. This doesn't really seem to provide much in the
way of a performance boost, and I'm pretty sure it can cause mbuf leakage
in some extreme cases.
positively not let ti_encap() fill up the TX ring all the way and wrap
around. This fixes a potential transmit lockup where a really fast
machine (or particular TX traffic pattern) can overrun the end of the
ring.
Reported by: John Plevyak <jplevyak@inktomi.com>
preparation for tsunami support. Previous chipsets' direct-mapped DMA
mask was always 1024*1024*1024. The Tsunami chipset needs it to be
2*1024*1024*1024
These changes should not affect the i386 port
Reviewed by: Doug Rabson <dfr@nlsystems.com>
in the transmit code: the TX descriptor ring, and a 'shadow' ring of mbuf
pointers, one for each TX descriptor. When transmitting a packet that
consists of several fragments in an mbuf chain, we link each fragment
to a descriptor in the TX ring, but we only save a pointer to the mbuf
chain. This pointer is saved in the shadow ring entry which corresponds
to the first fragment in the packet. Later, ti_txeof() can release the
whole chain with a single m_freem() call. (We need the second ring to
keep track of the virtual addresses of the mbuf chains.)
The problem with this is that the Tigon isn't actually through with the
mbuf chain until it reaches the last fragment (which has the TI_BDFLAG_END
bit set), however the current scheme releases the mbuf chain as soon as
the first fragment is consumed. This is wrong, since the mbufs can then
be yanked out from under the Tigon and modified before the other fragments
can be transmitted.
The fix is to make a one line change to ti_encap() so that it saves the
mbuf chain pointer in the shadow ring entry that corresponds to the last
fragment in TX ring instead of the first. This prevents the mbufs from
being released until the last fragment is transmitted.
Painstakingly diagnosed and fixed by: Robert Picco <picco@mail.wevinc.com>
Brought to my attention by: dg
#define COMPAT_PCI_DRIVER(name,data) DATA_SET(pcidevice_set,data)
.. to 2.2.x and 3.x if people think it's worth it. Driver writers can do
this if it's not defined. (The reason for this is that I'm trying to
progressively eliminate use of linker_sets where it hurts modularity and
runtime load capability, and these DATA_SET's keep getting in the way.)
bug in the stats accounting (nicSendBDs counter was bogus when TX ring was
configured to be in host memory).
Update if_tireg.h to look for new firmware fix level.
from ever catching up to the transmit consumer index. We can't let this
happen because ti_txeof() depends on the assumption that producer == consumer
means the ring is empty, and producer != consumer means the ring has some
number of active descriptors in it.
Oh, I forgot to mention: this driver also works on FreeBSD/alpha (big
thanks to Andrew Gallatin). And there is a 2.2.x version available for
those who stubbornly refuse to upgrade.
Networks Tigon 1 and Tigon 2 chipsets. There are a _lot_ of OEM'ed
gigabit ethernet adapters out there which use the Alteon chipset so
this driver covers a fair amount of hardware. I know that it works with
the Alteon AceNIC, 3Com 3c985 and Netgear GA620, however it should also
work with the DEC/Compaq EtherWORKS 1000, Silicon Graphics Gigabit
ethernet board, NEC Gigabit Ethernet board and maybe even the IBM and
and Sun boards. The Netgear board is the cheapest (~$350US) but still
yields fairly good performance.
Support is provided for jumbo frames with all adapters (just set the
MTU to something larger than 1500 bytes), as well as hardware multicast
filtering and vlan tagging (in conjunction with the vlan support in
-current, which I should merge into -stable soon). There are some hooks
for checksum offload support, but they're turned off for now since
FreeBSD doesn't have an officially sanctioned way to support checksum
offloading (yet).
I have not added the 'device ti0' entry to GENERIC since the driver
with all the firmware compiled in is quite large, and it doesn't really
fit into the category of generic hardware.