- Don't use a single big DMA block for all rings. Create separate
DMA area for each ring instead. Currently the following DMA
areas are created:
Event ring, standard RX ring, jumbo RX ring, RX return ring,
hardware MAC statistics and producer/consumer status area.
For Tigon II, mini RX ring and TX ring are additionally created.
- Added missing bus_dmamap_sync(9) in various TX/RX paths.
- TX ring is no longer created for Tigon 1 such that it saves more
resources on Tigon 1.
- Data sheet is not clear about alignment requirement of each ring
so use 32 bytes alignment for normal DMA area but use 64 bytes
alignment for jumbo RX ring where the extended RX descriptor
size is 64 bytes.
- For each TX/RX buffers use separate DMA tag(e.g. the size of a
DMA segment, total size of DMA segments etc).
- Tigon allows separate DMA area for event producer, RX return
producer and TX consumer which is really cool feature. This
means TX and RX path could be independently run in parallel.
However ti(4) uses a single driver lock so it's meaningless
to have separate DMA area for these producer/consumer such that
this change creates a single status DMA area.
- It seems Tigon has no limits on DMA address space and I also
don't see any problem with that but old comments in driver
indicates there could be issues on descriptors being located in
64bit region. Introduce a tunable, dev.ti.%d.dac, to disable
using 64bit DMA in driver. The default is 0 which means it would
use full 64bit DMA. If there are DMA issues, users can disable
it by setting the tunable to 0.
- Do not increase watchdog timer in ti_txeof(). Previously driver
increased the watchdog timer whenever there are queued TX frames.
- When stat ticks is set to 0, skip processing ti_stats_update(),
avoiding bus_dmamap_sync(9) and updating if_collisions counter.
- MTU does not include FCS bytes, replace it with
ETHER_VLAN_ENCAP_LEN.
With these changes, ti(4) should work on PAE environments.
Many thanks to Jay Borkenhagen for remote hardware access.
have administrators control them. ti(4) provides a character
device to control various other features of driver via ioctls but
users had to write their own code to manipulate these parameters.
It seems some default values for these parameters are not optimal
on today's system but leave it as it was and let administrators
change them. The following parameters could be changed:
dev.ti.%d.rx_coal_ticks
dev.ti.%d.rx_max_coal_bds
dev.ti.%d.tx_coal_ticks
dev.ti.%d.tx_max_coal_bds
dev.ti.%d.tx_buf_ratio
dev.ti.%d.stat_ticks
The interface has to be brought down and up again before a change
takes effect.
ti(4) controller supports hardware MAC counters with additional
DMA statistics. So it's doable to export these counters via
sysctl interface. Unfortunately, these counters are cumulative
such that driver have to either send an explicit clear command to
controller after extracting them or have to maintain internal
counters to get actual changes. Neither look good to me so
counters were not exported via sysctl.
Pre-allocate the memory in device attach time. While I'm here
remove unnecessary reassignment of error variable as it was already
initialized. Also added a missing driver lock in TIIOCSETTRACE
handler.
allocator with UMA backed jumbo allocator by default. Previously
ti(4) used sf_buf(9) interface for jumbo buffers but it was broken
at this moment such that enabling jumbo frame caused instant panic.
Due to the nature of sf_buf(9) it heavily relies on VM changes but
it seems ti(4) was not received much blessing from VM gurus. I
don't understand VM magic and implications used in driver either.
Switching to UMA backed jumbo allocator like other network drivers
will make jumbo frame work on ti(4).
While I'm here, fully allocate all RX buffers. This means ti(4) now
uses 512 RX buffer and 1024 mini RX buffers.
To use sf_buf(9) interface for jumbo buffers, introduce a new
'options TI_SF_BUF_JUMBO'. If it is proven that sf_buf(9) is better
for jumbo buffers, interesting developers can fix the issue in
future.
ti(4) still needs more bus_dma(9) cleanups and should use separate
DMA tag/map for each ring(standard, jumbo, mini, command, event
etc) but it should work on all platforms except PAE.
Special thanks to Jay[1] who provided complete remote debugging
access.
Tested by: Jay Borkenhagen <jayb <> braeburn dot org > [1]
o Do not blindly UP controller when MTU is changed. Reinitialize
controller only if driver is running.
o Remove useless ti_stop() in ti_watchdog() since ti_init_locked()
always invokes ti_stop().
checksum offloading and VLAN hardware tag insertion/stripping from
the currently enabled hardware offloading capabilities.
Previously if_hwassist, which was initialized to TX/RX checksum
offloading, was blindly used to enable both TX and RX checksum
offloading such that disabling either TX or RX checksum offloading
was not possible.
ti(4) controllers support TX/RX checksum offloading with VLAN
tagging so announce TX/RX checksum offloading capability over VLAN
to vlan(4).
Make VLAN hardware tag insertion/stripping honors currently enabled
interface capability instead of blindly enabling VLAN hardware
tagging. This change allows disabling hardware support of VLAN tag.
Because ti(4) supports VLAN oversized frames, make network stack
know the capability by setting if_hdrlen.
While I'm here, rewrite SIOCSIFCAP handler and make sure to
reinitialize controller whenever TX/RX checksum offloading and VLAN
hardware tagging option is changed. The requirement of controller
reinitialization comes from the limitation of Tigon I/II firmware.
Tigon I/II firmware requires all related RCBs should be
reinitialized whenever any of its hardware offloading capabilities
change.
vlan(4) is also notified whenever the parent interface's capability
changes such that it can correctly handle TX/RX checksum offloading
based on parent interface's enabled offloading capabilities.
RX checksum offloading handler was changed to make upper stack use
controller computed partial checksum value. Previously, ti(4) just
set the computed value for any frames(IPv4, IPv6) and the value was
not used in upper stack because driver didn't set CSUM_DATA_VALID
such that upper network stack had to recompute checksum of TCP/UDP
packets. I have no idea how this was not noticed for a long time.
With this change, upper network stack does not have to fully
recompute the checksum such that calculating pseudo checksum based
on partial checksum is sufficient to know whether received packet's
checksum is correct or not. However, I don't know why ti(4) does
not have controller compute pseudo checksum as controller has
ability to do it. I'm just guessing enabling that feature could
trigger a firmware bug or could be slower than computing it on host
side so just leave it as it was.
In order not to produce false positives, ti(4) now checks whether
controller actually computed IP or TCP/UDP checksum by checking
ti_flags field.
state changes. Hide superfluous link up/down message under
bootverbose since if_link_state_change(9) shows that information.
While I'm here, change baudrate with the resolved speed of the
established link instead of blindly setting it 1G. Unfortunately,
it seems there is no way to differentiate 10/100Mbps from
non-gigabit link so just assume we established a 100Mbps link if
current link is not a gigabit link.
This was broken in r175872.
We have a UMA backed jumbo allocator and that is much better
implementation than having a local jumbo buffer allocator in
driver. This local allocator would be removed in near future but
fixing build before removal wouldn't be a bad idea.
if_watchdog and if_timer.
- Fix some issues in detach for sn(4), ste(4), and ti(4). Primarily this
means calling ether_ifdetach() before anything else.
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
free function controlable, instead of passing the KVA of the buffer
storage as the first argument.
Fix all conventional users of the API to pass the KVA of the buffer
as the first argument, to make this a no-op commit.
Likely break the only non-convetional user of the API, after informing
the relevant committer.
Update the mbuf(9) manual page, which was already out of sync on
this point.
Bump __FreeBSD_version to 800016 as there is no way to tell how
many arguments a CPP macro needs any other way.
This paves the way for giving sendfile(9) a way to wait for the
passed storage to have been accessed before returning.
This does not affect the memory layout or size of mbufs.
Parental oversight by: sam and rwatson.
No MFC is anticipated.
If these drivers are setting M_VLANTAG because they are stripping the
layer 2 802.1Q headers, then they need to be re-inserting them so any
bpf(4) peers can properly decode them.
It should be noted that this is compiled tested only.
MFC after: 3 weeks
sparc64 GENERIC and the sound device drivers known working on sparc64
to use bus_get_dma_tag() to obtain the parent DMA tag so we can get rid
of the sparc64_root_dma_tag kludge eventually. Except for ath(4), sk(4),
stge(4) and ti(4) these changes are runtime tested (unless I booted up
the wrong kernels again...).
m_pkthdr.ether_vlan. The presence of the M_VLANTAG flag on the mbuf
signifies the presence and validity of its content.
Drivers that support hardware VLAN tag stripping fill in the received
VLAN tag (containing both vlan and priority information) into the
ether_vtag mbuf packet header field:
m->m_pkthdr.ether_vtag = vlan_id; /* ntohs()? */
m->m_flags |= M_VLANTAG;
to mark the packet m with the specified VLAN tag.
On output the driver should check the mbuf for the M_VLANTAG flag to
see if a VLAN tag is present and valid:
if (m->m_flags & M_VLANTAG) {
... = m->m_pkthdr.ether_vtag; /* htons()? */
... pass tag to hardware ...
}
VLAN tags are stored in host byte order. Byte swapping may be necessary.
(Note: This driver conversion was mechanic and did not add or remove any
byte swapping in the drivers.)
Remove zone_mtag_vlan UMA zone and MTAG_VLAN definition. No more tag
memory allocation have to be done.
Reviewed by: thompsa, yar
Sponsored by: TCP/IP Optimization Fundraise 2005
if_watchdog, etc., or in functions used only in these methods.
In all other functions in the driver use device_printf().
- Use __func__ instead of typing function name.
Submitted by: Alex Lyashkov <umka sevcity.net>
Use proper pointer dereference to inform modified mbuf chains to
caller.
While I'm here perform checksum offload setup after loading DMA
maps.
In collaboration with: glebius
requiried to keep consistent softc state before/after callback function
invocation and supposed to be sligntly faster than previous one as it
wouldn't incur callback overhead. With this change callback function
was gone.
- Decrease TI_MAXTXSEGS to 32 from 128. It seems that most mbuf chain
length is less than 32 and it would be re-packed with m_defrag(9) if
its chain length is larger than TI_MAXTXSEGS. This would protect ti(4)
against possible kernel stack overflow when txsegs[] is put on stack.
Alternatively, we can embed the txsegs[] into softc. However, that
would waste memory and make Tx/Rx speration hard when we want to
sperate Tx/Rx handlers to optimize locking.
- Fix dma map tracking used in Tx path. Previously it used the dma map
of the last mbuf chain in ti_txeof() which was incorrect as ti(4)
used dma map of the first mbuf chain when it loads a mbuf chain with
bus_dmamap_load_mbuf(9). Correct the bug by introducing queues that
keep track of active/inactive dma maps/mbuf chain.
- Use ti_txcnt to check whether driver need to set watchdog timer instead
of blidnly clearing the timer in ti_txeof().
- Remove the 3rd arg. of ti_encap(). Since ti(4) now caches the last
descriptor index(ti_tx_saved_prodidx) used in Tx there is no need to
pass it as a fuction arg.
- Change data type of producer/consumer index to int from u_int16_t in
order to remove implicit type conversions in Tx/Rx handlers.
- Check interface queue before getting a mbuf chain to reduce locking
overhead.
- Check number of available Tx descriptores to be 16 or higher in
ti_start(). This wouldn't protect Tx descriptor shortage but it would
reduce number of bus_dmamap_unload(9) calls in ti_encap() when we are
about to running out of Tx descriptors.
- Command NIC to send packets ony when the driver really has packets
enqueued. Previously it always set TI_MB_SENDPROD_IDX which would
command NIC to DMA Tx descriptors into NIC local memory regardless
of Tx descriptor changes.
Reviewed by: scottl
. remove unnecessay header files after Scott's bus_dma(9) commit.
. remove global variable tis which was introduced at the time of
zero_copy(9) changes. The variable tis was not used at all. The
same applyes to ti_links in softc so axe it.
. deregister variables.
. axe ti_vhandle and switch to use explicit register access for
accessing NIC local memory. Creates three variants of ti_mem to
read/write NIC local memory(ti_mem_read, ti_mem_write) and clearing
NIC local memory(ti_mem_zero). This greatly enhances code
readability and have ti(4) drop using shared memory scheme for
Tigon 1. As Tigon 1 switched to use explicit register access for Tx,
axe ti_tx_ring_nic/ti_cmd_ring in softc.(Tigon 2 used to host ring
scheme which means there is no need to access NIC local memory via
register access for Tx and NIC would DMA the modified Tx rings into
its local memory.) [1]
. introduce new macro TI_EVENT_*/TI_CMD_* to handle NIC envent/command.
Instead of using bit fields assginment for accessing the event, use
shift operations to set/get it. [1]
. add additional check for valid DMA tags in ti_free_dmamaps().
. add missing bus_dmamap_sync/bus_dmamap_unload in ti_free_*_ring_*.
. fix locking nits(MTX_RECURSE mutex) and make ti(4) MPSAFE.
. change data type of ti_rdata_phys to bus_addr_t and don't blindly
cast to uint32_t.
. rearrange detach path and make ti(4) survive during device detach.
. for Tigon 1, use explicit register access for checking Tx descriptors
in ti_encap()/ti_txeof(). [1]
. properly call bus_dmamap_sync(9) for updating statistics.
. remove extra semicolon in ti_encap()
. rewrite loading MAC address to work on strict-alignment architectures.
. move TI_RD_OFF macro to if_tireg.h
. axe ETHER_ALIGN as it's already defined in <net/ethernet.h>.
. make macros immuine from expansion by adding parenthesis and do-while.
. remove alpha specific hack as vtophys(9) is no longer used in ti(4)
after Scott's bus_dma(9) fix.
Reviewed by: scottl
Obtained from: OpenBSD [1]
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
In collaboration with: yongari, ru, sam
to use busdma. Unlike most of the other drivers, but similar to the
if_em driver, pre-allocate the dmamaps at init time instead of allocating
them on the fly when descriptors need to be filled. This isn't ideal right
now because a map is allocated for every descriptor slot in the tx, rx, mini,
and jumbo rings (which is a lot!) in order to simplify the bookkeeping, even
though the driver might support filling only a subset of those slots.
Luckily, maps are typically NULL on i386 and amd64, so the cost isn't
very high. It could be an issue with sparc64, but the driver isn't endian
clean either, and that is a much bigger problem to solve first.
Note that jumbo frame support is under-tested, and I'm not even sure if
it till really works correctly given the evil VM magic that is does.
The changes here attempt to preserve the existing semanitcs.
Thanks to Martin Nillson for contributing the Netgear card for this work.
MFC-After: 3 weeks
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.