Specifically - never jack the TX FIFO threshold up to the absolute
maximum; always leave enough space for two DMA transactions to
appear.
This is a paranoia from the Linux ath9k driver. It can't hurt.
Obtained from: Linux ath9k
The default is to limit them to what the hardware is capable of.
Add sysctl twiddles for both the non-RTS and RTS protected aggregate
generation.
Whilst here, add some comments about stuff that I've discovered during
my exploration of the TX aggregate / delimiter setup path from the
reference driver.
- bear with me, there are lots of white space changes, I would not
do them, but I am a mere consumer of this stuff and if these drivers
are to stay in shape they need to be taken.
em driver changes: support for the new i217/i218 interfaces
igb driver changes:
- TX mq start has a quick turnaround to the stack
- Link/media handling improvement
- When link status changes happen the current flow control state
will now be displayed.
- A few white space/style changes.
lem driver changes:
- the shared code uncovered a bogus write to the RLPML register
(which does not exist in this hardware) in the vlan code,this
is removed.
CSUM_TCP too. They are all set explicitly by the kernel usually.
While here, fix an unrelated bug where hardware L4 checksum calculation
was accidentally disabled for some IPv6 packets.
Reported by: alfred@
MFC after: 3 days
This has reduced the number of TX delimiter and data underruns when
doing large UDP transfers (>100mbit).
This stops any HAL_INT_TXURN interrupts from occuring, which is a good
sign!
Obtained from: Qualcomm Atheros
* Delete this debugging print - I used it when debugging the initial
TX descriptor chaining code. It now works, so let's toss it.
It just confuses people if they enable TX descriptor debugging as they
get two slightly different versions of the same descriptor.
* Indenting.
part of ts_status. Thus:
* make sure we decode them from ts_flags, rather than ts_status;
* make sure we decode them regardless of whether there's an error or not.
This correctly exposes descriptor configuration errors, TX delimiter
underruns and TX data underruns.
Make dcons input polling adaptive, reducing poll rate to 1Hz after several
minutes of inactivty to reduce global interrupt rate. Most of users never
used FireWire debugging, so it is not very useful to consume power by it.
unnecessarily by a user thread waiting to run on a specific CPU after
calling sched_bind().
Reviewed by: rstone
Approved by: emaste (co-mentor)
Sponsored by: Sandvine Incorporated
MFC after: 1 week
- Replace divisor numbers with more descirptive names
- Properly calculate minimum frequency for SDHCI 3.0
- Properly calculate frequency for SDHCI 3.0 in mmcbr_set_clock
- Add min_freq method to sdhci_if.m and provide default
implementation. By re-implementing this method hardware
drivers can control frequency controller operates when
executing initialization sequence
actually do have to reinitialise the RX side of things after an RX
descriptor EOL error.
* Revert a change of mine from quite a while ago - don't shortcut the
RX initialisation path. There's a RX FIFO bug in the earlier chips
(I'm not sure when it was fixed in this series, but it's fixed
with the AR9380 and later) which causes the same RX descriptor to
be written to over and over. This causes the descriptor to be
marked as "done", and this ends up causing the whole RX path to
go very strange. This should fixed the "kickpcu; handled X packets"
message spam where "X" is consistently small.
enumeration lock. Make sure all callers of usbd_enum_lock() check the return
value. Remove the control transfer specific lock. Bump the FreeBSD version
number, hence external USB modules may need to be recompiled due to a USB
device structure change.
MFC after: 1 week
My changed had some rather significant behavioural changes to throughput.
The two issues I noticed:
* With if_start and the ifnet mbuf queue, any temporary latency
would get eaten up by some mbufs being queued. With ath_transmit()
queuing things to ath_buf's, I'd only get 512 TX buffers before I
couldn't queue any further frames.
* There's also some non-zero latency involved with TX being pushed
into a taskqueue via direct dispatch. Any time the scheduler didn't
immediately schedule the ath TX task would cause extra latency.
Various 1ge/10ge drivers implement both direct dispatch (if the TX
lock can be acquired) and deferred task transmission (if the TX lock
can't be acquired), with frames being pushed into a drbd queue.
I'll have to do this at some point, but until I figure out how to
deal with 802.11 fragments, I'll have to wait a while longer.
So what I saw:
* lots of extra latency, specially under load - if the taskqueue
wasn't immediately scheduled, things went pear shaped;
* any extra latency would result in TX ath_buf's taking their sweet time
being replenished, so any further calls to ath_transmit() would drop
mbufs.
* .. yes, there's no explicit backpressure here - things are just dropped.
Eek.
With this, the general performance has gone up, but those subtle if_start()
related race conditions are back. For some reason, this is doubly-obvious
with the AR5416 NIC and I don't quite understand why yet.
There's an unrelated issue with AR5416 performance in STA mode (it's
fine in AP mode when bridging frames, weirdly..) that requires a little
further investigation. Specifically - it works fine on a Lenovo T40
(single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60
(dual core) running an early November 2012 kernel behaves very poorly.
The same hardware with an AR9160 or AR9280 behaves perfectly.
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
when they're being called from the TX completion handler.
Going (back) through the taskqueue is just adding extra locking and
latency to packet operations. This improves performance a little bit
on most NICs.
It still hasn't restored the original performance of the AR5416 NIC
but the AR9160, AR9280 and later NICs behave very well with this.
Tested:
* AR5416 STA (still tops out at ~ 70mbit TCP, rather than 150mbit TCP..)
* AR9160 hostap (good for both TX and RX)
* AR9280 hostap (good for both TX and RX)
so that simultaneous access cannot happen. Protect scratch area using
the enumeration lock. Also reduce stack usage in usbd_transfer_setup()
by moving some big stack members to the scratch area. This saves around
200 bytes of stack.
- Fix a whitespace.
MFC after: 1 week
freed memory cannot be used during detach.
- Remove all panic() calls from the urtw driver because
panic() is not appropriate here.
- Remove redundant checks for device detached in
device detach callbacks.
- Use DEVMETHOD_END to mark end of device methods.
MFC after: 2 weeks
crappy 802.11n performance, sigh.)
With the AR5416, aggregates need to be limited to 8KiB if RTS/CTS is
enabled. However, larger aggregates were going out with RTSCTS enabled.
The following was going on:
* The first buffer in the list would have RTS/CTS enabled in
bf->bf_state.txflags;
* The aggregate would be formed;
* The "copy over the txflags from the first buffer" logic that I added
blanked the RTS/CTS TX flags fields, and then copied the bf_first
RTS/CTS flags over;
* .. but that'd cause bf_first to be blanked out! And thus the flag
was cleared;
* So the rest of the aggregate formation would run with those flags
cleared, and thus > 8KiB aggregates were formed.
The driver is now (again) correctly limiting aggregate formation for
the AR5416 but there are still other pending issues to resolve.
Tested:
* AR5416, STA mode