the firmware (instead of just the main firmware version) when evaluating
firmware compatibility. Document the new "hw.cxgbe.fw_install" knob
being introduced here.
This should fix kern/173584 too. Setting hw.cxgbe.fw_install=2 will
mostly do what was requested in the PR but it's a bit more intelligent
in that it won't reinstall the same firmware repeatedly if the knob is
left set.
PR: kern/173584
MFC after: 5 days
rate.
This fixes two things:
* The intermediary rates now also have their EWMA values changed;
* The existing code was using the wrong value for longtries - so the
EWMA stats were only adjusted for the first rate and not subsequent
rates in a MRR setup.
TODO:
* Merge the EWMA updates into update_stats() now..
* Remove ar5416UpdateChainmasks();
* Remove the TX chainmask override code from the ar5416 TX descriptor
setup routines;
* Write a driver method to calculate the current chainmask based on the
operating mode and update the driver state;
* Call the HAL chainmask method before calling ath_hal_reset();
* Use the currently configured chainmask in the TX descriptors rather than
the hardware TX chainmasks.
Tested:
* AR5416, STA/AP mode - legacy and 11n modes
Right now the only way to set the chainmask is to set the hardware
configured chainmask through capabilities. This is fine for forcing
the chainmask to be something other than what the hardware is capable
of (eg to reduce TX/RX to one connected antenna) but it does change what
the HAL hardware chainmask configuration is.
For operational mode changes, it (may?) make sense to separately control
the TX/RX chainmask.
Right now it's done as part of ar5416_reset.c - ar5416UpdateChainMasks()
calculates which TX/RX chainmasks to enable based on the operating mode.
(1 for legacy and whatever is supported for 11n operation.) But doing
this in the HAL is suboptimal - the driver needs to know the currently
configured chainmask in order to correctly enable things for each
TX descriptor. This is currently done by overriding the chainmask
config in the ar5416 TX routines but this has to disappear - the AR9300
HAL support requires the driver to dynamically set the TX chainmask based
on the TX power and TX rate in order to meet mini-PCIe slot power
requirements.
So:
* Introduce a new HAL method to set the operational chainmask variables;
* Introduce null methods for the previous generation chipsets;
* Add new driver state to record the current chainmask separate from
the hardware configured chainmask.
Part #2 of this will involve disabling ar5416UpdateChainMasks() and moving
it into the driver; as well as properly programming the TX chainmask
based on the currently configured HAL chainmask.
Tested:
* AR5416, STA mode - both legacy (11a/11bg) and 11n rates - verified
that AR_SELFGEN_MASK (the chainmask used for self-generated frames like
ACKs and RTSes) is correct, as well as the TX descriptor contents is
correct.
currnet initialization sequence. Set it to simple mode only so that
systems can be updated from stable/7 to newer installations.
At some point, we should figure out why we cannot initialize performant
mode on this board.
PR: kern/153361
Reviewed by: scottl
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
- Remove vestigial null pointer tests after malloc(..., M_WAITOK).
- Remove vestigal qualhack union
- Use strlcpy() instead of the error-prone strncpy() when parsing
EEPROM and copying strings
- Check the MAC address in the EEPROM strings more strictly.
- Expand the macro MXGE_NEXT_STRING() at its only user. Due to a typo,
the macro was very confusing.
- Remove unnecessary buffer limit check. The buffer is double-NUL
terminated per construction.
PR: kern/176369
Submitted by: Christoph Mallon <christoph.mallon gmx.de>
might have been enabled for them- now that we use all 32 bits of handle.
Fast Posting doesn't pass the full 32 bits.
Noticed by: Bugs in NetBSD. Only a NetBSD user might actually still use such old hardware.
MFC after: 1 week
Some hardware like DMA and GPIO controllers might require
more then 8 interrupts per device instance.
Submitted by: Daisuke Aoyama <aoyama at peach.ne.jp>
Discussed with: gber@, raj@
data, introduced by r246713. There are two places where ata_request is
filled in ATA_CAM: ata_cam_begin_transaction() and ata_cam_request_sense().
In the first case DMA should be done for addresses from the CCB. In second
case, DMA should be done to the different address, the address of the sense
buffer inside the CCB structure itself.
- Some mxge nics may store the serial number in the SN2 field of the
EEPROM. These will also have an SN=0 field, so parse the SN2 field,
and give it precedence.
- Skip MXGEFW_CMD_UNALIGNED_TEST on mxge nics which do not require it.
This saves roughly 10ms per port at device attach time.
Sponsored by: Myricom
MFC After: 7 days
- Re-fix build by restoring local removed in r247151, but protected
by #if defined(INET) || defined(INET6) so that the compile
succeeds in the !(INET||INET6) case.
- Protect call to in_pseudo() with an #ifdef INET, to allow
a kernel to link with mxge when INET is not compiled in.
- Also remove an errant (improperly commented) obsolete debugging printf
Thanks to Glebius for pointing out the !(INET||INET6) build issue.
Sponsored by: Myricom
MFC After: 7 days
an incorrectly calculated RTS duration value when transmitting aggregates.
These earlier 802.11n NICs incorrectly used the ACK duration time when
calculating what to put in the RTS of an aggregate frame. Instead it
should have used the block-ack time. The result is that other stations
may not reserve enough time and start transmitting _over_ the top of
the in-progress blockack field. Tsk.
This workaround is to popuate the burst duration field with the delta
between the ACK duration the hardware is using and the required duration
for the block-ack. The result is that the RTS field should now contain
the correct duration for the subsequent block-ack.
This doesn't apply for AR9280 and later NICs.
Obtained from: Qualcomm Atheros
- Add support for IPv6 rx csum offload
- Finally switch mxge from using its own driver lro, to
using tcp_lro
MFC after: 7 days
Sponsored by: Myricom Inc.
DragonFlyBSD, so it certainly doesn't need splsoftvm(). Remove it.
# I doubt this driver will now compile on older FreeBSD versions or DFBSD
# We should consider unifdefing it since that code seems unmaintained.
Specifically - never jack the TX FIFO threshold up to the absolute
maximum; always leave enough space for two DMA transactions to
appear.
This is a paranoia from the Linux ath9k driver. It can't hurt.
Obtained from: Linux ath9k
The default is to limit them to what the hardware is capable of.
Add sysctl twiddles for both the non-RTS and RTS protected aggregate
generation.
Whilst here, add some comments about stuff that I've discovered during
my exploration of the TX aggregate / delimiter setup path from the
reference driver.
- bear with me, there are lots of white space changes, I would not
do them, but I am a mere consumer of this stuff and if these drivers
are to stay in shape they need to be taken.
em driver changes: support for the new i217/i218 interfaces
igb driver changes:
- TX mq start has a quick turnaround to the stack
- Link/media handling improvement
- When link status changes happen the current flow control state
will now be displayed.
- A few white space/style changes.
lem driver changes:
- the shared code uncovered a bogus write to the RLPML register
(which does not exist in this hardware) in the vlan code,this
is removed.
CSUM_TCP too. They are all set explicitly by the kernel usually.
While here, fix an unrelated bug where hardware L4 checksum calculation
was accidentally disabled for some IPv6 packets.
Reported by: alfred@
MFC after: 3 days
This has reduced the number of TX delimiter and data underruns when
doing large UDP transfers (>100mbit).
This stops any HAL_INT_TXURN interrupts from occuring, which is a good
sign!
Obtained from: Qualcomm Atheros
* Delete this debugging print - I used it when debugging the initial
TX descriptor chaining code. It now works, so let's toss it.
It just confuses people if they enable TX descriptor debugging as they
get two slightly different versions of the same descriptor.
* Indenting.
part of ts_status. Thus:
* make sure we decode them from ts_flags, rather than ts_status;
* make sure we decode them regardless of whether there's an error or not.
This correctly exposes descriptor configuration errors, TX delimiter
underruns and TX data underruns.
Make dcons input polling adaptive, reducing poll rate to 1Hz after several
minutes of inactivty to reduce global interrupt rate. Most of users never
used FireWire debugging, so it is not very useful to consume power by it.
unnecessarily by a user thread waiting to run on a specific CPU after
calling sched_bind().
Reviewed by: rstone
Approved by: emaste (co-mentor)
Sponsored by: Sandvine Incorporated
MFC after: 1 week
- Replace divisor numbers with more descirptive names
- Properly calculate minimum frequency for SDHCI 3.0
- Properly calculate frequency for SDHCI 3.0 in mmcbr_set_clock
- Add min_freq method to sdhci_if.m and provide default
implementation. By re-implementing this method hardware
drivers can control frequency controller operates when
executing initialization sequence
actually do have to reinitialise the RX side of things after an RX
descriptor EOL error.
* Revert a change of mine from quite a while ago - don't shortcut the
RX initialisation path. There's a RX FIFO bug in the earlier chips
(I'm not sure when it was fixed in this series, but it's fixed
with the AR9380 and later) which causes the same RX descriptor to
be written to over and over. This causes the descriptor to be
marked as "done", and this ends up causing the whole RX path to
go very strange. This should fixed the "kickpcu; handled X packets"
message spam where "X" is consistently small.
enumeration lock. Make sure all callers of usbd_enum_lock() check the return
value. Remove the control transfer specific lock. Bump the FreeBSD version
number, hence external USB modules may need to be recompiled due to a USB
device structure change.
MFC after: 1 week
My changed had some rather significant behavioural changes to throughput.
The two issues I noticed:
* With if_start and the ifnet mbuf queue, any temporary latency
would get eaten up by some mbufs being queued. With ath_transmit()
queuing things to ath_buf's, I'd only get 512 TX buffers before I
couldn't queue any further frames.
* There's also some non-zero latency involved with TX being pushed
into a taskqueue via direct dispatch. Any time the scheduler didn't
immediately schedule the ath TX task would cause extra latency.
Various 1ge/10ge drivers implement both direct dispatch (if the TX
lock can be acquired) and deferred task transmission (if the TX lock
can't be acquired), with frames being pushed into a drbd queue.
I'll have to do this at some point, but until I figure out how to
deal with 802.11 fragments, I'll have to wait a while longer.
So what I saw:
* lots of extra latency, specially under load - if the taskqueue
wasn't immediately scheduled, things went pear shaped;
* any extra latency would result in TX ath_buf's taking their sweet time
being replenished, so any further calls to ath_transmit() would drop
mbufs.
* .. yes, there's no explicit backpressure here - things are just dropped.
Eek.
With this, the general performance has gone up, but those subtle if_start()
related race conditions are back. For some reason, this is doubly-obvious
with the AR5416 NIC and I don't quite understand why yet.
There's an unrelated issue with AR5416 performance in STA mode (it's
fine in AP mode when bridging frames, weirdly..) that requires a little
further investigation. Specifically - it works fine on a Lenovo T40
(single core CPU) running a March 2012 9-STABLE kernel, but a Lenovo T60
(dual core) running an early November 2012 kernel behaves very poorly.
The same hardware with an AR9160 or AR9280 behaves perfectly.
every architecture's busdma_machdep.c. It is done by unifying the
bus_dmamap_load_buffer() routines so that they may be called from MI
code. The MD busdma is then given a chance to do any final processing
in the complete() callback.
The cam changes unify the bus_dmamap_load* handling in cam drivers.
The arm and mips implementations are updated to track virtual
addresses for sync(). Previously this was done in a type specific
way. Now it is done in a generic way by recording the list of
virtuals in the map.
Submitted by: jeff (sponsored by EMC/Isilon)
Reviewed by: kan (previous version), scottl,
mjacob (isp(4), no objections for target mode changes)
Discussed with: ian (arm changes)
Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris),
amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
when they're being called from the TX completion handler.
Going (back) through the taskqueue is just adding extra locking and
latency to packet operations. This improves performance a little bit
on most NICs.
It still hasn't restored the original performance of the AR5416 NIC
but the AR9160, AR9280 and later NICs behave very well with this.
Tested:
* AR5416 STA (still tops out at ~ 70mbit TCP, rather than 150mbit TCP..)
* AR9160 hostap (good for both TX and RX)
* AR9280 hostap (good for both TX and RX)
so that simultaneous access cannot happen. Protect scratch area using
the enumeration lock. Also reduce stack usage in usbd_transfer_setup()
by moving some big stack members to the scratch area. This saves around
200 bytes of stack.
- Fix a whitespace.
MFC after: 1 week
freed memory cannot be used during detach.
- Remove all panic() calls from the urtw driver because
panic() is not appropriate here.
- Remove redundant checks for device detached in
device detach callbacks.
- Use DEVMETHOD_END to mark end of device methods.
MFC after: 2 weeks
crappy 802.11n performance, sigh.)
With the AR5416, aggregates need to be limited to 8KiB if RTS/CTS is
enabled. However, larger aggregates were going out with RTSCTS enabled.
The following was going on:
* The first buffer in the list would have RTS/CTS enabled in
bf->bf_state.txflags;
* The aggregate would be formed;
* The "copy over the txflags from the first buffer" logic that I added
blanked the RTS/CTS TX flags fields, and then copied the bf_first
RTS/CTS flags over;
* .. but that'd cause bf_first to be blanked out! And thus the flag
was cleared;
* So the rest of the aggregate formation would run with those flags
cleared, and thus > 8KiB aggregates were formed.
The driver is now (again) correctly limiting aggregate formation for
the AR5416 but there are still other pending issues to resolve.
Tested:
* AR5416, STA mode
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.
The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.
Reviewed by: jhb@freebsd.org, jlv@freebsd.org
using /dev/consolectl close. This fixes a problem where if
a USB mouse is detached while a button is pressed, that
button is never released.
MFC after: 1 week
requires 8 bytes alignment on RX buffer. Given that non-jumbo
frame works on any alignments I guess this DMA limitation for RX
buffer could be jumbo frame specific one. Also I'm not sure
whether this DMA limitation is related with 64bit DMA. Previously
age(4) disabled 64bit DMA addressing due to silent data corruption.
So we may need more testing on re-enabling 64bit DMA in future.
While I'm here, change mbuf chaining algorithm to use fixed sized
buffer and force software checksum if controller reports length
error. According to QAC, RFD is not updated at all for jumbo frame
so it works just like alc(4) controllers. This change also added
alignment fixup for strict alignment architectures. Because I'm
not aware of any non-x86 machines that use age(4) controllers it's
just for completeness at this moment.
Wit this change, jumbo frame should work with age(4).
Tested by: Christian Gusenbauer < c47g <> gmx dot at >
MFC after: 1 week
passed in by smartd of smartmontools.
While at it, hint the compiler that 32-bit PIO is the most likely
case (idea from Linux) and use bus_{read,write}_stream_2(9) instead
of bus_{read,write}_multi_stream_2(9) for single count reads/writes.
MFC after: 1 week
This hack is picked up from Linux, which claims that it follows
Windows behavior.
PR: amd64/174409
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>,
Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after: 13 days
x86 buses
Otherwise the uart hardware could be in such a state after the resume
where IER is cleared and thus no interrupts are generated.
This behavior is observed and tested with QEMU, so I am comitting this
change to help with my debugging.
There has been no feedback from users of serial ports on real hardware.
MFC after: 20 days
The "blackhole" driver was used in conjunction with bhyve to sequester
pci devices intended for passthru until vmm.ko was loaded. This was
useful at one point because vmm.ko could not be loaded at boot time.
The same functionality can now be achieved by loading vmm.ko via the
loader along with the kernel.
Discussed with: grehan
Obtained from: NetApp
case 0x3E: /* Per Intel document 325462-045US 01/2013. */
Add manpage to document all the goodness that is available in this
processor model.
No support for uncore events at this time.
Submitted by: hiren panchasara <hiren.panchasara@gmail.com>
Reviewed by: davide, jimharris, sbruno
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
Right now, ic_curchan seems to be updated rather quickly (ie, during
the ioctl) and before the driver gets notified of what's going on.
So what I was seeing was:
* NIC was in channel X;
* It generates PHY errors for channel X;
* an ioctl comes along from userland and changes things to channel Y;
* .. this updates ic_curchan, but hasn't yet reset the hardware;
* in parallel, RX is occuring and it looks at ic_curchan;
* .. which is channel Y, so events get stamped with that now.
Sigh.
into the FreeBSD boot loader, typically for non-USB aware BIOSes, EFI systems
or embedded platforms. This is also useful for out of the system compilation
of the FreeBSD USB stack for various purposes. The USB kernel files can
now optionally include a global header file which should include all needed
definitions required to compile the FreeBSD USB stack. When the global USB
header file is included, no other USB header files will be included by
default.
Add new file containing the USB stack configuration for the
FreeBSD loader build.
Replace some __FBSDID()'s by /* $FreeBSD$ */ comments. Now all
USB files follow the same style.
Use cases:
- console in loader via USB
- loading kernel via USB
Discussed with: Hiroki Sato, hrs @ EuroBSDCon
bug in old versions of QEMU (and Xen, and other places using QEMU code).
On those buggy emulated UARTs, the "TX idle" interrupt gets lost; with
this workaround, we spinwait for the TX to happen and then send ourselves
the interrupt. It's ugly but it works, while minimizing the impact on
the code for the !broken_txfifo case.
MFC after: 2 weeks
If a BUSDMA load operation results in a single segment which
is greater than the PAGE_SIZE, the USB computed physical
addresses will not be correct. Make sure that the first
segment is unfolded like the sub-sequent segments are into
USB_PAGE_SIZE big ranges.
Found by: Alexander Nedotsukov
MFC after: 1 week
flush wait on the Gen2 chipsets. Confirmed by the inspection of the
Linux agp code.
Submitted by: Taku YAMAMOTO <taku@tackymt.homeip.net>
MFC after: 2 weeks
cannot be freed while do_pass_accept_req is running. This closes a race
where do_pass_establish on another CPU (the driver chose a different
queue for the new tid) expands the synq entry into a full PCB and then
releases the only hold on it, all while do_pass_accept_req is still
running.
MFC after: 3 days
the separate ath0 TX taskq.
Whilst here, make sure that the TX software scheduler is also
running out of the TX task, rather than the ath0 taskqueue.
Make sure that the tx taskqueue is blocked/unblocked as necessary.
This allows for a little more parallelism on multi-core machines,
as well as (eventually) supporting a higher task priority for TX
tasks, allowing said TX task to preempt an already running RX or
TX completion task.
Tested:
* AR5416, AR9280 hostap and STA modes
- Make bge_lookup_{rev,vendor}() static.
- Factor out chip identification rather than duplicating the code.
- Sanitize bge_probe() a bit (don't hardcode buffer sizes, allow
bge_lookup_vendor() to return NULL so the excessive panic() three
can be removed there, etc.) and return BUS_PROBE_DEFAULT rather than
hardcoding 0.
- According to the Linux tg3 driver, BCM57791 and BCM57795 aren't
capable of Gigabit Ethernet.
- Check the return value of taskqueue_start_threads().
- At least the Saturn chips of 501-6738 cards need a delay after freezing
the external GMII pins before the internal PHY is accessible again. So
wait a bit after (un)freezing these. Also don't touch the other bits of
that configuration register. [1]
- Take advantage of nitems().
Reported and tested by: Paul Keusemann [1]
MFC after: 3 days
By setting dev.netmap.fwd=1 (or enabling the feature with a per-ring flag),
packets are forwarded between the NIC and the host stack unless the
netmap client clears the NS_FORWARD flag on the individual descriptors.
This feature greatly simplifies applications where some traffic
(think of ARP, control traffic, ssh sessions...) must be processed
by the host stack, whereas the bulk is handled by the netmap process
which simply (un)marks packets that should not be forwarded.
The default is chosen so that now a netmap receiver operates
in a mode very similar to bpf.
Of course there is no free lunch: traffic to/from the host stack
still operates at OS speed (or less, as there is one extra copy in
one direction).
HOWEVER, since traffic goes to the user process before being
reinjected, and reinjection occurs in a user context, you get some
form of livelock protection for free.
Add a missing 0 to the mask for byte0 of C_SIZE.
The previous mask (0xc) worked except that the last 0-1536K of the disk
could not be accessed since we were shifting the (wrong) bits we did
mask off the right edge.
generating binary diffs.
- Constify a few strings used in the driver.
- Style changes to make the driver compile with default clang settings.
Approved by: HighPoint Technologies
MFC after: 3 days
This is easily possible now that the TX is protected by a single
lock, rather than a per-TXQ (and thus per-TID) lock.
Only set CLRDMASK if none of the destinations are filtered.
This likely will need some tuning when it comes time to do UASPD/PS-POLL
TX, however at that point it should be manually set anyway.
Tested:
* AR9280, STA mode
TODO:
* More thorough testing in AP mode
* test other chipsets, just to be safe/sure.
'bhyve' was developed by grehan@ and myself at NetApp (thanks!).
Special thanks to Peter Snyder, Joe Caradonna and Michael Dexter for their
support and encouragement.
Obtained from: NetApp
Make umass return an error code if SCSI sense retrieval request
has failed. Make sure scsi_error_action honors SF_NO_RETRY and
SF_NO_RECOVERY in all cases, even if it cannot parse sense bytes.
Reviewed by: hselasky (umass), scottl (cam)
two upcoming features:
semi-transparent mode:
when a device is opened in this mode, the
user program will be able to mark slots that must be forwarded
to the "other" side (i.e. from NIC to host stack, or viceversa),
and the forwarding will occur automatically at the next netmap syscall.
This saves the need to open another file descriptor and do
the forwarding manually.
direct-forwarding mode:
when operating with a VALE port, the user can specify in the slot
the actual destination port, overriding the forwarding decision
made by a lookup of the destination MAC. This can be useful to
implement packet dispatchers.
No API changes will be introduced.
No new functionality in this patch yet.
chip hangs.
* Always do a reset in ath_bmiss_proc(), regardless of whether the
hardware is "hung" or not. Specifically, for spectral scan, there's
likely a whole bunch of potential hangs that we don't (yet) recognise
in the HAL. So to avoid staying RX deaf persisting until the station
disassociates, just do a no-loss reset.
* Set sc_beacons=1 in STA mode. During a reset, the beacon programming
isn't done. (It's likely I need to set sc_syncbeacons during a hang
reset, but I digress.) Thus after a reset, there's no beacon timer
programming to send a BMISS interrupt if beacons aren't heard ..
thus if the AP disappears, you won't get notified and you'll have to
reset your interface.
This hasn't yet fixed all of the hangs that I've seen when debugging
spectral scan, but it's certainly reduced the hang frequency and it
should improve general STA stability in very noisy environments.
Tested:
* AR9280, STA mode, spectral scan off/on
PR: kern/175227
when an interface is going down.
Right now it's quite possible (but very unlikely!) that ath_reset()
or similar is called, leading to a beacon config call, in parallel with
the last VAP being destroyed.
This likely should be fixed by making sure the bmiss/bstuck/watchdog
taskqueues are canceled whenever the last VAP is destroyed.