this set of patches fixes support for systems with > 32 cores.
Details include
sfxge: RXQ index (not label) comes from FW in flush done/failed events
Change the second argument name of the efx_rxq_flush_done_ev_t and
efx_rxq_flush_failed_ev_t prototypes to highlight that RXQ index (not label)
comes from FW in flush done and failed events.
sfxge: TXQ index (not label) comes from FW in flush done event
Change the second argument name of the efx_txq_flush_done_ev_t prototype to
highlight that TXQ index (not label) comes from FW in flush done event.
sfxge: use TXQ type as label to support more than 32 TXQs
There are 3 TXQs in event queue 0 and 1 TXQ (with TCP/UDP checksum offload)
in all other event queues.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
and finish the job. ncurses is now the only Makefile in the tree that
uses it since it wasn't a simple mechanical change, and will be
addressed in a future commit.
logical volume state changes.
Currently, I view this as a critical fix for users and will MFC this rapidly as
my testing has shown data loss when the disk is failed by removing it when
under some amount of write activity and this code panics the box.
Reviewed by: mav@ scottl@
MFC after: 3 days
Sponsored by: Yahoo! Inc.
Use soreadable()/sowriteable() in socket upcalls to avoid extra wakeups
until we have enough data to read or space to write.
Increase partial receive len from 1K to 128K to not wake up on every
received packet.
This significantly reduces locks congestion and CPU usage and improves
throughput for large I/Os on NICs without TSO and LRO.
Reviewed by: trasz
Sponsored by: iXsystems, Inc.
motherboard. PHY hardware used for the controller responded at
all possible addresses which in turn resulted in having 32 PHYs
for the controller. If driver detects "MSI K9N6PGM2-V2 (MS-7309)"
motherboard, tell miibus(4) PHY is located at 0.
Tested by: Chris H
o Unmute terminal when done with driver replacement.
o Move init fonts to early point.
o Minor cleanup.
MFC after: 6 days
X-MFC-with: r264244 r264242
Sponsored by: The FreeBSD Foundation
{MIO,SER}5xxxx chips instead of treating all of them as PUC_PORT_2S.
Among others, this fixes the hang seen when trying to probe the none-
existent second UART on an actually 1-port chip.
Obtained from: NetBSD (BAR layouts)
MFC after: 3 days
Sponsored by: Bally Wulff Games & Entertainment GmbH
tracked BAW actually is.
The net80211 code that completes a BAR will set tid->txa_start (the
BAW start) to whatever value was called when sending the BAR.
Now, in case there's bugs in my driver code that cause the BAW
to slip along, we should make sure that the new BAW we start
at is actually what we currently have it at, not what we've sent.
This totally breaks the specification and so this stays a printf().
If it happens then I need to know and fix it.
Whilst here, add some debugging updates:
* add TID logging to places where it's useful;
* use SEQNO().
match how it's used.
This is another bug that led to aggregate traffic hanging because
the BAW tracking stopped being accurate. In this instance, a filtered
frame that exceeded retries would return a non-error, which would
mean the caller would never remove it from the BAW. But it wouldn't
be added to the filtered list, so it would be lost forever. There'd
thus be a hole in the BAW that would never get transmitted and
this leads to a traffic hang.
Tested:
* Routerstation Pro, AR9220 AP
we did suspend it.
The whole suspend/resume TID queue thing is supposed to be a matched
reference count - a subsystem (eg addba negotiation, BAR transmission,
filtered frames, etc) is supposed to call pause() once and then resume()
once.
ath_tx_tid_filt_comp_complete() is called upon the completion of any
filtered frame, regardless of whether the driver had aleady seen
a filtered frame and called pause().
So only call resume() if tid->isfiltered = 1, which indicates that
we had called pause() once.
This fixes a seemingly whacked and different problem - traffic hangs.
What was actually going on:
* There'd be some marginal link with crappy behaviour, causing filtered
frames and BAR TXing to occur;
* A BAR TX would occur, setting the new BAW (block-ack window) to seqno n;
* .. and pause() would be called, blocking further transmission;
* A filtered frame completion would occur from the hardware, but with
tid->isfiltered = 0 which indiciates we haven't actually marked
the queue yet as filtered;
* ath_tx_tid_filt_comp_complete() would call resume(), continuing
transmission;
* Some frames would be queued to the hardware, since the TID is now no
longer paused;
* .. and if some make it out and ACked successfully, the new BAW
may be seqno n+1 or more;
* .. then the BAR TX completes and sets the new seqno back to n.
At this point the BAW tracking would be loopy because the BAW
start was modified but the BAW ring buffer wasn't updated in lock
step.
Tested:
* Routerstation Pro + AR9220 AP
that are being done by the OS.
For now this'll match up with the "wakeups"; although I'll dig deeper into
this to see if we can determine which sleep state the CPU managed to get
into. Most things I've seen these days only expose up to C2 or C3 via
ACPI even though the CPU goes all the way down to C6 or C7.
o Mute terminal while vt(4) driver change in progress.
o Reset VDF_TEXTMODE before init new driver.
o Assign default font, if new driver is not in TEXTMODE.
o Do not update screen while driver changing.
Resolved by: adrian
Reported by: tyler
MFC after: 7 days
Sponsored by: The FreeBSD Foundation
CLOCAL and HUPCL control flags. There are legit reasons for allowing
those to be changed. When /etc/ttys has the "3wire" type (without a
baudrate) for the serial port that is the low-level console, then
this change has no effect.
Obtained from: Juniper Networks, Inc.
other modes supported by the FTDI serial adapter chips.
In addition to adding the new ioctls, this change removes all the code
that reset the chip at attach and open/close time, and also the code
that turned on RTS/CTS flow control on open without any permission to do
so (that was just always a bug in the driver).
When FTDI chips are configured as GPIO or MPSSE or other special-purpose
uses by an attached serial eeprom, the chip will power on with certain
pins driven or floating, and it's important that the driver not do
anything to the chip to perturb that unless it receives a specific
command to do so. When used for "plain old serial comms" the chip
powers on into the right mode and never needs to be reset while it's
running to operate properly, so this change is transparent to most users.
before changing the divisor bits in the register. We were writing a zero
to the register, which clears the enable, but also cleared the divisor bits
at the same time. That's a violation of the sdhci spec, which says the
divisor can only be changed when the clock is disabled. This has worked
okay on most hardware for years, but the TI OMAP controller would misbehave
after changing the divisor improperly.
Submitted by: Svatopluk Kraus <onwahe@gmail.com>
Ensure that first_func is set to 0 on every iteration of the PCI slot
enumeration loop after the first. There is a continue statement that would
cause first_func to stay at 1 any PCI device where slot 0 has no functions
until we find a slot that does have a function. This would cause us to
not enumerate the first PCI function on the device.
Credit to markj@ for spotting the bug.
X-MFC-With: r264011
While I'm here, remove aue_eeprom_getword() as its only usage is to
read station address and make it more readable. This change is
inspired by NetBSD.
With this change, aue(4) should work on big endian architectures.
PR: 188177
default wMaxPacketSize (64 or 512 bytes). This actually helps older FTDI
devices (which were USB 1/full speed) more than the new H-series high
speed, but even for the new chips it helps cut the number of interrupts
when doing very high speed (3-12mbaud).
This avoids extra locking in icl_pdu_queue(); the upper layer needs to call
it while holding its own lock anyway, to avoid sending PDUs out of order.
Sponsored by: The FreeBSD Foundation
PCIe Alternate RID Interpretation (ARI) is an optional feature that
allows devices to have up to 256 different functions. It is
implemented by always setting the PCI slot number to 0 and
re-purposing the 5 bits used to encode the slot number to instead
contain the function number. Combined with the original 3 bits
allocated for the function number, this allows for 256 functions.
This is enabled by default, but it's expected to be a no-op on currently
supported hardware. It's a prerequisite for supporting PCI SR-IOV, and
I want the ARI support to go in early to help shake out any bugs in it.
ARI can be disabled by setting the tunable hw.pci.enable_ari=0.
Reviewed by: kib
MFC after: 2 months
Sponsored by: Sandvine Inc.
Recent FDTI chips have the ability to operate at up to 12mbps. The newer
chips with faster clocks have the same usb vendor/product IDs as the older
chips; the bcdDevice field must be used to detect the newer versions. This
change includes a new function to do that instead of using just the IDs from
the vendor/product table.
The code to choose the baud clock divisor is completely rewritten. In
addition to supporting the new higher clock rates, the rewrite fixes a
longstanding bug in the old code which put the high bits of the fractional
part of the divisor into the wrong place in the wIndex field. That bug
was mostly harmless -- it accidentally didn't affect standard baud rates
and would only show up when using relatively fast non-standard rates.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it. I may have accidentally applied a patch to a non-clean
working tree. Revert everything while I figure out what went wrong.
Pointy hat to: rstone
out 32 is not enough to support a full sized TSO packet.
While I'm here fix a long standing bug introduced in r169632 in
bce(4) where it didn't include L2 header length of TSO packet in
the maximum DMA segment size calculation.
In collaboration with: rmacklem
MFC after: 2 weeks
o Move vd_bitbltchr vga's driver method to vd_maskbitbltchr.
o Implement new vd_bitbltchr method for vga driver. (It do single write for 8
pixels, have to be a bit faster).
MFC after: 7 days
Sponsored by: The FreeBSD Foundation
vt(9) crash on resume fixed, but Xorg still have damaged screen on resume (at
least with i915kms), so better to switch to VT0 before suspend and back on
resume.
Sponsored by: The FreeBSD Foundation
Statically allocated terminal window have not initialized callout handler, so we
have to initialize it even for existing window if it is console window.
Reported by: gjb and many
Tested by: gjb
MFC after: 7 days
Sponsored by: The FreeBSD Foundation
Previous implementation limits put queue size only (when Tx lock can't
be acquired), but get queue may grow unboundedly which results in mbuf
pools exhaustion and latency growth.
Submitted by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
These are needed to diagnose TX hangs that I and hiren are seeing.
Without it, the only way we'll see debugging is by having ATH_DEBUG_SW_TX
enabled and that is going to be very, very spammy.
ATH_DEBUG_RESET is fine; it's only going to be done during stuck beacon
situations in AP mode.
Whilst I'm here, and now that it's behind debugging, let's just disable
the "print only one" conditional. I'll eventually make it more tunable.
Tested:
* AR9220, hostap mode.
create character devices. The deadlock can happen if an application is
issuing IOCTLs which require USB refcounting, at the same time the USB
device is detaching.
There is already a counter in place in the USB device structure to
detect this situation, but it was not always checked ahead of invoking
functions that might destroy character devices, like detach, set
configuration, set alternate interface or detach active kernel driver.
Reported by: Daniel O'Connor <doconnor@gsoft.com.au>
MFC after: 1 week
device is asleep.
This doesn't avoid logging errors for things that are actually OK to
access whilst the chip is asleep (eg, the RTC registers (0x7000->0x70ff
on the AR5416 and later.)
But, this is a pretty good indicator if things are accessed incorrectly.
Tested:
* AR5416, STA
This way the state changes from sleep->awake before the registers are poked
and from awake->sleep after the registers are poked.
This way spurious warnings aren't printed by my (to be committed)
debugging code.
Tested:
* AR5416, STA
Yes, this means that sc_invalid is slightly racy, but there are other
issues here which need fixing.
This fixes a source of eventual LORs - ath_init() grabs ATH_LOCK to do
work and releases it before it calls ieee80211_start_all().
ieee80211_start_all() will grab the net80211 comlock to iterate over
the VAPs.
TODO:
* .. I should just migrate the ieee80211_start_all() work to a
deferred task so it can be done later; it doesn't have to be
immediately done.
Tested:
* AR5416, STA mode
then threads can sleep on the pip condition.
Avoid to deadlock such threads by correctly awakening the sleeping ones
after the pip is finished.
swapoff side of the bug can likely result in shutdown deadlocks.
Sponsored by: EMC / Isilon Storage Division
Reported by: pho, pluknet
Tested by: pho
- More flexible cluster size selection, including the ability to fall
back to a safe cluster size (PAGE_SIZE from zone_jumbop by default) in
case an allocation of a larger size fails.
- A single get_fl_payload() function that assembles the payload into an
mbuf chain for any kind of freelist. This replaces two variants: one
for freelists with buffer packing enabled and another for those without.
- Buffer packing with any sized cluster. It was limited to 4K clusters
only before this change.
- Enable buffer packing for TOE rx queues as well.
- Statistics and tunables to go with all these changes. The driver's
man page will be updated separately.
MFC after: 5 weeks
mbuf should be owned by if_transmit function in any case.
Submitted-by: Andrew Rybchenko <Andrew.Rybchenko at oktetlabs.ru>
Sponsored by: Solarflare Communications, Inc.
The NetBSD Foundation states "Third parties are encouraged to change the
license on any files which have a 4-clause license contributed to the
NetBSD Foundation to a 2-clause license."
This change removes clauses 3 and 4 from copyright / license blocks that
list The NetBSD Foundation as the only copyright holder.
Sponsored by: The FreeBSD Foundation
controller initialization.
The spec says OS drivers should send this command after controller
initialization completes successfully, but other NVMe OS drivers are
not sending this command. This change will therefore reduce differences
between the FreeBSD and other OS drivers.
Sponsored by: Intel
MFC after: 3 days
Replace usage of db_active in Xen console with kdb_active.
Reported by: Andrzej Tobola <ato@iem.pw.edu.pl>
Approved by: gibbs
Sponsored by: Citrix Systems R&D
As a prerequisite for multiple queues, the guest must have MSIX enabled.
Unfortunately, to work around device passthrough bugs, FreeBSD disables
MSIX when running as a VMWare guest due to the hw.pci.honor_msi_blacklist
tunable; this tunable must be disabled for multiple queues.
Also included is various minor changes from the projects/vmxnet branch.
MFC after: 1 month
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.
MFC after: 3 weeks
Add support for MSI interrupts in the puc(9) driver. By default the driver
will prefer MSI interrupts to legacy interrupts. A tunable,
hw.puc.msi_disable, has been added to force the allocation of legacy
interrupts.
Reviewed by: jhb@
MFC after: 2 weeks
Sponsored by: Sandvine Inc.
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.
o Remove the if_baudrate_pf crutch.
o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.
This also removes quite a lot of COMPAT_FREEBSD32 code.
o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.
o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.
__FreeBSD_version bumped.
Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
When running as a PVH guest, there's no emulated i8254, so we need to
use the Xen PV timer as the early source for DELAY. This change allows
for different implementations of the early DELAY function and
implements a Xen variant for it.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
dev/xen/timer/timer.c:
dev/xen/timer/timer.h:
- Implement Xen early delay functions using the PV timer and declare
them.
x86/include/init.h:
- Add hooks for early clock source initialization and early delay
functions.
i386/i386/machdep.c:
pc98/pc98/machdep.c:
amd64/amd64/machdep.c:
- Set early delay hooks to use the i8254 on bare metal.
- Use clock_init (that will in turn make use of init_ops) to
initialize the early clock source.
amd64/include/clock.h:
i386/include/clock.h:
- Declare i8254_delay and clock_init.
i386/xen/clock.c:
- Rename DELAY to i8254_delay.
x86/isa/clock.c:
- Introduce clock_init that will take care of initializing the early
clock by making use of the init_ops hooks.
- Move non ISA related delay functions to the newly introduced delay
file.
x86/x86/delay.c:
- Add moved delay related functions.
- Implement generic DELAY function that will use the init_ops hooks.
x86/xen/pv.c:
- Set PVH hooks for the early delay related functions in init_ops.
conf/files.amd64:
conf/files.i386:
conf/files.pc98:
- Add delay.c to the kernel build.
This should not introduce any functional change, and makes the
functions suitable to be called before we have actually mapped the
vcpu_info struct on a per-cpu basis.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
dev/xen/timer/timer.c:
- Remove citrical_{enter/exit}, the clock code will already be called
with preemption disabled when needed. Add a comment to that regard
in xentimer_get_timecount.
- Allow xen_fetch_vcpu_time to be called with a specifc vcpu_info
that will be used to fetch current time.
- Assert that xentimer_et_start will always be called with preemption
disabled.
This adds and enables the PV console used on XEN kernels to
GENERIC/XENHVM kernels in order for it to be used on PVH.
Approved by: gibbs
Sponsored by: Citrix Systems R&D
dev/xen/console/console.c:
- Define console_page.
- Move xc_printf debug function from i386 XEN code to generic console
code.
- Rework xc_printf.
- Use xen_initial_domain instead of open-coded checks for Dom0.
- Gate the attach of the PV console to PV(H) guests.
dev/xen/console/xencons_ring.c:
- Allow the PV Xen console to output earlier by directly signaling
the event channel in start_info if the event channel is not yet
initialized.
- Use HYPERVISOR_start_info instead of xen_start_info.
i386/include/xen/xen-os.h:
- Remove prototype for xc_printf since it's now declared in global
xen-os.h
i386/xen/xen_machdep.c:
- Remove previous version of xc_printf.
- Remove definition of console_page (now it's defined in the console
itself).
- Fix some printf formatting errors.
x86/xen/pv.c:
- Add some early boot debug messages using xc_printf.
- Set console_page based on the value passed in start_info.
xen/xen-os.h:
- Declare console_page and add prototype for xc_printf.
baudrate of the device special file, and makes sure that on open(2) the
UART is programmed with the correct baudrate. This then eliminates the
need in uart_tty_param() to override the speed setting.
private per-chip HAL.
This allows the ah_osdep.[ch] code to check whether the power state is
valid for doing chip programming.
It should be a no-op for normal driver work but it does require a
clean kernel/module rebuild, as the size of HAL structures have changed.
Now, this doesn't track whether the hardware is ACTUALLY awake,
as NETWORK_SLEEP wakes the chip up for a short period when traffic
is received. This doesn't actually set the power mode to AWAKE, so
we have to be careful about how we touch things.
But it's enough to start down the path of implementing station mode
chipset power savings, as a large part of the silliness is making
sure the chip is awake during periodic calibration / ANI and
random places where transmit may be occuring. I'd rather not a repeat
of debugging power save on ath9k, where races with calibration
and transmit path stuff took a couple years to shake out.
Tested:
* AR5416, STA mode
This fixes kernel panic during boot, caused by incompatibility of recent
CAM locking changes and this bus scanner code.
Submitted by: Microsoft
MFC after: 1 week
Centrino 2230 firmware.
This fixes the general statistics block to be actually valid.
I've verified this by contrasting the output of iwnstats before and
after the change. The general block is now correct.
Tested:
* Intel 5100 (old format stats message)
* Intel 2230 (new format stats message)
(pvid=1) and we already configure them to send to other ports.
Setting pvid=portnum would mean that there were separate vlangroups
for each ports, but 'leaking' into other ports. The result? All port
traffic flooded to all other port traffic.
Tested:
* DB120, AR9344 + AR8327 switch
The OpenWRT AR8xxx switch support flushes the ATU (address translation
unit) after each port link 'up' status change. I've modified this to
just flush on any port transition.
Whilst here, bump the number of ports on the AR8327 to 6, rather than
the default of 5. It's DB120 specific; I'll go and make this configurable
later.
There's some debugging code in here still; I am still debugging whether
this is or isn't working fully.
Tested:
* DB120, AR9344 + AR8327 switch
Obtained from: OpenWRT
This patch does four things:
* it globally disables mirroring;
* it globally sets the mirroring on each port to be disabled;
* the initial port setup now programs a portmask for the port to allow
transmission (forwarding) to all other ports bar itself;
* the vlan setup path now programs the portmask for the port to
allow transmission (forwarding) to all other ports bar itself.
Before this, I hard-coded the portmask to 0x3f which would mean all
ports (bar port 6, which currently isn't hooked up to anything.)
This means that traffic would be duplicated back out the port it
received it. I bet this wasn't .. optimal.
In any case, this _seems_ to make DHCP from my macosx laptop
work through this access point. I'll do some further testing
to ensure it's actually working correctly on all my devices.
Tested:
* DB120, AR8327 switch
It turns out that there's a variant format of the RX statisitcs notification
from the intel firmware. It's even more whacked - the non-BT variant has
bluetooth fields; apparently some later NICs return even _more_ bluetooth
related fields.
I'll commit the statistics structure changes here - it's a no-op for the
driver. I'll later teach the driver code to populate a statistics structure
from the received message after reformatting things correctly.
I don't _think_ it's going to fix anything related to sensitivity programming
as the CCK/OFDM (non-11n) fields are in the same place for both formats.
But the HT structure and the general statistics aren't in the same place.
I'll go find some NIC(s) that spit out the other format and when I find one,
I'll go and update the driver to handle things correctly.
Tested:
* Intel 5100 (which returns the legacy, non-BT format)
Obtained from: Linux iwlwifi
match the device. Pinctrl will need to be added before this will work,
in addition to migrating the current board_foo.c method of configuring
these pins to something else. Non-FDT systems won't be affected, yet.