I couldn't test arge0->arge1 bridging, only arge0 VLAN bridging.
The DIR-825C1 only hooks up arge0 to the switch GMAC0 and so
you need to abuse VLANs to test.
Tested:
* DIR-825C1 (AR9344)
This part seems to work bug-free with single byte TX/RX buffer alignment.
This drops the CPU requirement to bridge 100mbit iperf from 100% CPU
to ~ 50% CPU.
Tested:
* AP121 (AR9330) SoC, highly magic netbooted kernel + USB rootfs
due to 4mb flash, 16mb RAM; doing bridging between arge0 and arge1.
Notes:
* Yes, I likely can also turn this on for the AR934x SoC family now.
But since hardware design apparently follows similar branching
strategies to software design, I'll go and make sure all the AR934x's
that made it out into shipping products work before I flip it on.
This was triggering when using it as an AP bridge rather than an ethernet
bridge.
The code is unclear but it works; I'll fix it to be clearer and test
performance at a later stage.
The existing code meets the "alignment" requirement for the l3 payload
by offsetting the mbuf by uint64_t and then calling an rx fixup routine
to copy the frame backwards by 2 bytes. This DWORD aligns the
L3 payload so tcp, etc doesn't panic on unaligned access.
This is .. slow.
For arge MACs that support 1 byte TX/RX address alignment, we can do
the "other" hack: offset the RX address of the mbuf so the L3 payload
again is hopefully DWORD aligned.
This is much cheaper - since TX/RX is both 1 byte align ready (thanks
to the previous commit) there's no bounce buffering going on and there
is no rx fixup copying.
This gets bridging performance up from 180mbit/sec -> 410mbit/sec.
There's around 10% of CPU cycles spent in _bus_dmamap_sync(); I'll
investigate that later.
Tested:
* QCA955x SoC (AP135 reference board), bridging arge0/arge1
by programming the switch to have two vlangroups in dot1q mode:
# ifconfig bridge0 inet 192.168.2.20/24
# etherswitchcfg config vlan_mode dot1q
# etherswitchcfg vlangroup0 members 0,1,2,3,4
# etherswitchcfg vlangroup1 vlan 2 members 5,6
# etherswitchcfg port5 pvid 2
# etherswitchcfg port6 pvid 2
# ifconfig arge1 up
# ifconfig bridge0 addm arge1
The early ethernet MACs (I think AR71xx and AR913x) require that both
TX and RX require 4-byte alignment for all packets.
The later MACs have started relaxing the requirements.
For now, the 1-byte TX and 1-byte RX alignment requirements are only for
the QCA955x SoCs. I'll add in the relaxed requirements as I review the
datasheets and do testing.
* Add a hardware flags field and 1-byte / 4-byte TX/RX alignment.
* .. defaulting to 4-byte TX and 4-byte RX alignment.
* Only enforce the TX alignment fixup if the hardware requires a 4-byte
TX alignment. This avoids a call to m_defrag().
* Add counters for various situations for further debugging.
* Set the 1-byte and 4-byte busdma alignment requirement when
the tag is created.
This improves the straight bridging performance from 130mbit/sec
to 180mbit/sec, purely by removing the need for TX path bounce buffers.
The main performance issue is the RX alignment requirement and any RX
bounce buffering that's occuring. (In a local test, removing the RX
fixup path and just aligning buffers raises the performance to above
400mbit/sec.
In theory it's a no-op for SoCs before the QCA955x.
Tested:
* QCA9558 SoC in AP135 board, using software bridging between arge0/arge1.
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
and start teaching subsystems about it.
The Atheros MIPS platforms don't guarantee any kind of FIFO consistency
with interrupts in hardware. So software needs to do a flush when it
receives an interrupt and before it calls the interrupt handler.
There are new ones for the QCA934x and QCA955x, so do a few things:
* Get rid of the individual ones (for ethernet and IP2);
* Create a mux and enum listing all the variations on DDR flushes;
* replace the uses of IP2 with the relevant one (which will typically
be "PCI" here);
* call the USB DDR flush before calling the real USB interrupt handlers;
* call the ethernet one upon receiving an interrupt that's for us,
rather than never calling it during operation.
Tested:
* QCA9558 (TP-Link archer c7 v2)
* AR9331 (Carambola 2)
TODO:
* PCI, USB, ethernet, etc need to do a double-check to see if the
interrupt was truely for them before doing the DDR. For now I
prefer "correct" over "fast".
The QCA955x looks a lot like the AR724x PCIe controller, except it
supports two root complexes. Unfortunately I only have one, so
although this code has started down the path of supporting more than
one, it's definitely not yet ready.
Tested:
* AP135 board (QCA9558 SoC), with the 11ac NIC swapped for an AR9380
PCIe NIC.
Notes:
* Yes, this driver isn't very pretty. I decided to commit what I have
versus holding onto something that isn't yet finished. It is enough
to bring up the above NIC and interrupt routing works, so it's a good
start.
* However, yes, the DDR flush routine hooks need to be fixed up.
I don't think I'm firing the right one at the moment.
This is needed with the pl011 driver. Before this change it would default
to a shift of 0, however the hardware places the registers at 4-byte
addresses meaning the value should be 2.
This patch fixes this for the pl011 when configured using the fdt. The
other drivers have a default value of 0 to keep this a no-op.
MFC after: 1 week
A lot of these dinky atheros based MIPS boards don't have a nice, well,
anything consistent defining their MAC addresses for things.
The Atheros reference design boards will happily put MAC addresses
into the wifi module calibration data like they should, and individual
ethernet MAC addresses into the calibration area in flash.
That makes my life easy - "hint.arge.X.eeprommac=<addr>" reads from
that flash address to extract a MAC, and everything works fine.
However, aside from some very well behaved vendors (eg the Carambola 2
board), everyone else does something odd.
eg:
* a MAC address in the environment (eg ubiquiti routerstation/RSPRO)
that you derive arge0/arge1 MAC addresses from.
* a MAC address in flash that you derive arge0/arge1 MAC addresses from.
* The wifi devices having their own MAC addresses in calibration data,
like normal.
* The wifi devices having a fixed, default or garbage value for a MAC
address in calibration data, and it has to be derived from the
system MAC.
So to support this complete nonsense of a situation, there needs to be
a few hacks:
* The "board" MAC address needs to be derived from somewhere and squirreled
away. For now it's either redboot or a MAC address stored in calibration
flash.
* Then, a "map" set of hints to populate kenv with some MAC addresses
that are derived/local, based on the board address. Each board has
a totally different idea of what you do to derive things, so each
map entry has an "offset" (+ve or -ve) that's added to the board
MAC address.
* Then if_arge (and later, if_ath) should check kenv for said hint and
if it's found, use that rather than the EEPROM MAC address - which may
be totally garbage and not actually work right.
In order to do this, I've undone some of the custom redboot expecting
hacks in if_arge and the stuff that magically adds one to the MAC
address supplied by the board - instead, as I continue to test this
out on more hardware, I'll update the hints file with a map explaining
(a) where the board MAC should come from, and (b) what offsets to use
for each device.
The aim is to have all of the tplink, dlink and other random hardware
we run on have valid MAC addresses at boot, so (a) people don't get
random B:S:D❌x:x ethernet MACs, and (b) the wifi MAC is valid
so it works rather than trying to use an invalid address that
actually upsets systems (think: multicast bit set in BSSID.)
Tested:
* TP-Link TL_WDR3600 - subsequent commits will add the hints map
and the if_ath support.
TODO:
* Since this is -HEAD, and I'm all for debugging, there's a lot of
printf()s in here. They'll eventually go under bootverbose.
* I'd like to turn the macaddr routines into something available
to all drivers - too many places hand-roll random MAC addresses
and parser stuff. I'd rather it just be shared code.
However, that'll require more formal review.
* More boards.
The AR934x (and maybe others in this family) have a more complicated
GPIO mux. The AR71xx just has a single function register for a handful
of "GPIO or X" options, however the AR934x allows for one of roughly
100 behaviours for each GPIO pin.
So, this adds a quick hints based mechanism to configure the output
functions, which is required for some of the more interesting board
configurations. Specifically, some use external LNAs to improve
RX, and without the MUX/output configured right, the 2GHz RX side
will be plain terrible.
It doesn't yet configure the "input" side yet; I'll add that if
it's required.
Tested:
* TP-Link TL-WDR3600, testing 2GHz STA/AP modes, checking some
basic RX sensitivity things (ie, "can I see the AP on the other
side of the apartment that intentionally has poor signal reception
from where I am right now.")
Whilst here, fix a silly bug in the maxpin routine; I was missing
a break.
A lot of these embedded boards don't have a unique MAC address per
device stored somewhere unique - sometimes they'll have one MAC
for both arge NICs; someties they'll have one MAC for both arge NICs
/and/ the ath NICs. In these instances, we need to derive device
specific MAC addresses from the base MAC address.
These functions will be used by some follow-up code that'll slot
into if_arge and if_ath.
Otherwise, the initial media speed would change if a PHY is hooked up,
sending PHY speed notifications. For the AP135 at least, the RGMII
PHY has a static speed/duplex configured and if the PHY plumbing
attaches the PHY to the if_arge interface, the first link speed change
from 1000/full will set the MAC to something that isn't useful.
This shouldn't affect any other platforms - everything I looked at is
using hard-coded speed/duplex as static, as they're facing a switch
with no PHY attached.
There's two EHCI controllers in the QCA955x SoCs - they have different
interrupts available via various demux registers, but they both tie to
IP3.
So for now, allow them to be sharable so they can hang off of IP3.
There's a lot more to come - the QCA955x has a bunch more GPIO MUX
configuration, reminiscent of what the ARM chips let you do - but
it'll have to come later.
The QCA955x has more mux interrupts going on - and the AR934x actually does,
but I cheated and assigned wlan and pcie to the same interrupt line.
They are, there's just a status register mux that I should've been using.
Luckily this isn't too bad a change in itself - almost all of the
Atheros MIPS configurations use a _BASE file to inherit from.
Except PB92, which I should really fix up at some point.
The AR934x will use the legacy apb for now until I write its replacement.
The QCA955x SoC I'm doing bring-up on will have a separate qca955x_apb.c
implementation that includes hooking into IP2/IP3 and doing further
interrupt demuxing as appropriate.
APB mux.
It's larger than the AR71xx because it needs to replace the nexus
for some devices (notably wifi) and the wifi driver (if_ath_ahb.c)
reads the SPI data directly at early boot whilst it's memory mapped
in.
I'm eventually going to rip it out and replace it with a firmware
interface similar to what exists for the if_ath_pci.c path -
something early on (likely something new that I'll write) will
suck in the calibration data into a firmware API blob and that'll
be accessed from if_ath_ahb.c.
But, one thing at a time.
Tested:
* QCA955x SoC, AP135 development board
This adds the initial frequency poking and configures up enough
for it to boot and spit out data over the console.
There's still a whole bunch of work to do in the reset path
and devices to support this thing, but hey, it's alive!
ath> go 0x80050100
## Starting application at 0x80050100 ...
CPU platform: Atheros AR9558 rev 0
CPU Frequency=720 MHz
CPU DDR Frequency=600 MHz
CPU AHB Frequency=200 MHz
platform frequency: 720 MHz
CPU reference clock: 0 MHz
CPU MDIO clock: 40 MHz
Done at: hackathon
Obtained from: Linux OpenWRT, Qualcomm Atheros
There's likely a bunch of register offsets that I have to add the
register window base to before I use them.
Done at: Hackathon
Obtained from: Linux OpenWRT
The AR934x and later (which will turn up eventually) have a new GPIO
output configuration option - a real MUX rather than a "GPIO or this
function."
For now I'm squirreling it away in the CPU code just so it's done -
I may move this to the GPIO layer later.
Specifically, this is required for setting up some boards that have
external receive side LNA (low noise amplifier) that gets switched on/off
by the on-chip wireless MAC. If we don't add this support for those
boards then we'll end up with really poor performance.
(I don't yet have one of those APs, but it'll likely show up in a week.)
Obtained from: Linux OpenWRT