* use barriers in a slightly better fashion. You can blame this
glass of whiskey on putting barriers in the wrong spot. Grr adrian.
* steal/rewrite the mdio busy check from ag7100 from openwrt and
refactor the existing code out. This is .. more correct.
This seems to fix the boot-to-boot variation that I've been seeing
and it quietens the switch port status flapping.
Tested:
* QCA9558 SoC (AP135.)
Obtained from: Linux OpenWRT
This driver and the linux ag71xx driver both treat the transmit ring
as a circular linked list of descriptors. There's no "end" pointer
that is ever NULL - instead, it expects the MAC to hit a finished
descriptor (ARGE_DESC_EMPTY) and stop.
Now, since it's a circular buffer, we may end up with the hardware
hitting the beginning of our multi-descriptor frame before we've finished
setting it up. It then DMA's it in, starts sending it, and we finish
writing out the new descriptor. The hardware may then write its
completion for the next descriptor out; then we do, and when we next
read it it'll show up as "not done" and transmit completion stops.
This unfortunately manifests itself as the transmit queue always
being active and a massive TX interrupt storm. We need to actively
ACK packets back from the transmit engine and if we don't (eg because
we think the transmit isn't finished but it is) then the unit will
just keep generating interrupts.
I hit this finally with the below testing setup. This fixed it for me.
Strictly speaking I should put in a sync in between writing out all of
the descriptors and writing out that final descriptor.
Tested:
* QCA9558 SoC (AP135 reference board) w/ arge1 + vlans acting as a
router, and iperf -d (tcp, bidirectional traffic.)
Obtained from: Linux OpenWRT (ag71xx_main.c.)
The MIPS busdma sync operations currently are a big no-op on coherent memory.
This isn't strictly correct behaviour as we need a SYNC in here to ensure that
the writes have finished and are visible in main memory before the MMIO accesses
occur. This will have to be addressed in a later commit.
But, before that happens, let's at least do a flush here to make things
more "correct".
This is required for even remotely sensible behaviour on mips74k with
write-through memory enabled.
The mips74k programmers guide notes that reads can be re-ordered, even
uncached ones, so we need an explicit SYNC between them.
Yes, this is a case of a driver author actively doing a bus barrier
operation.
This ends up being necessary when the mips74k core is run in write-back
mode rather than write-through mode. That's coming in an upcoming
commit.
Tested:
* mips74k, QCA9558 SoC (AP135 reference board), arge<->arge interface
routing traffic tests.
send frames.
This matches the other check for space.
"enough" is a misnomer, for "reasons". The biggest reason is that
the TX ring is actually a circular linked list, with no head/tail pointers.
This is just a bit more headroom between head/tail so we have time to
schedule frames before we hit where the hardware is at.
Ideally this would be tunable and a little larger.
This should make it easier to track down interrupt storms from arge.
Tested:
* AP135 (QCA955x) SoC - defaults to ARGE_DEBUG enabled
* Carambola2 (AR9331 SoC) - defaults to ARGE_DEBUG disabled
I couldn't test arge0->arge1 bridging, only arge0 VLAN bridging.
The DIR-825C1 only hooks up arge0 to the switch GMAC0 and so
you need to abuse VLANs to test.
Tested:
* DIR-825C1 (AR9344)
This part seems to work bug-free with single byte TX/RX buffer alignment.
This drops the CPU requirement to bridge 100mbit iperf from 100% CPU
to ~ 50% CPU.
Tested:
* AP121 (AR9330) SoC, highly magic netbooted kernel + USB rootfs
due to 4mb flash, 16mb RAM; doing bridging between arge0 and arge1.
Notes:
* Yes, I likely can also turn this on for the AR934x SoC family now.
But since hardware design apparently follows similar branching
strategies to software design, I'll go and make sure all the AR934x's
that made it out into shipping products work before I flip it on.
This was triggering when using it as an AP bridge rather than an ethernet
bridge.
The code is unclear but it works; I'll fix it to be clearer and test
performance at a later stage.
The existing code meets the "alignment" requirement for the l3 payload
by offsetting the mbuf by uint64_t and then calling an rx fixup routine
to copy the frame backwards by 2 bytes. This DWORD aligns the
L3 payload so tcp, etc doesn't panic on unaligned access.
This is .. slow.
For arge MACs that support 1 byte TX/RX address alignment, we can do
the "other" hack: offset the RX address of the mbuf so the L3 payload
again is hopefully DWORD aligned.
This is much cheaper - since TX/RX is both 1 byte align ready (thanks
to the previous commit) there's no bounce buffering going on and there
is no rx fixup copying.
This gets bridging performance up from 180mbit/sec -> 410mbit/sec.
There's around 10% of CPU cycles spent in _bus_dmamap_sync(); I'll
investigate that later.
Tested:
* QCA955x SoC (AP135 reference board), bridging arge0/arge1
by programming the switch to have two vlangroups in dot1q mode:
# ifconfig bridge0 inet 192.168.2.20/24
# etherswitchcfg config vlan_mode dot1q
# etherswitchcfg vlangroup0 members 0,1,2,3,4
# etherswitchcfg vlangroup1 vlan 2 members 5,6
# etherswitchcfg port5 pvid 2
# etherswitchcfg port6 pvid 2
# ifconfig arge1 up
# ifconfig bridge0 addm arge1
The early ethernet MACs (I think AR71xx and AR913x) require that both
TX and RX require 4-byte alignment for all packets.
The later MACs have started relaxing the requirements.
For now, the 1-byte TX and 1-byte RX alignment requirements are only for
the QCA955x SoCs. I'll add in the relaxed requirements as I review the
datasheets and do testing.
* Add a hardware flags field and 1-byte / 4-byte TX/RX alignment.
* .. defaulting to 4-byte TX and 4-byte RX alignment.
* Only enforce the TX alignment fixup if the hardware requires a 4-byte
TX alignment. This avoids a call to m_defrag().
* Add counters for various situations for further debugging.
* Set the 1-byte and 4-byte busdma alignment requirement when
the tag is created.
This improves the straight bridging performance from 130mbit/sec
to 180mbit/sec, purely by removing the need for TX path bounce buffers.
The main performance issue is the RX alignment requirement and any RX
bounce buffering that's occuring. (In a local test, removing the RX
fixup path and just aligning buffers raises the performance to above
400mbit/sec.
In theory it's a no-op for SoCs before the QCA955x.
Tested:
* QCA9558 SoC in AP135 board, using software bridging between arge0/arge1.
and start teaching subsystems about it.
The Atheros MIPS platforms don't guarantee any kind of FIFO consistency
with interrupts in hardware. So software needs to do a flush when it
receives an interrupt and before it calls the interrupt handler.
There are new ones for the QCA934x and QCA955x, so do a few things:
* Get rid of the individual ones (for ethernet and IP2);
* Create a mux and enum listing all the variations on DDR flushes;
* replace the uses of IP2 with the relevant one (which will typically
be "PCI" here);
* call the USB DDR flush before calling the real USB interrupt handlers;
* call the ethernet one upon receiving an interrupt that's for us,
rather than never calling it during operation.
Tested:
* QCA9558 (TP-Link archer c7 v2)
* AR9331 (Carambola 2)
TODO:
* PCI, USB, ethernet, etc need to do a double-check to see if the
interrupt was truely for them before doing the DDR. For now I
prefer "correct" over "fast".
A lot of these dinky atheros based MIPS boards don't have a nice, well,
anything consistent defining their MAC addresses for things.
The Atheros reference design boards will happily put MAC addresses
into the wifi module calibration data like they should, and individual
ethernet MAC addresses into the calibration area in flash.
That makes my life easy - "hint.arge.X.eeprommac=<addr>" reads from
that flash address to extract a MAC, and everything works fine.
However, aside from some very well behaved vendors (eg the Carambola 2
board), everyone else does something odd.
eg:
* a MAC address in the environment (eg ubiquiti routerstation/RSPRO)
that you derive arge0/arge1 MAC addresses from.
* a MAC address in flash that you derive arge0/arge1 MAC addresses from.
* The wifi devices having their own MAC addresses in calibration data,
like normal.
* The wifi devices having a fixed, default or garbage value for a MAC
address in calibration data, and it has to be derived from the
system MAC.
So to support this complete nonsense of a situation, there needs to be
a few hacks:
* The "board" MAC address needs to be derived from somewhere and squirreled
away. For now it's either redboot or a MAC address stored in calibration
flash.
* Then, a "map" set of hints to populate kenv with some MAC addresses
that are derived/local, based on the board address. Each board has
a totally different idea of what you do to derive things, so each
map entry has an "offset" (+ve or -ve) that's added to the board
MAC address.
* Then if_arge (and later, if_ath) should check kenv for said hint and
if it's found, use that rather than the EEPROM MAC address - which may
be totally garbage and not actually work right.
In order to do this, I've undone some of the custom redboot expecting
hacks in if_arge and the stuff that magically adds one to the MAC
address supplied by the board - instead, as I continue to test this
out on more hardware, I'll update the hints file with a map explaining
(a) where the board MAC should come from, and (b) what offsets to use
for each device.
The aim is to have all of the tplink, dlink and other random hardware
we run on have valid MAC addresses at boot, so (a) people don't get
random B:S:D❌x:x ethernet MACs, and (b) the wifi MAC is valid
so it works rather than trying to use an invalid address that
actually upsets systems (think: multicast bit set in BSSID.)
Tested:
* TP-Link TL_WDR3600 - subsequent commits will add the hints map
and the if_ath support.
TODO:
* Since this is -HEAD, and I'm all for debugging, there's a lot of
printf()s in here. They'll eventually go under bootverbose.
* I'd like to turn the macaddr routines into something available
to all drivers - too many places hand-roll random MAC addresses
and parser stuff. I'd rather it just be shared code.
However, that'll require more formal review.
* More boards.
Otherwise, the initial media speed would change if a PHY is hooked up,
sending PHY speed notifications. For the AP135 at least, the RGMII
PHY has a static speed/duplex configured and if the PHY plumbing
attaches the PHY to the if_arge interface, the first link speed change
from 1000/full will set the MAC to something that isn't useful.
This shouldn't affect any other platforms - everything I looked at is
using hard-coded speed/duplex as static, as they're facing a switch
with no PHY attached.
handle packets up to 1536 bytes)
This fixes the need to frag that could happen when using vlans on top of
if_arge (which is a common case for the use the switch ports as individual
NICs).
Previously to this commit any vlan setup with if_arge as parent would have
the MTU of the parent interface reduced by the size of dot1q header
(4 bytes).
Tested on TP-Link 1043ND (where the WAN port is just a switch port setup to
tag packets in a different VLAN than the LAN ports).
Reported and tested by: Harm Weites (harm at weites.com)
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.
In many cases these bugs could result in leaks when a driver was detached.
Reviewed by: yongari
MFC after: 2 weeks
'eeprommac'.
The existing driver would just make arge units past 0 take the primary
MAC and increment it by the unit number, without correct address wrapping.
That has to be fixed at a later date.
Tested:
* Atheros DB120 reference obard
return BUS_PROBE_NOWILDCARD from their probe routines to avoid claiming
wildcard devices on their parent bus. Do a sweep through the MIPS tree.
MFC after: 2 weeks
The MDIO bus frequency is configured as a divisor off of the MDIO bus
reference clock. For the AR9344 and later, the MDIO bus frequency can
be faster than normal (ie, up to 100MHz) and thus a static divisor may
not be very applicable.
So, for those boards that may require an actual frequency to be selected
regardless of what crazy stuff the vendor throws in uboot, one can now
set the MDIO bus frequency. It uses the MDIO frequency and the target
frequency to choose a divisor that doesn't exceed the target frequency.
By default it will choose:
* DIV_28 on everything; except
* DIV_58 on the AR9344 to be conservative.
Whilst I'm here, add some comments about the defaults being not quite
right. For the other internal switch devices (like the AR933x, AR724x)
the divisor can be higher - it's internal and the reference MDIO clock
is much lower than 100MHz.
The divisor tables and loop code is inspired from Linux/OpenWRT. It's very
simple; I didn't feel that reimplementing it would yield a substantially
different solution.
Tested:
* AR9331 (mips24k)
* AR9344 (mips74k)
Obtained from: Linux/OpenWRT
up and running.
* The MAC FIFO configurations needed updating;
* Reset the MDIO block at the same time the MAC block is reset;
* The default divisor needs changing as the DB120 runs at a higher
base MDIO bus clock compared to other chips.
The long-term fix is to allow the system to have a target MDIO bus
clock rate and then calculate the most suitable divider to meet
that. This will likely need implementing before stable external
PHY or switch support can be committed.
Tested:
* AR9344 (mips74k)
* AR9331 (mips24k)
add some packet(s) to tx ring and arge_stop() is called before receive the
sent packet interrupt from hardware. Fix arge_stop() to unload the in use
dma tags and free the associated mbuf.
PR: 178319, 163670
Approved by: adrian (mentor)
(re)start the interface when it is down. This change fix a race with
BOOTP where the response packet is lost because the interface is being
reset by a netmask change right after send the packet.
PR: 178318
Approved by: adrian (mentor)
form xx:xx:xx:xx:xx:xx complete with ":" characters taking of 18 bytes
instead of 6 integers. Expose a "readascii" tuneable to handle this case.
Remove restriction on eepromac assignement for the first dev instance only.
Add eepromac address for DIR-825 to hints file.
Add readascii hint for DIR-825
Reviewed by: adrian@
This seems to break at least my test board here (AR71xx + AR8316 switch
PHY). Since I do have a whole sleuth of "normal" PHY boards (with
an AR71xx on a normal PHY port), I'll do some further testing with those
to determine whether this is a general issue, or whether it's limited
to the behaviour of the "fake" dedicated PHY port mode on these atheros
switches.
* Flesh out the PLL configuration fetch function, which will return the PLL
configuration based on the unit number and speed.
* Remove the PLL speed config logic from the AR71xx/AR91xx chip PLL config
function - pass in a 'pll' value instead.
* Modify arge_set_pll() to:
+ fetch the PLL configuration
+ write the PLL configuration
+ update the MII speed configuration.
This will allow if_arge to override the PLL configuration as required.
Obtained from: Linux/Atheros/OpenWRT
This is only done if the ARGE_MDIO option is included.
* Shuffle the arge MDIO bus into a separate device, that needs to be
probed early (use hint.argemdio.X.order=0)
* hint.arge.X.mdio now specifies which miiproxy to rendezvous with.
* Call MAC/MDIO bus init during MDIO attach, not arge attach.
This is done regardless:
* Shift the arge MAC and MDIO bus reset code into separate functions
and call it early during MDIO bus attach. It's required for
correct MDIO bus IO to occur on AR71xx/AR91xx devices.
* Remove the AR71xx/AR91xx centric assumption that there's only one
MDIO bus. The initial code mapped miibus0(arge0) and miibus1(arge1)
MII register operations to the MII0 (arge0) register space. The
AR724x (and later, upcoming chipsets) have two MDIO busses and
the second is very much in use.
TODO:
* since the multiphy behaviour has changed (where now a phymask of >1
PHY will still be enumerated), multiphy setups may be quite wrong.
I'll go and fix these so they still have a chance of working, at least.
until the switch PHY support appears in -HEAD.
Submitted by: Stefan Bethke <stb@lassitu.de>
function.
From the submitter:
This patch fixes an issue I encountered using an NFS root with an
ar71xx-based MikroTik RouterBoard 450G on -current where the kernel fails
to contact a DHCP/BOOTP server via if_arge when it otherwise should be able
to. This may be the same issue that Monthadar Al Jaberi reported against
an RSPRO on 6 March, as the signature is the same:
%%%
DHCP/BOOTP timeout for server 255.255.255.255
DHCP/BOOTP timeout for server 255.255.255.255
DHCP/BOOTP timeout for server 255.255.255.255
.
.
.
DHCP/BOOTP timeout for server 255.255.255.255
DHCP/BOOTP timeout for server 255.255.255.255
arge0: initialization failed: no memory for rx buffers
DHCP/BOOTP timeout for server 255.255.255.255
arge0: initialization failed: no memory for rx buffers
%%%
The primary issue that I found is that the DHCP/BOOTP message that
bootpc_call() is sending never makes it onto the wire, which I believe is
due to the following:
- Last December, a change was made to the ifioctl that bootpc_call() uses
to adjust the netmask around the sosend().
- The new ioctl (SIOCAIFADDR) performs an if_init when invoked, whereas the
old one (SIOCSIFNETMASK) did not.
- if_arge maintains its own sense of link state in sc->arge_link_status.
- On a single-phy interface, sc->arge_link_status is initialized to 0 in
arge_init_locked().
- sc->arge_link_status remains 0 until a phy state change notification
causes arge_link_task to run, notice the link is up, and set it to 1.
- The inits caused by the ifioctls in bootpc_call are reinitializing the
interface, but not the phy, so sc->arge_link_status goes to 0 and remains
there.
- arge_start_locked() always sees sc->arge_link_status == 0 and returns
without queuing anything.
The attached patch changes arge_init_locked() such that in the single-phy
case, instead of initializing sc->arge_link_status to 0, it runs
arge_link_task() to set it according to the current phy state. This change
has allowed my setup to mount an NFS root successfully.
Submitted by: Patrick Kelsey <kelsey@ieee.org>
Reviewed by: juli
I had some interesting hangs until I realised I should try flushing the
DDR FIFO register and lo and behold, hangs stopped occuring.
I've put in a few DDR flushes here and there in case people decide to
reuse some of these functions. It's very very likely they're almost
all superflous.
To test:
* Connect to a network with a _lot_ of broadcast traffic
* Do this:
# while true; do ifconfig arge0 down; ifconfig arge0 up; done
This fixes the mbuf exhaustion that has been reported when the interface
state flaps up/down.
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
Because driver is accessing a common MII structure in
mii_pollstat(), updating user supplied structure should be done
before dropping a driver lock.
Reported by: Karim (fodillemlinkarimi <> gmail dot com)
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
it might provide useful for debugging). For most mii(4) drivers it
was unclear whether the PHYs driven by them actually support
loopback or not. Moreover, typically loopback mode also needs to
be activated on the MAC, which none of the Ethernet drivers using
mii(4) implements. Given that loopback media has no real use (and
obviously hardly had a chance to actually work) besides for driver
development (which just loopback mode should be sufficient for
though, i.e one doesn't necessary need support for loopback media)
support for it is just dropped as both NetBSD and OpenBSD already
did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
instead of discrete function pointers, and extend this to include
a "reset" entry point. Make sure any PHY-specific reset routine is
always used, and provide one for lxtphy(4) which disables MII
interrupts (as is done for a few other PHYs we have drivers for).
This includes changing NIC drivers which previously just called the
generic mii_phy_reset() to now actually call the PHY-specific reset
routine, which might be crucial in some cases. While at it, the
redundant checks in these NIC drivers for mii->mii_instance not being
zero before calling the reset routines were removed because as soon
as one PHY driver attaches mii->mii_instance is incremented and we
hardly can end up in their media change callbacks etc if no PHY driver
has attached as mii_attach() would have failed in that case and not
attach a miibus(4) instance.
Consequently, NIC drivers now no longer should call mii_phy_reset()
directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
The purpose of that function is to perform the common steps to attach
a PHY driver instance and to hook it up to the miibus(4) instance and to
optionally also handle the probing, addition and initialization of the
supported media. So all a PHY driver without any special requirements
has to do in its bus attach method is to call mii_phy_dev_attach()
along with PHY-specific MIIF_* flags, a pointer to its PHY functions
and the add_media set to one. All PHY drivers were updated to take
advantage of mii_phy_dev_attach() as appropriate. Along with these
changes the capability mask was added to the mii_softc structure so
PHY drivers taking advantage of mii_phy_dev_attach() but still
handling media on their own do not need to fiddle with the MII attach
arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
structure. Several PHY drivers require this information also after
attaching and previously had to wrap their own softc around mii_softc.
NetBSD/OpenBSD also keep track of the model and revision on their
mii_softc structure. All PHY drivers were updated to take advantage
as appropriate.
- Convert the mebers of the MII data structure to unsigned where
appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
when mapping an OUI to the MII ID registers. All PHY drivers and
miidevs where changed as necessary. Actually this now again allows to
largely share miidevs with NetBSD, which fixed this problem already
9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
weren't explicitly converted to support flow control before. It's
unclear whether flow control actually works with these but typically
it should and their net behavior should be more correct with these
changes in place than without if the MAC driver sets MIIF_DOPAUSE.
Obtained from: NetBSD (partially)
Reviewed by: yongari (earlier version), silence on arch@ and net@
levels. TX would hang, RX wouldn't. A bit of digging showed the interface
send queue was full, but IFF_DRV_OACTIVE was clear and the hardware TX
queue was empty.
It turns out that there wasn't a check to drain the interface send
queue once hardware TX had completed, so if the interface send queue
had filled up in the meantime, subsequent packets would be dropped
by the higher layers and if_start (and thus arge_start()) would never
be called.
The fix is simple - call arge_start_locked() in the software interrupt
handler after the hardware TX queue has been handled or a TX underrun
occured. This way the interface send queue gets drained.