Adrian Chadd 240de6998b arge: don't do the rx fixup copy and just offset the mbuf by 2 bytes
The existing code meets the "alignment" requirement for the l3 payload
by offsetting the mbuf by uint64_t and then calling an rx fixup routine
to copy the frame backwards by 2 bytes.  This DWORD aligns the
L3 payload so tcp, etc doesn't panic on unaligned access.

This is .. slow.

For arge MACs that support 1 byte TX/RX address alignment, we can do
the "other" hack: offset the RX address of the mbuf so the L3 payload
again is hopefully DWORD aligned.

This is much cheaper - since TX/RX is both 1 byte align ready (thanks
to the previous commit) there's no bounce buffering going on and there
is no rx fixup copying.

This gets bridging performance up from 180mbit/sec -> 410mbit/sec.
There's around 10% of CPU cycles spent in _bus_dmamap_sync(); I'll
investigate that later.

Tested:

* QCA955x SoC (AP135 reference board), bridging arge0/arge1
  by programming the switch to have two vlangroups in dot1q mode:

# ifconfig bridge0 inet 192.168.2.20/24
# etherswitchcfg config vlan_mode dot1q
# etherswitchcfg vlangroup0 members 0,1,2,3,4
# etherswitchcfg vlangroup1 vlan 2 members 5,6
# etherswitchcfg port5 pvid 2
# etherswitchcfg port6 pvid 2
# ifconfig arge1 up
# ifconfig bridge0 addm arge1
2015-10-21 01:41:18 +00:00
..
2015-01-05 02:00:41 +00:00
2015-01-05 01:59:44 +00:00
2015-03-02 01:23:59 +00:00
2015-07-03 07:00:24 +00:00