unnecessary filter configuration code in nge_init_locked().
While I'm here add a check for driver running state for multicast
filter handling. Also remove unnecessary assignment to error
variable since it is cleared in the function entry.
Suggested by: brad@OpenBSD.org
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.
In many cases these bugs could result in leaks when a driver was detached.
Reviewed by: yongari
MFC after: 2 weeks
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
them, please let me know if not). Most of these are of the form:
static const struct bzzt_type {
[...list of members...]
} const bzzt_devs[] = {
[...list of initializers...]
};
The second const is unnecessary, as arrays cannot be modified anyway,
and if the elements are const, the whole thing is const automatically
(e.g. it is placed in .rodata).
I have verified this does not change the binary output of a full kernel
build (except for build timestamps embedded in the object files).
Reviewed by: yongari, marius
MFC after: 1 week
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
take advantage of it instead of duplicating it. This reduces the size of
the i386 GENERIC kernel by about 4k. The only potential in-tree user left
unconverted is xe(4), which generally should be changed to use miibus(4)
instead of implementing PHY handling on its own, as otherwise it makes not
much sense to add a dependency on miibus(4)/mii_bitbang(4) to xe(4) just
for the MII bitbang'ing code. The common MII bitbang'ing code also is
useful in the embedded space for using GPIO pins to implement MII access.
- Based on lessons learnt with dc(4) (see r185750), add bus barriers to the
MII bitbang read and write functions of the other drivers converted in
order to ensure the intended ordering. Given that register access via an
index register as well as register bank/window switching is subject to the
same problem, also add bus barriers to the respective functions of smc(4),
tl(4) and xl(4).
- Sprinkle some const.
Thanks to the following testers:
Andrew Bliznak (nge(4)), nwhitehorn@ (bm(4)), yongari@ (sis(4) and ste(4))
Thanks to Hans-Joerg Sirtl for supplying hardware to test stge(4).
Reviewed by: yongari (subset of drivers)
Obtained from: NetBSD (partially)
Because driver is accessing a common MII structure in
mii_pollstat(), updating user supplied structure should be done
before dropping a driver lock.
Reported by: Karim (fodillemlinkarimi <> gmail dot com)
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
it might provide useful for debugging). For most mii(4) drivers it
was unclear whether the PHYs driven by them actually support
loopback or not. Moreover, typically loopback mode also needs to
be activated on the MAC, which none of the Ethernet drivers using
mii(4) implements. Given that loopback media has no real use (and
obviously hardly had a chance to actually work) besides for driver
development (which just loopback mode should be sufficient for
though, i.e one doesn't necessary need support for loopback media)
support for it is just dropped as both NetBSD and OpenBSD already
did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
instead of discrete function pointers, and extend this to include
a "reset" entry point. Make sure any PHY-specific reset routine is
always used, and provide one for lxtphy(4) which disables MII
interrupts (as is done for a few other PHYs we have drivers for).
This includes changing NIC drivers which previously just called the
generic mii_phy_reset() to now actually call the PHY-specific reset
routine, which might be crucial in some cases. While at it, the
redundant checks in these NIC drivers for mii->mii_instance not being
zero before calling the reset routines were removed because as soon
as one PHY driver attaches mii->mii_instance is incremented and we
hardly can end up in their media change callbacks etc if no PHY driver
has attached as mii_attach() would have failed in that case and not
attach a miibus(4) instance.
Consequently, NIC drivers now no longer should call mii_phy_reset()
directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
The purpose of that function is to perform the common steps to attach
a PHY driver instance and to hook it up to the miibus(4) instance and to
optionally also handle the probing, addition and initialization of the
supported media. So all a PHY driver without any special requirements
has to do in its bus attach method is to call mii_phy_dev_attach()
along with PHY-specific MIIF_* flags, a pointer to its PHY functions
and the add_media set to one. All PHY drivers were updated to take
advantage of mii_phy_dev_attach() as appropriate. Along with these
changes the capability mask was added to the mii_softc structure so
PHY drivers taking advantage of mii_phy_dev_attach() but still
handling media on their own do not need to fiddle with the MII attach
arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
structure. Several PHY drivers require this information also after
attaching and previously had to wrap their own softc around mii_softc.
NetBSD/OpenBSD also keep track of the model and revision on their
mii_softc structure. All PHY drivers were updated to take advantage
as appropriate.
- Convert the mebers of the MII data structure to unsigned where
appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
when mapping an OUI to the MII ID registers. All PHY drivers and
miidevs where changed as necessary. Actually this now again allows to
largely share miidevs with NetBSD, which fixed this problem already
9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
weren't explicitly converted to support flow control before. It's
unclear whether flow control actually works with these but typically
it should and their net behavior should be more correct with these
changes in place than without if the MAC driver sets MIIF_DOPAUSE.
Obtained from: NetBSD (partially)
Reviewed by: yongari (earlier version), silence on arch@ and net@
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
o Header file cleanup.
o bus_dma(9) conversion.
- Removed all consumers of vtophys(9) and converted to use
bus_dma(9).
- 64bit DMA support was disabled because DP83821 is not capable
of handling the DMA request. 64bit DMA request on DP83820
requires different descriptor structures and it's hard to
dynamically change descriptor format at run time so I disabled
it. Note, this is the same behavior as previous one but
previously nge(4) didn't explicitly disable 64bit mode on
DP83820.
- Added Tx/Rx descriptor ring alignment requirements(8 bytes
alignment).
- Limit maximum number of Tx DMA segments to 16. In fact,
controller does not seem to have limitations on number of Tx
DMA segments but 16 should be enough for most cases and
m_collapse(9) will handle highly fragmented frames without
consuming a lot of CPU cycles.
- Added Rx buffer alignment requirements(8 bytes alignment). This
means driver should fixup received frames to align on 16bits
boundary on strict-alignment architectures.
- Nuked driver private data structure in descriptor ring.
- Added endianness support code in Tx/Rx descriptor access.
o Prefer faster memory mapped register access to I/O mapped access.
Added fall-back mechanism to use alternative register access.
The hardware supports both memory and I/O mapped access.
o Added suspend/resume methods but it wasn't tested as controller I
have does not support PCI PME.
o Removed swap argument in nge_read_eeprom() since endianness
should be handled after reading EEPROM.
o Implemented experimental 802.3x full-duplex flow-control. ATM
it was commented out but will be activated after we have generic
flow-control framework in mii(4) layer.
o Rearranged promiscuous mode settings and simplified logic.
o Always disable Rx filter prior to changing Rx filter functions as
indicated in DP83820/DP83821 datasheet.
o Added an explicit DELAY in timeout loop of nge_reset().
o Added a sysctl variable dev.nge.%d.int_holdoff to control
interrupt moderation. Valid ranges are 1 to 255(default 1) in
units of 100us. The actual delivery of interrupt would be delayed
based on the sysctl value. The interface has to be brought down
and up again before a change takes effect. With proper tuning
value, users do not need to resort to polling(4) anymore.
o Added ALTQ(4) support.
o Added missing IFCAP_VLAN_HWCSUM as nge(4) can offload Tx/Rx
checksum calculation on VLAN tagged frames as well as VLAN tag
insertion/stripping. Also add IFCAP_VLAN_MTU capability as nge(4)
can handle VLAN tagged oversized frames.
o Fixed media header length for VLAN.
o Rearranged nge_detach routine such that it's now used for general
clean-up routine.
o Enabled MWI.
o Accessing EEPROM takes very long time so read 6 bytes ethernet
address with one call instead of 3 separate accesses.
o Don't set if_mtu in device attach, it's already set in
ether_ifattach().
o Don't do any special things for TBI interface. Remove TBI
specific media handling in the driver and have gentbi(4) handle
it. Add glue code to read/write TBI PHY registers in miibus
method. This change removes a lot of PHY handling code in driver
and now its functionality is handled by mii(4).
o Alignment fixup code is now applied only for strict-alignment
architectures. Previously the code was applied for all
architectures except i386. With this change amd64 will get
instant Rx performance boost.
o When driver fails to allocate a new mbuf, update if_qdrops so
users can see what was wrong in Rx path.
o Added a workaround for a hardware bug which resulted in short
VLAN tagged frames(e.g. ARP) was rejected as if runt frame was
received. With this workaround nge(4) now accepts the short VLAN
tagged frame and nge(4) can take full advantage of hardware VLAN
tag stripping. I have no idea how this bug wasn't known so far,
without the workaround nge(4) may never work on VLAN
environments.
o Fixed Rx checksum offload logic such that it now honors active
interface capability configured with ifconfig(8).
o In nge_start()/nge_txencap(), always leave at least one free
descriptor as indicated in datasheet. Without this the hardware
would be confused with ring descriptor structure(e.g. no clue
for the end of descriptor ring).
o Removed dead-code that checks interrupts on PHY hardware. The
code was designed to detect link state changes but it was
disabled as driving nge_tick clock would break auto-negotiation
timer. This code is no longer needed as nge(4) now uses mii(4)
and link state change handling is done with mii callback.
o Rearranged ethernet address programming logic such that it works
on strict-alignment architectures.
o Added IFCAP_VLAN_HWTAGGING/IFCAP_VLAN_HWCSUM handler in
nge_ioctl() such that the functionality is configurable with
ifconfig(8). DP83820/DP83821 can do checksum offload for VLAN
tagged frames so enable Tx/Rx checksum offload for VLAN
interfaces.
o Simplified IFCAP_POLLING selection logic in nge_ioctl().
o Fixed module unload panic when bpf listeners are active.
o Tx/Rx descriptor ring address uses 64bit DMA address for
readability. High address part of DMA would be 0 as nge(4)
disabled 64bit DMA transfers so it's ok for DP83821.
o Removed volatile keyword in softc as bus_dmamap_sync(9) should
take care of this.
o Removed extra driver private structures in descriptor ring. These
extra elements are not part of descriptor structure. Embedding
private driver structure into descriptor ring is not good idea
as its size may be different on 32bit/64bit architectures.
o Added miibus_linkchg method handler to catch link state changes.
o Removed unneeded nge_ifmedia in softc. All TBI access is handled
in gentbi(4). There is no difference between TBI and non-TBI case
now.
o Removed "gigabit link up" message handling in nge_tick. Link
state change notification is already performed by mii(4) and
checking link state by accessing PHY registers in periodic timer
handler of driver is wrong. All link state and speed/duplex
monitoring should be handled in PHY driver.
o Use our own timer for watchdog instead of if_watchdog/if_timer
interface.
o Added hardware MAC statistics counter, users canget current MAC
statistics from dev.nge.%d.stats sysctl node(%d is unit number of
a device).
o Removed unused macros, NGE_LASTDESC, NGE_MODE, NGE_OWNDESC,
NGE_RXBYTES.
o Increased number of Tx/Rx descriptors from 128 to 256. From my
experience on gigabit ethernet controllers, number of descriptors
should be 256 or higher to get an optimal performance on gigabit
link.
o Increased jumbo frame length to 9022 bytes to cope with other
gigabit ethernet drivers. Experimentation shows no problems with
9022 bytes.
o Removed unused member variables in softc.
o Switched from bus_space_{read|write}_4 to bus_{read|write}_4.
o Added support for WOL.
If these drivers are setting M_VLANTAG because they are stripping the
layer 2 802.1Q headers, then they need to be re-inserting them so any
bpf(4) peers can properly decode them.
It should be noted that this is compiled tested only.
MFC after: 3 weeks
m_pkthdr.ether_vlan. The presence of the M_VLANTAG flag on the mbuf
signifies the presence and validity of its content.
Drivers that support hardware VLAN tag stripping fill in the received
VLAN tag (containing both vlan and priority information) into the
ether_vtag mbuf packet header field:
m->m_pkthdr.ether_vtag = vlan_id; /* ntohs()? */
m->m_flags |= M_VLANTAG;
to mark the packet m with the specified VLAN tag.
On output the driver should check the mbuf for the M_VLANTAG flag to
see if a VLAN tag is present and valid:
if (m->m_flags & M_VLANTAG) {
... = m->m_pkthdr.ether_vtag; /* htons()? */
... pass tag to hardware ...
}
VLAN tags are stored in host byte order. Byte swapping may be necessary.
(Note: This driver conversion was mechanic and did not add or remove any
byte swapping in the drivers.)
Remove zone_mtag_vlan UMA zone and MTAG_VLAN definition. No more tag
memory allocation have to be done.
Reviewed by: thompsa, yar
Sponsored by: TCP/IP Optimization Fundraise 2005
if_watchdog, etc., or in functions used only in these methods.
In all other functions in the driver use device_printf().
- Use __func__ instead of typing function name.
Submitted by: Alex Lyashkov <umka sevcity.net>
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.
In collaboration with: yongari, ru, sam
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
- Use device_printf() and if_printf() and remove nge_unit.
- Use callout_init_mtx() and remove nge_tick_locked() as nge_tick() is now
always called with the driver lock held.
- Use M_ZERO to contigmalloc() when allocating nge_ldata. It was possible
for the random garbage to be used in certain cases otherwise.
- Cleanup attach error handling including no longer leaking nge_ldata.
- Add locking to the ifmedia callouts.
- Lock accesses to if_hwassist and if_capenable in nge_ioctl().
Submitted by: Yuriy N. Shkandybin jura at networks dot ru (1, 3, 4)
Tested by: Yuriy N. Shkandybin jura at networks dot ru
MFC after: 3 days
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
can be compiled as loadable modules.
Reviewed by: bde
o Axe poll in trap.
o Axe IFF_POLLING flag from if_flags.
o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.
o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.
o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.
Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.
Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.
Reviewed by: ru, sam, jhb
could get an interrupt after we free the ifp, and the interrupt
handler depended on the ifp being still alive, this could, in theory,
cause a crash. Eliminate this possibility by moving the if_free to
after the bus_teardown_intr() call.
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
over iteration of their multicast address lists when synchronizing the
hardware address filter with the network stack-maintained list.
Problem reported by: Ed Maste (emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam