to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
don't give RX path more priority than TX path.
Also remove infinite loop in interrupt handler and limit number of
iteration to 32. This change addresses system load fluctuations
under high network load.
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
(reporting IFM_LOOP based on BMCR_LOOP is left in place though as
it might provide useful for debugging). For most mii(4) drivers it
was unclear whether the PHYs driven by them actually support
loopback or not. Moreover, typically loopback mode also needs to
be activated on the MAC, which none of the Ethernet drivers using
mii(4) implements. Given that loopback media has no real use (and
obviously hardly had a chance to actually work) besides for driver
development (which just loopback mode should be sufficient for
though, i.e one doesn't necessary need support for loopback media)
support for it is just dropped as both NetBSD and OpenBSD already
did quite some time ago.
- Let mii_phy_add_media() also announce the support of IFM_NONE.
- Restructure the PHY entry points to use a structure of entry points
instead of discrete function pointers, and extend this to include
a "reset" entry point. Make sure any PHY-specific reset routine is
always used, and provide one for lxtphy(4) which disables MII
interrupts (as is done for a few other PHYs we have drivers for).
This includes changing NIC drivers which previously just called the
generic mii_phy_reset() to now actually call the PHY-specific reset
routine, which might be crucial in some cases. While at it, the
redundant checks in these NIC drivers for mii->mii_instance not being
zero before calling the reset routines were removed because as soon
as one PHY driver attaches mii->mii_instance is incremented and we
hardly can end up in their media change callbacks etc if no PHY driver
has attached as mii_attach() would have failed in that case and not
attach a miibus(4) instance.
Consequently, NIC drivers now no longer should call mii_phy_reset()
directly, so it was removed from EXPORT_SYMS.
- Add a mii_phy_dev_attach() as a companion helper to mii_phy_dev_probe().
The purpose of that function is to perform the common steps to attach
a PHY driver instance and to hook it up to the miibus(4) instance and to
optionally also handle the probing, addition and initialization of the
supported media. So all a PHY driver without any special requirements
has to do in its bus attach method is to call mii_phy_dev_attach()
along with PHY-specific MIIF_* flags, a pointer to its PHY functions
and the add_media set to one. All PHY drivers were updated to take
advantage of mii_phy_dev_attach() as appropriate. Along with these
changes the capability mask was added to the mii_softc structure so
PHY drivers taking advantage of mii_phy_dev_attach() but still
handling media on their own do not need to fiddle with the MII attach
arguments anyway.
- Keep track of the PHY offset in the mii_softc structure. This is done
for compatibility with NetBSD/OpenBSD.
- Keep track of the PHY's OUI, model and revision in the mii_softc
structure. Several PHY drivers require this information also after
attaching and previously had to wrap their own softc around mii_softc.
NetBSD/OpenBSD also keep track of the model and revision on their
mii_softc structure. All PHY drivers were updated to take advantage
as appropriate.
- Convert the mebers of the MII data structure to unsigned where
appropriate. This is partly inspired by NetBSD/OpenBSD.
- According to IEEE 802.3-2002 the bits actually have to be reversed
when mapping an OUI to the MII ID registers. All PHY drivers and
miidevs where changed as necessary. Actually this now again allows to
largely share miidevs with NetBSD, which fixed this problem already
9 years ago. Consequently miidevs was synced as far as possible.
- Add MIIF_NOMANPAUSE and mii_phy_flowstatus() calls to drivers that
weren't explicitly converted to support flow control before. It's
unclear whether flow control actually works with these but typically
it should and their net behavior should be more correct with these
changes in place than without if the MAC driver sets MIIF_DOPAUSE.
Obtained from: NetBSD (partially)
Reviewed by: yongari (earlier version), silence on arch@ and net@
StarFire controller does not require controller reinitialization to
program perfect filters. While here, make driver immediately exit
from interrupt/polling handler if driver reinitialized controller.
PR: kern/87506
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs. This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.
For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.
Approved by: re (kib)
MFC after: 6 weeks
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.
In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).
Bump __FreeBSD_version in order to signal such situation.
Reviewed by: emaste
Sponsored by: Sandvine Incorporated
checksum offoload by downloading AIC-6915 firmware. Changes are
o Header file cleanup.
o Simplified probe logic.
o s/u_int{8,16,32}_t/uint{8,16,32}_t/g
o K&R -> ANSI C.
o In register access function, added support both memory mapped and
IO space register acccess. The function will dynamically detect
which method would be choosed.
o sf_setperf() was modified to support strict-alignment
architectures.
o Use SF_MII_DATAPORT instead of hardcoded value 0xffff.
o Added link state/speed, duplex changes handling task q. The task q
is also responsible for flow control settings.
o Always hornor link up/down state reported by mii layers. The link
state information is used in sf_start() to determine whether we
got a valid link.
o Added experimental flow-control setup. It was commented out but
will be activated once we have flow-cotrol infrastructure in mii
layer.
o Simplify IFF_UP/IFCAP_POLLING and IFF_PROMISC handling logic. Rx
filter always honors promiscuous mode.
o Implemented suspend/resume methods.
o Reorganized Rx filter routine so promiscuous mode changes doesn't
require interface re-initialization.
o Reimplemnted driver probe routine such that it looks for matching
device from supported hardware list table. This change will help to
add newer hardware revision to the driver.
o Use ETHER_ADDR_LEN instead of hardcoded value.
o Prefer memory space register mapping over I/O space as the hardware
requires lots of register access to get various consumer/producer
index. Failing to get memory space mapping, sf(4) falls back to I/O
space mapping. Use of memory space register mapping requires
somewhat large memory space(512K), though.
o Switch to simpler bus_{read,write}_{1,2,4}.
o Use PCIR_BAR macro to get BARs.
o Program PCI cache line size if the cache line size was set to 0
and enable PCI MWI.
o Add a new sysctl node 'dev.sf.N.stats' that shows various MAC
counters for Rx/Tx statistics.
o Add a sysctl node to configure interrupt moderation timer. The
timer defers interrupts generation until time specified in timer
control register is expired. The value in the timer register is in
units of 102.4us. The allowable range for the timer is 0 - 31
(0 ~ 3.276ms).
The default value is 1(102.4us). Users can change the timer value
with dev.sf.N.int_mod sysctl(8) variable/loader(8) tunable.
o bus_dma(9) conversion
- Enable 64bit DMA addressing.
- Enable 64bit descriptor format support.
- Apply descriptor ring alignment requirements(256 bytes alignment).
- Apply Rx buffer address alignment requirements(4 bytes alignment).
- Apply 4GB boundary restrictions(Tx/Rx ring and its completion ring
should live in the same 4GB address space.)
- Set number of allowable number of DMA segments to 16. In fact,
AIC-6915 doesn't have a limit for number of DMA segments but it
would be waste of Tx descriptor resource if we allow more than 16.
- Rx/Tx side bus_dmamap_load_mbuf_sg(9) support.
- Added alignment fixup code for strict-alignment architectures.
- Added endianness support code in Tx/Rx descriptor access.
With these changes sf(4) should work on all platforms.
o Don't set if_mtu in device attach, it's handled in ether_ifattach.
o Use our own callout to drive watchdog timer.
o Enable VLAN oversized frames and announce sf(4)'s VLAN capability
to upper layer.
o In sf_detach(), remove mtx_initialized KASSERT as it's not possible
to get there without initialzing the mutex. Also mark that we're
about to detaching so active bpf listeners do not panic the system.
o To reduce PCI register access cycles, Rx completion ring is
directly scanned instead of reading consumer/producer index
registers. In theory, Tx completion ring also can be directly
scanned. However the completion ring is composed of two types
completion(1 for Tx done and 1 and DMA done). So reading producer
index via register access would be more safer way to detect the
ring wrap-around.
o In sf_rxeof(), don't use m_devget(9) to align recevied frames. The
alignment is required only for strict-alignment architectures and
now the alignment is handled by sf_fixup_rx() if required. The
removal of the copy operation in fast path should increase Rx
performance a lot on non-strict-alignemnt architectures such as
i386 and amd64.
o In sf_newbuf(), don't set descriptor valid bit as sf(4) is
programmed to run with normal mode. In normal mode, the valid bit
have no meaning. The valid bit should be used only when the
hardware uses polling(prefetch) mode. The end of descriptor queue
bit could be used if needed, but sf(4) relys on auto-wrapping of
hardware on 256 descriptor queue entries so both valid and
descriptor end bit are not used anymore.
o Don't disable generation of Tx DMA completion as said in datasheet
and use the Tx DMA completion entry instead of relying on Tx done
completion entry. Also added additional Tx completion entry type
check in Tx completion handler.
o Don't blindly reset watchdog timer in sf_txeof(). sf(4) now unarm
the the watchdog only if there are no active Tx descriptors in Tx
queue.
o Don't manually update various counters in driver, instead, use
built-in MAC statistic registers to update them. The statistic
registers are updated in every second.
o Modified Tx underrun handlers to increase the threshold value
in units of 256 bytes. Previously it used to increase 16 bytes
at a time which seems to take too long to stabalize whenever Tx
underrun occurrs.
o In interrupt handler, additional check for the interrupt is
performed such that interrupts only for this device is allowed to
process descriptor rings. Because reading SF_ISR register clears
all interrtups, nuke writing to a SF_ISR register.
o Tx underrun is abonormal condition and SF_ISR_ABNORMALINTR includes
the interrupt. So there is no need to inspect the Tx underrun again
in main interrupt loop.
o Don't blindly reinitialize hardware for abnormal interrupt
condition. sf(4) reintializes the hardware only when it encounters
DMA error which requires an explicit hardware reinitialization.
o Fix a long standing bug that incorrectly clears MAC statistic
registers in sf_init_locked.
o Added strict-alignment safe way of ethernet address reprogramming
as IF_LLADDR may return unaligned address.
o Move sf_reset() to sf_init_locked in order to always reset the
hardware to a known state prior to configuring hardware.
o Set default Rx DMA, Tx DMA paramters as shown in datasheet.
o Enable PCI busmaster logic and autopadding for VLAN frames.
o Rework sf_encap.
- Previously sf(4) used to type 0 of Tx descriptor with padding
enabled to store driver private data. Emebedding private data
structures into descriptors is bad idea as the structure size
would be different between 64bit and 32bit architectures. The
type 0 descriptor allows fixed number of DMA segments in
a descriptor format and provides relatively simple interface to
manage multi-fragmented frames.
However, it wastes lots of Tx descriptors as not all frames are
fragmented as the number of allowable segments in a descriptor.
- To overcome the limitation of type 0 descriptor, switch to type
2 descriptor which allows 64bit DMA addressing and can handle
unliumited number of fragmented DMA segments. The drawback of
type 2 descriptor is in its complexity in managing descriptors
as driver should handle the end of Tx ring manually.
- Manually set Tx desciptor queue end mark and record number of
used descriptors to reclaim used descriptors in sf_txeof().
o Rework sf_start.
- Honor link up/down state before attempting transmission.
- Because sf(4) uses only one of two Tx queues, use low priority
queue instead of high one. This will remove one shift operation
in each Tx kick command.
- Cache last produder index into softc such that subsequenet Tx
operation doesn't need to access producer index register.
o Rewrote sf_stats_update to include all available MAC statistic
counters.
o Employ AIC-6915 firmware from Adaptec and implement firmware
download routine and TCP/UDP checksum offload.
Partial checksum offload support was commented out due to the
possibility of firmware bug in RxGFP.
The firmware can strip VLAN tag in Rx path but the lack of firmware
assistance of VLAN tag insertion in transmit side made it useless
on FreeBSD. Unlike checksum offload, FreeBSD requires both Tx/Rx
hardware VLAN assistance capability. The firmware may also detect
wakeup frame and can wake system up from states other than D0.
However, the lack of wakeup support form D3cold state keep me from
adding WOL capability. Also detecting WOL frame requires firmware
support but it's not yet known to me whether the firmware can
process the WOL frame.
o Changed *_ADDR_HIADDR to *_ADDR_HI to match other definitions of
registers.
o Added definitioan to interrupt moderation related constants.
o Redefined SF_INTRS to include Tx DMA done and DMA errors. Removed
Tx done as it's not needed anymore.
o Added definition for Rx/Tx DMA high priority threshold.
o Nuked unused marco SF_IDX_LO, SF_IDX_HI.
o Added complete MAC statistic register definition.
o Modified sf_stats structure to hold all MAC statistic regiters.
o Nuke various driver private padding data in Tx/Rx descriptor
definition. sf(4) no longer requires private padding. Also remove
unused padding related definitions. This greatly simplifies
descriptor manipulation on 64bit architectures.
o Becase we no longer pad driver private data into descriptor,
remove deprecated/not-applicable comments for padding.
o Redefine Rx/Tx desciptor status. sf(4) doesn't use bit fileds
anymore to support endianness.
Tested by: bruffer (initial version)
be wrong but I couldn't find a way to make it work. In addition, the
number of TxGFP instruction does not match the firmware image size,
so I guess something was wrong when Adaptec generated the TxGFP
firmware from their DDK.
According to datasheet, normally, the first GFP instruction would be
opcode C, WaitForStartOfFrame, to synchronize checksumming with
incoming frame. But the first instruction in TxGFP firmware was
opcode 1, BrToImmIfTrue, so it could not process checksum correctly,
I guess. Checking for RxGFP firmware also indicates the first
instruction should be opcode C. Since the number of instructions in
TxGFP firmware lacks exactly one instruction, I prepended the opcode
C to TxGFP firmware image. With this change, the resulting image size
perfectly matches with the nummber of instructions and Tx checksum
offload seems to work without problems.
if_ioctl, if_watchdog, etc, or in functions that are used by
these methods only. In all other cases use device_printf().
This also fixes several panics, when if_printf() is called before
softc->ifp was initialized.
Submitted by: Alex Lyashkov <umka sevcity.net>
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
can be compiled as loadable modules.
Reviewed by: bde
o Axe poll in trap.
o Axe IFF_POLLING flag from if_flags.
o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.
o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.
o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.
Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.
Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.
Reviewed by: ru, sam, jhb
- Add locked variants of start, init, and ifmedia_upd.
- Use callout_* instead of timeout/untimeout.
- Don't recurse on the driver lock.
- Fixup locking in ioctl.
- Lock the driver lock in the ifmedia handlers rather than across
ifmedia_ioctl().
Tested by: brueffer
MFC after: 3 days
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
over iteration of their multicast address lists when synchronizing the
hardware address filter with the network stack-maintained list.
Problem reported by: Ed Maste (emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam