r200526:
Use PCIR_BAR instead of hard-coded value.
r200527:
Fix typo in register definition.
r200529:
Clear VGE_TXDESC_Q bit for transmitted frames. The VGE_TXDESC_Q bit
seems to work like a tag that indicates 'not list end' of queued
frames. Without having a VGE_TXDESC_Q bit indicates 'list end'. So
the last frame of multiple queued frames has no VGE_TXDESC_Q bit.
The hardware has peculiar behavior for VGE_TXDESC_Q bit handling.
If the VGE_TXDESC_Q bit of descriptor was set the controller would
fetch next descriptor. However if next descriptor's OWN bit was
cleared but VGE_TXDESC_Q was set, it could confuse controller.
Clearing VGE_TXDESC_Q bit for transmitted frames ensure correct
behavior.
r200531:
Use ANSI function definations.
r200532:
Remove unnecessary return statement.
r200533:
s/u_intXX_t/uintXX_t/g
r200536:
style(9).
Overhaul bus_dma(9) usage and fix various things.
o Separate TX/RX buffer DMA tag from TX/RX descriptor ring DMA tag.
o Separate RX buffer DMA tag from common buffer DMA tag. RX DMA
tag has different restriction compared to TX DMA tag.
o Add 40bit DMA address support.
o Adjust TX/RX descriptor ring alignment to 64 bytes from 256
bytes as documented in datasheet.
o Added check to ensure TX/RX ring reside within a 4GB boundary.
Since TX/RX ring shares the same high address register they
should have the same high address.
o TX/RX side bus_dmamap_load_mbuf_sg(9) support.
o Add lock assertion to vge_setmulti().
o Add RX spare DMA map to recover from DMA map load failure.
o Add optimized RX buffer handler, vge_discard_rxbuf which is
activated when vge(4) sees bad frames.
o Don't blindly update VGE_RXDESC_RESIDUECNT register. Datasheet
says the register should be updated only when number of
available RX descriptors are multiple of 4.
o Use __NO_STRICT_ALIGNMENT instead of defining VGE_FIXUP_RX which
is only set for i386 architecture. Previously vge(4) also
performed expensive copy operation to align IP header on amd64.
This change should give RX performance boost on amd64
architecture.
o Don't reinitialize controller if driver is already running. This
should reduce number of link state flipping.
o Since vge(4) drops a driver lock before passing received frame
to upper layer, make sure vge(4) is still running after
re-acquiring driver lock.
o Add second argument count to vge_rxeof(). The argument will
limit number of packets could be processed in RX handler.
o Rearrange vge_rxeof() not to allocate RX buffer if received
frame was bad packet.
o Removed if_printf that prints DMA map failure. This type of
message shouldn't be used in fast path of driver.
o Reduce number of allowed TX buffer fragments to 6 from 7. A TX
descriptor allows 7 fragments of a frame. However the CMZ field
of descriptor has just 3bits and the controller wants to see
fragment + 1 in the field. So if we have 7 fragments the field
value would be 0 which seems to cause unexpected results under
certain conditions. This change should fix occasional TX hang
observed on vge(4).
o Simplify vge_stat_locked() and add number of available TX
descriptor check.
o vge(4) controllers lack padding short frames. Make sure to fill
zero for the padded bytes. This closes unintended information
disclosure.
o Don't set VGE_TDCTL_JUMBO flag. Datasheet is not clear whether
this bit should be set by driver or write-back status bit after
transmission. At least vendor's driver does not set this bit so
remove it. Without this bit vge(4) still can send jumbo frames.
o Don't start driver when vge(4) know there are not enough RX
buffers.
o Remove volatile keyword in RX descriptor structure. This should
be handled by bus_dma(9).
o Collapse two 16bits member of TX/RX descriptor into single 32bits
member.
o Reduce number of RX descriptors to 252 from 256. The
VGE_RXDESCNUM is 16bits register but only lower 8bits are valid.
So the maximum number of RX descriptors would be 255. However
the number of should be multiple of 4 as controller wants to
update 4 RX descriptors at a time. This limits the maximum
number of RX descriptor to be 252.
PR: kern/141276, kern/141414
Partial merge r198987:
Use device_printf() and if_printf() instead of printf() with an explicit
unit number and remove 'unit' members from softc.
Partial merge r199414:
Use the bus_*() routines rather than bus_space_*() for register operations.
r199543:
Several fixes to this driver:
- Overhaul the locking to avoid recursion and add missing locking in a few
places.
- Don't schedule a task to call vge_start() from contexts that are safe to
call vge_start() directly. Just invoke the routine directly instead
(this is what all of the other NIC drivers I am familiar with do). Note
that vge(4) does not use an interrupt filter handler which is the primary
reason some other drivers use tasks.
- Add a new private timer to drive the watchdog timer instead of using
if_watchdog and if_timer.
- Fixup detach by calling ether_ifdetach() before stopping the interface.
r200422:
Remove driver lock assertion in MII register access. This change
was made in r199543 to remove MTX_RECURSE. These routines can be
called in device attach phase(e.g. mii_phy_probe()) so checking
assertion here is not right as caller does not hold a driver lock.
Modify the experimental server so that it uses VOP_ACCESSX().
This is necessary in order to enable NFSv4 ACL support. The
argument to nfsvno_accchk() was changed to an accmode_t and
the function nfsrv_aclaccess() was no longer needed and,
therefore, deleted.
Reviewed by: trasz
Set the locally-assigned bit in the randomly generated Ethernet address
if we end up having to generate one.
PR: kern/133239
Discussed with: yongari
Approved by: ed (mentor, implicit)
---snip---
Prevent paging pressure from draining arc too much
- always drain arc if above arc_c_max - never drain arc if arc is below
arc_c_max
---snip---
after r198460 but was missed.
Note: that 800108 should have been 800501 with that but as there is no
functional problem here, it'll just stay as is. [1]
This will make pkg_add -r use packages-8-stable for stable/8 rather
than packages-8.0-release.
Reported by: Paride Legovini (pl ninthfloor.org) on stable@,
(pluknet gmail.com), jhb
Discussed with: rwatson [1]
Use ALLOW_NEW_SOURCES and BLOCK_OLD_SOURCES to signal a join or leave
with SSM MLDv2 by default.
This is current practice and complies with RFC 4604, as well as being
required by production IPv6 networks in Japan.
The behaviour may be disabled by setting the net.inet6.mld.use_allow
sysctl/tunable to 0.
Requested by: Hideki Yamamoto, dikshie
r200088:
Add workaround to overcome hardware limitation which allows only a
single outstanding DMA read operation. Most controllers targeted to
client with PCIe bus interface(e.g. BCM5761) may have this
limitation. All controllers for servers does not have this
limitation.
Collapsing mbuf chains to reduce number of memory reads before
transmitting was most effective way to workaround this. I got about
940Mbps from 850Mbps with mbuf collapsing on BCM5761. However it
takes a lot of CPU cycles to collapse mbuf chains so add tunable to
control the number of allowed TX buffers before collapsing. The
default value is 0 which effectively disables the forced collapsing.
For most cases 2 would yield best performance(about 930Mbps)
without much sacrificing CPU cycles.
Note the collapsing is only activated when the controller is on
PCIe bus and the frame does not need TSO operation. TSO does not
seem to suffer from the hardware limitation because the payload
size is much bigger than normal IP datagram.
Thanks to davidch@ who told me the limitation of client controllers
and actually gave possible workarounds to mitigate the limitation.
r200227:
Remove PHY isolate/power down code in bge_stop(). The isolation
handler in brgphy(4) does not exist and brgphy(4) just resets the
PHY and returns EINVAL as it has no isolation handler. I also agree
on Marius's opinion that stop handler of every NIC driver seems to
be the wrong place for implementing PHY isolate/power down.
If we need PHY isolate/power down it should be implemented in
brgphy(4) and users should administratively down the PHY.
r200228:
Don't access jumbo frame related registers if controller lacks the
feature. These registers are reserved on controllers that have no
support for jumbo frame.
Only BCM5700 has mini ring so do not poke mini ring related
registers if controller is not BCM5700.
r200246:
Partially revert r200228. For mini RCB case, bge(4) still have to
disable mini ring withtout regard to mini ring support.
r200264:
Create sysctl node(dev.bge.%d.focred_collapse) instead of
hw.bge.forced_collapse. hw.bge.forced_collapse affects all bge(4)
controllers on system which may not desirable behavior of the
sysctl node. Also allow the sysctl node could be modified at any
time.
r201446:
Fix regression introduced in r198318. BCM5754/BCM5754M uses the
same ASIC ID of BCM5758 such that r198318 incorecctly enabled TSO
on BCM5754.BCM5754M controllers. BCM5754/BCM5754M needs a special
firmware to enable TSO and bge(4) does not support firmware based
TSO.
r199670:
Fix two long standing bugs on bge(4). Most pre BCM5755 controllers
have a DMA bug when buffer address crosses a multiple of the 4GB
boundary(e.g. 4GB, 8GB, 12GB etc). Limit DMA address to be within
4GB address for these controllers. The second DMA bug limits DMA
address to be within 40bit address space. This bug applies to
BCM5714 and BCM5715 and 5708(bce(4) controller). This is not
actually a MAC controller bug but an issue with the embedded PCIe
to PCI-X bridge in the device. So for BCM5714/BCM5715 controllers
also limit the DMA address to be within 40bit address space.
Special thanks to davidch@ who gave me detailed errata information.
I think this change will fix long standing bge(4) instability
issues on systems with more than 4GB memory.
r199671:
Implement TSO for BCM5755 or newer controllers. Some controllers
seem to require a special firmware to use TSO. But the firmware is
not available to FreeBSD and Linux claims that the TSO performed by
the firmware is slower than hardware based TSO. Moreover the
firmware based TSO has one known bug which can't handle TSO if
ethernet header + IP/TCP header is greater than 80 bytes. The
workaround for the TSO bug exist but it seems it's too expensive
than not using TSO at all. Some hardwares also have the TSO bug so
limit the TSO to the controllers that are not affected TSO issues
(e.g. 5755 or higher).
While I'm here set VLAN tag bit to all descriptors that belengs to
a frame instead of the first descriptor of a frame. The datasheet
is not clear how to handle VLAN tag bit but it worked either way in
my testing. This makes it simplify TSO configuration a little bit.
Big thanks to davidch@ who sent me detailed TSO information.
Without this I was not able to implement it.
r199674:
Add missing function prototype in r199671.
r199679:
Reduce status block size DMAed by controller. bge(4) uses single
Tx/Rx/Rx return ring such that large part of status block was not
used at all. All bge(4) controllers except BCM5700 AX/BX has a
feature to control the size of status block. So use minimum status
block size allowed in controller. This reduces number of DMAed
status block size to 32 bytes from 80 bytes.
r199761:
BGE_FLAG_40BIT_BUG should be set before creating DMA tags.
r199807:
Make sure one shot MSI is enabled.
r199808:
Fix typo which inversed the logic which in turn disabled MSI.
r199667:
Cache Rx producer/Tx consumer index as soon as we know status block
update and then clear status block. Previously it used to access
these index without synchronization which may cause problems when
bounce buffers are used. Also add missing bus_dmamap_sync(9) in
polling handler. Since we now update status block in driver, adjust
bus_dmamap_sync(9) for status block.
r199668:
For MSI case, interrupt is not shared and we don't need to force
PCI flush to get correct status block update. Add an optimized
interrupt handler that is activated for MSI case. Actual interrupt
handling is done by taskqueue such that the handler does not
require driver lock for Rx path. The MSI capable bge(4) controllers
automatically disables further interrupt once it enters interrupt
state so we don't need PIO access to disable interrupt in interrupt
handler.
r199663:
Due to newly added PCIe capabilities fallback code for finding the
PCIe capability did not work right on recent controllers. Remove
FreeBSD 6.x support code.
r199664:
Use capability pointer to access PCIe registers rather than
directly access them at fixed address. While I'm here don't touch
other bits of PCIe device control register except max payload size.
r199665:
Controller does not write Rx descriptors, remove BUS_DMASYNC_PREREAD.
r199666:
Rearrange bge_start_locked to see we can send more frames by
checking IFF_DRV_RUNNING and IFF_DRV_OACTIVE flags. Also if we
have less than 16 free send BDs set IFF_DRV_OACTIVE and try it
later. Previously bge(4) used to reserve 16 free send BDs after
loading dma maps but hardware just need one reserved send BD. If
prouder index has the same value of consumer index it means the Tx
queue is empty.
While I'm here check IFQ_DRV_IS_EMPTY first to save one lock
operation.
r199065:
Correct disabling checksum offloading for BCM5700 B0.
r199115:
Add missing bus_dmamap_sync(9) before issuing kick command.
r199116:
Zero out Tx/Rx descriptors before using them. Also add missing
bus_dmamap_sync(9) after Tx descriptor initialization.
r199153:
Controller does not update Tx descriptors(send BDs) after sending
frames so remove unnecessary BUS_DMASYNC_PREREAD and
BUS_DMASYNC_POSTREAD of bus_dmamap_sync(9).
r199661:
Remove extra white space.
r199662:
Fix typo introduced in r199011.
r198967:
Correct MSI mode register bits.
r199009:
bge(4) already switched to use UMA backed page allocator and local
memory allocator for jumbo frame was removed long time ago. Remove
no more used macros.
r199010:
Do bus_dmamap_sync call only if frame size is greater than
standard buffer size. If controller is not capable of handling
jumbo frame, interface MTU couldn't be larger than standard MTU
which in turn the received should be fit in standard buffer. This
fixes bus_dmamap_sync call for jumbo ring is called even if
interface is configured to use standard MTU.
Also if total frame size could be fit into standard buffer don't
use jumbo buffers.
r199011:
Reimplement Rx buffer allocation to handle dma map load failure.
Introduce two spare dma maps for standard buffer and jumbo buffer
respectively. If loading a dma map failed reuse previously loaded
dma map. This should fix unloaded dma map is used in case of dma
map load failure. Also don't blindly unload dma map and defer
dma map sync and unloading operation until we know dma map for new
buffer is successfully loaded. This change saves unnecessary dma
load/unload operation. Previously bge(4) tried to reuse mbuf
with unloaded dma map which is really bad thing in bus_dma(9)
perspective.
While I'm here update if_iqdrops if we can't allocate Rx buffers.
r199014:
Fix I mssied in r199011. Rx ring index also should be updated.
If we fill Rx ring full instead of half we can simplify this logic
but this requires more experimentation.
r199020:
Tell upper layer we support long frames. ether_ifattach()
initializes it to ETHER_HDR_LEN so we have to override it after
calling ether_ifattch().
While I'm here remove setting if_mtu value, it's initialized in
ether_ifattach().
r199035:
Don't count input errors twice, we always read input errors from
MAC in bge_tick. Previously it used to show more number of input
errors. I noticed actual input errors were less than 8% even for
64 bytes UDP frames generated by netperf.
Since we always access BGE_RXLP_LOCSTAT_IFIN_DROPS register in
bge_tick, remove useless code protected by #ifdef notyet.
r199036:
Count number of inbound packets which were chosen to be discarded
as input errors. Also count out of receive BDs as input errors.
r199054:
Partially revert r199035.
Revision 1.158 says only lower ten bits of
BGE_RXLP_LOCSTAT_IFIN_DROPS register is valid. For BCM5761 case it
seems the controller maintains 16bits value for the register.
However 16bits are still too small to count all dropped packets
happened in a second. To get a correct counter we have to read the
register in bge_rxeof() which would be too expensive.
r198923:
Use correct dma tag for jumbo buffer.
r198924:
Covert bge_newbuf_std to use bus_dmamap_load_mbuf_sg(9). Note,
bge_newbuf_std still has a bug for handling dma map load failure
under high network load. Just reusing mbuf is not enough as driver
already unloaded the dma map of the mbuf. Graceful recovery needs
more work.
Ideally we can just update dma address part of a Rx descriptor
because the controller never overwrite the Rx descriptor. This
requires some Rx initialization code changes and it would be done
later after fixing other incorrect bus_dma(9) usages.
r198927:
Remove common DMA tag used for TX/RX mbufs and create Tx DMA tag
and Rx DMA tag separately. Previously it used a common mbuf DMA tag
for both Tx and Rx path but Rx buffer(standard ring case) should
have a single DMA segment and maximum buffer size of the segment
should be less than or equal to MCLBYTES. This change also make it
possible to add TSO with minor changes.
r198928:
Make bge_newbuf_std()/bge_newbuf_jumbo() returns actual error code
for buffer allocation. If driver know we are out of Rx buffers let
controller stop. This should fix panic when interface is run even
if it had no configured Rx buffers.
Support the tablet in (at least) the Toshiba Portege M200 Tablet PC.
This device only appears on the ACPI bus, so isn't caught by the current
entry for it in the uart(4) ISA attachment.
PR: kern/140172
Reviewed by: jhb, marcel
Approved by: ed (mentor, implicit)
- Try pre-allocating all FIBs upfront. Previously we tried pre-allocating
128 FIBs first and allocated more later if necessary. Remove now unused
definitions from the header file[1].
- Force sequential bus scanning. It seems parallel scanning is in fact
slower and causes more harm than good[1]. Adjust a comment to reflect that.
Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.
Multiple IPv6 addresses of the same prefix can be installed on the
same interface. The first address will install the prefix route into
the kernel routing table and that prefix will be marked as on-link.
Without RADIX_MPATH enabled, the other address aliases of the same
prefix will update the prefix reference count but no other routes
will be installed. Consequently the prefixes associated with these
addresses would not be marked as on-link. As such, incoming packets
destined to these address aliases will fail the ND6 on-link check
on input. This patch fixes the above problem by searching the kernel
routing table and try to find an on-link prefix on the given interface.
r201282
-------
The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.
r201543
-------
The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.
PR: ports/141342, kern/141134
Change vlan interfaces to cope more usefully with the parent interface being
renamed. Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed. Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
renamed. This flag can be checked in ifnet departure/arrival event
handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
ignore departure events due to a trunk interface being renamed.
- Rename the __tcpi_(snd|rcv)_mss fields of the tcp_info structure to remove
the leading underscores since they are now implemented.
- Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info
structure.
As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.
Make sure the multicast forwarding cache entry's stall queue is properly
initialized before trying to insert an entry into it.
PR: kern/142052
Reviewed by: bms
File flags handling fixes for ext2fs:
- Disallow setting of flags not supported by ext2fs.
- Map EXT2_APPEND_FL to SF_APPEND.
- Map EXT2_IMMUTABLE_FL to SF_IMMUTABLE.
- Map EXT2_NODUMP_FL to UF_NODUMP.
Note that ext2fs doesn't support user settable append and immutable flags.
EXT2_NODUMP_FL is an user settable flag also on Linux.
PR: kern/122047
Approved by: trasz (mentor)
Apply fix for Solaris bug 6764159: restore_object() makes a call
that can block while having a tx open but not yet committed
(onnv revision 7994)
Submitted by: mm
Approved by: pjd
Obtained from: OpenSolaris
Print backspaces after echoing an EOF.
Applications like shells expect EOF to give no graphical output, while
our implementation prints ^D by default (tunable with stty echoctl).
Make the new implementation behave like the old TTY code. Print two
backspaces afterwards.
I totally forgot to MFC this, because the 8.0 freeze took a little
longer than I expected.
Reminded by: koitsu