#defines. This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.
This is a mostly mechanical rename:
s/PCIR_EXPRESS_/PCIER_/g
s/PCIM_EXP_/PCIEM_/g
s/PCIM_LINK_/PCIEM_LINK_/g
When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.
Discussed with: jhb
MFC after: 1 week
the interface is brought up. Without this, the boot time interrupt
round-robin assignment does not think the allocated interrupt resources
are active and leaves them assigned to CPU 0.
While here, add descriptive tags to each interrupt handler when MSI-X
is used.
Reviewed by: np
MFC after: 1 week
evicted from the syncache but a later syncache_expand succeeds because
of syncookies. The TOE driver has to resort to more direct means to
install its hooks in the socket in this case.
the TOE driver reports that an active open failed. toe_connect_failed is
supposed to handle this but it should be provided the inpcb instead of the
tcpcb which may no longer be around.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.
Build-tested with make universe.
30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe
Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
Allow LRO to work on IPv6 as well.
Fix the module Makefile to at least properly inlcude opt_inet6.h
and allow builds without INET or INET6.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
Significantly update tcp_lro for mostly two things:
1) introduce basic support for IPv6 without extension headers.
2) try hard to also get the incremental checksum updates right,
especially also in the IPv4 case for the IP and TCP header.
Move variables around for better locality, factor things out into
functions, allow checksum updates to be compiled out, ...
Leave a few comments on further things to look at in the future,
though that is not the full list.
Update drivers with appropriate #includes as needed for IPv6 data
type in LRO.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days
If an IPv6 packet has extension headers the kernel needs to deal with it
itself. For the rest it can set various CSUM_XXX flags and the driver
will act on them.
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
Changes since 7.8.0 (from the official changelog):
- Fixed sporadic interrupt generation for associated CQ when processing
a local invalidate work request
- Changes to core scheduling to avoid starving requests from the host
under heavy RDMA Read Request load (e.g. packets to the wire)
- Programmed the tp tx resource limiter in function of the traffic (only
affects iWarp)
- Increased the egress NIC gather list length from 36 to 46 entries
MFC after: 1 week
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
reading. (This was already done for writing to a sysctl). This
requires all SYSCTL setups to specify a type. Most of them are now
checked at compile-time.
Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).
Succested by: bde
condition in proc_rwmem() and to (2) simplify the implementation of the
cxgb driver's vm_fault_hold_user_pages(). Specifically, in proc_rwmem()
the requested read or write could fail because the targeted page could be
reclaimed between the calls to vm_fault() and vm_page_hold().
In collaboration with: kib@
MFC after: 6 weeks
There is no need to use an atomic operation at structure initialization
time.
Note that the file changed is not connected to the build at this time.
Reviewed by: jhb (general issue)
Approved by: np
MFC after: 2 weeks
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.
Reviewed by: phk (original patch)
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.
Requested by: nwhitehorn
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.
Reviewed by: phk
10G cards. 1G cards are x4 only.
- Use constants from pcireg.h for reading the current link width.
- Use pci_set_max_read_req() rather than implementing it by hand.
Reviewed by: np
MFC after: 1 week
- Run the adapter's tick at 1Hz and remove link state checks from it.
Instead, have each port check its link state. Delay the check so that
it takes place slightly after the driver is notified of a change in
link state. This is a cheap way to debounce these notifications if
many are received in rapid succession. POLL_LINK_1ST_TIME flag can
also be eliminated as a side effect of these changes.
- Do not reset the PHY when link goes down.
- Clear port's link_fault flag if the PHY indicates link is down.
- get_link_status_r should leave speed and duplex alone when link is down.
MFC after: 1 month
The T3 ASIC can provide an incoming packet's timestamp instead of its RSS hash.
The timestamp is just a counter running off the card's clock. With a 175MHz
clock an increment represents ~5.7ns and the 32 bit value wraps around in ~25s.
# sysctl -d dev.cxgbc.0.pkt_timestamp
dev.cxgbc.0.pkt_timestamp: provide packet timestamp instead of connection hash
# sysctl -d dev.cxgbc.0.core_clock
dev.cxgbc.0.core_clock: core clock frequency (in KHz)
# sysctl dev.cxgbc.0.core_clock
dev.cxgbc.0.core_clock: 175000
L2/3/4 headers and can drop or steer packets as instructed. Filtering
based on src ip, dst ip, src port, dst port, 802.1q, udp/tcp, and mac
addr is possible. Add support in cxgbtool to program these filters.
Some simple examples:
Drop all tcp/80 traffic coming from the subnet specified.
# cxgbtool cxgb2 filter 0 sip 192.168.1.0/24 dport 80 type tcp action drop
Steer all incoming UDP traffic to qset 0.
# cxgbtool cxgb2 filter 1 type udp queue 0 action pass
Steer all tcp traffic from 192.168.1.1 to qset 1.
# cxgbtool cxgb2 filter 2 sip 192.168.1.1 type tcp queue 1 action pass
Drop fragments.
# cxgbtool cxgb2 filter 3 type frag action drop
List all filters.
# cxgbtool cxgb2 filter list
index SIP DIP sport dport VLAN PRI P/MAC type Q
0 192.168.1.0/24 0.0.0.0 * 80 0 0/1 */* tcp -
1 0.0.0.0/0 0.0.0.0 * * 0 0/1 */* udp 0
2 192.168.1.1/32 0.0.0.0 * * 0 0/1 */* tcp 1
3 0.0.0.0/0 0.0.0.0 * * 0 0/1 */* frag -
16367 0.0.0.0/0 0.0.0.0 * * 0 0/1 */* * *
MFC after: 2 weeks
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.
MFC after: 1 month
- Only the tunnelq (TXQ_ETH) requires a buf_ring, an ifq, and the watchdog/timer
callouts. Do not allocate these for the other tx queues.
- Use 16k jumbo clusters only on offload capable cards by default.
- Do not allocate a full tx ring for the offload queue if the card is not
offload capable.
- Slightly better freelist size calculation.
- Fix nmbjumbo4 typo, remove unneeded global variables.
MFC after: 3 days
been around for a long time now (7.1-ish or even earlier); assume
they are present. These includes MSI, TSO, LRO, VLAN, INTR_FILTERS,
FIRMWARE, etc.
Also, eliminate some dead code and clean up in other places as part
of this quick once-over.
MFC after: 1 week
race as it could already have been tx'd and freed by that time. Place
the bpf tap just _before_ writing the gen bit.
This fixes a panic when running tcpdump on a cxgb interface.
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq
Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks
the leading underscores since they are now implemented.
- Implement the tcpi_rto and tcpi_last_data_recv fields in the tcp_info
structure.
Reviewed by: rwatson
MFC after: 2 weeks
- support for the new Gen-2, BT, and LP-CR cards.
- T3 firmware 7.7.0
- shared "common code" updates.
Approved by: gnn (mentor)
Obtained from: Chelsio
MFC after: 1 month
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.
Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.
The following modules are affected by this change:
if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf
In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.
Reviewed by: bz
Approved by: re (kib)
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.
Reviewed by: bz
Approved by: re (vimage blanket)
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
check missed this because cxgb's TOM is currently commented out of the build
system.
Submitted by: Navdeep Parhar <np at FreeBSD dot org>
Approved by: re (kensmith), kensmith (mentor temporarily unavailable)
the TCP syncache. This returns struct tcpopt to being private within the TCP
implementation, thus allowing it to be modified without ABI concerns.
The patch breaks the ABI. Bump __FreeBSD_version to 800103 accordingly. The cxgb
driver is the only TOE consumer affected by this change, and needs to be
recompiled along with the kernel.
Suggested by: rwatson
Reviewed by: rwatson, kmacy
Approved by: re (kensmith), kensmith (mentor temporarily unavailable)
- build ifmedia list based on phy->caps, not string comparisons.
- rebuild media list when a transceiver change is detected.
- return EOPNOTSUPP instead of ENXIO in cxgb_media_status.
Approved by: gnn (mentor)
MFC after: 2 weeks.
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.
MFC after: 3 weeks