r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).
It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).
The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.
Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.
The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.
No other functional change intended.
Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
Changelist:
- Turn tx_rings and rx_rings arrays into arrays of pointers to kring
structs. This patch includes fixes for ixv, ixl, ix, re, cxgbe, iflib,
vtnet and ptnet drivers to cope with the change.
- Generalize the nm_config() callback to accept a struct containing many
parameters.
- Introduce NKR_FAKERING to support buffers sharing (used for netmap
pipes)
- Improved API for external VALE modules.
- Various bug fixes and improvements to the netmap memory allocator,
including support for externally (userspace) allocated memory.
- Refactoring of netmap pipes: now linked rings share the same netmap
buffers, with a separate set of kring pointers (rhead, rcur, rtail).
Buffer swapping does not need to happen anymore.
- Large refactoring of the control API towards an extensible solution;
the goal is to allow the addition of more commands and extension of
existing ones (with new options) without the need of hacks or the
risk of running out of configuration space.
A new NIOCCTRL ioctl has been added to handle all the requests of the
new control API, which cover all the functionalities so far supported.
The netmap API bumps from 11 to 12 with this patch. Full backward
compatibility is provided for the old control command (NIOCREGIF), by
means of a new netmap_legacy module. Many parts of the old netmap.h
header has now been moved to netmap_legacy.h (included by netmap.h).
Approved by: hrs (mentor)
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Initially, only tag files that use BSD 4-Clause "Original" license.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D13133
This is an RTL8168 chip, which we already support so all we have to do is add
the vendor ID.
PR: 212876
Submitted by: Tobias Kortkamp <t@tobik.me>
MFC after: 3 days
No functional change, only trivial cases are done in this sweep,
Drivers that can get further enhancements will be done independently.
Discussed in: freebsd-current
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131
accident with RTL8168G and later chips when the interface actually was
brought up. This is due to the fact that with these MAC variants, RXDV
gate needs be disabled for WOL to work. So do just that in re_setwol()
when IFCAP_WOL is requested.
Reported and tested by: dhw
MFC after: 3 days
before their initial configuration is done, it turns out that r281337
has the inverse effect on some older chips. Moreover, as with newer
chips before, two chips seemingly identical according to their MAC
revisions may behave differently in this regard, with most working
but a few not, making changes extremely hard to test.
Closer inspection of the corresponding Linux code suggests that RX
and TX should only be enabled after their initial configuration with
RTL8168G and later chips, i. e. RTL8106E{,US}, RTL8107E, as well as
RTL8168{EP,G,GU,H}, so limit the new code path to these. [1]
- Distinguish between RTL8168H and RTL8107E, with the latter being the
10/100-Mbit/s-only variant of the former.
- For MAC variants that can only do Fast Ethernet at a maximum, ensure
that we don't advertise Gigabit Ethernet speed.
- In re_stop(), do the inverse of re_init_locked() and enable RXDV
gate on RTL8168G and later chips again, matching what Linux does.
PR: 203422 [1]
MFC after: 1 week
This commit contains large contributions from Giuseppe Lettieri and
Stefano Garzarella, is partly supported by grants from Verisign and Cisco,
and brings in the following:
- fix zerocopy monitor ports and introduce copying monitor ports
(the latter are lower performance but give access to all traffic
in parallel with the application)
- exclusive open mode, useful to implement solutions that recover
from crashes of the main netmap client (suggested by Patrick Kelsey)
- revised memory allocator in preparation for the 'passthrough mode'
(ptnetmap) recently presented at bsdcan. ptnetmap is described in
S. Garzarella, G. Lettieri, L. Rizzo;
Virtual device passthrough for high speed VM networking,
ACM/IEEE ANCS 2015, Oakland (CA) May 2015
http://info.iet.unipi.it/~luigi/research.html
- fix rx CRC handing on ixl
- add module dependencies for netmap when building drivers as modules
- minor simplifications to device-specific routines (*txsync, *rxsync)
- general code cleanup (remove unused variables, introduce macros
to access rings and remove duplicate code,
Applications do not need to be recompiled, unless of course
they want to use the new features (monitors and exclusive open).
Those willing to try this code on stable/10 can just update the
sys/dev/netmap/*, sys/net/netmap* with the version in HEAD
and apply the small patches to individual device drivers.
MFC after: 1 month
Sponsored by: (partly) Verisign, Cisco
after setting up interrupt moderation but before turning interrupts on.
This matches what Realtek's r8168 Linux driver does as of version 8.039.00
and fixes problems with certain incarnations of certain MAC revisions
like the interface requiring an extra up/down-cycle after boot to start
working or DMA configuration not being adhered to.
PR: 193743, 197535
MFC after: 1 week
In particular, don't check the value of the bus_dma map against NULL
to determine if either bus_dmamem_alloc() or bus_dmamap_load() succeeded.
Instead, assume that bus_dmamap_load() succeeeded (and thus that
bus_dmamap_unload() should be called) if the bus address for a resource
is non-zero, and assume that bus_dmamem_alloc() succeeded (and thus
that bus_dmamem_free() should be called) if the virtual address for a
resource is not NULL.
In many cases these bugs could result in leaks when a driver was detached.
Reviewed by: yongari
MFC after: 2 weeks
Previously only TX IP checksum offloading was disabled but it's
reported that TX checksum offloading for UDP datagrams with IP
options also generates corrupted frames. Reporter's controller is
RTL8168CP but I guess RTL8168C also have the same issue since it
shall share the same core.
Reported and tested by: tuexen
driver as version 8.037.00 for RTL8168{E-VL,EP,F,G,GU} and RTL8111B. This
makes reception of packets work with the RTL8168G (HW rev. 0x4c000000) in
my Shuttle DS47.
- Consistently use RL_MSI_MESSAGES.
In joint forces with: yongari
MFC after: 5 days
This includes the following:
- use separate memory regions for VALE ports
- locking fixes
- some simplifications in the NIC-specific routines
- performance improvements for the VALE switch
- some new features in the pkt-gen test program
- documentation updates
There are small API changes that require programs to be recompiled
(NETMAP_API has been bumped so you will detect old binaries at runtime).
In particular:
- struct netmap_slot now is 16 bytes to support an extra pointer,
which may save one data copy when using VALE ports or VMs;
- the struct netmap_if has two extra fields;
MFC after: 3 days
RTL8168GU has two variants(GMII and MII) but it uses the same chip
revision id. Driver checks PCI device id of controller and
sets internal capability flag(i.e. jumbo frame and link speed down
in WOL).
H/W donated by: RealTek Semiconductor Corp.
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
them, please let me know if not). Most of these are of the form:
static const struct bzzt_type {
[...list of members...]
} const bzzt_devs[] = {
[...list of initializers...]
};
The second const is unnecessary, as arrays cannot be modified anyway,
and if the elements are const, the whole thing is const automatically
(e.g. it is placed in .rodata).
I have verified this does not change the binary output of a full kernel
build (except for build timestamps embedded in the object files).
Reviewed by: yongari, marius
MFC after: 1 week
#defines. This also has the advantage that it makes the names more
compact, iand also allows us to correct the non-uniform naming of
the PCIM_LINK_* defines, making them all consistent amongst themselves.
This is a mostly mechanical rename:
s/PCIR_EXPRESS_/PCIER_/g
s/PCIM_EXP_/PCIEM_/g
s/PCIM_LINK_/PCIEM_LINK_/g
When this is MFC'd, #defines will be added for the old names to assist
out-of-tree drivers.
Discussed with: jhb
MFC after: 1 week
extract a link status of PHY when parent driver is re(4).
RGEPHY_MII_SSR register does not seem to report correct PHY status
on some integrated PHYs used with re(4).
Unfortunately, RealTek PHYs have no additional information to
differentiate integrated PHYs from external ones so relying on PHY
model number is not enough to know that. However, it seems
RGEPHY_MII_SSR register exists for external RealTek PHYs so
checking parent driver would be good indication to know which PHY
was used. In other words, for non-re(4) controllers, the PHY is
external one and its revision number is greater than or equal to 2.
This change fixes intermittent link UP/DOWN messages reported on
RTL8169 controller.
Also, mii_attach(9) is tried after setting interface name since
rgephy(4) have to know parent driver name.
PR: kern/165509
USERSPACE:
1. add support for devices with different number of rx and tx queues;
2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);
3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.
4. update the manual page for the two changes above;
5. update sample applications in tools/tools/netmap
KERNEL:
1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';
2. simplify the functions that map kring<->nic ring indexes
3. normalize device-specific code, helps mainteinance;
4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.
Rx performance not investigated.
I am postponing the MFC so i can import a few more improvements
before merging.
RTL810x family , RTL8139 has different register map for Config
registers.
While here, follow the lead of re(4) in WOL configuration.
- Disable WOL_UCAST and WOL_MCAST capabilities by default.
- Config5 register write does not need to unlock EEPROM access
on RTL8139 family but unlocking EEPROM access does not affect
its operation and make it consistent with re(4).
Reported by: Matt Renzelmann mjr <> cs dot wisc dot edu
On my hardware, "em" in netmap mode does about 1.388 Mpps
on one card (on an Asus motherboard), and 1.1 Mpps on another
card (PCIe bus). Both seem to be NIC-limited, because
i have the same rate even with the CPU running at 150 MHz.
On the "re" driver the tx throughput is around 420-450 Kpps
on various (8111C and the like) chipsets. On the Rx side
performance seems much better, and i can receive the full
load generated by the "em" cards.
"igb" is untested as i don't have the hardware.