Commit Graph

77 Commits

Author SHA1 Message Date
Mateusz Guzik
068dbf361a virtio: clean up empty lines in .c and .h files 2020-09-01 21:31:26 +00:00
Vincenzo Maffione
ef6fdb3312 if_vtnet: let vtnet_rx_vq_intr() and vtnet_rxq_tq_intr() share code
Since the two functions are similar, introduce a common function
(vtnet_rx_vq_process()) to share common code.
This also improves locking, by ensuring vrxs_rescheduled is accessed
under the RXQ lock, and taskqueue_enqueue() is not called under the
lock (therefore avoiding a spurious duplicate lock warning).

Reported by:	jrtc27
MFC after:	2 weeks
2020-06-15 19:46:34 +00:00
Jessica Clarke
576b099a5f vtnet: Fix regression introduced in r361944
For legacy devices that don't support MrgRxBuf (such as bhyve pre-r358180),
r361944 failed to update the receive handler to account for the additional
padding introduced by the unused num_buffers field that is now always present
in struct vtnet_rx_header. Thus, calculate the padding dynamically based on
vtnet_hdr_size.

PR:		247242
Reported by:	thj
Tested by:	thj
2020-06-14 22:39:34 +00:00
Vincenzo Maffione
16f224b5f8 netmap: vtnet: fix races in vtnet_netmap_reg()
The nm_register callback needs to call nm_set_native_flags()
or nm_clear_native_flags() once the device has been stopped.
However, in the current implementation this is not true,
as the device is stopped by vtnet_init_locked(). This causes
race conditions where the driver crashes as soon as it
dequeues netmap buffers assuming they are mbufs (or the other
way around).
To fix the issue, we extend vtnet_init_locked() with a second
argument that, if not zero, will set/clear the netmap flags.
This results in a huge simplification of the nm_register
callback itself.
Also, use netmap_reset() to check if a ring is going to be
re-initialized in netmap mode.

MFC after:	1 week
2020-06-14 20:47:31 +00:00
Vincenzo Maffione
6682323732 netmap: introduce netmap_kring_on()
This function returns NULL if the ring identified by
queue id and direction is in netmap mode. Otherwise
return the corresponding kring.
Use this function to replace vtnet_netmap_queue_on().

MFC after:	1 week
2020-06-11 20:35:28 +00:00
Jessica Clarke
8c3988dff9 virtio: Support non-legacy network device and queue
The non-legacy interface always defines num_buffers in the header,
regardless of whether VIRTIO_NET_F_MRG_RXBUF, just leaving it unused. We
also need to ensure our virtqueue doesn't filter out VIRTIO_F_VERSION_1
during negotiation, as it supports non-legacy transports just fine. This
fixes network packet transmission on TinyEMU.

Reviewed by:	br, brooks (mentor), jhb (mentor)
Approved by:	br, brooks (mentor), jhb (mentor)
Differential Revision:	https://reviews.freebsd.org/D25132
2020-06-08 21:51:36 +00:00
Vincenzo Maffione
2d769e25b1 netmap: vtnet: add vtnrx_nm_refill index to receive queues
The new index tracks the next netmap slot that is going
to be enqueued into the virtqueue. The index is necessary
to prevent the receive VQ and the netmap rx ring from going
out of sync, considering that we never enqueue N slots, but
at most N-1. This change fixes a bug that causes the VQ
and the netmap ring to go out of sync after N-1 packets
have been received.

MFC after:	1 week
2020-06-03 17:42:17 +00:00
Vincenzo Maffione
f0d8d352c0 netmap: vtnet: call netmap_rx_irq() under VQ lock
The netmap_rx_irq() function normally wakes up user-space threads
waiting for more packets. In this case, it is not necessary to
call it under the driver queue lock. However, if the interface is
attached to a VALE switch, netmap_rx_irq() ends up calling rxsync
on the interface (see netmap_bwrap_intr_notify()). Although
concurrent rxsyncs are serialized through the kring lock
(see nm_kr_tryget()), the lock acquire operation is not blocking.
As a result, it may happen that netmap_rx_irq() is called on
an RX ring while another instance is running, causing the
second call to fail, and received packets stall in the receive VQ.
We fix this issue by calling netmap_irx_irq() under the VQ lock.

MFC after:	1 week
2020-06-03 05:27:29 +00:00
Vincenzo Maffione
1b89d00bd4 netmap: vtnet: honor NM_IRQ_RESCHED
The netmap_rx_irq() function may return NM_IRQ_RESCHED to inform the
driver that more work is pending, and that netmap expects netmap_rx_irq()
to be called again as soon as possible.
This change implements this behaviour in the vtnet driver.

MFC after:	1 week
2020-06-03 05:09:33 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
e87c494015 Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by:		hselasky, jeff, bz, gallatin
Differential Revision:	https://reviews.freebsd.org/D23674
2020-02-24 21:07:30 +00:00
Gleb Smirnoff
6c3e93cb5a Use NET_TASK_INIT() and NET_GROUPTASK_INIT() for drivers that process
incoming packets in taskqueue context.

Reviewed by:	hselasky
Differential Revision:	https://reviews.freebsd.org/D23518
2020-02-11 18:57:07 +00:00
Gleb Smirnoff
629667a148 Pacify gcc.
Reported by:	rlibby
2020-01-11 20:07:30 +00:00
Gleb Smirnoff
ed6cbf4805 Add pfil(9) hook to vtnet(4).
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.

However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
2020-01-10 21:22:03 +00:00
Kristof Provost
29bfe2102d vtnet: Pre-allocate debugnet data immediately
Don't wait until the vtnet_debugnet_init() call happens, because at that
point we might already have allocated something from
vtnet_tx_header_zone.

Some systems showed this panic:

        vtnet0: link state changed to UP
        panic: keg vtnet_tx_hdr initialization after use.
        cpuid = 5
        time = 1578427700
        KDB: stack backtrace:
        db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004db427f0
        vpanic() at vpanic+0x17e/frame 0xfffffe004db42850
        panic() at panic+0x43/frame 0xfffffe004db428b0
        uma_zone_reserve() at uma_zone_reserve+0xf6/frame 0xfffffe004db428f0
        vtnet_debugnet_init() at vtnet_debugnet_init+0x77/frame 0xfffffe004db42930
        debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x42/frame 0xfffffe004db42980
        do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe004db429d0
        taskqueue_run_locked() at taskqueue_run_locked+0x178/frame 0xfffffe004db42a30
        taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe004db42a50
        ithread_loop() at ithread_loop+0x1d6/frame 0xfffffe004db42ab0
        fork_exit() at fork_exit+0x80/frame 0xfffffe004db42af0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db42af0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        KDB: enter: panic
        [ thread pid 12 tid 100011 ]
        Stopped at      kdb_enter+0x37: movq    $0,0x1084eb6(%rip)
        db>

Reviewed by:	cem, markj
Differential Revision:	https://reviews.freebsd.org/D23073
2020-01-08 10:06:32 +00:00
Gleb Smirnoff
7dce56596f Convert to if_foreach_llmaddr() KPI. 2019-10-21 17:59:02 +00:00
Vincenzo Maffione
f8bc74e2f4 tap: add support for virtio-net offloads
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.

Reviewed by:	kevans
Differential Revision:	https://reviews.freebsd.org/D21263
2019-10-18 21:53:27 +00:00
Conrad Meyer
7790c8c199 Split out a more generic debugnet(4) from netdump(4)
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport.  It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4).  Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c.  UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c.  The
separation is not perfect and future improvement is welcome.  Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry.  I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking.  Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time.  If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark.  Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by:	markj (earlier version)
Some discussion with:	emaste, jhb
Objection from:	marius
Differential Revision:	https://reviews.freebsd.org/D21421
2019-10-17 16:23:03 +00:00
Conrad Meyer
0f6040f03e virtio(4): Add PNP match metadata for virtio devices
Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit.  Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type.  The same
matching criteria are retained exactly, but is now also available to
devmatch(8).

Reviewed by:	bryanv, markj; imp (earlier version)
Differential Revision:	https://reviews.freebsd.org/D20407
2019-06-04 02:37:11 +00:00
Michael Tuexen
132ea9f2ad Remove non-functional SCTP checksum offload support for virtio.
Checksum offloading for SCTP is not currently specified for virtio.
If the hypervisor announces checksum offloading support, it means TCP
and UDP checksum offload. If an SCTP packet is sent and the host announced
checksum offload support, the hypervisor inserts the IP checksum (16-bit)
at the correct offset, but this is not the right checksum, which is a CRC32c.
This results in all outgoing packets having the wrong checksum and therefore
breaking SCTP based communications.

This patch removes SCTP checksum offloading support from the virtio
network interface.

Thanks to Felix Weinrank for making me aware of the issue.

Reviewed by:		bryanv@
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D20147
2019-05-07 20:28:12 +00:00
Vincenzo Maffione
93ef29690e vtnet: fix typo in vtnet_free_taskqueues
Because of a typo, the code was mistakenly resetting the
vtnrx_vq pointer rather than vtntx_tq.

Reviewed by:	bryanv
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D19015
2019-01-29 14:31:41 +00:00
Vincenzo Maffione
2e42b74a6f vtnet: fix netmap support
netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.

Changelist:
  - handle errors returned by virtqueue_enqueue() properly (they were
    previously ignored)
  - make sure netmap XOR rest of the kernel access each virtqueue.
  - compute the number of netmap slots for TX and RX separately, according to
    whether indirect descriptors are used or not for a given virtqueue.
  - make sure sglist are freed according to their type (mbufs or netmap
    buffers)
  - add support for mulitiqueue and netmap host (aka sw) rings.
  - intercept VQ interrupts directly instead of intercepting them in txq_eof
    and rxq_eof. This simplifies the code and makes it easier to make sure
    taskqueues are not running for a VQ while it is in netmap mode.
  - implement vntet_netmap_config() to cope with changes in the number of queues.

Reviewed by:	bryanv
Approved by:	gnn (mentor)
MFC after:	3 days
Sponsored by:	Sunny Valley Networks
Differential Revision:	https://reviews.freebsd.org/D17916
2018-11-14 15:39:48 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Mark Johnston
c857c7d553 Add netdump support to vtnet(4).
Tested with bhyve.

Reviewed by:	bryanv, julian
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15261
2018-05-06 00:53:52 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
26c1d774b5 dev: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
2018-01-13 22:30:30 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Pedro F. Giffuni
7282444b10 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:36:21 +00:00
Kristof Provost
83699dfd09 vtnet: Support jumbo frames without TSO/GSO
Currently in Virtio driver without TSO/GSO features enabled, the max scatter
gather segments for the TX path can be 4, which limits the support for 9K JUMBO
frames. 9K JUMBO frames results in more than 4 scatter gather segments and
virtio driver fails to send the frame down to host OS. With TSO/GSO feature
enabled max scatter gather segments can be 64, then 9K JUMBO frames are fine,
this is making virtio driver to support JUMBO frames only with TSO/GSO.

Increasing the VTNET_MIN_TX_SEGS which is the case for non TSO/GSO to 32 to
support upto 64K JUMBO frames to Host.

Submitted by:	Lohith Bellad <lohithbsd@gmail.com>
Reviewed by:	adrian
Differential Revision:	https://reviews.freebsd.org/D8803
2017-07-29 09:22:48 +00:00
Philip Paeps
e414c66099 vtnet: don't update VLAN filter when parent is not running
Submitted by:	Gerrie Roos <groos -at- xiplink -dot- com>
Reviewed by:	gnn
Sponsored by:	XipLink, Inc.
Differential Revision:	https://reviews.freebsd.org/D9573
2017-02-13 21:44:29 +00:00
Steven Hartland
4be723f63e Fix vtnet hang with max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS
Correctly limit npairs passed to vtnet_ctrl_mq_cmd. This ensures that
VQ_ALLOC_INFO_INIT is called with the correct value, preventing the system
from hanging when max_virtqueue_pairs > VTNET_MAX_QUEUE_PAIRS.

Add new sysctl requested_vq_pairs which allow the user to configure
the requested number of virtqueue pairs. The actual value will still take
into account the system limits.

Also missing sysctls for the current tunables so their values can be seen.

PR:		207446
Reported by:	Andy Carrel
MFC after:	3 days
Relnotes:	Yes
Sponsored by:	Multiplay
2016-08-11 21:13:58 +00:00
Kristof Provost
3fcb1aaef1 vtnet: fix panic on unload
Since r276367 added the virtio_mmio support vtnet_modevent() gets called twice.
This resulted in a memory leak during load and a panic on unload.

Count the loads so we only initialise once (just like cxgbe(4)), and only clean
up in the final unload.

PR:		209428
Submitted by:	novel@FreeBSD.org
MFC after:	1 week
2016-05-14 06:07:15 +00:00
Marcelo Araujo
804fc8c859 Lower the compiler warning: unused-but-set-variable.
Approved by:		bapt (mentor)
Differential Revision:	D3556
2015-09-03 06:53:17 +00:00
Luigi Rizzo
0fdeab7bc5 add netmap dependency when compiled as a module 2015-07-10 07:13:14 +00:00
Kristof Provost
581e697036 Fix panic when adding vtnet interfaces to a bridge
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host).  if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).

There are also two issues with the if_bridge error handling.

If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.

A second issue is a double free of bif. It's already freed by
bridge_delete_member().

PR:		200210
Differential Revision:	https://reviews.freebsd.org/D2804
Reviewed by:	philip (mentor)
2015-06-13 19:39:21 +00:00
Bryan Venteicher
cab10cc1d1 Fix typo when deregistering the VLAN unconfig event handler
Submitted by:	Masao Uebayashi <uebayasi@tombiinc.com>
MFC after:	3 days
2015-06-13 16:13:31 +00:00
John Baldwin
4dc78216f8 Don't free mbufs when stopping an interface in netmap mode.
Currently if you ifconfig down a vtnet interface while it is being used
via netmap, the kernel panics due to trying to treat the cookie values
in the virtio rings as mbufs to be freed. When netmap is enabled, these
cookie values are pointers to something else.

Note that other netmap-aware drivers don't seem to need this as they
store the mbuf pointers in the software rings that mirror the hardware
descriptor rings, and since netmap doesn't touch those, the software
state always has NULL mbuf pointers causing the loops to free mbufs to
not do anything. However, vtnet reuses the same state area for both
netmap and non-netmap mode, so it needs to explicitly avoid looking at
the rings and treating the cookie values as mbufs if netmap is
enabled.

Differential Revision:	https://reviews.freebsd.org/D2348
Reviewed by:	adrian, bryanv, luigi
MFC after:	1 week
Sponsored by:	Norse Corp, Inc.
2015-04-29 17:48:25 +00:00
Bryan Venteicher
ab4c2818f2 Add softc flag for when the indirect descriptor feature was negotiated
MFC after:	2 weeks
2015-01-01 02:06:00 +00:00
Bryan Venteicher
5b32b2faaa Use the appropriate IPv4 or IPv6 TSO HW assist flag
MFC after:	2 weeks
2015-01-01 02:03:09 +00:00
Andrew Turner
e51f2e72db Attach vtnet to virtio_mmio. Qemu provides this as an option with AArch64.
Sponsored by:	The FreeBSD Foundation
2014-12-29 17:17:01 +00:00
Hans Petter Selasky
c25290420e Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.

This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.

"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.

Additional notes:
- The SCTP code changes will be committed as a separate patch.
- Removal of the "M_FLOWID" flag will also be done separately.
- The FreeBSD version has been bumped.

MFC after:	1 month
Sponsored by:	Mellanox Technologies
2014-12-01 11:45:24 +00:00
Bryan Venteicher
9a4dabdc5a Enable LRO by default when available on vtnet interfaces
The prior change to not enable LRO by default has confused several
people. The configurations where LRO is problematic is not the
typical use case for VirtIO, and due to other issues, this often
requires checksum offloading to be disabled anyways.

PR:		185864
MFC after:	2 weeks
2014-11-09 20:04:12 +00:00
Gleb Smirnoff
84047b19df - Provide if_get_counter() method for vtnet(4).
- Do not accumulate statistics on every tick.
- Accumulate statistics in vtnet_setup_stat_sysctl()
  and in vtnet_get_counter().

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 19:15:40 +00:00
Gleb Smirnoff
1bffa9511f Use define from if_var.h to access a field inside struct if_data,
that resides in struct ifnet.

Sponsored by:	Nginx, Inc.
2014-08-30 19:55:54 +00:00
Luigi Rizzo
4bf50f18eb Update to the current version of netmap.
Mostly bugfixes or features developed in the past 6 months,
so this is a 10.1 candidate.

Basically no user API changes (some bugfixes in sys/net/netmap_user.h).

In detail:

1. netmap support for virtio-net, including in netmap mode.
  Under bhyve and with a netmap backend [2] we reach over 1Mpps
  with standard APIs (e.g. libpcap), and 5-8 Mpps in netmap mode.

2. (kernel) add support for multiple memory allocators, so we can
  better partition physical and virtual interfaces giving access
  to separate users. The most visible effect is one additional
  argument to the various kernel functions to compute buffer
  addresses. All netmap-supported drivers are affected, but changes
  are mechanical and trivial

3. (kernel) simplify the prototype for *txsync() and *rxsync()
  driver methods. All netmap drivers affected, changes mostly mechanical.

4. add support for netmap-monitor ports. Think of it as a mirroring
  port on a physical switch: a netmap monitor port replicates traffic
  present on the main port. Restrictions apply. Drive carefully.

5. if_lem.c: support for various paravirtualization features,
  experimental and disabled by default.
  Most of these are described in our ANCS'13 paper [1].
  Paravirtualized support in netmap mode is new, and beats the
  numbers in the paper by a large factor (under qemu-kvm,
  we measured gues-host throughput up to 10-12 Mpps).

A lot of refactoring and additional documentation in the files
in sys/dev/netmap, but apart from #2 and #3 above, almost nothing
of this stuff is visible to other kernel parts.

Example programs in tools/tools/netmap have been updated with bugfixes
and to support more of the existing features.

This is meant to go into 10.1 so we plan an MFC before the Aug.22 deadline.

A lot of this code has been contributed by my colleagues at UNIPI,
including Giuseppe Lettieri, Vincenzo Maffione, Stefano Garzarella.

MFC after:	3 days.
2014-08-16 15:00:01 +00:00
Bryan Venteicher
32487a8973 Rework when the Tx queue completion interrupt is enabled
The Tx interrupt is now kept disabled in the common case, only
enabled when the number of free descriptors in the queue falls
below a threshold. Transmitted frames are cleared from the VQ
before subsequent transmit, or in the watchdog timer.

This was a very big performance improvement for an experimental
Netmap bhyve backend.

MFC after:	1 month
2014-07-10 05:36:04 +00:00
Bryan Venteicher
bae486f5d7 Force two byte alignment for all control message headers
The header structure consists of two 1-byte elements, but it must always
be describable by a single SG entry. Note for consistency, specify the
alignment everywhere, even if the structure has the appropriate natural
alignment since it contains a uint16_t.

Obtained from:	DragonFlyBSD
MFC after:	1 week
2014-06-16 04:32:27 +00:00
Bryan Venteicher
fd5b395117 Make the feature negotiation code easier to follow
MFC after:	1 week
2014-06-16 04:29:28 +00:00
Bryan Venteicher
add526c613 - Remove two write-only local variables
- Remove unused element in the vtnet_rxq structure

MFC after:	1 week
2014-06-16 04:12:33 +00:00
Luigi Rizzo
c26e5fc2ed make sure ifp->if_transmit returns 0 if a buffer is enqueued.
A similar fix should be applied to vmxnet, ixgbe, igb, i40e.
(some of them previously reported by Michael Tuexen)

Drivers using if_transmit are correct, and so are most of the
other drivers that reassing if_transmit.

Among other things, this bug causes panics when using netmap emulation
on top of generic drivers.

Approved by:	bryanv
MFC after:	3 days
2014-06-04 16:57:05 +00:00