b092fd6c97
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic. A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities. On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers. An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port. On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already. Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead. Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state. Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873
284 lines
7.8 KiB
Groff
284 lines
7.8 KiB
Groff
.\" Copyright (c) 2014 Bryan Venteicher
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD$
|
|
.\"
|
|
.Dd September 17, 2020
|
|
.Dt VXLAN 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm vxlan
|
|
.Nd "Virtual eXtensible LAN interface"
|
|
.Sh SYNOPSIS
|
|
To compile this driver into the kernel,
|
|
place the following line in your
|
|
kernel configuration file:
|
|
.Bd -ragged -offset indent
|
|
.Cd "device vxlan"
|
|
.Ed
|
|
.Pp
|
|
Alternatively, to load the driver as a
|
|
module at boot time, place the following line in
|
|
.Xr loader.conf 5 :
|
|
.Bd -literal -offset indent
|
|
if_vxlan_load="YES"
|
|
.Ed
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm
|
|
driver creates a virtual tunnel endpoint in a
|
|
.Nm
|
|
segment.
|
|
A
|
|
.Nm
|
|
segment is a virtual Layer 2 (Ethernet) network that is overlaid
|
|
in a Layer 3 (IP/UDP) network.
|
|
.Nm
|
|
is analogous to
|
|
.Xr vlan 4
|
|
but is designed to be better suited for large, multiple tenant
|
|
data center environments.
|
|
.Pp
|
|
Each
|
|
.Nm
|
|
interface is created at runtime using interface cloning.
|
|
This is most easily done with the
|
|
.Xr ifconfig 8
|
|
.Cm create
|
|
command or using the
|
|
.Va cloned_interfaces
|
|
variable in
|
|
.Xr rc.conf 5 .
|
|
The interface may be removed with the
|
|
.Xr ifconfig 8
|
|
.Cm destroy
|
|
command.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
driver creates a pseudo Ethernet network interface
|
|
that supports the usual network
|
|
.Xr ioctl 2 Ns s
|
|
and thus can be used with
|
|
.Xr ifconfig 8
|
|
like any other Ethernet interface.
|
|
The
|
|
.Nm
|
|
interface encapsulates the Ethernet frame
|
|
by prepending IP/UDP and
|
|
.Nm
|
|
headers.
|
|
Thus, the encapsulated (inner) frame is able to be transmitted
|
|
over a routed, Layer 3 network to the remote host.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
interface may be configured in either unicast or multicast mode.
|
|
When in unicast mode,
|
|
the interface creates a tunnel to a single remote host,
|
|
and all traffic is transmitted to that host.
|
|
When in multicast mode,
|
|
the interface joins an IP multicast group,
|
|
and receives packets sent to the group address,
|
|
and transmits packets to either the multicast group address,
|
|
or directly to the remote host if there is an appropriate
|
|
forwarding table entry.
|
|
.Pp
|
|
When the
|
|
.Nm
|
|
interface is brought up, a
|
|
.Xr udp 4
|
|
.Xr socket 9
|
|
is created based on the configuration,
|
|
such as the local address for unicast mode or
|
|
the group address for multicast mode,
|
|
and the listening (local) port number.
|
|
Since multiple
|
|
.Nm
|
|
interfaces may be created that either
|
|
use the same local address
|
|
or join the same group address,
|
|
and use the same port,
|
|
the driver may share a socket among multiple interfaces.
|
|
However, each interface within a socket must belong to
|
|
a unique
|
|
.Nm
|
|
segment.
|
|
The analogous
|
|
.Xr vlan 4
|
|
configuration would be a physical interface configured as
|
|
the parent device for multiple VLAN interfaces, each with
|
|
a unique VLAN tag.
|
|
Each
|
|
.Nm
|
|
segment is identified by a 24-bit value in the
|
|
.Nm
|
|
header called the
|
|
.Dq VXLAN Network Identifier ,
|
|
or VNI.
|
|
.Pp
|
|
When configured with the
|
|
.Xr ifconfig 8
|
|
.Cm vxlanlearn
|
|
parameter, the interface dynamically creates forwarding table entries
|
|
from received packets.
|
|
An entry in the forwarding table maps the inner source MAC address
|
|
to the outer remote IP address.
|
|
During transmit, the interface attempts to lookup an entry for
|
|
the encapsulated destination MAC address.
|
|
If an entry is found, the IP address in the entry is used to directly
|
|
transmit the encapsulated frame to the destination.
|
|
Otherwise, when configured in multicast mode,
|
|
the interface must flood the frame to all hosts in the group.
|
|
The maximum number of entries in the table is configurable with the
|
|
.Xr ifconfig 8
|
|
.Cm vxlanmaxaddr
|
|
command.
|
|
Stale entries in the table are periodically pruned.
|
|
The timeout is configurable with the
|
|
.Xr ifconfig 8
|
|
.Cm vxlantimeout
|
|
command.
|
|
The table may be viewed with the
|
|
.Xr sysctl 8
|
|
.Cm net.link.vxlan.N.ftable.dump
|
|
command.
|
|
.Sh MTU
|
|
Since the
|
|
.Nm
|
|
interface encapsulates the Ethernet frame with an IP, UDP, and
|
|
.Nm
|
|
header, the resulting frame may be larger than the MTU of the
|
|
physical network.
|
|
The
|
|
.Nm
|
|
specification recommends the physical network MTU be configured
|
|
to use jumbo frames to accommodate the encapsulated frame size.
|
|
Alternatively, the
|
|
.Xr ifconfig 8
|
|
.Cm mtu
|
|
command may be used to reduce the MTU size on the
|
|
.Nm
|
|
interface to allow the encapsulated frame to fit in the
|
|
current MTU of the physical network.
|
|
.Sh HARDWARE
|
|
The
|
|
.Nm
|
|
driver supports hardware checksum offload (receive and transmit) and TSO on the
|
|
encapsulated traffic over physical interfaces that support these features.
|
|
The
|
|
.Nm
|
|
interface examines the
|
|
.Cm vxlandev
|
|
interface, if one is specified, or the interface hosting the
|
|
.Cm vxlanlocal
|
|
address, and configures its capabilities based on the hardware offload
|
|
capabilities of that physical interface.
|
|
If multiple physical interfaces will transmit or receive traffic for the
|
|
.Nm
|
|
then they all must have the same hardware capabilities.
|
|
The transmit routine of a
|
|
.Nm
|
|
interface may fail with
|
|
.Er ENXIO
|
|
if an outbound physical interface does not support
|
|
an offload that the
|
|
.Nm
|
|
interface is requesting.
|
|
This can happen if there are multiple physical interfaces involved, with
|
|
different hardware capabilities, or an interface capability was disabled after
|
|
the
|
|
.Nm
|
|
interface had already started.
|
|
.Pp
|
|
At present, these devices are capable of generating checksums and performing TSO
|
|
on the inner frames in hardware:
|
|
.Xr cxgbe 4 .
|
|
.Sh EXAMPLES
|
|
Create a
|
|
.Nm
|
|
interface in unicast mode
|
|
with the
|
|
.Cm vxlanlocal
|
|
tunnel address of 192.168.100.1,
|
|
and the
|
|
.Cm vxlanremote
|
|
tunnel address of 192.168.100.2.
|
|
.Bd -literal -offset indent
|
|
ifconfig vxlan create vxlanid 108 vxlanlocal 192.168.100.1 vxlanremote 192.168.100.2
|
|
.Ed
|
|
.Pp
|
|
Create a
|
|
.Nm
|
|
interface in multicast mode,
|
|
with the
|
|
.Cm local
|
|
address of 192.168.10.95,
|
|
and the
|
|
.Cm group
|
|
address of 224.0.2.6.
|
|
The em0 interface will be used to transmit multicast packets.
|
|
.Bd -literal -offset indent
|
|
ifconfig vxlan create vxlanid 42 vxlanlocal 192.168.10.95 vxlangroup 224.0.2.6 vxlandev em0
|
|
.Ed
|
|
.Pp
|
|
Once created, the
|
|
.Nm
|
|
interface can be configured with
|
|
.Xr ifconfig 8 .
|
|
.Pp
|
|
The following when placed in the file
|
|
.Pa /etc/rc.conf
|
|
will cause a vxlan interface called
|
|
.Dq Li vxlan0
|
|
to be created, and will configure the interface in unicast mode.
|
|
.Bd -literal -offset indent
|
|
cloned_interfaces="vxlan0"
|
|
create_args_vxlan0="vxlanid 108 vxlanlocal 192.168.100.1 vxlanremote 192.168.100.2"
|
|
.Ed
|
|
.Sh SEE ALSO
|
|
.Xr inet 4 ,
|
|
.Xr inet6 4 ,
|
|
.Xr vlan 4 ,
|
|
.Xr rc.conf 5 ,
|
|
.Xr ifconfig 8 ,
|
|
.Xr sysctl 8
|
|
.Rs
|
|
.%A "M. Mahalingam"
|
|
.%A "et al"
|
|
.%T "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks"
|
|
.%D August 2014
|
|
.%O "RFC 7348"
|
|
.Re
|
|
.Sh AUTHORS
|
|
.An -nosplit
|
|
The
|
|
.Nm
|
|
driver was written by
|
|
.An Bryan Venteicher Aq bryanv@freebsd.org .
|
|
Support for stateless hardware offloads was added by
|
|
.An Navdeep Parhar Aq np@freebsd.org
|
|
in
|
|
.Fx 13.0 .
|