Konstantin Belousov baacf70137 vxlan: correct interface MTU when using hw offloads
Otherwise it breaks when offloading like checksum or TSO are used,
because second (encapsulated) ip_output() processing passes fragments of
the encapsulated packet down to the hardware interface.

Diagnosed by:	hselasky
Reviewed by:	np
Sponsored by:	Nvidia Networking / Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D29501
2021-03-31 14:38:26 +03:00

295 lines
8.0 KiB
Groff

.\" Copyright (c) 2014 Bryan Venteicher
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\"
.Dd March 30, 2021
.Dt VXLAN 4
.Os
.Sh NAME
.Nm vxlan
.Nd "Virtual eXtensible LAN interface"
.Sh SYNOPSIS
To compile this driver into the kernel,
place the following line in your
kernel configuration file:
.Bd -ragged -offset indent
.Cd "device vxlan"
.Ed
.Pp
Alternatively, to load the driver as a
module at boot time, place the following line in
.Xr loader.conf 5 :
.Bd -literal -offset indent
if_vxlan_load="YES"
.Ed
.Sh DESCRIPTION
The
.Nm
driver creates a virtual tunnel endpoint in a
.Nm
segment.
A
.Nm
segment is a virtual Layer 2 (Ethernet) network that is overlaid
in a Layer 3 (IP/UDP) network.
.Nm
is analogous to
.Xr vlan 4
but is designed to be better suited for large, multiple tenant
data center environments.
.Pp
Each
.Nm
interface is created at runtime using interface cloning.
This is most easily done with the
.Xr ifconfig 8
.Cm create
command or using the
.Va cloned_interfaces
variable in
.Xr rc.conf 5 .
The interface may be removed with the
.Xr ifconfig 8
.Cm destroy
command.
.Pp
The
.Nm
driver creates a pseudo Ethernet network interface
that supports the usual network
.Xr ioctl 2 Ns s
and thus can be used with
.Xr ifconfig 8
like any other Ethernet interface.
The
.Nm
interface encapsulates the Ethernet frame
by prepending IP/UDP and
.Nm
headers.
Thus, the encapsulated (inner) frame is able to be transmitted
over a routed, Layer 3 network to the remote host.
.Pp
The
.Nm
interface may be configured in either unicast or multicast mode.
When in unicast mode,
the interface creates a tunnel to a single remote host,
and all traffic is transmitted to that host.
When in multicast mode,
the interface joins an IP multicast group,
and receives packets sent to the group address,
and transmits packets to either the multicast group address,
or directly to the remote host if there is an appropriate
forwarding table entry.
.Pp
When the
.Nm
interface is brought up, a
.Xr udp 4
.Xr socket 9
is created based on the configuration,
such as the local address for unicast mode or
the group address for multicast mode,
and the listening (local) port number.
Since multiple
.Nm
interfaces may be created that either
use the same local address
or join the same group address,
and use the same port,
the driver may share a socket among multiple interfaces.
However, each interface within a socket must belong to
a unique
.Nm
segment.
The analogous
.Xr vlan 4
configuration would be a physical interface configured as
the parent device for multiple VLAN interfaces, each with
a unique VLAN tag.
Each
.Nm
segment is identified by a 24-bit value in the
.Nm
header called the
.Dq VXLAN Network Identifier ,
or VNI.
.Pp
When configured with the
.Xr ifconfig 8
.Cm vxlanlearn
parameter, the interface dynamically creates forwarding table entries
from received packets.
An entry in the forwarding table maps the inner source MAC address
to the outer remote IP address.
During transmit, the interface attempts to lookup an entry for
the encapsulated destination MAC address.
If an entry is found, the IP address in the entry is used to directly
transmit the encapsulated frame to the destination.
Otherwise, when configured in multicast mode,
the interface must flood the frame to all hosts in the group.
The maximum number of entries in the table is configurable with the
.Xr ifconfig 8
.Cm vxlanmaxaddr
command.
Stale entries in the table are periodically pruned.
The timeout is configurable with the
.Xr ifconfig 8
.Cm vxlantimeout
command.
The table may be viewed with the
.Xr sysctl 8
.Cm net.link.vxlan.N.ftable.dump
command.
.Sh MTU
Since the
.Nm
interface encapsulates the Ethernet frame with an IP, UDP, and
.Nm
header, the resulting frame may be larger than the MTU of the
physical network.
The
.Nm
specification recommends the physical network MTU be configured
to use jumbo frames to accommodate the encapsulated frame size.
.Pp
By default, the
.Nm
driver sets its MTU to usual ethernet MTU of 1500 bytes, reduced by
the size of vxlan headers prepended to the encapsulated packets.
.Pp
Alternatively, the
.Xr ifconfig 8
.Cm mtu
command may be used to set the fixed MTU size on the
.Nm
interface to allow the encapsulated frame to fit in the
current MTU of the physical network.
If the
.Cm mtu
command was used, system no longer adjust the
.Nm
interface MTU on routing or address changes.
.Sh HARDWARE
The
.Nm
driver supports hardware checksum offload (receive and transmit) and TSO on the
encapsulated traffic over physical interfaces that support these features.
The
.Nm
interface examines the
.Cm vxlandev
interface, if one is specified, or the interface hosting the
.Cm vxlanlocal
address, and configures its capabilities based on the hardware offload
capabilities of that physical interface.
If multiple physical interfaces will transmit or receive traffic for the
.Nm
then they all must have the same hardware capabilities.
The transmit routine of a
.Nm
interface may fail with
.Er ENXIO
if an outbound physical interface does not support
an offload that the
.Nm
interface is requesting.
This can happen if there are multiple physical interfaces involved, with
different hardware capabilities, or an interface capability was disabled after
the
.Nm
interface had already started.
.Pp
At present, these devices are capable of generating checksums and performing TSO
on the inner frames in hardware:
.Xr cxgbe 4 .
.Sh EXAMPLES
Create a
.Nm
interface in unicast mode
with the
.Cm vxlanlocal
tunnel address of 192.168.100.1,
and the
.Cm vxlanremote
tunnel address of 192.168.100.2.
.Bd -literal -offset indent
ifconfig vxlan create vxlanid 108 vxlanlocal 192.168.100.1 vxlanremote 192.168.100.2
.Ed
.Pp
Create a
.Nm
interface in multicast mode,
with the
.Cm local
address of 192.168.10.95,
and the
.Cm group
address of 224.0.2.6.
The em0 interface will be used to transmit multicast packets.
.Bd -literal -offset indent
ifconfig vxlan create vxlanid 42 vxlanlocal 192.168.10.95 vxlangroup 224.0.2.6 vxlandev em0
.Ed
.Pp
Once created, the
.Nm
interface can be configured with
.Xr ifconfig 8 .
.Pp
The following when placed in the file
.Pa /etc/rc.conf
will cause a vxlan interface called
.Dq Li vxlan0
to be created, and will configure the interface in unicast mode.
.Bd -literal -offset indent
cloned_interfaces="vxlan0"
create_args_vxlan0="vxlanid 108 vxlanlocal 192.168.100.1 vxlanremote 192.168.100.2"
.Ed
.Sh SEE ALSO
.Xr inet 4 ,
.Xr inet6 4 ,
.Xr vlan 4 ,
.Xr rc.conf 5 ,
.Xr ifconfig 8 ,
.Xr sysctl 8
.Rs
.%A "M. Mahalingam"
.%A "et al"
.%T "Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks"
.%D August 2014
.%O "RFC 7348"
.Re
.Sh AUTHORS
.An -nosplit
The
.Nm
driver was written by
.An Bryan Venteicher Aq bryanv@freebsd.org .
Support for stateless hardware offloads was added by
.An Navdeep Parhar Aq np@freebsd.org
in
.Fx 13.0 .