Commit Graph

205 Commits

Author SHA1 Message Date
Matt Joras
fdbf11746a Allow vlan interfaces to rx through netmap(4).
Normally after receiving a packet, a vlan(4) interface sends the packet
back through its parent interface's rx routine so that it can be
processed as an untagged frame. It does this by using the parent's
ifp->if_input. This is incompatible with netmap(4), which replaces the
vlan(4) interface's if_input with a netmap(4) hook. Fix this by using
the vlan(4) interface's ifp instead of the parent's directly.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	rstone
Approved by:	rstone (mentor)
MFC after:	3 days
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D12191
2017-09-13 00:25:09 +00:00
Matt Joras
d148c2a2b1 Rework vlan(4) locking.
Previously the locking of vlan(4) interfaces was not very comprehensive.
Particularly there was very little protection against the destruction of
active vlan(4) interfaces or concurrent modification of a vlan(4)
interface. The former readily produced several different panics.

The changes can be summarized as using two global vlan locks (an
rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
struct ifnet, in addition to other places where global exclusive access
is required. vlan(4) should now be much more resilient to the destruction
of active interfaces and concurrent calls into the configuration path.

PR:	220980
Reviewed by:	ae, markj, mav, rstone
Approved by:	rstone (mentor)
MFC after:	4 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D11370
2017-08-15 17:52:37 +00:00
Alexander Motin
9bcf3ae4c7 Add parent interface reference counting to if_vlan.
Using plain ifunit() looks like a request for troubles.

MFC after:	1 week
2017-05-23 00:13:27 +00:00
Alexander Motin
59150e9141 Propagate IFCAP_LRO from trunk to vlan interface.
False positive here cost nothing, while false negative may lead to some
confusions.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-04-29 08:28:59 +00:00
Alexander Motin
d89baa5aac Allow some control over enabled capabilities for if_vlan.
It improves interoperability with if_bridge, which may need to disable
some capabilities not supported by other members.  IMHO there is still
open question about LRO capability, which may need to be disabled on
physical interface.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2017-04-28 11:00:58 +00:00
Andrey V. Elsukov
070f87e878 Inherit IPv6 checksum offloading flags to vlan interfaces.
if_vlan(4) interfaces inherit IPv4 checksum offloading flags from the
parent when VLAN_HWCSUM and VLAN_HWTAGGING flags are present on the
parent interface. Do the same for IPv6 checksum offloading flags.

Reported by:	Harry Schmalzbauer
Reviewed by:	np, gnn
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10356
2017-04-11 19:23:25 +00:00
Hans Petter Selasky
f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Marcelo Araujo
2ccbbd06d2 Add support to priority code point (PCP) that is an 3-bit field
which refers to IEEE 802.1p class of service and maps to the frame
priority level.

Values in order of priority are: 1 (Background (lowest)),
0 (Best effort (default)), 2 (Excellent effort),
3 (Critical applications), 4 (Video, < 100ms latency),
5 (Video, < 10ms latency), 6 (Internetwork control) and
7 (Network control (highest)).

Example of usage:
root# ifconfig em0.1 create
root# ifconfig em0.1 vlanpcp 3

Note:
The review D801 includes the pf(4) part, but as discussed with kristof,
we won't commit the pf(4) bits for now.
The credits of the original code is from rwatson.

Differential Revision:	https://reviews.freebsd.org/D801
Reviewed by:	gnn, adrian, loos
Discussed with: rwatson, glebius, kristof
Tested by:	many including Matthew Grooms <mgrooms__shrew.net>
Obtained from:	pfSense
Relnotes:	Yes
2016-06-06 09:51:58 +00:00
Pedro F. Giffuni
a4641f4eaa sys/net*: minor spelling fixes.
No functional change.
2016-05-03 18:05:43 +00:00
Alexander V. Chernikov
8ad43f2d0a Move iflladdr_event eventhandler invocation to if_setlladdr.
Suggested by:	glebius
2015-11-14 13:34:03 +00:00
Alexander V. Chernikov
b13c5b5db2 Use lladdr_event to propagate gratiotus arp.
Differential Revision:	https://reviews.freebsd.org/D4019
2015-11-09 10:11:14 +00:00
Gleb Smirnoff
9f7d0f4830 Don't propagate SIOCSIFCAPS from a vlan(4) to its parent. This leads to
quite unexpected result of toggling capabilities on the neighbour vlan(4)
interfaces.

Reviewed by:		melifaro, np
Differential Revision:	https://reviews.freebsd.org/D2310
Sponsored by:		Nginx, Inc.
2015-04-23 13:19:00 +00:00
Gleb Smirnoff
c03044244f Fix couple of fallouts from r280280. The first one is a simple typo,
where counter was incremented on parent, instead of vlan(4) interface.

The second is more complicated. Historically, in our stack the incoming
packets are accounted in drivers, while incoming bytes for Ethernet
drivers are accounted in ether_input_internal(). Thus, it should be
removed from vlan(4) driver.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-03-25 16:01:46 +00:00
Gleb Smirnoff
b1828acf05 Make vlan_config() the signle point of validity checks.
Sponsored by:	Nginx, Inc.
2015-03-20 21:09:03 +00:00
Gleb Smirnoff
f941c31aae In vlan_clone_match_ethervid():
- Use ifunit() instead of going through the interface list ourselves.
- Remove unused parameter.
- Move the most important comment above the function.

Sponsored by:	Nginx, Inc.
2015-03-20 20:42:58 +00:00
Gleb Smirnoff
67975c79be Tiny comment fix. 2015-03-20 14:16:26 +00:00
Gleb Smirnoff
a58ea6b1cf Now, when r272244 introduced counter(9) based counters for all interfaces,
revert the r271538, which did that for vlan(4) only.

No objections:	melifaro
Sponsored by:	Nginx, Inc.
2015-03-20 14:05:17 +00:00
Xin LI
08e5736618 Handle SIOCSIFCAP by propogating the request to the parent interface. This
allows adding an vlan interface into a bridge.

Thanks for William Katsak <wkatsak cs rutgers edu> for testing and fixing
an issue in my previous patch draft.

MFC after:	2 weeks
2015-02-20 18:39:12 +00:00
Hiroki Sato
478e052062 Virtualize net.link.vlan.soft_pad. 2014-10-02 05:56:17 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Gleb Smirnoff
3751dddb3e Mechanically convert to if_inc_counter(). 2014-09-19 10:39:58 +00:00
Gleb Smirnoff
1b7fb1d93f While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-09-18 14:47:13 +00:00
Gleb Smirnoff
277e067a58 While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with:	melifaro
2014-09-18 10:01:56 +00:00
Marcelo Araujo
5d99eb5926 Revert r271735. The comment is absolutely correct, we do not support 802.1p priority tagging. I got confused with the packet tagged and packet to be tagged.
Spotted by:	glebius
2014-09-18 05:43:19 +00:00
Marcelo Araujo
397bdf7cd5 Remove old comment, we already do 802.1q tagging.
Phabric:	D797
Reviewed by:	kevlo
Approved by:	kevlo
Sponsored by:	QNAP Systems Inc.
2014-09-18 03:09:34 +00:00
Alexander V. Chernikov
6667db3130 * Fix if_omcast handling
* Convert if_oerrors to pcpu.

Suggested by:	glebius
MFC after:	2 weeks
2014-09-16 21:48:48 +00:00
Hans Petter Selasky
72f3100047 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00
Alexander V. Chernikov
772b000f02 Switch if_vlan(4) to rmlock.
MFC after:	2 weeks
2014-09-13 18:41:24 +00:00
Alexander V. Chernikov
299153b570 Switch if_vlan(4) to use counter(9) using new
if_get_counter api.
2014-09-13 18:13:08 +00:00
Hans Petter Selasky
eb93b77ae4 Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2014-09-13 08:26:09 +00:00
Gleb Smirnoff
bf7dcda366 Clean up unused CSUM_FRAGMENT.
Sponsored by:	Nginx, Inc.
2014-09-03 08:30:18 +00:00
Alexander Motin
2d222cb761 Improve locking of multicast addresses in VLAN and LAGG interfaces.
This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.

MFC after:	2 weeks
2014-08-04 00:58:12 +00:00
Rick Macklem
bd9fd667c4 Vlan did not set the value of if_hw_tsomax, so when vlan
was stacked on top of a network interface that set if_hw_tsomax,
tcp_output() would see the default value instead of the value
set by the network interface. This patch modifies vlan so that
it sets if_hw_tsomax to the value of the parent interface.

Reviewed by:	glebius
MFC after:	2 weeks
2014-04-15 21:48:35 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
76039bc84f The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-26 17:58:36 +00:00
Hiroki Sato
8b20f6cf8a Return ENETDOWN when the parent interface is down.
MFC after:	1 week
2013-06-16 04:40:02 +00:00
Oleg Bulyzhin
2c5b403e2d Recover missing arp_ifinit() call.
MFC after:	2 weeks
2013-04-18 20:13:33 +00:00
Andre Oppermann
da2299c5c7 Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability.
Checksumming the IP header of fragments is no different from doing
normal IP headers.

Discussed with:	yongari
MFC after:	1 week
2012-11-27 19:31:49 +00:00
Gleb Smirnoff
42a58907c3 Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

  if_clone_simple()
  if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with:		brooks, bz, 1 year ago
2012-10-16 13:37:54 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
John Baldwin
b90dde2fbc Fix a silly grammar bogon.
Submitted by:	Stephen McKay
2012-08-21 19:07:28 +00:00
John Baldwin
28cc4d37e6 Refine the changes made in r208212 to avoid bogus failures from
if_delmulti() when clearing the configuration for a subinterface when
the parent interface is being detached.  The current code was still
triggering an assertion in if_delmulti() due to the parent interface being
partially detached.  Fix this by not calling if_delmulti() at all if the
parent interface is being detached.  Warn if if_delmulti() fails when the
parent is not being detached (but similar to 208212, still proceed with
tearing down the vlan state).

Tested by:	ae@
MFC after:	1 month
2012-08-20 16:00:33 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Robert Watson
7983103ae6 Clarify throughout the vlan(4) code the difference between a "tag" (the
802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a
VLAN ID (VID).  Tags go in packets.  VIDs identify VLANs.

No functional change is intended, so this should be safe to MFC.  Further
cleanup with functional changes will be committed separately (for example,
renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI).

Reviewed by:	bz
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-12 18:39:37 +00:00
Robert Watson
5a39f779b2 Refine last comment.
Submitted by:	joeld
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-05 11:42:34 +00:00
Robert Watson
15f6780ef4 Add comment to the VLAN code about its integration with VIMAGE: we see what
the code is doing, we recognise the legitimacy of its goal, but we're not
quite sure it's going about it the right way.  More pondering is clearly
required.

Sponsored by:	ADARA Networks, Inc.
Discussed with:	bz
MFC after:	3 days
2012-01-05 11:24:22 +00:00
Pyun YongHyeon
1ad7a2570d Update if_obytes and if_omcast after successful transmit.
While I'm here update if_oerrors if parent interface of vlan is not
up and running.  Previously it updated collision counter and it was
confusing to interprete it.

PR:		kern/163478
Reviewed by:	glebius, jhb
Tested by:	Joe Holden < lists <> rewt dot org dot uk >
2011-12-29 18:40:58 +00:00
John Baldwin
d9b1d61535 Change the if_vlan driver to use if_transmit for forwarding packets to the
parent interface.  This avoids the overhead of queueing a packet to an IFQ
only to immediately dequeue it again.

Suggested by:	np
Reviewed by:	brooks
MFC after:	1 month
2011-11-28 19:35:08 +00:00