- Make lagg protos a enum.
- When reconfiguring protocol on a lagg, first set it to LAGG_PROTO_NONE,
then drop lock, run the attach routines, and then set it to specific
proto. This removes tons of WITNESS warnings.
- Make lagg protocol attach handlers not failing and allocate memory
with M_WAITOK.
- Virtualize lagg(4) cloner. This change fixes a panic when tearing down
if_lagg(4) interfaces which were cloned in a vnet jail.
Sysctl nodes which are dynamically generated for each cloned interface
(net.link.lagg.N.*) have been removed, and use_flowid and flowid_shift
ifconfig(8) parameters have been added instead. Flags and per-interface
statistics counters are displayed in "ifconfig -v".
- Separate option handling from SIOC[SG]LAGG to SIOC[SG]LAGGOPTS for
backward compatibility with old ifconfig(8).
- Move L2 addr configuration for the primary port to a taskqueue. This fixes
LOR of softc rmlock in iflladdr_event handlers.
- Call if_delmulti_ifma() after LACP_UNLOCK(). This fixes another LOR.
- Fix a panic in lacp_transit_expire().
- Fix a panic in lagg_input() upon shutting down a port.
- Use printb() for boolean flags in ro_opts and actor_state for LACP.
- Fix lladdr configuration which could prevent LACP mode from working.
- Fix LORs when a laggport interface has an IPv6 LLA.
- Virtualize if_epair(4). An if_xname check for both "a" and "b" interfaces
is added to return EEXIST when only "b" interface exists---this can happen
when epair<N>b is moved to a vnet jail and then "ifconfig epair<N> create"
is invoked there.
- Fix a panic which was reproducible by an infinite loop of
"ifconfig epair0 create && ifconfig epair0a destroy".
This was caused by an uninitialized function pointer in
softc->media.
Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.
The filter runs with the descriptor lock held and does not manipulate the
buffers, so it is not necessary sleep when the hold buffer is in use.
Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).
This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).
PR: 200323
Sponsored by: Rubicon Communications (Netgate)
Remove the sleep from the buffer allocation routine.
The buffer must be allocated (or even changed) before the interface is set
and thus, there is no need to verify if the buffer is in use.
MFC r286142:
Remove two unnecessary sleeps from the hot path in bpf(4).
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).
The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS(). If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet. If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.
Update the comment in ROTATE_BUFFERS macro to match the logic described
here.
While here fix a few typos in comments.
MFC r286243:
Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).
Sponsored by: Rubicon Communications (Netgate)
Do not allocate the buffers at opening of the descriptor, because once
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).
If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.
If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.
This fix allows the usage of BIOCSETBUFMODE after r235746.
Update the comments to reflect the recent changes.
Sponsored by: Rubicon Communications (Netgate)
r271524,r273541,r282967,r283009,r283364.
Add support for reading i2c SFP/SFP+ data from NIC driver and
presenting most interesting fields via ifconfig -v.
This version supports Intel ixgbe driver only.
Tested on: Cisco,Intel,Mellanox,ModuleTech,Molex transceivers
* Add new net/sff8436.h containing constants used to access
QSFP+ data via i2c inteface. These constants has been taken
from SFF-8436 "QSFP+ 10 Gbs 4X PLUGGABLE TRANSCEIVER" standard
rev 4.8.
* Add support for printing QSFP+ information from 40G NICs
such as Chelsio T5.
Example:
cxl1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,.....>
ether 00:07:43:28:ad:08
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet 40Gbase-LR4 <full-duplex>
status: active
plugged: QSFP+ 40GBASE-LR4 (MPO Parallel Optic)
vendor: OEM PN: OP-QSFP-40G-LR4 SN: 20140318001 DATE: 2014-03-18
module temperature: 64.06 C voltage: 3.26 Volts
lane 1: RX: 0.47 mW (-3.21 dBm) TX: 2.78 mW (4.46 dBm)
lane 2: RX: 0.20 mW (-6.94 dBm) TX: 2.80 mW (4.47 dBm)
lane 3: RX: 0.18 mW (-7.38 dBm) TX: 2.79 mW (4.47 dBm)
lane 4: RX: 0.90 mW (-0.45 dBm) TX: 2.80 mW (4.48 dBm)
Tested on: Chelsio T5
Tested on: Mellanox/Huawei passive/active cables/transceivers.
Sponsored by: Yandex LLC
Fix group membership of cloned interfaces when one is moved by
if_vmove().
In if_vmove(), if_detach_internal() and if_attach_internal() were
called in series to detach and reattach the interface. When
detaching, if_delgroup() was called and the interface leaves all of
the group membership. And then upon attachment, if_addgroup(ifp,
IFG_ALL) was called and it joined only "all" group again.
This had a problem. Normally, a cloned interface automatically joins
a group whose name is ifc_name of the cloner in addition to "all"
upon creation. However, if_vmove() removed the membership and did
not restore upon attachment.
Approved by: re (gjb)
Fix if_loop so bpfwrite() can use it regardless of the state of
bd_hdrcmplt. As if_loop does not use link-level headers, its behavior
when used by bpfwrite() should be the same regardless of the state of
bd_hdrcmplt. Without this change, libpcap (and other BPF users that
work like it) fail when writing to loopback interfaces.
Approved by: re
vtnet interfaces are always in promiscuous mode (at least if the
VIRTIO_NET_F_CTRL_RX feature is not negotiated with the host). if_promisc() on
a vtnet interface returned ENOTSUP although it has IFF_PROMISC set. This
confused the bridge code. Instead we now accept all enable/disable promiscuous
commands (and always keep IFF_PROMISC set).
There are also two issues with the if_bridge error handling.
If if_promisc() fails it uses bridge_delete_member() to clean up. This tries to
disable promiscuous mode on the interface. That runs into an assert, because
promiscuous mode was never set in the first place. (That's the panic reported in
PR 200210.)
We can only unset promiscuous mode if the interface actually is promiscuous.
This goes against the reference counting done by if_promisc(), but only the
first/last if_promic() calls can actually fail, so this is safe.
A second issue is a double free of bif. It's already freed by
bridge_delete_member().
PR: 200210
- Improve INET/INET6 scope.
- style(9) declarations.
- Make couple of local functions static.
- Even more fixes to !INET and !INET6 kernels.
In collaboration with pluknet
- Toss declarations to fix regular build and NO_INET6 build.
Differential Revision: https://reviews.freebsd.org/D2823
Reviewed by: gnn
In the forwarding case refragment the reassembled packets with the same
size as they arrived in. This allows the sender to determine the optimal
fragment size by Path MTU Discovery.
Roughly based on the OpenBSD work by Alexander Bluhm.
Differential Revision: https://reviews.freebsd.org/D2816
Reviewed by: gnn
Update the pf fragment handling code to closer match recent OpenBSD.
That partially fixes IPv6 fragment handling.
Differential Revision: https://reviews.freebsd.org/D2814
Reviewed by: gnn
- Add vxlan interface
- Use the size of the Ethernet address, not the entire header, when
copying into forwarding entry.
- Prefix all the vxlan ifconfig commands so they are unique
Add new socket ioctls SIOC[SG]TUNFIB to set FIB number of encapsulated
packets on tunnel interfaces. Add support of these ioctls to gre(4),
gif(4) and me(4) interfaces. For incoming packets M_SETFIB() should use
if_fib value from ifnet structure, use proper value in gre(4) and me(4).
Differential Revision: https://reviews.freebsd.org/D2462
Remove in_gif.h and in6_gif.h files. They only contain function
declarations used by gif(4). Instead declare these functions in C files.
Also make some variables static.
MFC r276215:
Extern declarations in C files loses compile-time checking that
the functions' calls match their definitions. Move them to header files.
Overhaul if_gre(4).
Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.
gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
work at system startup. But this also removes support for having tunnels with
the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
Use our standard ioctls for tunnels.
me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);
PR: 164475
MFC r274289 (by bz):
gcc requires variables to be initialised in two places. One of them
is correctly used only under the same conditional though.
For module builds properly check if the kernel supports INET or INET6,
as otherwise various mips kernels without IPv6 support would fail to build.
MFC r274964:
Add ip_gre.h to ObsoleteFiles.inc.
- Virtualize interface cloner for gre(4). This fixes a panic when destroying
a vnet jail which has a gre(4) interface.
- Make net.link.gre.max_nesting vnet-local.
Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.
Differential Revision: https://reviews.freebsd.org/D2004
Sponsored by: Yandex LLC
Currently there is no easy way to specify net.isr.maxthreads = all cpus. We need
to specify exact number of cpus in loader.conf which get annoying when you have
mix of machines which don't have equal number of total cpus. I propose "-1" as
that value. When loader.conf has net.isr.maxthreads = -1, netisr will use all
available cpus.
Sponsored by: Limelight Networks
Activate write-only optimization if bpf device opened with O_WRONLY.
dhclient opens bpf as write-only to send packets. It never reads received
packets from that descriptor, but processing them in kernel takes time.
Especially much time takes packet timestamping on systems with expensive
timecounter, such as bhyve guest, where network speed dropped in half.
Sponsored by: iXsystems, Inc.
r275358:
Start process of removing the use of the deprecated "M_FLOWID" flag
from the FreeBSD network code. The flag is still kept around in the
"sys/mbuf.h" header file, but does no longer have any users. Instead
the "m_pkthdr.rsstype" field in the mbuf structure is now used to
decide the meaning of the "m_pkthdr.flowid" field. To modify the
"m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX"
macros as defined in the "sys/mbuf.h" header file.
This patch introduces new behaviour in the transmit direction.
Previously network drivers checked if "M_FLOWID" was set in "m_flags"
before using the "m_pkthdr.flowid" field. This check has now now been
replaced by checking if "M_HASHTYPE_GET(m)" is different from
"M_HASHTYPE_NONE". In the future more hashtypes will be added, for
example hashtypes for hardware dedicated flows.
"M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is
valid and has no particular type. This change removes the need for an
"if" statement in TCP transmit code checking for the presence of a
valid flowid value. The "if" statement mentioned above is now a direct
variable assignment which is then later checked by the respective
network drivers like before.
r275483:
Remove M_FLOWID from SCTP code.
r276982:
Remove no longer used "M_FLOWID" flag from mbuf.h and update the netisr
manpage.
Note: The FreeBSD version has been bumped.
Reviewed by: hps, tuexen
Sponsored by: Limelight Networks
Add if_input_default() method, that will be used for if_input
initialization, when no input method specified before if_attach().
This prevents panics when if_input() method called directly e.g.
from bpf(4) code.
PR: 192426
Move the recursion detection code into separate function
gif_check_nesting(). Also make MTAG_GIF definition private to if_gif.c.
MFC r276907:
Restore Ethernet-within-IP Encapsulation support that was broken after
r273087. Move all checks from gif_output() into gif_transmit(). Previously
they were checked always, because if_start always called gif_output.
Now gif_transmit() can be called directly from if_bridge() code and we need
do checks here.
PR: 196646
Overhaul if_gif(4):
o convert to if_transmit;
o use rmlock to protect access to gif_softc;
o use sx lock to protect from concurrent ioctls;
o remove a lot of unneeded and duplicated code;
o remove cached route support (it won't work with concurrent io);
o style fixes.
MFC r273090:
Move memset under ifdef INET6.
MFC r273091:
Add more ifdefs. SIOC*_IN6 are defined only with INET6.
MFC r273121:
Add inet/inet6 to the dependency list. Without them if_gif is useless.
MFC r273209 by bz:
After r273087,r273090,r273091,r273121 changes to gif(4) try to fix
NOIP builds for real.
MFC r273587:
Remove redundant check and m_pullup() call.
Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.
Sponsored by: Mellanox Technologies
MFC r273783:
Add fueword(9) and casueword(9) functions.
MFC note: ia64 is handled like arm, with NO_FUEWORD define.
MFC r273784:
Replace some calls to fuword() by fueword() with proper error checking.
MFC r273785:
Convert kern_umtx.c to use fueword() and casueword().
MFC note: the sys__umtx_lock and sys__umtx_unlock syscalls are not
converted, they are removed from HEAD, and not used. The do_sem2*()
family is not yet merged to stable/10, corresponding chunk will be
merged after do_sem2* are committed.
MFC r273788 (by jkim):
Actually install casuword(9) to fix build.
MFC r273911:
Add type qualifier volatile to the base (userspace) address argument
of fuword(9) and suword(9).
Improve transmit sending offload, TSO, algorithm in general. This
change allows all HCAs from Mellanox Technologies to function properly
when TSO is enabled. See r271946 and r272595 for more details about
this commit.
Sponsored by: Mellanox Technologies
When tunneling interface is going to insert mbuf into netisr queue after stripping
outer header, consider it as new packet and clear the protocols flags.
This fixes problems when IPSEC traffic goes through various tunnels and router
doesn't send ICMP/ICMPv6 errors.
PR: 174602
Sponsored by: Yandex LLC
The SYSCTL data pointers can come from userspace and must not be
directly accessed. Although this will work on some platforms, it can
throw an exception if the pointer is invalid and then panic the kernel.
Add a missing SYSCTL_IN() of "SCTP_BASE_STATS" structure.
Sponsored by: Mellanox Technologies
Free radix mask entries on main radix destroy.
This is temporary commit to be merged to 10.
Other approach (like hash table) should be used
to store different masks.
PR: 194078
Fix a panic caused by doing "ifconfig -am" while a lagg is being destroyed.
The thread that is destroying the lagg has already set sc->sc_psc=NULL when
the "ifconfig -am" thread gets to lacp_req(). It tries to dereference
sc->sc_psc and panics. The solution is for lacp_req() to check the value of
sc->sc_psc. If NULL, harmlessly return an lacp_opreq structure full of
zeros. Full details in GNATS.
PR: 189003
- Count global pf(4) statistics in counter(9).
- Do not count global number of states and of src_nodes,
use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH
Improve locking of multicast addresses in VLAN and LAGG interfaces.
This fixes several scenarios of reproducible panics, cause by races
between multicast address changes and interface destruction.