Commit Graph

2080 Commits

Author SHA1 Message Date
Jonathan T. Looney
440598dd9e Fix implicit automatic local port selection for IPv6 during connect calls.
When a user creates a TCP socket and tries to connect to the socket without
explicitly binding the socket to a local address, the connect call
implicitly chooses an appropriate local port. When evaluating candidate
local ports, the algorithm checks for conflicts with existing ports by
doing a lookup in the connection hash table.

In this circumstance, both the IPv4 and IPv6 code look for exact matches
in the hash table. However, the IPv4 code goes a step further and checks
whether the proposed 4-tuple will match wildcard (e.g. TCP "listen")
entries. The IPv6 code has no such check.

The missing wildcard check can cause problems when connecting to a local
server. It is possible that the algorithm will choose the same value for
the local port as the foreign port uses. This results in a connection with
identical source and destination addresses and ports. Changing the IPv6
code to align with the IPv4 code's behavior fixes this problem.

Reviewed by:	tuexen
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D27164
2020-11-14 14:50:34 +00:00
Alexander V. Chernikov
d9999ae9ca Fix use-after-free in icmp6_notify_error().
Reported by:	Maxime Villard <max at m00nbsd.net>
Reviewed by:	markj
MFC after:	3 days
2020-10-28 20:22:20 +00:00
Mark Johnston
4caea9b169 icmp6: Count packets dropped due to an invalid hop limit
Pad the icmp6stat structure so that we can add more counters in the
future without breaking compatibility again, last done in r358620.
Annotate the rarely executed error paths with __predict_false while
here.

Reviewed by:	bz, melifaro
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26578
2020-10-19 17:07:19 +00:00
Alexander V. Chernikov
0c325f53f1 Implement flowid calculation for outbound connections to balance
connections over multiple paths.

Multipath routing relies on mbuf flowid data for both transit
 and outbound traffic. Current code fills mbuf flowid from inp_flowid
 for connection-oriented sockets. However, inp_flowid is currently
 not calculated for outbound connections.

This change creates simple hashing functions and starts calculating hashes
 for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the
 system.

Reviewed by:	glebius (previous version),ae
Differential Revision:	https://reviews.freebsd.org/D26523
2020-10-18 17:15:47 +00:00
Richard Scheffenegger
868aabb470 Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt)
option IP(V6)_VLAN_PCP, which can be set to -1 (interface
default), or explicitly to any priority between 0 and 7.

Note that for untagged traffic, explicitly adding a
priority will insert a special 801.1Q vlan header with
vlan ID = 0 to carry the priority setting

Reviewed by:	gallatin, rrs
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26409
2020-10-09 12:06:43 +00:00
Alexander V. Chernikov
fedeb08b6a Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
 relative weights and a dataplane-optimized structure to enable
 efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
 gets compiled during group creation and is basically an array of
 nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
 nexthop or nexthop group. They are distinguished by the presense of
 NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
 should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
 This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
1         ------- ------- ------- --------------------------------------- ---------       1
              13      10       1                             2001:db8::2     vlan2
              14      20       2                             2001:db8::3     vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by:	olivier
Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26449
2020-10-03 10:47:17 +00:00
Alexander V. Chernikov
2259a03020 Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
 as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
 that rtentry can contain multiple nexthops.

Differential Revision:	https://reviews.freebsd.org/D26497
2020-09-21 20:02:26 +00:00
Alexander V. Chernikov
1440f62266 Remove unused nhop_ref_any() function.
Remove "opt_mpath.h" header where not needed.

No functional changes.
2020-09-20 21:32:52 +00:00
Navdeep Parhar
b092fd6c97 if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx
and rx), TSO, and RSS if the NIC is capable of performing these operations on
inner VXLAN traffic.

A VXLAN interface inherits the capabilities of its vxlandev interface if one is
specified or of the interface that hosts the vxlanlocal address. If other
interfaces will carry traffic for that VXLAN then they must have the same
hardware capabilities.

On transmit, if_vxlan verifies that the outbound interface has the required
capabilities and then translates the CSUM_ flags to their inner equivalents.
This tells the hardware ifnet that it needs to operate on the inner frame and
not the outer VXLAN headers.

An event is generated when a VXLAN ifnet starts. This allows hardware drivers to
configure their devices to expect VXLAN traffic on the specified incoming port.

On receive, the hardware does RSS and checksum verification on the inner frame.
if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is
not very clear why it didn't do this already.

Future work:
Rx: it should be possible to avoid the first trip up the protocol stack to get
the frame to if_vxlan just so it can decapsulate and requeue for a second trip
up the stack. The hardware NIC driver could directly call an if_vxlan receive
routine for VXLAN traffic instead.

Rx: LRO. depends on what happens with the previous item. There will have to to
be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.

Reviewed by:	kib@
Relnotes:	Yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:37:57 +00:00
Navdeep Parhar
72cc43df17 Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port.
This will be used by some upcoming changes to if_vxlan(4).  RFC 7348 (VXLAN)
says that the UDP checksum "SHOULD be transmitted as zero.  When a packet is
received with a UDP checksum of zero, it MUST be accepted for decapsulation."
But the original IPv6 RFCs did not allow zero UDP checksum.  RFC 6935 attempts
to resolve this.

Reviewed by:	kib@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:21:15 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Kyle Evans
1e9b8db9b2 ipv6: quit dropping packets looping back on p2p interfaces
To paraphrase the below-referenced PR:

This logic originated in the KAME project, and was even controversial when
it was enabled there by default in 2001. No such equivalent logic exists in
the IPv4 stack, and it turns out that this leads to us dropping valid
traffic when the "point to point" interface is actually a 1:many tun
interface, e.g. with the wireguard userland stack.

Even in the case of true point-to-point links, this logic only avoids
transient looping of packets sent by misconfigured applications or
attackers, which can be subverted by proper route configuration rather than
hardcoded logic in the kernel to drop packets.

In the review, melifaro goes on to note that the kernel can't fix it, so it
perhaps shouldn't try to be 'smart' about it. Additionally, that TTL will
still kick in even with incorrect route configuration.

PR:		247718
Reviewed by:	melifaro, rgrimes
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25567
2020-08-31 01:45:48 +00:00
Alexander V. Chernikov
a624ca3dff Move net/route/shared.h definitions to net/route/route_var.h.
No functional changes.

net/route/shared.h was created in the inital phases of nexthop conversion.
It was intended to serve the same purpose as route_var.h - share definitions
 of functions and structures between the routing subsystem components. At
 that time route_var.h was included by many files external to the routing
 subsystem, which largerly defeats its purpose.

As currently this is not the case anymore and amount of route_var.h includes
 is roughly the same as shared.h, retire the latter in favour of the former.
2020-08-28 22:50:20 +00:00
Alexander V. Chernikov
bec053ffe0 Make net.inet6.ip6.deembed_scopeid behaviour default & remove sysctl.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25637
2020-08-15 11:37:44 +00:00
Alexander V. Chernikov
2f23f45b20 Simplify dom_<rtattach|rtdetach>.
Remove unused arguments from dom_rtattach/dom_rtdetach functions and make
  them return/accept 'struct rib_head' instead of 'void **'.
Declare inet/inet6 implementations in the relevant _var.h headers similar
  to domifattach / domifdetach.
Add rib_subscribe_internal() function to accept subscriptions to the rnh
  directly.

Differential Revision:	https://reviews.freebsd.org/D26053
2020-08-14 21:29:56 +00:00
Hans Petter Selasky
b453d3d239 Use a static initializer for the multicast free tasks.
This makes the SYSINIT() function updated in r364072 superfluous.

Suggested by:	glebius@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-08-11 08:31:40 +00:00
Alexander V. Chernikov
9a00f6d067 Fix rib_subscribe() waitok flag by performing allocation outside epoch.
Make in6_inithead() use rib_subscribe with waitok to achieve reliable
 subscription allocation.

Reviewed by:	glebius
2020-08-11 07:05:30 +00:00
Bjoern A. Zeeb
f9461246a2 MC: add a note with reference to the discussion and history as-to why we
are where we are now.  The main thing is to try to get rid of the delayed
freeing to avoid blocking on the taskq when shutting down vnets.

X-Timeout:	if you still see this before 14-RELEASE remove it.
2020-08-10 10:58:43 +00:00
Hans Petter Selasky
3689652c65 Make sure the multicast release tasks are properly drained when
destroying a VNET or a network interface.

Else the inm release tasks, both IPv4 and IPv6 may cause a panic
accessing a freed VNET or network interface.

Reviewed by:		jmg@
Discussed with:		bz@
Differential Revision:	https://reviews.freebsd.org/D24914
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-10 10:46:08 +00:00
Hans Petter Selasky
a95ef9d38d Use proper prototype for SYSINIT() functions.
Mark the unused argument using the __unused macro.

Discussed with:		kib@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-08-10 10:40:19 +00:00
Bjoern A. Zeeb
a9839c4aee IPV6_PKTINFO support for v4-mapped IPv6 sockets
When using v4-mapped IPv6 sockets with IPV6_PKTINFO we do not
respect the given v4-mapped src address on the IPv4 socket.
Implement the needed functionality. This allows single-socket
UDP applications (such as OpenVPN) to work better on FreeBSD.

Requested by:	Gert Doering (gert greenie.net), pfsense
Tested by:	Gert Doering (gert greenie.net)
Reviewed by:	melifaro
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D24135
2020-08-07 15:13:53 +00:00
Andrey V. Elsukov
ce8875b6c4 Fix typo.
Submitted by:	Evgeniy Khramtsov <evgeniy at khramtsov org>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25932
2020-08-05 10:27:11 +00:00
Mark Johnston
cfae6a92ac Remove an incorrect assertion from in6p_lookup_mcast_ifp().
The socket may be bound to an IPv4-mapped IPv6 address.  However, the
inp address is not relevant to the JOIN_GROUP or LEAVE_GROUP operations.

While here remove an unnecessary check for inp == NULL.

Reported by:	syzbot+d01ab3d5e6c1516a393c@syzkaller.appspotmail.com
Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25888
2020-08-04 15:00:02 +00:00
Mark Johnston
19afc65a7e ip6_output(): Check the return value of in6_getlinkifnet().
If the destination address has an embedded scope ID, make sure that it
corresponds to a valid ifnet before proceeding.  Otherwise a sendto()
with a bogus link-local address can trigger a NULL pointer dereference.

Reported by:	syzkaller
Reviewed by:	ae
Fixes:		r358572
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25887
2020-07-30 17:43:23 +00:00
Alexander V. Chernikov
e1c05fd290 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
 to rib_action(). This is part of the new routing KPI.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
2020-07-21 19:56:13 +00:00
Alexander V. Chernikov
725871230d Temporarly revert r363319 to unbreak the build.
Reported by:	CI
Pointy hat to: melifaro
2020-07-19 10:53:15 +00:00
Alexander V. Chernikov
8cee15d9e4 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25546
2020-07-19 09:29:27 +00:00
Alexander V. Chernikov
4c7ba83f9d Switch inet6 default route subscription to the new rib subscription api.
Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision:	https://reviews.freebsd.org/D25615
2020-07-12 11:24:23 +00:00
Alexander V. Chernikov
eddfb2e86f Fix IPv6 regression introduced by r362900.
PR:		kern/247729
2020-07-03 08:06:26 +00:00
Alexander V. Chernikov
6ad7446c6f Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
 over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25445
2020-07-02 21:04:08 +00:00
Mark Johnston
95033af923 Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack.  Analogous to
IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured
in order to support a loadable SCTP implementation.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-06-18 19:32:34 +00:00
Michael Tuexen
70486b27ae Retire SCTP_SO_LOCK_TESTING.
This was intended to test the locking used in the MacOS X kernel on a
FreeBSD system, to make use of WITNESS and other debugging infrastructure.
This hasn't been used for ages, to take it out to reduce the #ifdef
complexity.

MFC after:		1 week
2020-06-07 14:39:20 +00:00
Ryan Moeller
78a3645fd2 Fix typo in previous commit
Applied the wrong patch

Reported by:	Michael Butler <imb@protected-networks.net>
Approved by:	mav (mentor)
Sponsored by:	iXsystems.com
2020-06-03 17:26:00 +00:00
Ryan Moeller
f057d56c6c scope6: Check for NULL afdata before dereferencing
Narrows the race window with if_detach.

Approved by:	mav (mentor)
MFC after:	3 days
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D25017
2020-06-03 16:57:30 +00:00
Alexander V. Chernikov
da187ddb3d * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D25067
2020-06-01 20:49:42 +00:00
Alexander V. Chernikov
e7403d0230 Revert r361704, it accidentally committed merged D25067 and D25070. 2020-06-01 20:40:40 +00:00
Alexander V. Chernikov
79674562b8 * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D25067
2020-06-01 20:32:02 +00:00
Alexander V. Chernikov
a37a5246ca Use fib[46]_lookup() in mtu calculations.
fib[46]_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.

Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.

Differential Revision:	https://reviews.freebsd.org/D24974
2020-05-28 08:00:08 +00:00
Alexander V. Chernikov
1483c1c508 Replace ip6_ouput fib6_lookup_nh_<ext|basic> calls with fib6_lookup().
fib6_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.

Conversion is straight-forwarded, as the only 2 differences are
requirement of running in network epoch and the need to handle
RTF_GATEWAY case in the caller code.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24973
2020-05-28 07:29:44 +00:00
Alexander V. Chernikov
7bfc98af12 Switch gif(4) path verification to fib[46]_check_urfp().
fibX_lookup_nh_ represents pre-epoch generation of fib api,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Use specialized fib[46]_check_urpf() from newer KPI instead,
to allow removal of older KPI.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24978
2020-05-28 07:26:18 +00:00
Alexander V. Chernikov
4d2c2509f2 Move <add|del|change>_route() functions to route_ctl.c in preparation of
multipath control plane changed described in D24141.

Currently route.c contains core routing init/teardown functions, route table
 manipulation functions and various helper functions, resulting in >2KLOC
 file in total. This change moves most of the route table manipulation parts
 to a dedicated file, simplifying planned multipath changes and making
 route.c more manageable.

Differential Revision:	https://reviews.freebsd.org/D24870
2020-05-23 19:06:57 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Mike Karels
2510235150 Allow TCP to reuse local port with different destinations
Previously, tcp_connect() would bind a local port before connecting,
forcing the local port to be unique across all outgoing TCP connections
for the address family. Instead, choose a local port after selecting
the destination and the local address, requiring only that the tuple
is unique and does not match a wildcard binding.

Reviewed by:	tuexen (rscheff, rrs previous version)
MFC after:	1 month
Sponsored by:	Forcepoint LLC
Differential Revision:	https://reviews.freebsd.org/D24781
2020-05-18 22:53:12 +00:00
Andrew Gallatin
bc74b81991 IPv6: Fix a panic in the nd6 code with unmapped mbufs.
If the neighbor entry for an IPv6 TCP session using unmapped
mbufs times out, IPv6 will send an icmp6 dest. unreachable
message. In doing this, it will try to do a software checksum
on the reflected packet. If this is a TCP session using unmapped
mbufs, then there will be a kernel panic.

To fix this, just free packets with unmapped mbufs, rather
than sending the icmp.

Reviewed by:	np, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24821
2020-05-12 17:18:44 +00:00
Andrew Gallatin
d7452d89ad IPv6: sync IP_NO_SND_TAG_RL support from IPv4
The IP_NO_SND_TAG_RL flag to ip{,6}_output() means that the packets
being sent should bypass hardware rate limiting. This is typically used
by modern TCP stacks for rexmits.

This support was added to IPv4 in r352657, but never added to IPv6, even
though rack and bbr call ip6_output() with this flag.

Reviewed by:	rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24822
2020-05-12 14:01:12 +00:00
Andrew Gallatin
84af4cc153 Fix the build
Back out the IPv6 portion of r360903, as the stamp_tag param
is apparently not supported in upstream FreeBSD.

Sponsored by:	Netflix
Pointy hat to: gallatin
2020-05-11 21:23:22 +00:00
Andrew Gallatin
6043ac201a Ktls: never skip stamping tags for NIC TLS
The newer RACK and BBR TCP stacks have added a mechanism
to disable hardware packet pacing for TCP retransmits.
This mechanism works by skipping the send-tag stamp
on rate-limited connections when the TCP stack calls
ip_output() with the IP_NO_SND_TAG_RL flag set.

When doing NIC TLS, we must ignore this flag, as
NIC TLS packets must always be stamped.  Failure
to stamp a NIC TLS packet will result in crypto
issues.

Reviewed by:	hselasky, rrs
Sponsored by:	Netflix, Mellanox
2020-05-11 19:17:33 +00:00
Alexander V. Chernikov
9e02229580 Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields.
After converting routing subsystem customers to use nexthop objects
 defined in r359823, some fields in struct rtentry became unused.

This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry
 along with the code initializing and updating these fields.

Cleanup of the remaining fields will be addressed by D24669.

This commit also changes the implementation of the RTM_CHANGE handling.
Old implementation tried to perform the whole operation under radix WLOCK,
 resulting in slow performance and hacks like using RTF_RNH_LOCKED flag.
New implementation looks up the route nexthop under radix RLOCK, creates new
 nexthop and tries to update rte nhop pointer. Only last part is done under
 WLOCK.
In the hypothetical scenarious where multiple rtsock clients
 repeatedly issue RTM_CHANGE requests for the same route, route may get
 updated between read and update operation. This is addressed by retrying
 the operation multiple (3) times before returning failure back to the
 caller.

Differential Revision:	https://reviews.freebsd.org/D24666
2020-05-04 14:31:45 +00:00
Gleb Smirnoff
7b6c99d08d Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:12:56 +00:00
Alexander V. Chernikov
74787ef47b Add nhop to the ifa_rtrequest() callback.
With the upcoming multipath changes described in D24141,
 rt->rt_nhop can potentially point to a nexthop group instead of
 an individual nhop.
To simplify caller handling of such cases, change ifa_rtrequest() callback
 to pass changed nhop directly.

Differential Revision:	https://reviews.freebsd.org/D24604
2020-04-29 19:28:56 +00:00