6616 Commits

Author SHA1 Message Date
tuexen
53e0269fab Fix a copy and paste error introduced in r360878.
Reported-by:		syzbot+a0863e972771f2f0d4b3@syzkaller.appspotmail.com
Reported-by:		syzbot+4481757e967ba83c445a@syzkaller.appspotmail.com
MFC after:		3 days
2020-05-11 22:47:20 +00:00
melifaro
b6b9eece0f Fix NOINET[6] build by using af-independent route lookup function.
Reported by:	rpokala
2020-05-11 20:41:03 +00:00
gallatin
16255485c5 Ktls: never skip stamping tags for NIC TLS
The newer RACK and BBR TCP stacks have added a mechanism
to disable hardware packet pacing for TCP retransmits.
This mechanism works by skipping the send-tag stamp
on rate-limited connections when the TCP stack calls
ip_output() with the IP_NO_SND_TAG_RL flag set.

When doing NIC TLS, we must ignore this flag, as
NIC TLS packets must always be stamped.  Failure
to stamp a NIC TLS packet will result in crypto
issues.

Reviewed by:	hselasky, rrs
Sponsored by:	Netflix, Mellanox
2020-05-11 19:17:33 +00:00
tuexen
c3ef0c2592 Ensure that the SCTP iterator runs with an stcb and inp, which belong to
each other.

Reported by:	syzbot+82d39d14f2f765e38db0@syzkaller.appspotmail.com
MFC after:	3 days
2020-05-10 22:54:30 +00:00
tuexen
c55eeb3529 Remove trailing whitespace. 2020-05-10 17:43:42 +00:00
tuexen
541cb8e134 Ensure that we have a path when starting the T3 RXT timer.
Reported by:	syzbot+f2321629047f89486fa3@syzkaller.appspotmail.com
MFC after:	3 days
2020-05-10 17:19:19 +00:00
tuexen
b293dfd427 Only drop DATA chunk with lower priorities as specified in RFC 7496.
This issue was found by looking at a reproducer generated by syzkaller.

MFC after:		3 days
2020-05-10 10:03:10 +00:00
rrs
68f5e0e8b6 When in the SYN-SENT state bbr and rack will not properly send an ACK but instead start the D-ACK timer. This
causes so_reuseport_lb_test to fail since it slows down how quickly the program runs until the timeout occurs
and fails the test

Sponsored by: Netflix inc.
Differential Revision:	https://reviews.freebsd.org/D24747
2020-05-07 20:29:38 +00:00
rrs
1089697be0 NF has an internal option that changes the tcp_mcopy_m routine slightly (has
a few extra arguments). Recently that changed to only have one arg extra so
that two ifdefs around the call are no longer needed. Lets take out the
extra ifdef and arg.

Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D24736
2020-05-07 10:46:02 +00:00
tuexen
25f7702787 Avoid underflowing a variable, which would result in taking more
data from the stream queues then needed.

Thanks to Timo Voelker for finding this bug and providing a fix.

MFC after:		3 days
2020-05-05 19:54:30 +00:00
tuexen
73feb6f1cf Fix the computation of the numbers of entries of the mapping array to
look at when generating a SACK. This was wrong in case of sequence
numbers wrap arounds.

Thanks to Gwenael FOURRE for reporting the issue for the userland stack:
https://github.com/sctplab/usrsctp/issues/462
MFC after:		3 days
2020-05-05 17:52:44 +00:00
tuexen
5854f227c0 Add net epoch support back, which was taken out by accident in
https://svnweb.freebsd.org/changeset/base/360639

Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24694
2020-05-04 23:05:11 +00:00
rrs
083068553d This fixes two issues found by ankitraheja09@gmail.com
1) When BBR retransmits the syn it was messing up the snd_max
2) When we need to send a RST we might not send it when we should

Reported by:	ankitraheja09@gmail.com
Sponsored by:  Netflix.com
Differential Revision: https://reviews.freebsd.org/D24693
2020-05-04 23:02:58 +00:00
tuexen
9291be5e89 Enter the net epoch before calling the output routine in TCP BBR.
This was only triggered when setting the IPPROTO_TCP level socket
option TCP_DELACK.
This issue was found by runnning an instance of SYZKALLER.
Reviewed by:		rrs
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24690
2020-05-04 22:02:49 +00:00
rrs
3839932408 This commit brings things into sync with the advancements that
have been made in rack and adds a few fixes in BBR. This also
removes any possibility of incorrectly doing OOB data the stacks
do not support it. Should fix the skyzaller crashes seen in the
past. Still to fix is the BBR issue just reported this weekend
with the SYN and on sending a RST. Note that this version of
rack can now do pacing as well.

Sponsored by:Netflix Inc
Differential Revision:https://reviews.freebsd.org/D24576
2020-05-04 20:28:53 +00:00
rrs
cae4c35cb7 Adjust the fb to have a way to ask the underlying stack
if it can support the PRUS option (OOB). And then have
the new function call that to validate and give the
correct error response if needed to the user (rack
and bbr do not support obsoleted OOB data).

Sponsoered by: Netflix Inc.
Differential Revision:	 https://reviews.freebsd.org/D24574
2020-05-04 20:19:57 +00:00
melifaro
cb35a496b4 Remove now-unused rt_ifp,rt_ifa,rt_gateway,rt_mtu rte fields.
After converting routing subsystem customers to use nexthop objects
 defined in r359823, some fields in struct rtentry became unused.

This commit removes rt_ifp, rt_ifa, rt_gateway and rt_mtu from struct rtentry
 along with the code initializing and updating these fields.

Cleanup of the remaining fields will be addressed by D24669.

This commit also changes the implementation of the RTM_CHANGE handling.
Old implementation tried to perform the whole operation under radix WLOCK,
 resulting in slow performance and hacks like using RTF_RNH_LOCKED flag.
New implementation looks up the route nexthop under radix RLOCK, creates new
 nexthop and tries to update rte nhop pointer. Only last part is done under
 WLOCK.
In the hypothetical scenarious where multiple rtsock clients
 repeatedly issue RTM_CHANGE requests for the same route, route may get
 updated between read and update operation. This is addressed by retrying
 the operation multiple (3) times before returning failure back to the
 caller.

Differential Revision:	https://reviews.freebsd.org/D24666
2020-05-04 14:31:45 +00:00
glebius
21bb8736b2 Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so still carry M_EXT bit together with M_EXTPG. However,
prepare some code for explicit check for M_EXTPG.

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:37:16 +00:00
glebius
58b56c4a3d Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:21:11 +00:00
glebius
694c46b783 Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by:	gallatin
Differential Revision:	https://reviews.freebsd.org/D24598
2020-05-03 00:12:56 +00:00
rscheff
b76a10884a Introduce a lower bound of 2 MSS to TCP Cubic.
Running TCP Cubic together with ECN could end up reducing cwnd down to 1 byte, if the
receiver continously sets the ECE flag, resulting in very poor transmission speeds.

In line with RFC6582 App. B, a lower bound of 2 MSS is introduced, as well as a typecast
to prevent any potential integer overflows during intermediate calculation steps of the
adjusted cwnd.

Reported by:	Cheng Cui
Reviewed by:	tuexen (mentor)
Approved by:	tuexen (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23353
2020-04-30 11:11:28 +00:00
rscheff
9ff3e71cd0 Prevent premature shrinking of the scaled receive window
which can cause a TCP client to use invalid or stale TCP sequence numbers for ACK packets.

Packets with old sequence numbers are ignored and not used to update the send window size.
This might cause the TCP session to hang indefinitely under some circumstances.

Reported by:	Cui Cheng
Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D24515
2020-04-29 22:01:33 +00:00
rscheff
f8d5656acd Correctly set up the initial TCP congestion window in all cases,
by not including the SYN bit sequence space in cwnd related calculations.
Snd_und is adjusted explicitly in all cases, outside the cwnd update, instead.

This fixes an off-by-one conformance issue with regular TCP sessions not
using Appropriate Byte Counting (RFC3465), sending one more packet during
the initial window than expected.

PR:		235256
Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D19000
2020-04-29 21:48:52 +00:00
melifaro
7c1be85b66 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
melifaro
192053a93a Convert rtalloc_mpath_fib() users to the new KPI.
New fib[46]_lookup() functions support multipath transparently.
Given that, switch the last rtalloc_mpath_fib() calls to
 dib4_lookup() and eliminate the function itself.

Note: proper flowid generation (especially for the outbound traffic) is a
 bigger topic and will be handled in a separate review.
This change leaves flowid generation intact.

Differential Revision:	https://reviews.freebsd.org/D24595
2020-04-28 08:06:56 +00:00
melifaro
4a05b1819a Eliminate now-unused parts of old routing KPI.
r360292 switched most of the remaining routing customers to a new KPI,
 leaving a bunch of wrappers for old routing lookup functions unused.

Remove them from the tree as a part of routing cleanup.

Differential Revision:	https://reviews.freebsd.org/D24569
2020-04-28 07:25:34 +00:00
jhb
d223bc14de Initial support for kernel offload of TLS receive.
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and
  authentication algorithms and keys as well as the initial sequence
  number.

- When reading from a socket using KTLS receive, applications must use
  recvmsg().  Each successful call to recvmsg() will return a single
  TLS record.  A new TCP control message, TLS_GET_RECORD, will contain
  the TLS record header of the decrypted record.  The regular message
  buffer passed to recvmsg() will receive the decrypted payload.  This
  is similar to the interface used by Linux's KTLS RX except that
  Linux does not return the full TLS header in the control message.

- Add plumbing to the TOE KTLS interface to request either transmit
  or receive KTLS sessions.

- When a socket is using receive KTLS, redirect reads from
  soreceive_stream() into soreceive_generic().

- Note that this interface is currently only defined for TLS 1.1 and
  1.2, though I believe we will be able to reuse the same interface
  and structures for 1.3.
2020-04-27 23:17:19 +00:00
jhb
a19547aa99 Add the initial sequence number to the TLS enable socket option.
This will be needed for KTLS RX.

Reviewed by:	gallatin
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24451
2020-04-27 22:31:42 +00:00
rrs
d5486433ca This change does a small prepratory step in getting the
latest rack and bbr in from the NF repo. When those come
in the OOB data handling will be fixed where Skyzaller crashes.

Differential Revision:	https://reviews.freebsd.org/D24575
2020-04-27 16:30:29 +00:00
melifaro
c16a201cb8 Convert debugnet to the new routing KPI.
Introduce new fib[46]_lookup_debugnet() functions serving as a
special interface for the crash-time operations. Underlying
implementation will try to return lookup result if
datastructures are not corrupted, avoding locking.

Convert debugnet to use fib4_lookup_debugnet() and switch it
to use nexthops instead of rtentries.

Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D24555
2020-04-26 18:42:38 +00:00
melifaro
770f0899b4 Fix order of arguments in fib[46]_lookup calls in SCTP.
r360292 introduced the wrong order, resulting in returned
 nhops not being referenced, despite the fact that references
 were requested. That lead to random GPF after using SCTP sockets.

Special defined macro like IPV[46]_SCOPE_GLOBAL will be introduced
 soon to reduce the chance of putting arguments in wrong order.

Reported-by: syzbot+5c813c01096363174684@syzkaller.appspotmail.com
2020-04-26 13:02:42 +00:00
melifaro
fc5c7c80cc Fix LINT build #2 after r360292.
Pointyhat to: melifaro
2020-04-25 11:35:38 +00:00
melifaro
f6b8518330 Fix LINT build broken by r360292. 2020-04-25 10:31:56 +00:00
melifaro
07f4c41e55 Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.

Nexthops are separate datastructures, containing all necessary information
 to perform packet forwarding such as gateway interface and mtu. Nexthops
 are shared among the routes, providing more pre-computed cache-efficient
 data while requiring less memory. Splitting the LPM code and the attached
 data solves multiple long-standing problems in the routing layer,
 drastically reduces the coupling with outher parts of the stack and allows
 to transparently introduce faster lookup algorithms.

Route caching was (re)introduced to minimise (slow) routing lookups, allowing
 for notably better performance for large TCP senders. Caching works by
 acquiring rtentry reference, which is protected by per-rtentry mutex.
 If the routing table is changed (checked by comparing the rtable generation id)
 or link goes down, cache record gets withdrawn.

Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
 cached object, which is mostly mechanical. Other moving parts like cache
 cleanup on rtable change remains the same.

Differential Revision:	https://reviews.freebsd.org/D24340
2020-04-25 09:06:11 +00:00
melifaro
6ee7cd6ac4 Unbreak LINT-NOINET[6] builds broken in r360191.
Reported by:	np
2020-04-23 06:55:33 +00:00
tuexen
38c63fc796 Improve input validation when processing AUTH chunks.
Thanks to Natalie Silvanovich from Google for finding and reporting the
issue found by her in the SCTP userland stack.

MFC after:		3 days
X-MFC with:		https://svnweb.freebsd.org/changeset/base/360193
2020-04-22 21:22:33 +00:00
tuexen
890b12148e Improve input validation when processing AUTH chunks.
Thanks to Natalie Silvanovich from Google for finding and reporting the
issue found by her in the SCTP userland stack.

MFC after:		3 days
2020-04-22 12:47:46 +00:00
melifaro
e5ebdbed30 Convert TOE routing lookups to the new routing KPI.
Reviewed by:	np
Differential Revision:	https://reviews.freebsd.org/D24388
2020-04-22 07:53:43 +00:00
rscheff
932b6bd887 revert rS360143 - Correctly set up initial cwnd
due to syzkaller panics found

Reported by:	tuexen
Approved by:	tuexen (mentor)
Sponsored by:	NetApp, Inc.
2020-04-22 00:16:42 +00:00
rscheff
5c4e0af7f0 Correctly set up the initial TCP congestion window
in all cases, by adjust snd_una right after the
connection initialization, to include the one byte
in sequence space occupied by the SYN bit.

This does not change the regular ACK processing,
while making the BYTES_THIS_ACK macro to work properly.

PR:		235256
Reviewed by:	tuexen (mentor), rgrimes (mentor)
Approved by:	tuexen (mentor), rgrimes (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D19000
2020-04-21 13:05:44 +00:00
jtl
4f694d5f49 Avoid calling protocol drain routines more than once per reclamation event.
mb_reclaim() calls the protocol drain routines for each protocol in each
domain. Some protocols exist in more than one domain and share drain
routines. In the case of SCTP, it also uses the same drain routine for
its SOCK_SEQPACKET and SOCK_STREAM entries in the same domain.

On systems with INET, INET6, and SCTP all defined, mb_reclaim() calls
sctp_drain() four times. On systems with INET and INET6 defined,
mb_reclaim() calls tcp_drain() twice. mb_reclaim() is the only in-tree
caller of the pr_drain protocol entry.

Eliminate this duplication by ensuring that each pr_drain routine is only
specified for one protocol entry in one domain.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24418
2020-04-16 20:17:24 +00:00
melifaro
871657339d Add nhop parameter to rti_filter callback.
One of the goals of the new routing KPI defined in r359823 is to
 entirely hide`struct rtentry` from the consumers. It will allow to
 improve routing subsystem internals and deliver more features much faster.
This change is one of the ongoing changes to eliminate direct
 struct rtentry field accesses.

Additionally, with the followup multipath changes, single rtentry can point
 to multiple nexthops.

With that in mind, convert rti_filter callback used when traversing the
 routing table to accept pair (rt, nhop) instead of nexthop.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24440
2020-04-16 17:20:18 +00:00
rscheff
543b449bcd Reduce default TCP delayed ACK timeout to 40ms.
Reviewed by:	kbowling, tuexen
Approved by:	tuexen (mentor)
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D23281
2020-04-16 15:59:23 +00:00
melifaro
f3b7dca5ab Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to
the new routing KPI.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24245
2020-04-14 23:06:25 +00:00
tuexen
9cf35cacaf Improve the TCP blackhole detection. The principle is to reduce the
MSS in two steps and try each candidate two times. However, if two
candidates are the same (which is the case in TCP/IPv6), this candidate
was tested four times. This patch ensures that each candidate actually
reduced the MSS and is only tested 2 times. This reduces the time window
of missclassifying a temporary outage as an MTU issue.

Reviewed by:		jtl
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24308
2020-04-14 16:35:05 +00:00
gallatin
2de7a790ba KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by:	hselasky, jhb, rrs, glebius (early version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24213
2020-04-14 14:46:06 +00:00
melifaro
7c0fa838c1 Plug netmask NULL check during route addition causing kernel panic.
This bug was introduced by the r359823.

Reported by:	hselasky
2020-04-14 13:12:22 +00:00
kp
13140531a8 carp: Widen epoch coverage
Fix panics related to calling code which expects to be running inside
the NET_EPOCH from outside that epoch.

This leads to panics (with INVARIANTS) such as this one:

    panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/if_ether.c:373
    cpuid = 7
    time = 1586095719
    KDB: stack backtrace:
    db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0090819700
    vpanic() at vpanic+0x182/frame 0xfffffe0090819750
    panic() at panic+0x43/frame 0xfffffe00908197b0
    arprequest_internal() at arprequest_internal+0x59e/frame 0xfffffe00908198c0
    arp_announce_ifaddr() at arp_announce_ifaddr+0x20/frame 0xfffffe00908198e0
    carp_master_down_locked() at carp_master_down_locked+0x10d/frame 0xfffffe0090819910
    carp_master_down() at carp_master_down+0x79/frame 0xfffffe0090819940
    softclock_call_cc() at softclock_call_cc+0x13f/frame 0xfffffe00908199f0
    softclock() at softclock+0x7c/frame 0xfffffe0090819a20
    ithread_loop() at ithread_loop+0x279/frame 0xfffffe0090819ab0
    fork_exit() at fork_exit+0x80/frame 0xfffffe0090819af0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0090819af0
    --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Widen the NET_EPOCH to cover the relevant (callback / task) code.

Differential Revision:	https://reviews.freebsd.org/D24302
2020-04-12 16:09:21 +00:00
melifaro
4eda536b2e Introduce nexthop objects and new routing KPI.
This is the foundational change for the routing subsytem rearchitecture.
 More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
 routing KPI.

Nexthops are objects, containing all necessary information for performing
 the packet output decision. Output interface, mtu, flags, gw address goes
 there. For most of the cases, these objects will serve the same role as
 the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
 multiple BGP full-views, as these objects will be shared between routing
 entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
 <in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions  and the previous
 fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
 exist within current NET_EPOCH. If longer lifetime is desired, one can
 specify NHR_REF as a flag and get a referenced version of the nexthop.
 Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
 inside variety of our firewalls. Their primary goal is to hide the multipath
 implementation details inside the routing subsystem, greatly simplifying
 firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
 embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
 * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
 * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
 decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
 kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by:	ae,glebius(initial version)
Differential Revision:	https://reviews.freebsd.org/D24232
2020-04-12 14:30:00 +00:00
tuexen
556103ac7f Revert https://svnweb.freebsd.org/changeset/base/359809
The intended change was
	sp->next.tqe_next = NULL;
	sp->next.tqe_prev = NULL;
which doesn't fix the issue I'm seeing and the committed fix is
not the intended fix due to copy-and-paste.

Thanks a lot to Conrad Meyer for making me aware of the problem.

Reported by:		cem
2020-04-12 09:31:36 +00:00