7082 Commits

Author SHA1 Message Date
Gleb Smirnoff
2144431c11 Remove in_ifaddr_lock acquisiton to access in_ifaddrhead.
An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.

Next step would be to CK-ify the in_addr hash and get rid of the...

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D32434
2021-10-13 10:04:46 -07:00
Marko Zec
bc8b8e106b [fib_algo][dxr] Retire counters which are no longer used
The number of chunks can still be tracked via vmstat -z|fgrep dxr.

MFC after:	3 days
2021-10-09 13:47:10 +02:00
Marko Zec
1549575f22 [fib_algo][dxr] Improve incremental updating strategy
Tracking the number of unused holes in the trie and the range table
was a bad metric based on which full trie and / or range rebuilds
were triggered, which would happen in vain by far too frequently,
particularly with live BGP feeds.

Instead, track the total unused space inside the trie and range table
structures, and trigger rebuilds if the percentage of unused space
exceeds a sysctl-tunable threshold.

MFC after:	3 days
PR:		257965
2021-10-09 13:22:27 +02:00
Michael Tuexen
bd19202c92 sctp: improve KASSERT messages
MFC after:	1 week
2021-10-08 11:33:56 +02:00
Michael Tuexen
3ff3733991 sctp: don't keep being locked on a stream which is removed
Reported by:	syzbot+f5f551e8a3a0302a4914@syzkaller.appspotmail.com
MFC after:	1 week
2021-10-02 00:48:01 +02:00
Randall Stewart
a36230f75e tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics
on DSACKs however are very useful in figuring out how much bad retransmissions you
are doing. This is further complicated, however, by stacks that do TLP. A TLP
when discovering a lost ack in the reverse path will cause the generation
of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well
as the more traditional dsack-bytes and dsack-packets. These will now
all display in netstat -p tcp -s. This also updates all stacks that
are currently built to keep track of these stats.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32158
2021-10-01 10:36:27 -04:00
Michael Tuexen
28ea947078 sctp: provide a specific stream scheduler function for FCFS
A KASSERT in the genric routine does not apply and triggers
incorrectly.

Reported by:	syzbot+8435af157238c6a11430@syzkaller.appspotmail.com
MFC after:	1 week
2021-09-29 02:08:37 +02:00
Michael Tuexen
fa947a3687 sctp: cleanup and adding KASSERT()s, no functional change
MFC after:	1 week
2021-09-28 20:31:12 +02:00
Michael Tuexen
5b53e749a9 sctp: fix usage of stream scheduler functions
sctp_ss_scheduled() should only be called for streams that are
scheduled. So call sctp_ss_remove_from_stream() before it.
This bug was uncovered by the earlier cleanup.

Reported by:	syzbot+bbf739922346659df4b2@syzkaller.appspotmail.com
Reported by:	syzbot+0a0857458f4a7b0507c8@syzkaller.appspotmail.com
Reported by:	syzbot+a0b62c6107b34a04e54d@syzkaller.appspotmail.com
Reported by:	syzbot+0aa0d676429ebcd53299@syzkaller.appspotmail.com
Reported by:	syzbot+104cc0c1d3ccf2921c1d@syzkaller.appspotmail.com
MFC after:	1 week
2021-09-28 05:25:58 +02:00
Michael Tuexen
171633765c sctp: avoid locking an already locked mutex
Reported by:	syzbot+f048680690f2e8d7ddad@syzkaller.appspotmail.com
Reported by:	syzbot+0725c712ba89d123c2e9@syzkaller.appspotmail.com
MFC after:	1 week
2021-09-28 05:17:03 +02:00
Gordon Bergling
d2e616147d sctp: Fix a typo in a comment
- s/assue/assume/

MFC after:	3 days
2021-09-26 15:15:39 +02:00
Marko Zec
43880c511c [fib_algo][dxr] Split unused range chunk list in multiple buckets
Traversing a single list of unused range chunks in search for a block
of optimal size was suboptimal.

The experience with real-world BGP workloads has shown that on average
unused range chunks are tiny, mostly in length from 1 to 4 or 5, when
DXR is configured with K = 20 which is the current default (D16X4R).

Therefore, introduce a limited amount of buckets to accomodate descriptors
of empty blocks of fixed (small) size, so that those can be found in O(1)
time.  If no empty chunks of the requested size can be found in fixed-size
buckets, the search continues in an unsorted list of empty chunks of
variable lengths, which should only happen infrequently.

This change should permit us to manage significantly more empty range
chunks without sacrifying the speed of incremental range table updating.

MFC after:	3 days
2021-09-25 06:29:48 +02:00
Randall Stewart
1ca931a540 tcp: Rack compressed ack path updates the recv window too easily
The compressed ack path of rack is not following proper procedures in updating
the peers window. It should be checking the seq and ack values before updating and
instead it is blindly updating the values. This could in theory get the wrong window
in the connection for some length of time.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32082
2021-09-23 11:43:29 -04:00
Randall Stewart
fd69939e79 tcp: Two bugs in rack one of which can lead to a panic.
In extensive testing in NF we have found two issues inside
the rack stack.

1) An incorrect offset is being generated by the fast send path when a fast send is initiated on
   the end of the socket buffer and before the fast send runs, the sb_compress macro adds data to the trailing socket.
   This fools the fast send code into thinking the sb offset changed and it miscalculates a "updated offset".
   It should only do that when the mbuf in question got smaller.. i.e. an ack was processed. This can lead to
   a panic deref'ing a NULL mbuf if that packet is ever retransmitted. At the best case it leads to invalid data being
   sent to the client which usually terminates the connection. The fix is to have the proper logic (that is in the rsm fast path)
   to make sure we only update the offset when the mbuf shrinks.
2) The other issue is more bothersome. The timestamp check in rack needs to use the msec timestamp when
   comparing the timestamp echo to now. It was using a microsecond timestamp which ends up giving error
   prone results but causes only small harm in trying to identify which send to use in RTT calculations if its a retransmit.

Reviewed by: tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32062
2021-09-23 10:54:23 -04:00
Michael Tuexen
414499b3f9 sctp: Cleanup stream schedulers.
No functional change intended.

MFC after:	1 week
2021-09-23 14:16:56 +02:00
Michael Tuexen
762ae0ec8d sctp: Simplify stream scheduler usage
Callers are getting the stcb send lock, so just KASSERT that.
No need to signal this when calling stream scheduler functions.
No functional change intended.

MFC after:	1 week
2021-09-21 17:13:57 +02:00
Michael Tuexen
0b79a76f84 sctp: improve consistency when calling stream scheduler
Hold always the stcb send lock when calling sctp_ss_init() and
sctp_ss_remove_from_stream().

MFC after:	1 week
2021-09-21 00:54:13 +02:00
Michael Tuexen
34b1efcea1 sctp: use a valid outstream when adding it to the scheduler
Without holding the stcb send lock, the outstreams might get
reallocated if the number of streams are increased.

Reported by:	syzbot+4a5431d7caa666f2c19c@syzkaller.appspotmail.com
Reported by:	syzbot+aa2e3b013a48870e193d@syzkaller.appspotmail.com
Reported by:	syzbot+e4368c3bde07cd2fb29f@syzkaller.appspotmail.com
Reported by:	syzbot+fe2f110e34811ea91690@syzkaller.appspotmail.com
Reported by:	syzbot+ed6e8de942351d0309f4@syzkaller.appspotmail.com
MFC after:	1 week
2021-09-20 15:52:10 +02:00
Marko Zec
2ac039f7be [fib_algo][dxr] Merge adjacent empty range table chunks.
MFC after:	3 days
2021-09-20 06:30:45 +02:00
Michael Tuexen
e19d93b19d sctp: fix FCFS stream scheduler
Reported by:	syzbot+c6793f0f0ce698bce230@syzkaller.appspotmail.com
MFC after:	1 week
2021-09-19 11:56:26 +02:00
Mark Johnston
bf25678226 ktls: Fix error/mode confusion in TCP_*TLS_MODE getsockopt handlers
ktls_get_(rx|tx)_mode() can return an errno value or a TLS mode, so
errors are effectively hidden.  Fix this by using a separate output
parameter.  Convert to the new socket buffer locking macros while here.

Note that the socket buffer lock is not needed to synchronize the
SOLISTENING check here, we can rely on the PCB lock.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31977
2021-09-17 14:19:05 -04:00
Mike Karels
fd0765933c Change lowest address on subnet (host 0) not to broadcast by default.
The address with a host part of all zeros was used as a broadcast long
ago, but the default has been all ones since 4.3BSD and RFC1122.  Until
now, we would broadcast the host zero address as well as the configured
address.  Change to not broadcasting that address by default, but add a
sysctl (net.inet.ip.broadcast_lowest) to re-enable it.  Note that the
correct way to use the zero address for broadcast would be to configure
it as the broadcast address for the network.

See https:/datatracker.ietf.org/doc/draft-schoen-intarea-lowest-address/
and the discussion in https://reviews.freebsd.org/D19316.  Note, Linux
now implements this.

Reviewed by:	rgrimes, tuexen; melifaro (previous version)
MFC after:	1 month
Relnotes:	yes
Differential Revision: https://reviews.freebsd.org/D31861
2021-09-16 19:42:20 -05:00
Marko Zec
eb3148cc4d [fib algo][dxr] Fix division by zero.
A division by zero would occur if DXR would be activated on a vnet
with no IP addresses configured on any interfaces.

PR:		257965
MFC after:	3 days
Reported by:	Raul Munoz
2021-09-16 16:34:05 +02:00
Marko Zec
b51f8bae57 [fib algo][dxr] Optimize trie updating.
Don't rebuild in vain trie parts unaffected by accumulated incremental
RIB updates.

PR:		257965
Tested by:	Konrad Kreciwilk
MFC after:	3 days
2021-09-15 22:42:49 +02:00
Marko Zec
442c8a245e [fib algo][dxr] Fix undefined behavior.
The result of shifting uint32_t by 32 (or more) is undefined: fix it.
2021-09-15 22:42:48 +02:00
Hans Petter Selasky
e3e7d95332 tcp: Avoid division by zero when KERN_TLS is enabled in tcp_account_for_send().
If the "len" variable is non-zero, we can assume that the sum of
"tp->t_snd_rxt_bytes + tp->t_sndbytes" is also non-zero.

It is also assumed that the 64-bit byte counters will never wrap around.

Differential Revision:	https://reviews.freebsd.org/D31959
Reviewed by:	gallatin, rrs and tuexen
Found by:	"I told you so", also called hselasky
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2021-09-15 18:05:31 +02:00
Michael Tuexen
4542164685 sctp: cleanup, no functional change intended
MFC after:	1 week
2021-09-15 10:18:11 +02:00
John Baldwin
c782ea8bb5 Add a switch structure for send tags.
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'.  A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).

Previously, device driver ifnet methods switched on the type to call
type-specific functions.  Now, those type-specific functions are saved
in the switch structure and invoked directly.  In addition, this more
gracefully permits multiple implementations of the same tag within a
driver.  In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.

Reviewed by:	gallatin, hselasky, kib (older version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D31572
2021-09-14 11:43:41 -07:00
Mark Johnston
e6c19aa94d sctp: Allow blocking on I/O locks even with non-blocking sockets
There are two flags to request a non-blocking receive on a socket:
MSG_NBIO and MSG_DONTWAIT.  They are handled a bit differently in that
soreceive_generic() and soreceive_stream() will block on the socket I/O
lock when MSG_NBIO is set, but not if MSG_DONTWAIT is set.  In general,
MSG_NBIO seems to mean, "don't block if there is no data to receive" and
MSG_DONTWAIT means "don't go to sleep for any reason".

SCTP's soreceive implementation did not allow blocking on the I/O lock
if either flag is set, but this violates an assumption in
aio_process_sb(), which specifies MSG_NBIO but nonetheless
expects to make progress if data is available to read.  Change
sctp_sorecvmsg() to block on the I/O lock only if MSG_DONTWAIT
is not set.

Reported by:	syzbot+c7d22dbbb9aef509421d@syzkaller.appspotmail.com
Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31915
2021-09-14 09:02:05 -04:00
Michael Tuexen
29545986bd sctp: avoid LOR
Don't lock the inp-info lock while holding an stcb lock.

MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D31921
2021-09-12 21:11:14 +02:00
Michael Tuexen
4181fa2a20 sctp: minor cleanup, no functional change
MFC after:	1 week
2021-09-12 19:21:15 +02:00
Mark Johnston
2d5c48eccd sctp: Tighten up locking around sctp_aloc_assoc()
All callers of sctp_aloc_assoc() mark the PCB as connected after a
successful call (for one-to-one-style sockets).  In all cases this is
done without the PCB lock, so the PCB's flags can be corrupted.  We also
do not atomically check whether a one-to-one-style socket is a listening
socket, which violates various assumptions in solisten_proto().

We need to hold the PCB lock across all of sctp_aloc_assoc() to fix
this.  In order to do that without introducing lock order reversals, we
have to hold the global info lock as well.

So:
- Convert sctp_aloc_assoc() so that the inp and info locks are
  consistently held.  It returns with the association lock held, as
  before.
- Fix an apparent bug where we failed to remove an association from a
  global hash if sctp_add_remote_addr() fails.
- sctp_select_a_tag() is called when initializing an association, and it
  acquires the global info lock.  To avoid lock recursion, push locking
  into its callers.
- Introduce sctp_aloc_assoc_connected(), which atomically checks for a
  listening socket and sets SCTP_PCB_FLAGS_CONNECTED.

There is still one edge case in sctp_process_cookie_new() where we do
not update PCB/socket state correctly.

Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31908
2021-09-11 10:15:21 -04:00
orange30
f5777c123a net: Fix memory leaks upon arp_fillheader() failures
Free memory before return from arprequest_internal().  In in_arpinput(),
if arp_fillheader() fails, it should use goto drop.

Reviewed by:	melifaro, imp, markj
MFC after:	1 week
Pull Request:	https://github.com/freebsd/freebsd-src/pull/534
2021-09-10 09:45:26 -04:00
Michael Tuexen
3ea2cdd45e sctp: add explicit cast, no functional change intended
MFC after:	3 days
2021-09-09 19:13:47 +02:00
Michael Tuexen
0c1a20beb4 sctp: use appropriate argument when freeing association
Reported by:	syzbot+7fe26e26911344e7211d@syzkaller.appspotmail.com
MFC after:	3 days
2021-09-09 18:01:35 +02:00
Mark Johnston
4250aa1188 sctp: Clear assoc socket references when freeing a PCB
This restores behaviour present in the first import of SCTP.  Commit
ceaad40ae729dea2c5d8ffcfdd45bb96fb8969d2 commented this out and commit
62fb761ff28bb184a2543e539dd689fefd5d3246 removed it.  However, once
sctp_inpcb_free() returns, the socket reference is gone no matter what,
so we need to clear it.

Reported by:	syzbot+30dd69297fcbc5f0e10a@syzkaller.appspotmail.com
Reported by:	syzbot+7b2f9d4bcac1c9569291@syzkaller.appspotmail.com
Reported by:	syzbot+ed3e651f7d040af480a6@syzkaller.appspotmail.com
Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31886
2021-09-09 08:33:26 -04:00
Michael Tuexen
58a7bf124c sctp: cleanup timewait handling for vtags
MFC after:	1 week
2021-09-09 01:18:58 +02:00
Mark Johnston
ee4731179c sctp: Fix a lock order reversal in sctp_swap_inpcb_for_listen()
When port reuse is enabled in a one-to-one-style socket, sctp_listen()
may call sctp_swap_inpcb_for_listen() to move the PCB out of the "TCP
pool".  In so doing it will drop the PCB lock, yielding an LOR since we
now hold several socket locks.  Reorder sctp_listen() so that it
performs this operation before beginning the conversion to a listening
socket.  Also modify sctp_swap_inpcb_for_listen() to return with PCB
write-locked, since that's what sctp_listen() expects now.

Reviewed by:	tuexen
Fixes:	bd4a39cc93d9 ("socket: Properly interlock when transitioning to a listening socket")
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31879
2021-09-08 11:41:19 -04:00
Mark Johnston
6e3af6321b sctp: Fix lock recursion in sctp_swap_inpcb_for_listen()
After commit bd4a39cc93d9 we now hold the global inp info lock across
the call to sctp_swap_inpcb_for_listen(), which attempts to acquire it
again.  Since sctp_swap_inpcb_for_listen()'s sole caller is
sctp_listen(), we can simply change it to not try to acquire the lock.

Reported by:	syzbot+a76b19ea2f8e1190c451@syzkaller.appspotmail.com
Reported by:	syzbot+a1b6cef257ad145b7187@syzkaller.appspotmail.com
Reviewed by:	tuexen
Fixes:	bd4a39cc93d9 ("socket: Properly interlock when transitioning to a listening socket")
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31878
2021-09-08 11:41:18 -04:00
Michael Tuexen
aab1d593b2 sctp: minor cleanups, no functional change intended 2021-09-08 15:13:49 +02:00
Alexander V. Chernikov
4b631fc832 routing: fix source address selection rules for IPv4 over IPv6.
Current logic always selects an IFA of the same family from the
 outgoing interfaces. In IPv4 over IPv6 setup there can be just
 single non-127.0.0.1 ifa, attached to the loopback interface.

Create a separate rt_getifa_family() to handle entire ifa selection
 for the IPv4 over IPv6.

Differential Revision: https://reviews.freebsd.org/D31868
MFC after:	1 week
2021-09-07 21:41:05 +00:00
Mark Johnston
c4b44adcf0 sctp: Remove special handling for a listen(2) backlog of 0
... when applied to one-to-one-style sockets.  sctp_listen() cannot be
used to toggle the listening state of such a socket.  See RFC 6458's
description of expected listen(2) semantics for one-to-one- and
one-to-many-style sockets.

Reviewed by:	tuexen
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31774
2021-09-07 17:12:09 -04:00
Mark Johnston
bd4a39cc93 socket: Properly interlock when transitioning to a listening socket
Currently, most protocols implement pru_listen with something like the
following:

	SOCK_LOCK(so);
	error = solisten_proto_check(so);
	if (error) {
		SOCK_UNLOCK(so);
		return (error);
	}
	solisten_proto(so);
	SOCK_UNLOCK(so);

solisten_proto_check() fails if the socket is connected or connecting.
However, the socket lock is not used during I/O, so this pattern is
racy.

The change modifies solisten_proto_check() to additionally acquire
socket buffer locks, and the calling thread holds them until
solisten_proto() or solisten_proto_abort() is called.  Now that the
socket buffer locks are preserved across a listen(2), this change allows
socket I/O paths to properly interlock with listen(2).

This fixes a large number of syzbot reports, only one is listed below
and the rest will be dup'ed to it.

Reported by:	syzbot+9fece8a63c0e27273821@syzkaller.appspotmail.com
Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31659
2021-09-07 17:11:43 -04:00
Mark Johnston
f94acf52a4 socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK().  These operate on a
socket rather than a socket buffer.  Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket.  Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before.  In
particular, add checks to sendfile() and sorflush().

Reviewed by:	tuexen, gallatin
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31657
2021-09-07 15:06:48 -04:00
Mark Johnston
173a7a4ee4 sctp: Fix iterator synchronization in sctp_sendall()
- The SCTP_PCB_FLAGS_SND_ITERATOR_UP check was racy, since two threads
  could observe that the flag is not set and then both set it.  I'm not
  sure if this is actually a problem in practice, i.e., maybe there's no
  problem having multiple sends for a single PCB in the iterator list?
- sctp_sendall() was modifying sctp_flags without the inp lock held.

The change simply acquires the PCB write lock before toggling the flag,
fixing both problems.

Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31813
2021-09-07 11:19:29 -04:00
Mark Johnston
e8e23ec127 sctp: Remove an unused sctp_inpcb field
This appears to be unused in usrsctp as well.  No functional change
intended.

Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31812
2021-09-07 11:19:29 -04:00
Mark Johnston
c17b531bed sctp: Fix races around sctp_inpcb_free()
sctp_close() and sctp_abort() disassociate the PCB from its socket.
As a part of this, they attempt to free the PCB, which may end up
lingering.  Fix some bugs in this area:

- For some reason, sctp_close() and sctp_abort() set
  SCTP_PCB_FLAGS_SOCKET_GONE using an atomic compare-and-set without the
  PCB lock held.  This is racy since sctp_flags is normally updated
  without atomics, using the PCB lock to synchronize.  So, the update
  can be lost, which can cause all sort of races with other SCTP
  components which look for the _GONE flag.  Fix the problem simply by
  acquiring the PCB lock in order to set the flag.  Note that we have to
  drop and re-acquire the lock again in sctp_inpcb_free(), but I don't
  see a good way around that for now.  If it's a real problem, the _GONE
  flag could be split out of sctp_flags and into a dedicated sctp_inpcb
  field.
- In sctp_inpcb_free(), load sctp_socket after acquiring the PCB lock,
  to avoid possible races with parallel sctp_inpcb_free() calls.
- Add an assertion sctp_inpcb_free() to verify that _ALLGONE is not set.

Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31811
2021-09-07 11:19:29 -04:00
Alexander V. Chernikov
936f4a42fa lltable: do not require prefix lookup when checking lle allocation rules.
With the new FIB_ALGO infrastructure, nearly all subsystems use
 fib[46]_lookup() functions, which provides lockless lookups.
A number of places remains that uses old-style lookup functions, that
 still requires RIB read lock to return the result. One of such places
 is arp processing code.
FIB_ALGO implementation makes some tradeoffs, resulting in (relatively)
 prolonged periods of holding RIB_WLOCK. If the lock is held and datapath
 competes for it, the RX ring may get blocked, ending in traffic delays and losses.
As currently arp processing is performed directly in the interrupt handler,
 handling ARP replies triggers the problem descibed above when the amount of
 ARP replies is high.

To be more specific, prior to creating new ARP entry, routing lookup for the entry
 address in interface fib is executed. The following conditions are the verified:

1. If lookup returns an empty result, or the resulting prefix is non-directly-reachable,
 failure is returned. The only exception are host routes w/ gateway==address.
2. If the routing lookup returns different interface and non-host route,
 we want to support the use case of having multiple interfaces with the same prefix.
 In fact, the current code just checks if the returned prefix covers target address
 (always true) and effectively allow allocating ARP entries for any directly-reachable prefix,
 regardless of its interface.

Change the code to perform the following:

1) use fib4_lookup() to get the nexthop, instead of requesting exact prefix.
2) Rewrite first condition check using nexthop flags (1:1 match)
3) Rewrite second condition to check for interface addresses matching target address on
 the input interface.

Differential Revision: https://reviews.freebsd.org/D31824
Reviewed by:	ae
MFC after:	1 week
PR:	257965
2021-09-06 21:03:22 +00:00
Gordon Bergling
631504fb34 Fix a common typo in source code comments
- s/existant/existent/

MFC after:	3 days
2021-09-04 12:56:57 +02:00
Mark Johnston
c98bf2a45e sctp: Always check for a vanishing inpcb when processing COOKIE-ECHO
We previously did this only in the normal case where no association
exists yet.  However, it is not safe to process COOKIE-ECHO even if an
association exists, as sctp_process_cookie_existing() may dereference
the socket pointer.

See also commit 0c7dc84076b64ef74c24f04400d572f75ef61bb4.

Reviewed by:	tuexen
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31755
2021-09-01 10:28:17 -04:00