Commit Graph

2024 Commits

Author SHA1 Message Date
Alexander V. Chernikov
aaad3c4fca Convert rtentry field accesses into nhop field accesses.
One of the goals of the new routing KPI defined in r359823 is to entirely
 hide`struct rtentry` from the consumers. It will allow to improve routing
 subsystem internals and deliver more features much faster.

This commit is mostly mechanical change to eliminate direct struct rtentry
 field accesses.

The only notable difference is AF_LINK gateway encoding.

AF_LINK gw is used in routing stack for operations with interface routes
 and host loopback routes.
In the former case it indicates _some_ non-NULL gateway, as the interface
 is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting.
In the latter case the interface index inside gateway was used by the IPv6
 datapath to verify address scope for link-local interfaces.

Kernel uses struct sockaddr_dl for this type of gateway. This structure
 allows for specifying rich interface data, such as mac address and interface
 name. However, this results in relatively large structure size - 52 bytes.
Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside
 in the first 8 bytes of the structure.

In the new KPI, struct nhop_object tries to be cache-efficient, hence
 embodies gateway address inside the structure. In the AF_LINK case it
 stores stortened version of the structure - struct sockaddr_dl_short,
 which occupies 16 bytes. After D24340 changes, the data inside AF_LINK
 gateway will not be used in the kernel at all, leaving rtsock as the only
 potential concern.

The difference in rtsock reporting:

(old)
got message of size 240 on Thu Apr 16 03:12:13 2020
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

(new)
got message of size 200 on Sun Apr 19 09:46:32 2020
RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

Note 40 bytes different (52-16 + alignment).
However, gateway is still a valid AF_LINK gateway with proper data filled in.

It is worth noting that these particular messages (interface routes) are mostly
 ignored by routing daemons:
* bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages.
* quagga/frr ignores routes without gateway

More detailed overview on how rtsock messages are used by the
 routing daemons to reconstruct the kernel view, can be found in D22974.

Differential Revision:	https://reviews.freebsd.org/D24519
2020-04-23 08:04:20 +00:00
Alexander V. Chernikov
d98351e13c Fix lookup key generation in fib6_check_urpf().
The version introduced in r359823 assumed D23051
 had been in tree already. As this is not the case yet,
 revert to sockaddr.
2020-04-19 07:27:12 +00:00
Jonathan T. Looney
5d6e356cb0 Avoid calling protocol drain routines more than once per reclamation event.
mb_reclaim() calls the protocol drain routines for each protocol in each
domain. Some protocols exist in more than one domain and share drain
routines. In the case of SCTP, it also uses the same drain routine for
its SOCK_SEQPACKET and SOCK_STREAM entries in the same domain.

On systems with INET, INET6, and SCTP all defined, mb_reclaim() calls
sctp_drain() four times. On systems with INET and INET6 defined,
mb_reclaim() calls tcp_drain() twice. mb_reclaim() is the only in-tree
caller of the pr_drain protocol entry.

Eliminate this duplication by ensuring that each pr_drain routine is only
specified for one protocol entry in one domain.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D24418
2020-04-16 20:17:24 +00:00
Alexander V. Chernikov
539642a29d Add nhop parameter to rti_filter callback.
One of the goals of the new routing KPI defined in r359823 is to
 entirely hide`struct rtentry` from the consumers. It will allow to
 improve routing subsystem internals and deliver more features much faster.
This change is one of the ongoing changes to eliminate direct
 struct rtentry field accesses.

Additionally, with the followup multipath changes, single rtentry can point
 to multiple nexthops.

With that in mind, convert rti_filter callback used when traversing the
 routing table to accept pair (rt, nhop) instead of nexthop.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24440
2020-04-16 17:20:18 +00:00
Alexander V. Chernikov
53a4886d5d Convert ip6_forward() to the new routing KPI.
Update ip6_forward() internals to use deembedded IPv6 addresses
 to simplify calls to the new KPI and prepare for the future
 scope-embedding cleanup.

Add in6_get_unicast_scopeid() and in6_set_unicast_scopeid() scopeid
 operation functions tailored for unicast processing.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24334
2020-04-15 12:56:05 +00:00
Alexander V. Chernikov
9ac7c6cfed Convert IP/IPv6 forwarding, ICMP processing and IP PCB laddr selection to
the new routing KPI.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24245
2020-04-14 23:06:25 +00:00
Alexander V. Chernikov
dd4776f0cc Reorganise nd6 notification code to avoid direct rtentry field access.
One of the goals of the new routing KPI defined in r359823 is to entirely hide
 `struct rtentry` from the consumers. Doing so will allow to improve routing
 subsystem internals and deliver features more easily. This change is one of
  the ongoing changes to eliminate direct struct rtentry field accesses.

It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification
 code to avoid accessing most of the rtentry fields.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24404
2020-04-14 22:48:33 +00:00
Andrew Gallatin
23feb56348 KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by:	hselasky, jhb, rrs, glebius (early version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24213
2020-04-14 14:46:06 +00:00
Alexander V. Chernikov
6722086045 Plug netmask NULL check during route addition causing kernel panic.
This bug was introduced by the r359823.

Reported by:	hselasky
2020-04-14 13:12:22 +00:00
Alexander V. Chernikov
3133002560 Remove tcp_rtlookup6() function signature.
The function itself was removed in r122922 16 years ago.
2020-04-13 08:26:11 +00:00
Alexander V. Chernikov
a666325282 Introduce nexthop objects and new routing KPI.
This is the foundational change for the routing subsytem rearchitecture.
 More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
 routing KPI.

Nexthops are objects, containing all necessary information for performing
 the packet output decision. Output interface, mtu, flags, gw address goes
 there. For most of the cases, these objects will serve the same role as
 the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
 multiple BGP full-views, as these objects will be shared between routing
 entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
  uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
 <in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions  and the previous
 fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
 exist within current NET_EPOCH. If longer lifetime is desired, one can
 specify NHR_REF as a flag and get a referenced version of the nexthop.
 Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
 inside variety of our firewalls. Their primary goal is to hide the multipath
 implementation details inside the routing subsystem, greatly simplifying
 firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
  uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
 embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
 * rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
 * rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
 decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
 kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by:	ae,glebius(initial version)
Differential Revision:	https://reviews.freebsd.org/D24232
2020-04-12 14:30:00 +00:00
Alexander V. Chernikov
c80b717f71 Remove RADIX_MPATH headers, they were unused since r293159.
MFC after:	2 weeks
2020-04-11 07:56:11 +00:00
Alexander V. Chernikov
4684d3cbcb Remove per-AF radix_mpath initializtion functions.
Split their functionality by moving random seed allocation
 to SYSINIT and calling (new) generic multipath function from
 standard IPv4/IPv5 RIB init handlers.

Differential Revision:	https://reviews.freebsd.org/D24356
2020-04-11 07:37:08 +00:00
Andrey V. Elsukov
cfad769689 Ignore ND6 neighbor advertisement received for static link-layer entries.
Previously such NA could override manually created LLE.

Reported by:	Martin Beran <martin at mber cz>
Reviewed by:	melifaro
MFC after:	10 days
2020-04-01 02:13:01 +00:00
Mark Johnston
431f2b8712 Use a dedicated taskqueue thread for in6m_release_task().
Interfaces may be detached from a taskqueue_thread task, for example by
prison_complete(), so after r359438, when draining the queue we may end
up deadlocking.

Reported by:	Jenkins via lwhsu
MFC with:	r359438
2020-03-31 02:25:53 +00:00
Mark Johnston
9b1d850be8 Remove the "config" taskqgroup and its KPIs.
Equivalent functionality is already provided by taskqueue(9), just use
that instead.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-03-30 14:24:03 +00:00
Mark Johnston
e02582d1ae Fix synchronization in the IPV6_2292PKTOPTIONS set handler.
The inpcb needs to be locked when we update output packet options.
Otherwise it is possible for the IPV6_2292PKTOPTIONS handler to free
packet option structures while another thread is reading or updating
them.

Note that the option handler is still kind of broken.  For instance it
frees all options before performing privilege checks for individual
options.  However, this can be fixed separately.

Reported by:	syzbot+52eb0fd4ddc119787f9d@syzkaller.appspotmail.com
Reviewed by:	bz, tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D24125
2020-03-19 21:38:52 +00:00
Bjoern A. Zeeb
8483fce695 ip6: retire in6_selectroute_fib() as promised 8 years ago
In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.
2020-03-03 13:48:12 +00:00
Bjoern A. Zeeb
000c42faf3 ip6_output: use new routing KPI when not passed a cached route
Implement the equivalent of r347375 (IPv4) for the IPv6 output path.
In IPv6 we get passed a cached route (and inp) by udp6_output()
depending on whether we acquired a write lock on the INP.
In case we neither bind nor connect a first UDP packet would come in
with a cached route (wlocked) and all further packets would not.
In case we bind and do not connect we never write-lock the inp.

When we do not pass in a cached route, rather than providing the
storage for a route locally and pass it over the old lookup code
and down the stack, use the new route lookup KPI and acquire all
details we need to send the packet.

Compared to the IPv4 code the IPv6 code has a couple of possible
complications: given an option with a routing hdr/caching route there,
and path mtu (ro_pmtu) case which now equally has to deal with the
possibility of having a route which is NULL passed in, and the
fwd_tag in case a firewall changes the next hop (something to
factor out in the future).

Sponsored by:	Netflix
Reviewed by:	glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D23886
2020-03-03 11:32:47 +00:00
Bjoern A. Zeeb
5f3e375ed8 in6_fib: return nh_ia in the ext interface as we do for IPv4
Like for IPv4 add nh_ia to the ext interface and return rt_ifa
in order to be used for, e.g., packet/octets accounting purposes.

Reviewed by:	melifaro
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D23873
2020-03-03 09:50:33 +00:00
Bjoern A. Zeeb
f6428cdb1f fib6_rte_to_nh_*: return a link-local gw address with scope embedded
In fib6_rte_to_nh_* when returning a link-local gateway address
currently we do clear the scope. That could be recovered using
the ifp returned as well, but the code in general seems to
expect a link-local address with scope embeedded as otherwise
the "dst" (gw) passed to the output routines will not include
scope and not send the packet out (the right interface).

Do not clear the scope when returning a link-local address and
allow packets to go out (the right interface).

Remove the (now) extra scope recovery in the IPv6 fast-fwd code.

Sponsored by:	Netflix
Reviewed by:	melifaro, ae
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23872
2020-03-03 09:45:16 +00:00
Bjoern A. Zeeb
f1db666a61 mld6: initialize oifp to avoid bogus results/panics in edge cases
In certain cases (probably not during normal operation but observed in
the lab during development) ip6_ouput() could return without error
and ifpp (&oifp) not updated.
Given oifp was never initialized we would take the later branch
as oifp was not NULL, and when calling icmp6_ifstat_inc() we would
panic dereferencing a garbage pointer.
For code stability initialize oifp to NULL before first use to always
have a deterministic value and not rely on a called function to behave
and always and for ever do the work for us as we hope for.

MFC after:	3 days
Sponsored by:	Netflix
2020-02-28 11:16:41 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Bjoern A. Zeeb
3db6053160 ip6_output: fix regression introduced in r358167 for ipv6 fragmentation
When moving the calculations for the optlen into the if (opt) block
which deals with possible extension headers I failed to initialise
unfragpartlen to the ipv6 header length if there were no extension
headers present.  Correct that mistake to make IPv6 fragment length
calculcations work again.

Reported by:	hselasky, kp
OKed by:	hselasky, kp
MFC after:	3 days
X-MFC with:	r358167
PR:		244393
2020-02-25 15:03:41 +00:00
Bjoern A. Zeeb
3459050c9a Fix IPv6 checksums when exthdrs are present.
In two places in ip6_output we are doing (delayed) checksum calculations.
The initial logic came from SCTP in r205075,205104 and later I copied
and adjusted it for the TCP|UDP case in r235958.
The problem was that the original SCTP offsets were already wrong for any
case with extension headers present given IPv6 extension headers are not
part of the pseudo checksum calculations.
The later changes do not help in case there is checksum offloading as for
extension headers (incl. fragments) we do currrently never offload as we
have no infrastructure to know whether the NIC can handle these cases.

Correct the offsets for delayed checksum calculations and properly handle
mbuf flags.  In addition harmonize the almost identical duplicate code.

While here eliminate the now unneeded variable hlen and add an always
missing mtod() call in the 1-b and 3 cases after the introduction of
the mb_unmapped_to_ext() calls.

Reported by:	Francis Dupont (fdupont isc.org)
PR:		243675
MFC after:	6 days
Reviewed by:	markj (earlier version), gallatin
Differential Revision:	https://reviews.freebsd.org/D23760
2020-02-24 19:12:20 +00:00
Pawel Biernacki
295a18d184 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (14 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23639
2020-02-24 10:47:18 +00:00
Bjoern A. Zeeb
a1a6c01e41 ip6_output: improve extension header handling
Move IPv6 source address checks from after extension header heandling
to the top of the function. If we do not pass these checks there is
no reason to do a lot of work upfront.

Fold extension header preparations and length calculations together into
a single branch and macro rather than doing them sequentially.
Likewise move extension header concatination into a single branch block
only doing it if we recorded any extension header length length.

Reviewed by:	melifaro (earlier version), markj, gallatin
Sponsored by:	Netflix (partially, originally)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D23740
2020-02-20 10:56:12 +00:00
Michael Tuexen
868b51f234 Epochify SCTP. 2020-02-18 21:25:17 +00:00
Bjoern A. Zeeb
7c1daefe2c ip6_output: update comments.
Clear up some comments and improve to panic messages.

No functional changes.

MFC after:	3 days
2020-02-18 11:28:00 +00:00
Hans Petter Selasky
bacb11c9ed Fix kernel panic while trying to read multicast stream.
When VIMAGE is enabled make sure the "m_pkthdr.rcvif" pointer is set
for all mbufs being input by the IGMP/MLD6 code. Else there will be a
NULL-pointer dereference in the netisr code when trying to set the
VNET based on the incoming mbuf. Add an assert to catch this when
queueing mbufs on a netisr to make debugging of similar cases easier.

Found by:	Vladislav V. Prodan
PR:		244002
Reviewed by:	bz@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-02-17 09:46:32 +00:00
Navdeep Parhar
c53c867eb3 Fix NOINET builds. 2020-01-31 02:23:48 +00:00
Gleb Smirnoff
e617b21d2f Enter network epoch when calling in_pcbconnect() for IPv6 mapped to IPv4
UDP sockets.  This is miss from r356983.

Reported by:	https://syzkaller.appspot.com/bug?id=73c7a2e3f0783f9947459065e5c2f25fe8f82f54
2020-01-22 17:06:55 +00:00
Alexander V. Chernikov
34a5582c47 Bring back redirect route expiration.
Redirect (and temporal) route expiration was broken a while ago.
This change brings route expiration back, with unified IPv4/IPv6 handling code.

It introduces net.inet.icmp.redirtimeout sysctl, allowing to set
 an expiration time for redirected routes. It defaults to 10 minutes,
 analogues with net.inet6.icmp6.redirtimeout.

Implementation uses separate file, route_temporal.c, as route.c is already
 bloated with tons of different functions.
Internally, expiration is implemented as an per-rnh callout scheduled when
 route with non-zero rt_expire time is added or rt_expire is changed.
 It does not add any overhead when no temporal routes are present.

Callout traverses entire routing tree under wlock, scheduling expired routes
 for deletion and calculating the next time it needs to be run. The rationale
 for such implemention is the following: typically workloads requiring large
 amount of routes have redirects turned off already, while the systems with
 small amount of routes will not inhibit large overhead during tree traversal.

This changes also fixes netstat -rn display of route expiration time, which
 has been broken since the conversion from kread() to sysctl.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23075
2020-01-22 13:53:18 +00:00
Gleb Smirnoff
b955545386 Make ip6_output() and ip_output() require network epoch.
All callers that before may called into these functions
without network epoch now must enter it.
2020-01-22 05:51:22 +00:00
Gleb Smirnoff
bab98355f9 Add some documenting NET_EPOCH_ASSERTs. 2020-01-22 02:37:47 +00:00
Gleb Smirnoff
f6a2a6b163 Unroll macro that is used just once. Not a functional change. 2020-01-22 02:35:39 +00:00
Alexander V. Chernikov
16c2f24169 Document requirements for the 'struct route' variations.
MFC after:	2 weeks
2020-01-21 12:00:34 +00:00
Gleb Smirnoff
2a4bd982d0 Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch.   The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
2020-01-15 06:05:20 +00:00
Gleb Smirnoff
97168be809 Mechanically substitute assertion of in_epoch(net_epoch_preempt) to
NET_EPOCH_ASSERT(). NFC
2020-01-15 05:45:27 +00:00
Michael Tuexen
fe1274ee39 Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was
taken:
1) The local address was filled in, then the inp was added to the hash
   table.
2) The remote address was filled in and the inp was relocated in the
   hash table.
Before the epoch changes, a write lock was held when this happens and
the code looking up entries was holding a corresponding read lock.
Since the read lock is gone away after the introduction of the
epochs, the half populated inp was found during lookup.
This resulted in processing TCP segments in the context of the wrong
TCP connection.
This patch changes the above procedure in a way that the inp is fully
populated before inserted into the hash table.

Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@
mailing list and for testing the patch!

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D22971
2020-01-12 17:52:32 +00:00
Bjoern A. Zeeb
c6feea3b89 nd6_rtr: constantly use __func__ for nd6log()
Over time one or two hard coded function names did not match the
actual function anymore.  Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.

MFC after:	2 weeks
2020-01-12 17:41:09 +00:00
Bjoern A. Zeeb
25ebfe3350 nd6_rtr: make nd6_prefix_onlink() static
nd6_prefix_onlink() is not used anywhere outside nd6_rtr.c.  Stop
exporting it and make it file local static.
2020-01-12 16:58:21 +00:00
Bjoern A. Zeeb
e1891232fc in6_mcast: make in6_joingroup_locked() static
in6_joingroup_locked() is only used file-local. No need to export it
hance make it static.
2020-01-11 18:55:12 +00:00
Alexander V. Chernikov
ead85fe415 Add fibnum, family and vnet pointer to each rib head.
Having metadata such as fibnum or vnet in the struct rib_head
 is handy as it eases building functionality in the routing space.
This change is required to properly bring back route redirect support.

Reviewed by:	bz
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D23047
2020-01-09 17:21:00 +00:00
Bjoern A. Zeeb
334fc5822b vnet: virtualise more network stack sysctls.
Virtualise tcp_always_keepalive, TCP and UDP log_in_vain.  All three are
set in the netoptions startup script, which we would love to run for VNETs
as well [1].

While virtualising the log_in_vain sysctls seems pointles at first for as
long as the kernel message buffer is not virtualised, it at least allows
an administrator to debug the base system or an individual jail if needed
without turning the logging on for all jails running on a system.

PR:		243193 [1]
MFC after:	2 weeks
2020-01-08 23:30:26 +00:00
Alexander V. Chernikov
e02d3fe70c Fix rtsock route message generation for interface addresses.
Reviewed by:	olivier
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22974
2020-01-07 21:16:30 +00:00
Gleb Smirnoff
e00ee1a9f4 In r343631 error code for a packet blocked by a firewall was
changed from EACCES to EPERM.  This change was not intentional,
so fix that.  Return EACCESS if a firewall forbids sending.

Noticed by:	ae
2020-01-01 17:32:20 +00:00
Alexander V. Chernikov
bdb214a4a4 Remove useless code from in6_rmx.c
The code in questions walks IPv6 tree every 60 seconds and looks into
 the routes with non-zero expiration time (typically, redirected routes).
For each such route it sets RTF_PROBEMTU flag at the expiration time.
No other part of the kernel checks for RTF_PROBEMTU flag.

RTF_PROBEMTU was defined 21 years ago, 30 Jun 1999, as RTF_PROTO1.
RTF_PROTO1 is a de-facto standard indication of a route installed
 by a routing daemon for a last decade.

Reviewed by:	bz, ae
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D22865
2019-12-18 22:10:56 +00:00
Hans Petter Selasky
a4c5668d12 Leave multicast group before reaping and committing state for both
IPv4 and IPv6.

This fixes a regression issue after r349369. When trying to exit a
multicast group before closing the socket, a multicast leave packet
should be sent.

Differential Revision:	https://reviews.freebsd.org/D22848
PR: 242677
Reviewed by:	bz (network)
Tested by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-12-18 12:06:34 +00:00
Bjoern A. Zeeb
74ff87cd16 Update comment.
Update the comment related to SIIT and v4mapped addresses being rejected
by us when coming from the wire given we have supported IPv6-only kernels
for a few years now.
See also draft-itojun-v6ops-v4mapped-harmful.

Suggested by:	melifaro
MFC after:	2 weeks
2019-12-06 16:53:42 +00:00