This function was originally carved out of in6_pcbbind(), which
is in in6_pcb.c. This function also uses KPI private to the PCB
database - in_pcb_lport().
Currently ip6_input() calls in6ifa_ifwithaddr() for
every local packet, in order to check if the target ip
belongs to the local ifa in proper state and increase
its counters.
in6ifa_ifwithaddr() references found ifa.
With epoch changes, both `ip6_input()` and all other current callers
of `in6ifa_ifwithaddr()` do not need this reference
anymore, as epoch provides stability guarantee.
Given that, update `in6ifa_ifwithaddr()` to allow
it to return ifa without referencing it, while preserving
option for getting referenced ifa if so desired.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28648
in6_selectsrc() may call fib6_lookup() in some cases, which requires
epoch. Wrap in6_selectsrc* calls into epoch inside its users.
Mark it as requiring epoch by adding NET_EPOCH_ASSERT().
MFC after: 1 weeek
Differential Revision: https://reviews.freebsd.org/D28647
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.
With no callers remaining, remove fib[46]_lookup_nh_ functions.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445
It was broken by r360292 as fib6_lookup() assumes de-embedded addresses
while rtalloc_mpath_fib() requires sockaddr with embedded ones.
New fib6_lookup() transparently supports multipath, hence
remove old RADIX_MPATH condition.
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information
to perform packet forwarding such as gateway interface and mtu. Nexthops
are shared among the routes, providing more pre-computed cache-efficient
data while requiring less memory. Splitting the LPM code and the attached
data solves multiple long-standing problems in the routing layer,
drastically reduces the coupling with outher parts of the stack and allows
to transparently introduce faster lookup algorithms.
Route caching was (re)introduced to minimise (slow) routing lookups, allowing
for notably better performance for large TCP senders. Caching works by
acquiring rtentry reference, which is protected by per-rtentry mutex.
If the routing table is changed (checked by comparing the rtable generation id)
or link goes down, cache record gets withdrawn.
Nexthops have the same reference counting interface, backed by refcount(9).
This change merely replaces rtentry with the actual forwarding nextop as a
cached object, which is mostly mechanical. Other moving parts like cache
cleanup on rtable change remains the same.
Differential Revision: https://reviews.freebsd.org/D24340
In r231852 I added in6_selectroute_fib() as a compat function with the
fibnum as an extra argument compared to in6_selectroute() to keep the
KPI stable.
Way too late retire this function again and add the fib to in6_selectroute()
which also only has a single consumer now and was an orphan function before.
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
Finish what was started a few years ago and harmonize IPv6 and IPv4
kernel names. We are down to very few places now that it is feasible
to do the change for everything remaining with causing too much disturbance.
Remove "aliases" for IPv6 names which confusingly could indicate
that we are talking about a different data structure or field or
have two fields, one for each address family.
Try to follow common conventions used in FreeBSD.
* Rename sin6p to sin6 as that is how it is spelt in most places.
* Remove "aliases" (#defines) for:
- in6pcb which really is an inpcb and nothing separate
- sotoin6pcb which is sotoinpcb (as per above)
- in6p_sp which is inp_sp
- in6p_flowinfo which is inp_flow
* Try to use ia6 for in6_addr rather than in6p.
* With all these gone also rename the in6p variables to inp as
that is what we call it in most of the network stack including
parts of netinet6.
The reasons behind this cleanup are that we try to further
unify netinet and netinet6 code where possible and that people
will less ignore one or the other protocol family when doing
code changes as they may not have spotted places due to different
names for the same thing.
No functional changes.
Discussed with: tuexen (SCTP changes)
MFC after: 3 months
Sponsored by: Netflix
since r286195.
Do not forget results of route lookup and initialize rt and ifp pointers.
PR: 238098
Submitted by: Masse Nicolas <nicolas.masse at stormshield eu>
MFC after: 1 week
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).
This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@
for the substantive feedback that is included in this commit.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Obtained from: DragonflyBSD
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.
Most of the code was copied from a similar patch for DragonflyBSD.
However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.
Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.
Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).
Submitted by: Johannes Lundberg <johanlun0@gmail.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.
Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
sys/netinet6/icmp6.c
Use the interface's FIB for source address selection in ICMPv6 error
responses.
sys/netinet6/in6.c
In in6_newaddrmsg, announce arrival of local addresses on the
interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6
cache instead of a single cache.
sys/netinet6/in6_src.c
In in6_selectsrc, use the caller's fib instead of the default fib.
In in6_selectsrc_socket, remove a superfluous check.
sys/netinet6/nd6.c
In nd6_lle_event, use the interface's fib for routing socket
messages. In nd6_is_new_addr_neighbor, check all FIBs when trying
to determine whether an address is a neighbor. Also, simplify the
code for point to point interfaces.
sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
Make defrouter_select fib-aware, and make all of its callers pass in
the interface fib.
sys/netinet6/nd6_nbr.c
When inputting a Neighbor Solicitation packet, consider the
interface fib instead of the default fib for DAD. Output NS and
Neighbor Advertisement packets on the correct fib.
sys/netinet6/nd6_rtr.c
Allow installing the same host route on different interfaces in
different FIBs. If rt_add_addr_allfibs=0, only install or delete
the prefix route on the interface fib.
tests/sys/netinet/fibs_test.sh
Clear some expected failures, but add a skip for the newly revealed
BUG217871.
PR: 196361
Submitted by: Erick Turnquist <jhujhiti@adjectivism.org>
Reported by: Jason Healy <jhealy@logn.net>
Reviewed by: asomers
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9451
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Currently we don't keep zoneid in in6_ifaddr structure, because there
is still some code, that doesn't properly initialize sin6_scope_id,
but some functions use sa_equal() for addresses comparison. sa_equal()
compares full sockaddr_in6 structures and such comparison will fail.
For now use zero zoneid in in6ifa_ifwithaddr(). It is safe, because
used address is in embedded form. In future we will use zoneid, so mark it
with XXX comment.
Reported by: kp
Tested by: kp
pointer isn't NULL, it is safe, because we are handling IPV6_PKTINFO
socket option in this block of code. Also, use in6ifa_withaddr() instead
of ifa_withaddr().
in6_selectsrc() has 2 class of users: socket-based one (raw/udp/pcb/etc) and
socket-less (ND code). The main reason for that change is inability to
specify non-default FIB for callers w/o socket since (internally) inpcb
is used to determine fib.
As as result, add 2 wrappers for in6_selectsrc() (making in6_selectsrc()
static):
1) in6_selectsrc_socket() for the former class. Embed scope_ambiguous check
along with returning hop limit when needed.
2) in6_selectsrc_addr() for the latter case. Add 'fibnum' argument and
pass IPv6 address w/ explicitly specified scope as separate argument.
Reviewed by: ae (previous version)
in6_selectif().
The main task of in6_selectsrc() is to return IPv6 SAS (along with
output interface used for scope checks). No data-path code uses
route argument for caching. The only users are icmp6 (reflect code),
ND6 ns/na generation code. All this fucntions are control-plane, so
there is no reason to try to 'optimize' something by passing cached
route into to ip6_output(). Given that, simplify code by eliminating
in6_selectsrc() 'struct route_in6' argument. Since in6_selectif() is
used only by in6_selectsrc(), eliminate its 'struct route_in6' argument,
too. While here, reshape rte-related code inside in6_selectif() to
free lookup result immediately after saving all the needed fields.
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.
Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
return IPv6 destination address with embedded scope. Currently
rw_gateway has it scope embedded, do the same for non-gatewayed
destinations.
Sponsored by: Yandex LLC
o remove disabled code;
o if nexthop address is link-local, use embedded scope zone id to
determine outgoing interface;
o properly fill ro_dst before doing route lookup;
o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.
Sponsored by: Yandex LLC
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.
Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
It affects the IPv6 source address selection algorithm (RFC 6724)
and allows override the last rule ("longest matching prefix") for
choosing among equivalent addresses. The address with `prefer_source'
will be preferred source address.
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
for the set of IPv6 addresses. Now each attempt goes into IPv6 statistics,
even if given rule did not won. Change this and take into account only
those rules, that won. Also add accounting for cases, when algorithm
fails to select an address.
It stops treating the address on the interface as special by source
address selection rule even when the interface is outgoing interface.
This is desired in some situation.
Requested by: hrs
Reviewed by: IHANet folks including hrs
MFC after: 1 week
Simplify the code removing a return from an earlier else case,
not differing from the default function return called now.
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
Reviewed by: gnn (as part of the whole)
MFC After: 3 days