Move the nd_defrouter along with the sysctl handler from nd6.c to
nd6_rtr.c and make the variable file static. Provide (temporary)
new accessor functions for code manipulating nd_defrouter from nd6.c,
and stop exporting functions no longer needed outside nd6_rtr.c.
This also shuffles a few functions around in nd6_rtr.c without
functional changes.
Given all nd_defrouter logic is now in one place we can tidy up the
code, locking and, and other open items.
MFC after: 3 weeks
X-MFC: keep exporting the functions
Sponsored by: Netflix
In preparation for another change factor out various variable cleanups.
These mainly include:
(1) do not assign values to variables during declaration: this makes
the code more readable and does allow for better grouping of
variable declarations,
(2) do not assign values to variables before need; e.g., if a variable
is only used in the 2nd half of a function and we have multiple
return paths before that, then do not set it before it is needed, and
(3) try to avoid assigning the same value multiple times.
MFC after: 3 weeks
Sponsored by: Netflix
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
Currently rinit1() and its IPv6 counterpart
nd6_prefix_onlink_rtrequest() uses dummy null_sdl gateway address
during route insertion and change it afterwards. This behaviour
brings complications to the routing stack and the users of its
upcoming notification system.
This change fixes both rinit1() and nd6_prefix_onlink_rtrequest()
by filling in proper gateway in the beginning. It does not change any
of the userland notifications as in both cases, they happen after
the insertion and fixup process (rt_newaddrmsg_fib() and nd6_rtmsg()).
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20328
When we roam between networks and our link-state goes down, automatically remove
the IPv6-Only flag from the interface. Otherwise we might switch from an
IPv6-only to and IPv4-only network and the flag would stay and we would prevent
IPv4 from working.
While the actual function call to clear the flag is under EXPERIMENTAL,
the eventhandler is not as we might want to re-use it for other
functionality on link-down event (such was re-calculate default routers
for example if there is more than one).
Reviewed by: hrs
Differential Revision: https://reviews.freebsd.org/D19487
All changes are hidden behind the EXPERIMENTAL option and are not compiled
in by default.
Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode
manually without router advertisement options. This will allow developers to
test software for the appropriate behaviour even on dual-stack networks or
IPv6-Only networks without the option being set in RA messages.
Update ifconfig to allow setting and displaying the flag.
Update the checks for the filters to check for either the automatic or the manual
flag to be set. Add REVARP to the list of filtered IPv4-related protocols and add
an input filter similar to the output filter.
Add a check, when receiving the IPv6-Only RA flag to see if the receiving
interface has any IPv4 configured. If it does, ignore the IPv6-Only flag.
Add a per-VNET global sysctl, which is on by default, to not process the automatic
RA IPv6-Only flag. This way an administrator (if this is compiled in) has control
over the behaviour in case the node still relies on IPv4.
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.
If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.
The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.
Further changes to tcpdump (contrib code) are availble and will
be upstreamed.
Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).
We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set. Also we might want to start
IPv6 before IPv4 in the future.
All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.
Dear 6man, you have running code.
Discussed with: Bob Hinden, Brian E Carpenter
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU. Update the routes so they do not provide
stale MTUs.
This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.
Reported and tested by: Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by: dab, melifaro
Discussed with: ae
MFC after: 1 week
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D14257
that had the IPv6 fragmentation header:
o Neighbor Solicitation
o Neighbor Advertisement
o Router Solicitation
o Router Advertisement
o Redirect
Introduce M_FRAGMENTED mbuf flag, and set it after IPv6 fragment reassembly
is completed. Then check the presence of this flag in correspondig ND6
handling routines.
PR: 224247
MFC after: 2 weeks
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
sys/netinet6/icmp6.c
Use the interface's FIB for source address selection in ICMPv6 error
responses.
sys/netinet6/in6.c
In in6_newaddrmsg, announce arrival of local addresses on the
interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6
cache instead of a single cache.
sys/netinet6/in6_src.c
In in6_selectsrc, use the caller's fib instead of the default fib.
In in6_selectsrc_socket, remove a superfluous check.
sys/netinet6/nd6.c
In nd6_lle_event, use the interface's fib for routing socket
messages. In nd6_is_new_addr_neighbor, check all FIBs when trying
to determine whether an address is a neighbor. Also, simplify the
code for point to point interfaces.
sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
Make defrouter_select fib-aware, and make all of its callers pass in
the interface fib.
sys/netinet6/nd6_nbr.c
When inputting a Neighbor Solicitation packet, consider the
interface fib instead of the default fib for DAD. Output NS and
Neighbor Advertisement packets on the correct fib.
sys/netinet6/nd6_rtr.c
Allow installing the same host route on different interfaces in
different FIBs. If rt_add_addr_allfibs=0, only install or delete
the prefix route on the interface fib.
tests/sys/netinet/fibs_test.sh
Clear some expected failures, but add a skip for the newly revealed
BUG217871.
PR: 196361
Submitted by: Erick Turnquist <jhujhiti@adjectivism.org>
Reported by: Jason Healy <jhealy@logn.net>
Reviewed by: asomers
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9451
The caller may unlink a prefix before purging referencing addresses. An
identical assertion in nd6_prefix_del() verifies that the addresses are
purged before the prefix is freed.
PR: 215372
X-MFC With: r306829
This change extends the nd6 lock to protect the ND prefix list as well
as the list of advertising routers associated with each prefix. To handle
cases where the nd6 lock must be dropped while iterating over either the
prefix or default router lists, a generation counter is used to track
modifications to the lists. Additionally, a new mutex is used to serialize
prefix on-link/off-link transitions. This mutex must be acquired before
the nd6 lock and is held while updating the routing table in
nd6_prefix_onlink() and nd6_prefix_offlink().
Reviewed by: ae, tuexen (SCTP bits)
Tested by: Jason Wolfe <jason@llnw.com>,
Larry Rosenman <ler@lerctr.org>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D8125
nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.
MFC after: 1 week
This allows some simplification of its callers. No functional change
intended.
Tested by: Larry Rosenman (as part of a larger change)
MFC after: 1 month
This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.
PR: 206904
Tested by: Larry Rosenman <ler@lerctr.org>,
Kevin Bowling <kevin.bowling@kev009.com>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D5315
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
with different requirements. In fact, first 3 don't have _any_ requirements
and first 2 does not use radix locking. On the other hand, routing
structure do have these requirements (rnh_gen, multipath, custom
to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.
So, radix code now uses tiny 'struct radix_head' structure along with
internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
Existing consumers still uses the same 'struct radix_node_head' with
slight modifications: they need to pass pointer to (embedded)
'struct radix_head' to all radix callbacks.
Routing code now uses new 'struct rib_head' with different locking macro:
RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
information base).
New net/route_var.h header was added to hold routing subsystem internal
data. 'struct rib_head' was placed there. 'struct rtentry' will also
be moved there soon.
by filter function instead of picking into routing table details in
each consumer.
Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED
user).
This simplifies future nexthops/mulitipath changes and rtrequest1_fib()
locking refactoring.
Actual changes:
Add "rt_chain" field to permit rte grouping while doing batched delete
from routing table (thus growing rte 200->208 on amd64).
Add "rti_filter" / "rti_filterdata" / "rti_spare" fields to rt_addrinfo
to pass filter function to various routing subsystems in standard way.
Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate
rt_expunge().
On receipt of a redirect message, install an interface route for the
redirected destination. On removal of the corresponding Neighbor Cache
entry, remove the interface route.
This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.
This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.
Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.
When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE. cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi. (Thanks!)
These other test cases also passed in 2012:
* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
RTF_HOST|RTF_GATEWAY route for the destination)
* the redirected-to-self case, with IPv4 and IPv6
* a valid IPv4 redirect
All testing in 2012 was done with WITNESS and INVARIANTS.
Tested by: EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
Mark Kelley <mark_kelley@dell.com> in 2012,
TC Telkamp <terence_telkamp@dell.com> in 2012
PR: 152791
Reviewed by: melifaro (current rev), bz (earlier rev)
Approved by: kib (mentor)
MFC after: 1 month
Relnotes: yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
PF_INET6 in kernel. This fixes various malfunction when the wall time
clock is changed. Bump __FreeBSD_version to 1000041.
- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.
MFC after: 1 month