Commit Graph

174 Commits

Author SHA1 Message Date
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Alexander V. Chernikov
7b3440fc30 Revert "routing: install prefix and loopback routes using new nhop-based KPI."
Temporarily revert the commit to unblock testing.

This reverts commit a1b59379db.
2022-08-29 16:20:42 +00:00
Alexander V. Chernikov
835a611e68 routing: make IPv6 defrouter code use new nhop-based KPI.
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36168
2022-08-29 10:08:47 +00:00
Alexander V. Chernikov
a1b59379db routing: install prefix and loopback routes using new nhop-based KPI.
Construct the desired hexthops directly instead of using the
 "translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
 rib_copy_route() to propagate the routes to the non-primary address
 fibs.

MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D36166
2022-08-29 10:07:58 +00:00
Alexander V. Chernikov
8036234c72 netinet6: fix SIOCSPFXFLUSH_IN6 by skipping manually-configured prefixes
Summary:
Currently netinet6/ code allocates IPv6 prefixes (nd_prefix) for
 both manually-assigned addresses and advertised prefixes. As a result,
 prefixes from manually-assigned prefixes can be seen in `ndp -p` list
 and be cleared via `ndp -P`. The latter relies on the SIOCSPFXFLUSH_IN6
 ioctl to clear to prefix list.
The original intent of the SIOCSPFXFLUSH_IN6 was to clear prefixes
 originated from the advertising routers:

```
1998-09-02  JINMEI, Tatuya  <jinmei@isl.rdc.toshiba.co.jp>
	* nd6.c (nd6_ioctl): added 2 new ioctls; SIOCSRTRFLUSH_IN6 and
	SIOCSPFXFLUSH_IN6. The former is to flush all default routers
	in the default router list, and the latter is to flush all the
	prefixes and the addresses derived from them in the prefix list.
```

Restore the intent by marking prefixes derived from the RA messages
with newly-added ndpr_flags.ra_derived flag and skip prefixes not marked
 with such flag during deletion and listing.

Differential Revision: https://reviews.freebsd.org/D36312
MFC after:	2 weeks
2022-08-24 13:59:13 +00:00
Alexander V. Chernikov
9d16275c65 netinet6: simplify defrouter_select_fib()
* factor out underlying llentry check into a separate function and use it consistently
* enter epoch once instead of per-router enter/exit
* don't execute body with fibnum = `RT_ALL_FIBS`

Differential Revision: https://reviews.freebsd.org/D35523
MFC after:	2 weeks
2022-08-12 11:43:37 +00:00
Alexander V. Chernikov
800c68469b routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985
MFC after:	1 month
2022-08-01 08:52:26 +00:00
Gordon Bergling
c55ecce1c1 netinet6: Fix a typo in a source code comment
- s/maping/mapping/

MFC after:	3 days
2022-03-28 19:32:10 +02:00
Gleb Smirnoff
d74b7baeb0 ifnet_byindex() actually requires network epoch
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch.  Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit.  Mark that with a comment.

Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.

Reviewed by:		melifaro
Differential revision:	https://reviews.freebsd.org/D33263
2021-12-06 09:32:31 -08:00
Alexander V. Chernikov
f8c1b1a929 lltable: fix crash introduced in c541bd368f.
Reported by:	cy
MFC after:	2 weeks
2021-08-22 08:49:18 +00:00
Mark Johnston
c1dd4d642f nd6: Avoid using an uninitialized sockaddr in nd6_prefix_offlink()
Commit 81728a538 ("Split rtinit() into multiple functions.") removed
the initialization of sa6, but not one of its uses.  This meant that we
were passing an uninitialized sockaddr as the address to
lltable_prefix_free().  Remove the variable outright to fix the problem.
The caller is expected to hold a reference on pr.

Fixes:		81728a538 ("Split rtinit() into multiple functions.")
Reported by:	KMSAN
Reviewed by:	donner
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30166
2021-05-12 12:52:06 -04:00
Alexander V. Chernikov
cc3fa1e29f Fix crash with rtadv-originated multipath IPv6 routes.
PR:		253800
Reported by:	Frederic Denis <freebsdml at hecian.net>
MFC after:	immediately
2021-02-24 16:44:10 +00:00
Alexander V. Chernikov
9c4a8d24f0 Fix nd6 rib_action() handling.
rib_action() guarantees valid rc filling IFF it returns without error.
Check rib_action() return code instead of checking rc fields.

PR:		253800
Reported by:	Frederic Denis <freebsdml@hecian.net>
MFC after:	immediately
2021-02-23 22:40:01 +00:00
Alexander V. Chernikov
81728a538d Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes,
  similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
 in reality.

1) IPv6 code does not use it for the ifa routes. There is a separate layer,
  nd6_prelist_(), providing interface for maintaining interface routes. Its part,
  responsible for the actual route table interaction, mimics rtenty() code.

2) rtinit tries to combine multiple actions in the same function: constructing
  proper route attributes and handling iterations over multiple fibs, for the
  non-zero net.add_addr_allfibs use case. It notably increases the code complexity.

3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
 for p2p connections, host routes and p2p routes are handled in the same way.
 Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
 It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
 aliases.

4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
 complicating rib_action() implementation.

5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
 ifa messages in certain corner cases.

To address all these points, the following has been done:

* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
 dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
 is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
 responsible for the actual route notifications.

Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses

Differential revision:	https://reviews.freebsd.org/D28186
2021-01-16 22:42:41 +00:00
Alexander V. Chernikov
d68cf57b7f Refactor rt_addrmsg() and rt_routemsg().
Summary:
* Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally.
* Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer.
* Refactor rtsock_routemsg(): avoid accessing rtentry fields directly.
* Simplify in_addprefix() by moving prefix search to a separate  function.

Reviewers: #network

Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D28011
2021-01-07 19:38:19 +00:00
Alexander V. Chernikov
7511a63825 Refactor rib iterator functions.
* Make rib_walk() order of arguments consistent with the rest of RIB api
* Add rib_walk_ext() allowing to exec callback before/after iteration.
* Rename rt_foreach_fib_walk_del -> rib_foreach_table_walk_del
* Rename rt_forach_fib_walk -> rib_foreach_table_walk
* Move rib_foreach_table_walk{_del} to route/route_helpers.c
* Slightly refactor rib_foreach_table_walk{_del} to make the implementation
 consistent and prepare for upcoming iterator optimizations.

Differential Revision:	https://reviews.freebsd.org/D27219
2020-11-22 20:21:10 +00:00
Mark Johnston
4caea9b169 icmp6: Count packets dropped due to an invalid hop limit
Pad the icmp6stat structure so that we can add more counters in the
future without breaking compatibility again, last done in r358620.
Annotate the rarely executed error paths with __predict_false while
here.

Reviewed by:	bz, melifaro
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26578
2020-10-19 17:07:19 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Alexander V. Chernikov
e1c05fd290 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
 to rib_action(). This is part of the new routing KPI.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
2020-07-21 19:56:13 +00:00
Alexander V. Chernikov
725871230d Temporarly revert r363319 to unbreak the build.
Reported by:	CI
Pointy hat to: melifaro
2020-07-19 10:53:15 +00:00
Alexander V. Chernikov
8cee15d9e4 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25546
2020-07-19 09:29:27 +00:00
Alexander V. Chernikov
4c7ba83f9d Switch inet6 default route subscription to the new rib subscription api.
Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision:	https://reviews.freebsd.org/D25615
2020-07-12 11:24:23 +00:00
Alexander V. Chernikov
da187ddb3d * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:		ae
Differential Revision:	https://reviews.freebsd.org/D25067
2020-06-01 20:49:42 +00:00
Alexander V. Chernikov
e7403d0230 Revert r361704, it accidentally committed merged D25067 and D25070. 2020-06-01 20:40:40 +00:00
Alexander V. Chernikov
79674562b8 * Add rib_<add|del|change>_route() functions to manipulate the routing table.
The main driver for the change is the need to improve notification mechanism.
Currently callers guess the operation data based on the rtentry structure
 returned in case of successful operation result. There are two problems with
 this appoach. First is that it doesn't provide enough information for the
 upcoming multipath changes, where rtentry refers to a new nexthop group,
 and there is no way of guessing which paths were added during the change.
 Second is that some rtentry fields can change during notification and
 protecting from it by requiring customers to unlock rtentry is not desired.

Additionally, as the consumers such as rtsock do know which operation they
 request in advance, making explicit add/change/del versions of the functions
 makes sense, especially given the functions don't share a lot of code.

With that in mind, introduce rib_cmd_info notification structure and
 rib_<add|del|change>_route() functions, with mandatory rib_cmd_info pointer.
 It will be used in upcoming generalized notifications.

* Move definitions of the new functions and some other functions/structures
 used for the routing table manipulation to a separate header file,
 net/route/route_ctl.h. net/route.h is a frequently used file included in
 ~140 places in kernel, and 90% of the users don't need these definitions.

Reviewed by:	ae
Differential Revision: https://reviews.freebsd.org/D25067
2020-06-01 20:32:02 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Alexander V. Chernikov
e7d8af4f65 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
Alexander V. Chernikov
aaad3c4fca Convert rtentry field accesses into nhop field accesses.
One of the goals of the new routing KPI defined in r359823 is to entirely
 hide`struct rtentry` from the consumers. It will allow to improve routing
 subsystem internals and deliver more features much faster.

This commit is mostly mechanical change to eliminate direct struct rtentry
 field accesses.

The only notable difference is AF_LINK gateway encoding.

AF_LINK gw is used in routing stack for operations with interface routes
 and host loopback routes.
In the former case it indicates _some_ non-NULL gateway, as the interface
 is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting.
In the latter case the interface index inside gateway was used by the IPv6
 datapath to verify address scope for link-local interfaces.

Kernel uses struct sockaddr_dl for this type of gateway. This structure
 allows for specifying rich interface data, such as mac address and interface
 name. However, this results in relatively large structure size - 52 bytes.
Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside
 in the first 8 bytes of the structure.

In the new KPI, struct nhop_object tries to be cache-efficient, hence
 embodies gateway address inside the structure. In the AF_LINK case it
 stores stortened version of the structure - struct sockaddr_dl_short,
 which occupies 16 bytes. After D24340 changes, the data inside AF_LINK
 gateway will not be used in the kernel at all, leaving rtsock as the only
 potential concern.

The difference in rtsock reporting:

(old)
got message of size 240 on Thu Apr 16 03:12:13 2020
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

(new)
got message of size 200 on Sun Apr 19 09:46:32 2020
RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

Note 40 bytes different (52-16 + alignment).
However, gateway is still a valid AF_LINK gateway with proper data filled in.

It is worth noting that these particular messages (interface routes) are mostly
 ignored by routing daemons:
* bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages.
* quagga/frr ignores routes without gateway

More detailed overview on how rtsock messages are used by the
 routing daemons to reconstruct the kernel view, can be found in D22974.

Differential Revision:	https://reviews.freebsd.org/D24519
2020-04-23 08:04:20 +00:00
Alexander V. Chernikov
539642a29d Add nhop parameter to rti_filter callback.
One of the goals of the new routing KPI defined in r359823 is to
 entirely hide`struct rtentry` from the consumers. It will allow to
 improve routing subsystem internals and deliver more features much faster.
This change is one of the ongoing changes to eliminate direct
 struct rtentry field accesses.

Additionally, with the followup multipath changes, single rtentry can point
 to multiple nexthops.

With that in mind, convert rti_filter callback used when traversing the
 routing table to accept pair (rt, nhop) instead of nexthop.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24440
2020-04-16 17:20:18 +00:00
Alexander V. Chernikov
dd4776f0cc Reorganise nd6 notification code to avoid direct rtentry field access.
One of the goals of the new routing KPI defined in r359823 is to entirely hide
 `struct rtentry` from the consumers. Doing so will allow to improve routing
 subsystem internals and deliver features more easily. This change is one of
  the ongoing changes to eliminate direct struct rtentry field accesses.

It introduces rtfree_func() wrapper around RTFREE() and reorganises nd6 notification
 code to avoid accessing most of the rtentry fields.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24404
2020-04-14 22:48:33 +00:00
Bjoern A. Zeeb
c6feea3b89 nd6_rtr: constantly use __func__ for nd6log()
Over time one or two hard coded function names did not match the
actual function anymore.  Consistently use __func__ for nd6log() calls
and re-wrap/re-format some messages for consitency.

MFC after:	2 weeks
2020-01-12 17:41:09 +00:00
Bjoern A. Zeeb
25ebfe3350 nd6_rtr: make nd6_prefix_onlink() static
nd6_prefix_onlink() is not used anywhere outside nd6_rtr.c.  Stop
exporting it and make it file local static.
2020-01-12 16:58:21 +00:00
Alexander V. Chernikov
e02d3fe70c Fix rtsock route message generation for interface addresses.
Reviewed by:	olivier
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D22974
2020-01-07 21:16:30 +00:00
Bjoern A. Zeeb
a4adf6cc65 Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros.
r354748-354750 replaced the KAME macros with m_pulldown() calls.
Contrary to the rest of the network stack m_len checks before m_pulldown()
were not put in placed (see r354748).
Put these m_len checks in place for now (to go along with the style of the
network stack since the initial commits).  These are not put in for
performance but to avoid an error scenario (even though it also will help
performance at the moment as it avoid allocating an extra mbuf; not because
of the unconditional function call).

The observed error case went like this:
(1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it.
(2) m_pullup() will call m_get() unless the requested length is larger than
MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the
requested length of data and pkthdr into the new mbuf.
(3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail.
This was observed with failing auto-configuration as an RA packet of
200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input()
dropped the mbuf.
(Re-)adding the m_len checks before m_pullup() calls avoids this problems
with mbufs using external storage for now.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-12-01 00:22:04 +00:00
Bjoern A. Zeeb
f77a6dbd1e nd6_rtr: re-sort functions
Resort functions within file in a way that they depend on each other as
that makes it easier to rework various things.
Also allows us to remove file local function declarations.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 20:34:33 +00:00
Bjoern A. Zeeb
808c432f62 nd6: retire defrouter_select(), use _fib() variant.
Burn bridges and replace the last two calls of defrouter_select() with
defrouter_select_fib().  That allows us to retire defrouter_select()
and make it more clear in the calling code that it applies to all FIBs.

Sponsored by:	Netflix
2019-11-16 00:17:35 +00:00
Bjoern A. Zeeb
f592d0c377 nd6_rtr:
Pull in the TAILQ_HEAD() as it is not needed outside nd6_rtr.c.
Rename the TAILQ_HEAD() struct and the nd_defrouter variable from
"nd_" to "nd6_" as they are not part of the RFC 3542 API which uses "ND_".

Ideally I'd like to also rename the struct nd_defrouter {} to "nd6_*"
but given that is used externally there is more work to do.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-16 00:02:36 +00:00
Bjoern A. Zeeb
a61b5cfbbf netinet6: Remove PULLDOWN_TESTs.
Remove the KAME introduced PULLDOWN_TESTs which did not even
have a compile-time option in sys/conf to turn them on for a
custom kernel build. They made the code a lot harder to read
or more complicated in a few cases.

Convert the IP6_EXTHDR_CHECK() calls into FreeBSD looking code.
Rather than throwing the packet away if it would not fit the
KAME mbuf expectations, convert the macros to m_pullup() calls.
Do not do any extra manual conditional checks upfront as to
whether the m_len would suffice (*), simply let m_pullup() do
its work (incl. an early check).

Remove extra m_pullup() calls where earlier in the function or
the only caller has already done the pullup.

Discussed with:	rwatson (*)
Reviewed by:	ae
MFC after:	8 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D22334
2019-11-15 21:40:40 +00:00
Bjoern A. Zeeb
ad675b3279 nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c
Move the nd_defrouter along with the sysctl handler from nd6.c to
nd6_rtr.c and make the variable file static.  Provide (temporary)
new accessor functions for code manipulating nd_defrouter from nd6.c,
and stop exporting functions no longer needed outside nd6_rtr.c.
This also shuffles a few functions around in nd6_rtr.c without
functional changes.

Given all nd_defrouter logic is now in one place we can tidy up the
code, locking and, and other open items.

MFC after:	3 weeks
X-MFC:		keep exporting the functions
Sponsored by:	Netflix
2019-11-13 12:05:48 +00:00
Bjoern A. Zeeb
503f4e4736 netinet*: variable cleanup
In preparation for another change factor out various variable cleanups.
These mainly include:
(1) do not assign values to variables during declaration:  this makes
    the code more readable and does allow for better grouping of
    variable declarations,
(2) do not assign values to variables before need; e.g., if a variable
    is only used in the 2nd half of a function and we have multiple
    return paths before that, then do not set it before it is needed, and
(3) try to avoid assigning the same value multiple times.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-07 18:29:51 +00:00
Gleb Smirnoff
751d8d156a Widen network epoch coverage in nd6_prefix_onlink() as
in6ifa_ifpforlinklocal() requires the epoch.

Reported by:	bz
Reviewed by:	bz
2019-11-07 17:00:20 +00:00
Gleb Smirnoff
b8a6e03fac Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
Alexander V. Chernikov
563ab4e400 Fix gateway setup for the interface routes.
Currently rinit1() and its IPv6 counterpart
  nd6_prefix_onlink_rtrequest() uses dummy null_sdl gateway address
  during route insertion and change it afterwards. This behaviour
  brings complications to the routing stack and the users of its
  upcoming notification system.

This change fixes both rinit1() and nd6_prefix_onlink_rtrequest()
  by filling in proper gateway in the beginning. It does not change any
  of the userland notifications as in both cases, they happen after
  the insertion and fixup process (rt_newaddrmsg_fib() and nd6_rtmsg()).

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D20328
2019-05-22 21:20:15 +00:00
Bjoern A. Zeeb
30b450774e Update for IETF draft-ietf-6man-ipv6only-flag.
When we roam between networks and our link-state goes down, automatically remove
the IPv6-Only flag from the interface.  Otherwise we might switch from an
IPv6-only to and IPv4-only network and the flag would stay and we would prevent
IPv4 from working.

While the actual function call to clear the flag is under EXPERIMENTAL,
the eventhandler is not as we might want to re-use it for other
functionality on link-down event (such was re-calculate default routers
for example if there is more than one).

Reviewed by:	hrs
Differential Revision:	https://reviews.freebsd.org/D19487
2019-03-07 23:03:39 +00:00
Bjoern A. Zeeb
21231a7aa6 Update for IETF draft-ietf-6man-ipv6only-flag.
All changes are hidden behind the EXPERIMENTAL option and are not compiled
in by default.

Add ND6_IFF_IPV6_ONLY_MANUAL to be able to set the interface into no-IPv4-mode
manually without router advertisement options.  This will allow developers to
test software for the appropriate behaviour even on dual-stack networks or
IPv6-Only networks without the option being set in RA messages.
Update ifconfig to allow setting and displaying the flag.

Update the checks for the filters to check for either the automatic or the manual
flag to be set.  Add REVARP to the list of filtered IPv4-related protocols and add
an input filter similar to the output filter.

Add a check, when receiving the IPv6-Only RA flag to see if the receiving
interface has any IPv4 configured.  If it does, ignore the IPv6-Only flag.

Add a per-VNET global sysctl, which is on by default, to not process the automatic
RA IPv6-Only flag.  This way an administrator (if this is compiled in) has control
over the behaviour in case the node still relies on IPv4.
2019-03-06 23:31:42 +00:00
Gleb Smirnoff
a68cc38879 Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such
  macros a quite unsafe, e.g. will produce a buggy code if same macro is
  used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
  IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
  locking macros to what they actually are - the net_epoch.
  Keeping them as is is very misleading. They all are named FOO_RLOCK(),
  while they no longer have lock semantics. Now they allow recursion and
  what's more important they now no longer guarantee protection against
  their companion WLOCK macros.
  Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with:	jtl, gallatin
2019-01-09 01:11:19 +00:00
Bjoern A. Zeeb
201100c58b Initial implementation of draft-ietf-6man-ipv6only-flag.
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.

If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.

The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.

Further changes to tcpdump (contrib code) are availble and will
be upstreamed.

Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).

We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set.  Also we might want to start
IPv6 before IPv4 in the future.

All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.

Dear 6man, you have running code.

Discussed with:	Bob Hinden, Brian E Carpenter
2018-10-30 20:08:48 +00:00
Andrew Turner
5f901c92a8 Use the new VNET_DEFINE_STATIC macro when we are defining static VNET
variables.

Reviewed by:	bz
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D16147
2018-07-24 16:35:52 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Eric van Gyzen
0bbfb20fe5 Update the MTU in affected routes when IPv6 RA changes the MTU
ip6_calcmtu() only looks at the interface MTU if neither the TCP hostcache
nor the route provides an MTU.  Update the routes so they do not provide
stale MTUs.

This fixes UNH IPv6 conformance test cases v6LC_4_1_08 and v6LC_4_1_09,
which use a RA to reduce the link MTU from 1500 to 1280.

Reported and tested by:	Farrell Woods <Farrell_Woods@Dell.com>
Reviewed by:	dab, melifaro
Discussed with:	ae
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D14257
2018-02-12 19:49:20 +00:00