Commit Graph

328 Commits

Author SHA1 Message Date
Justin Hibbits
3d0d5b21c9 IfAPI: Explicitly include <net/if_private.h> in netstack
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header.  <net/if_var.h> will stop including the
header in the future.

Sponsored by:	Juniper Networks, Inc.
Reviewed by:	glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
2023-01-31 15:02:16 -05:00
Alexander V. Chernikov
6468b6b23e nd6: fix panic in lltable_drop_entry_queue()
nd6_resolve_slow() can be called without mbuf. If the LLE entry
 is not reachable, nd6_resolve_slow() will add this NULL mbuf to
 the holdchain via lltable_append_entry_queue, which will "append"
 NULL to the end of the queue (effectively no-op) and bump la_numhold
 value. When this entry gets freed, the kernel will panic due to the
 inconsistency between the amount of mbufs in the queue and the value
 of la_numhold.

Fix the panic by checking of mbuf is not NULL prior to inserting it
 into the holdchain.

Reported by:	kib
MFC after:	3 days
2023-01-15 15:22:42 +00:00
John Baldwin
744bfb2131 Import the WireGuard driver from zx2c4.com.
This commit brings back the driver from FreeBSD commit
f187d6dfbf plus subsequent fixes from
upstream.

Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.

Reviewed by:	pauamma, gbe, kevans, emaste
Obtained from:	git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36909
2022-10-28 13:36:12 -07:00
Alexander V. Chernikov
177f04d57f routing: constantify @rc in rib_decompose_notification().
Clarify the @rc immutability by explicitly marking @rc const.

MFC after:	2 weeks
2022-08-29 18:12:24 +00:00
Alexander V. Chernikov
6d4f6e4c70 routing: make rib_add_redirect() use new nhop-based KPI
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D36169
2022-08-29 10:23:26 +00:00
Alexander V. Chernikov
8036234c72 netinet6: fix SIOCSPFXFLUSH_IN6 by skipping manually-configured prefixes
Summary:
Currently netinet6/ code allocates IPv6 prefixes (nd_prefix) for
 both manually-assigned addresses and advertised prefixes. As a result,
 prefixes from manually-assigned prefixes can be seen in `ndp -p` list
 and be cleared via `ndp -P`. The latter relies on the SIOCSPFXFLUSH_IN6
 ioctl to clear to prefix list.
The original intent of the SIOCSPFXFLUSH_IN6 was to clear prefixes
 originated from the advertising routers:

```
1998-09-02  JINMEI, Tatuya  <jinmei@isl.rdc.toshiba.co.jp>
	* nd6.c (nd6_ioctl): added 2 new ioctls; SIOCSRTRFLUSH_IN6 and
	SIOCSPFXFLUSH_IN6. The former is to flush all default routers
	in the default router list, and the latter is to flush all the
	prefixes and the addresses derived from them in the prefix list.
```

Restore the intent by marking prefixes derived from the RA messages
with newly-added ndpr_flags.ra_derived flag and skip prefixes not marked
 with such flag during deletion and listing.

Differential Revision: https://reviews.freebsd.org/D36312
MFC after:	2 weeks
2022-08-24 13:59:13 +00:00
Alexander V. Chernikov
f998535a66 netinet6: allow ND entries creation for all directly-reachable
destinations.

The current assumption is that kernel-handled rtadv prefixes along with
 the interface address prefixes are the only prefixes considered in
 the ND neighbor eligibility code.
Change this by allowing any non-gatewaye routes to be eligible. This
 will allow DHCPv6-controlled routes to be correctly handled by
 the ND code.
Refactor nd6_is_new_addr_neighbor() to enable more deterministic
 performance in "found" case and remove non-needed
 V_rt_add_addr_allfibs handling logic.

Reviewed By: kbowling
Differential Revision: https://reviews.freebsd.org/D23695
MFC after:	1 month
2022-08-10 14:19:19 +00:00
Gordon Bergling
cd33039749 inet6(4): Fix a typo in a source code comment
- s/Unreachablity/Unreachability/

MFC after:	3 days
2022-08-07 14:20:52 +02:00
Dimitry Andric
50207b2de9 Adjust function definition in nd6.c to avoid clang 15 warnings
With clang 15, the following -Werror warning is produced:

    sys/netinet6/nd6.c:247:12: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
    nd6_destroy()
               ^
                void

This is nd6_destroy() is declared with a (void) argument list, but
defined with an empty argument list. Make the definition match the
declaration.

MFC after:	3 days
2022-07-26 21:25:09 +02:00
Arseny Smalyuk
d18b4bec98 netinet6: Fix mbuf leak in NDP
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.

Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.

Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after:	2 weeks
2022-05-31 21:06:14 +00:00
Mark Johnston
990a6d18b0 net: Fix memory leaks in lltable_calc_llheader() error paths
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended.  There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.

Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.

Reviewed by:	bz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34832
2022-04-08 11:47:25 -04:00
Gordon Bergling
10e0082fff inet6(4): Fix a few common typos in source code comments
- s/reshedule/reschedule/

MFC after:	3 days
2021-08-28 18:53:59 +02:00
Alexander V. Chernikov
c541bd368f lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries.
Currently we use pre-calculated headers inside LLE entries as prepend data
 for `if_output` functions. Using these headers allows saving some
 CPU cycles/memory accesses on the fast path.

However, this approach makes adding L2 header for IPv4 traffic with IPv6
 nexthops more complex, as it is not possible to store multiple
 pre-calculated headers inside lle. Additionally, the solution space is
 limited by the fact that PCB caching saves LLEs in addition to the nexthop.

Thus, add support for creating special "child" LLEs for the purpose of holding
 custom family encaps and store mbufs pending resolution. To simplify handling
 of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
 Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
 to the "parent" LLE - it is not possible to delete "child" when parent is alive.
 Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
 machine used by the standard LLEs.

nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
 allowing to return such child LLEs. This change uses `LLE_SF()` macro which
 packs family and flags in a single int field. This is done to simplify merging
 back to stable/. Once this code lands, most of the cases will be converted to
 use a dedicated `family` parameter.

Differential Revision: https://reviews.freebsd.org/D31379
MFC after:	2 weeks
2021-08-21 17:34:35 +00:00
Mark Johnston
663428ea17 nd6: Mark several callouts as MPSAFE
The use of Giant here is vestigal and does not provide any useful
synchronization.  Furthermore, non-MPSAFE callouts can cause the
softclock threads to block waiting for long-running newbus operations to
complete.

Reported by:	mav
Reviewed by:	bz
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31470
2021-08-09 13:27:52 -04:00
Alexander V. Chernikov
0b79b007eb [lltable] Restructure nd6 code.
Factor out lltable locking logic from lltable_try_set_entry_addr()
 into a separate lltable_acquire_wlock(), so the latter can be used
 in other parts of the code w/o duplication.

Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
 and nd6_nbr.c.

Move lle creation logic from nd6_resolve_slow() into a separate
 nd6_get_llentry() to simplify the former.

These changes serve as a pre-requisite for implementing
 RFC8950 (IPv4 prefixes with IPv6 nexthops).

Differential Revision: https://reviews.freebsd.org/D31432
MFC after:	2 weeks
2021-08-07 09:59:11 +00:00
Alexander V. Chernikov
8482aa7748 Use lltable calculated header when sending lle holdchain after successful lle resolution.
Subscribers: imp, ae, bz

Differential Revision: https://reviews.freebsd.org/D31391
2021-08-05 20:44:36 +00:00
Alexander V. Chernikov
f3a3b06121 [lltable] Unify datapath feedback mechamism.
Use newly-create llentry_request_feedback(),
 llentry_mark_used() and llentry_get_hittime() to
 request datapatch usage check and fetch the results
 in the same fashion both in IPv4 and IPv6.

While here, simplify llentry_provide_feedback() wrapper
 by eliminating 1 condition check.

MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
2021-08-04 22:52:43 +00:00
Kyle Evans
f187d6dfbf base: remove if_wg(4) and associated utilities, manpage
After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree.  This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.

Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.
2021-03-17 09:14:48 -05:00
Kyle Evans
74ae3f3e33 if_wg: import latest fixup work from the wireguard-freebsd project
This is the culmination of about a week of work from three developers to
fix a number of functional and security issues.  This patch consists of
work done by the following folks:

- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>

Notable changes include:
- Packets are now correctly staged for processing once the handshake has
  completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
  and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
  tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
  the interface's home vnet so that it can act as the sole network
  connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
  complete.  It is additionally supported by the upstream
  wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
  aligned with security auditing guidelines

Note that the driver has been rebased away from using iflib.  iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.

The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations.  This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.

There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.

Also note that this is still a work in progress; work going further will
be much smaller in nature.

MFC after:	1 month (maybe)
2021-03-14 23:52:04 -05:00
Kristof Provost
c139b3c19b arp/nd: Cope with late calls to iflladdr_event
When tearing down vnet jails we can move an if_bridge out (as
part of the normal vnet_if_return()). This can, when it's clearing out
its list of member interfaces, change its link layer address.
That sends an iflladdr_event, but at that point we've already freed the
AF_INET/AF_INET6 if_afdata pointers.

In other words: when the iflladdr_event callbacks fire we can't assume
that ifp->if_afdata[AF_INET] will be set.

Reviewed by:	donner@, melifaro@
MFC after:	1 week
Sponsored by:	Orange Business Services
Differential Revision:	https://reviews.freebsd.org/D28860
2021-02-23 13:54:07 +01:00
Randall Stewart
24a8f6d369 When we are about to send down to the driver layer
we need to make sure that the m_nextpkt field is NULL
else the lower layers may do unwanted things.

Reviewed By:  gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D28377
2021-01-27 13:52:44 -05:00
Alexander V. Chernikov
0da3f8c98d Bump amount of queued packets in for unresolved ARP/NDP entries to 16.
Currently default behaviour is to keep only 1 packet per unresolved entry.
Ability to queue more than one packet was added 10 years ago, in r215207,
 though the default value was kep intact.

Things have changed since that time. Systems tend to initiate multiple
 connections at once for a variety of reasons.
For example, recent kern/252278 bug report describe happy-eyeball DNS
 behaviour sending multiple requests to the DNS server.

The primary driver for upper value for the queue length determination is
 memory consumption. Remote actors should not be able to easily exhaust
 local memory by sending packets to unresolved arp/ND entries.

For now, bump value to 16 packets, to match Darwin implementation.

The proper approach would be to switch the limit to calculate memory
 consumption instead of packet count and limit based on memory.

We should MFC this with a variation of D22447.

Reviewers: #manpages, #network, bz, emaste

Reviewed By: emaste, gbe(doc), jilles(doc)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D28068
2021-01-11 19:51:11 +00:00
Alexander V. Chernikov
d1d941c5b9 Remove RADIX_MPATH config option.
ROUTE_MPATH is the new config option controlling new multipath routing
 implementation. Remove the last pieces of RADIX_MPATH-related code and
 the config option.

Reviewed by:	glebius
Differential Revision:	https://reviews.freebsd.org/D27244
2020-11-29 19:43:33 +00:00
Bjoern A. Zeeb
dd4d5a5ffb IPv6: set ifdisabled in the kernel rather than in rc
Enable ND6_IFF_IFDISABLED when the interface is created in the
kernel before return to user space.

This avoids a race when an interface is create by a program which
also calls ifconfig IF inet6 -ifdisabled and races with the
devd -> /etc/pccard_ether -> .. netif start IF -> ifdisabled
calls (the devd/rc framework disabling IPv6 again after the program
had enabled it already).

In case the global net.inet6.ip6.accept_rtadv was turned on,
we also default to enabling IPv6 on the interfaces, rather than
disabling them.

PR:		248172
Reported by:	Gert Doering (gert greenie.muc.de)
Reviewed by:	glebius (, phk)
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D27324
2020-11-25 20:58:01 +00:00
Alexander V. Chernikov
fedeb08b6a Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
 relative weights and a dataplane-optimized structure to enable
 efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
 gets compiled during group creation and is basically an array of
 nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
 nexthop or nexthop group. They are distinguished by the presense of
 NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
 should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
 This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
1         ------- ------- ------- --------------------------------------- ---------       1
              13      10       1                             2001:db8::2     vlan2
              14      20       2                             2001:db8::3     vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by:	olivier
Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26449
2020-10-03 10:47:17 +00:00
Alexander V. Chernikov
2259a03020 Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
 as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
 that rtentry can contain multiple nexthops.

Differential Revision:	https://reviews.freebsd.org/D26497
2020-09-21 20:02:26 +00:00
Mateusz Guzik
662c13053f net: clean up empty lines in .c and .h files 2020-09-01 21:19:14 +00:00
Alexander V. Chernikov
e1c05fd290 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
 to rib_action(). This is part of the new routing KPI.

Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25546
2020-07-21 19:56:13 +00:00
Alexander V. Chernikov
725871230d Temporarly revert r363319 to unbreak the build.
Reported by:	CI
Pointy hat to: melifaro
2020-07-19 10:53:15 +00:00
Alexander V. Chernikov
8cee15d9e4 Transition from rtrequest1_fib() to rib_action().
Remove all variations of rtrequest <rtrequest1_fib, rtrequest_fib,
 in6_rtrequest, rtrequest_fib> and their uses and switch to
to rib_action(). This is part of the new routing KPI.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25546
2020-07-19 09:29:27 +00:00
Alexander V. Chernikov
4c7ba83f9d Switch inet6 default route subscription to the new rib subscription api.
Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision:	https://reviews.freebsd.org/D25615
2020-07-12 11:24:23 +00:00
Alexander V. Chernikov
2bbab0af6d Use epoch(9) for rtentries to simplify control plane operations.
Currently the only reason of refcounting rtentries is the need to report
 the rtable operation details immediately after the execution.
Delaying rtentry reclamation allows to stop refcounting and simplify the code.
Additionally, this change allows to reimplement rib_lookup_info(), which
 is used by some of the customers to get the matching prefix along
 with nexthops, in more efficient way.

The change keeps per-vnet rtzone uma zone. It adds nh_vnet field to
 nhop_priv to be able to reliably set curvnet even during vnet teardown.
Rest of the reference counting code will be removed in the D24867 .

Differential Revision:	https://reviews.freebsd.org/D24866
2020-05-23 10:21:02 +00:00
Andrew Gallatin
bc74b81991 IPv6: Fix a panic in the nd6 code with unmapped mbufs.
If the neighbor entry for an IPv6 TCP session using unmapped
mbufs times out, IPv6 will send an icmp6 dest. unreachable
message. In doing this, it will try to do a software checksum
on the reflected packet. If this is a TCP session using unmapped
mbufs, then there will be a kernel panic.

To fix this, just free packets with unmapped mbufs, rather
than sending the icmp.

Reviewed by:	np, rrs
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24821
2020-05-12 17:18:44 +00:00
Alexander V. Chernikov
74787ef47b Add nhop to the ifa_rtrequest() callback.
With the upcoming multipath changes described in D24141,
 rt->rt_nhop can potentially point to a nexthop group instead of
 an individual nhop.
To simplify caller handling of such cases, change ifa_rtrequest() callback
 to pass changed nhop directly.

Differential Revision:	https://reviews.freebsd.org/D24604
2020-04-29 19:28:56 +00:00
Alexander V. Chernikov
e7d8af4f65 Move route_temporal.c and route_var.h to net/route.
Nexthop objects implementation, defined in r359823,
 introduced sys/net/route directory intended to hold all
 routing-related code. Move recently-introduced route_temporal.c and
 private route_var.h header there.

Differential Revision:	https://reviews.freebsd.org/D24597
2020-04-28 19:14:09 +00:00
Alexander V. Chernikov
fe6da72759 Move struct rtentry definition to nhop_var.h.
One of the goals of the new routing KPI defined in r359823
 is to entirely hide`struct rtentry` from the consumers.
It will allow to improve routing subsystem internals and deliver
 features much faster.

This is one of the last changes, effectively moving struct rtentry
 definition to a net/route_var.h header, internal to the routing subsystem.

Differential Revision:	https://reviews.freebsd.org/D24580
2020-04-28 18:42:30 +00:00
Alexander V. Chernikov
aaad3c4fca Convert rtentry field accesses into nhop field accesses.
One of the goals of the new routing KPI defined in r359823 is to entirely
 hide`struct rtentry` from the consumers. It will allow to improve routing
 subsystem internals and deliver more features much faster.

This commit is mostly mechanical change to eliminate direct struct rtentry
 field accesses.

The only notable difference is AF_LINK gateway encoding.

AF_LINK gw is used in routing stack for operations with interface routes
 and host loopback routes.
In the former case it indicates _some_ non-NULL gateway, as the interface
 is the same as in rt_ifp in kernel and rtm_ifindex in rtsock reporting.
In the latter case the interface index inside gateway was used by the IPv6
 datapath to verify address scope for link-local interfaces.

Kernel uses struct sockaddr_dl for this type of gateway. This structure
 allows for specifying rich interface data, such as mac address and interface
 name. However, this results in relatively large structure size - 52 bytes.
Routing stack fils in only 2 fields - sdl_index and sdl_type, which reside
 in the first 8 bytes of the structure.

In the new KPI, struct nhop_object tries to be cache-efficient, hence
 embodies gateway address inside the structure. In the AF_LINK case it
 stores stortened version of the structure - struct sockaddr_dl_short,
 which occupies 16 bytes. After D24340 changes, the data inside AF_LINK
 gateway will not be used in the kernel at all, leaving rtsock as the only
 potential concern.

The difference in rtsock reporting:

(old)
got message of size 240 on Thu Apr 16 03:12:13 2020
RTM_ADD: Add Route: len 240, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

(new)
got message of size 200 on Sun Apr 19 09:46:32 2020
RTM_ADD: Add Route: len 200, pid: 0, seq 0, errno 0, flags:<UP,DONE,PINNED>
locks:  inits:
sockaddrs: <DST,GATEWAY,NETMASK>
 10.0.0.0 link#5 255.255.255.0

Note 40 bytes different (52-16 + alignment).
However, gateway is still a valid AF_LINK gateway with proper data filled in.

It is worth noting that these particular messages (interface routes) are mostly
 ignored by routing daemons:
* bird/quagga/frr uses RTM_NEWADDR and ignores prefix route addition messages.
* quagga/frr ignores routes without gateway

More detailed overview on how rtsock messages are used by the
 routing daemons to reconstruct the kernel view, can be found in D22974.

Differential Revision:	https://reviews.freebsd.org/D24519
2020-04-23 08:04:20 +00:00
Alexander V. Chernikov
539642a29d Add nhop parameter to rti_filter callback.
One of the goals of the new routing KPI defined in r359823 is to
 entirely hide`struct rtentry` from the consumers. It will allow to
 improve routing subsystem internals and deliver more features much faster.
This change is one of the ongoing changes to eliminate direct
 struct rtentry field accesses.

Additionally, with the followup multipath changes, single rtentry can point
 to multiple nexthops.

With that in mind, convert rti_filter callback used when traversing the
 routing table to accept pair (rt, nhop) instead of nexthop.

Reviewed by:	ae
Differential Revision:	https://reviews.freebsd.org/D24440
2020-04-16 17:20:18 +00:00
Bjoern A. Zeeb
3c5018ca10 nd6: sysctl
Move the SYSCTL_DECL to the top of the file.  Move the sysctl function
before SYSCTL_PROC so that we don't need an extra function declaration in
the middle of the file.

No functional changes.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 21:08:18 +00:00
Bjoern A. Zeeb
6db6527385 nd6: make nd6_timer_ch static
nd6_timer_ch is only used in file local context.  There is no need to
export it, so make it static.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-19 20:54:17 +00:00
Bjoern A. Zeeb
808c432f62 nd6: retire defrouter_select(), use _fib() variant.
Burn bridges and replace the last two calls of defrouter_select() with
defrouter_select_fib().  That allows us to retire defrouter_select()
and make it more clear in the calling code that it applies to all FIBs.

Sponsored by:	Netflix
2019-11-16 00:17:35 +00:00
Bjoern A. Zeeb
e20b5bc485 nd6: simplify code
We are taking the same actions in both cases of the branch inside the block.
Simplify that code as the extra branch is not needed.

MFC after:	3 weeks
Sponsored by:	Netflix
2019-11-15 13:45:38 +00:00
Bjoern A. Zeeb
d64df9a2b2 nd6: make nd6_alloc() file static
nd6_alloc() is a function used only locally.  Make it static and no
longer export it.  Keeps the KPI smaller.

Sponsored by:	Netflix
2019-11-13 13:53:17 +00:00
Bjoern A. Zeeb
ad675b3279 nd6 defrouter: consolidate nd_defrouter manipulations in nd6_rtr.c
Move the nd_defrouter along with the sysctl handler from nd6.c to
nd6_rtr.c and make the variable file static.  Provide (temporary)
new accessor functions for code manipulating nd_defrouter from nd6.c,
and stop exporting functions no longer needed outside nd6_rtr.c.
This also shuffles a few functions around in nd6_rtr.c without
functional changes.

Given all nd_defrouter logic is now in one place we can tidy up the
code, locking and, and other open items.

MFC after:	3 weeks
X-MFC:		keep exporting the functions
Sponsored by:	Netflix
2019-11-13 12:05:48 +00:00
Gleb Smirnoff
d6dbfed81e In nd6_timer() enter the network epoch earlier. The defrouter_del() may
call into leaf functions that require epoch.  Since the function is already
run in non-sleepable context, it should be safe to cover it whole with epoch.

Reported by:	syzcaller
2019-11-04 17:35:37 +00:00
Gleb Smirnoff
ef2e580e56 Don't cover in6_ifattach() with network epoch, as it may call into
network drivers ioctls, that may sleep.

PR:		241223
2019-10-13 04:25:16 +00:00
Gleb Smirnoff
b8a6e03fac Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by:	gallatin, hselasky, cy, adrian, kristof
Differential Revision:	https://reviews.freebsd.org/D19111
2019-10-07 22:40:05 +00:00
Conrad Meyer
e2e050c8ef Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions.  The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended).  Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed.  __FreeBSD_version has been bumped.
2019-05-20 00:38:23 +00:00
Mark Johnston
ca1163bd5f Do not perform DAD on stf(4) interfaces.
stf(4) interfaces are not multicast-capable so they can't perform DAD.
They also did not set IFF_DRV_RUNNING when an address was assigned, so
the logic in nd6_timer() would periodically flag such an address as
tentative, resulting in interface flapping.

Fix the problem by setting IFF_DRV_RUNNING when an address is assigned,
and do some related cleanup:
- In in6if_do_dad(), remove a redundant check for !UP || !RUNNING.
  There is only one caller in the tree, and it only looks at whether
  the return value is non-zero.
- Have in6if_do_dad() return false if the interface is not
  multicast-capable.
- Set ND6_IFF_NO_DAD when an address is assigned to an stf(4) interface
  and the interface goes UP as a result. Note that this is not
  sufficient to fix the problem because the new address is marked as
  tentative and DAD is started before in6_ifattach() is called.
  However, setting no_dad is formally correct.
- Change nd6_timer() to not flag addresses as tentative if no_dad is
  set.

This is based on a patch from Viktor Dukhovni.

Reported by:	Viktor Dukhovni <ietf-dane@dukhovni.org>
Reviewed by:	ae
MFC after:	3 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19751
2019-03-30 18:00:44 +00:00
Bjoern A. Zeeb
30b450774e Update for IETF draft-ietf-6man-ipv6only-flag.
When we roam between networks and our link-state goes down, automatically remove
the IPv6-Only flag from the interface.  Otherwise we might switch from an
IPv6-only to and IPv4-only network and the flag would stay and we would prevent
IPv4 from working.

While the actual function call to clear the flag is under EXPERIMENTAL,
the eventhandler is not as we might want to re-use it for other
functionality on link-down event (such was re-calculate default routers
for example if there is more than one).

Reviewed by:	hrs
Differential Revision:	https://reviews.freebsd.org/D19487
2019-03-07 23:03:39 +00:00