This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.
Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1
route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20
netstat -6On
Nexthop groups data
Internet6:
GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2
Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default
Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449
This is the foundational change for the routing subsytem rearchitecture.
More details and goals are available in https://reviews.freebsd.org/D24141 .
This patch introduces concept of nexthop objects and new nexthop-based
routing KPI.
Nexthops are objects, containing all necessary information for performing
the packet output decision. Output interface, mtu, flags, gw address goes
there. For most of the cases, these objects will serve the same role as
the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
multiple BGP full-views, as these objects will be shared between routing
entries. This allows to store more information in the nexthop.
New KPI:
struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
uint32_t scopeid, uint32_t flags, uint32_t flowid);
These 2 function are intended to replace all all flavours of
<in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous
fib[46]-generation functions.
Upon successful lookup, they return nexthop object which is guaranteed to
exist within current NET_EPOCH. If longer lifetime is desired, one can
specify NHR_REF as a flag and get a referenced version of the nexthop.
Reference semantic closely resembles rtentry one, allowing sed-style conversion.
Additionally, another 2 functions are introduced to support uRPF functionality
inside variety of our firewalls. Their primary goal is to hide the multipath
implementation details inside the routing subsystem, greatly simplifying
firewalls implementation:
int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);
All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
embedding and allowing to support IPv4 link-locals in the future.
Structure changes:
* rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
* rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.
Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.
More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232
Reviewed by: ae,glebius(initial version)
Differential Revision: https://reviews.freebsd.org/D24232
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D11448
This is so that 'make depend' is not a required build step in these
files.
DPSRCS is overall unneeded. DPSRCS already contains SRCS, so anything
which can safely be in SRCS should be. DPSRCS is mostly just a way to
generate files that should not be linked into the final PROG/LIB. For
headers and grammars it is safe for them to be in SRCS since they will
be excluded during linking and installation.
The only remaining uses of DPSRCS are for generating .c or .o files that
must be built before 'make depend' can run 'mkdep' on the SRCS c files
list. A semi-proper example is in tests/sys/kern/acct/Makefile where a
checked-in .c file has an #include on a generated .c file. The
generated .c file should not be linked into the final PROG though since
it is #include'd. The more proper way here is just to build/link it in
though without DPSRCS. Another example is in sys/modules/linux/Makefile
where a shell script runs to parse a DPSRCS .o file that should not be
linked into the module. Beyond those, the need for DPSRCS is largely
unneeded, redundant, and forces 'make depend' to be ran. Generally,
these Makefiles should avoid the need for DPSRCS and define proper
dependencies for their files as well.
An example of an improper usage and why this matters is in usr.bin/netstat.
nl_defs.h was only in DPSRCS and so was not generated during 'make all',
but only during 'make depend'. The files including it lacked proper
depenencies on it, which forced running 'make depend' to workaround that
bug. The 'make depend' target should mostly be used for incremental build
help, not to produce a working build. This specific example was broken in
the meta build until r287905 since it does not run 'make depend'.
The gnu/lib/libreadline/readline case is fine since bsd.lib.mk has 'OBJS:
SRCS:M*.h' when there is no .depend file.
Sponsored by: EMC / Isilon Storage Division
MFC after: 1 week
Obtained from: Phil Shafer <phil@juniper.net>
Ported to -current by: alfred@ (mostly), Kim Shrier
Formatting: marcel@
Sponsored by: Juniper Networks, Inc.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.
Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.
Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
- ip_output() and ip_output6() simply call flowtable_lookup(),
passing mbuf and address family. That's the only code under
#ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
- Remove hand made pcpu stats, and utilize counter(9).
- Snapshot of statistics is available via 'netstat -rs'.
- All sysctls are moved into net.flowtable namespace, since
spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
- Remove chain of multiple flowtables. We simply have one for
IPv4 and one for IPv6.
- Flowtables are allocated in flowtable.c, symbols are static.
- With proper argument to SYSINIT() we no longer need flowtable_ready.
- Hash salt doesn't need to be per-VNET.
- Removed rudimentary debugging, which use quite useless in dtrace era.
The runtime behavior of flowtable shouldn't be changed by this commit.
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
__uint16_t, we can partially undo r228668.
Note the remark "Work around a clang false positive with format string
warnings and ntohs macros (see LLVM PR 11313)" was actually incorrect.
Before r232745, on some arches, the ntohs() macros did in fact return
int, not uint16_t, so clang was right in warning about the %hu format
string.
MFC after: 2 weeks
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.
Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang (disables -Werror)
- NO_WCAST_ALIGN.clang (disables -Wcast-align)
- NO_WFORMAT.clang (disables -Wformat and friends)
- CLANG_NO_IAS (disables integrated assembler)
- CLANG_OPT_SMALL (adds flags for extra small size optimizations)
As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf! For clang, use the following:
CC=clang
CXX=clang++
CPP=clang-cpp
MFC after: 2 weeks
control over the result of buildworld and installworld; this especially
helps packaging systems such as nanobsd
Reviewed by: various (posted to arch)
MFC after: 1 month
doesn't use the default CFLAGS which contain -fno-strict-aliasing.
Until the code is cleaned up, just add -fno-strict-aliasing to the
CFLAGS of these for the tinderboxes' sake, allowing the rest of the
tree to have -Werror enabled again.
including to printf(). Using uintmax_t is also robust to further
extensions in both the C language and the bitwidth of kernel counters.
Tested on: i386 amd64 ia64
with FAST_IPSEC rather than the KAME IPSEC stack.
Note that the output of "netstat -s -p ipsec" differs depending on which
stack is compiled into the kernel since they each keep different stats.
This delta also adds the "esp", "ah", and "ipcomp" protocol stats, which
are also available when the kernel is compiled with the FAST_IPSEC stack
(e.g. "netstat -s -p esp").
Submitted by: Matt Titus <titus at nttmcl dot com>
MFC after: 3 days
a -B option which causes bpf peers to be printed. This option can be
used in conjunction with -I if information about specific interfaces
is desired. This is similar to what NetBSD added to their version of
netstat.
$ netstat -B
Pid Netif Flags Recv Drop Match Sblen Hblen Command
1137 lo0 p--s-- 0 0 0 0 0 tcpdump
205 sis0 -ifs-l 37331 0 1 0 0 dhclient
$
$ netstat -I lo0 -B
Pid Netif Flags Recv Drop Match Sblen Hblen Command
1174 lo0 p--s-- 0 0 0 0 0 tcpdump
$
-Add bpf.c which stores all the code for retrieving and parsing bpf
related statistics.
-Modify main.c to add support for the -B option and hook it into the
program logic.
-Add bpf.c to the build.
-Document this new functionality in the man page and bump the revision
date.
-Add prototype for bpf_stats function.
with a number of positive benefits:
- Start using UMA(9) statistics for mbufs and clusters, which avoids
using the mbuf allocator statistics which suffer from races under
load on SMP. This should eliminate "negative" mbuf counts in
netstat -mb.
- We are now able to track cached (free) mbufs and clusters and count
it towards memory allocated by the network stack.
- We are now also able to track memory allocated to mbuf tags since
libmemstat(3) can also query malloc(9). We don't print this except
as part of the total (for now - #if 0).
- We are now able to track mbuf/cluster/packet allocation failures,
although they are not currently printed (#if 0).
- Don't print out sfbuf statistics when running on a kernel core, as
currently that code is able only to query sysctl for statistics.
MFC after: 1 week
Without this change, when running netstat with a kernel without
INET6 built in, you will get a complain at the end of "netstat -s"
output.
X-MFC: NO_INET6 was called "NOINET6" on RELENG_5
printf format warnings for inet6.c (pluralies() was implicit int, but
the context requires a "char *").
Added WARNS?=2 to the Makefile so that such errors don't come back.
Added NO_WERROR?= to the Makefile because I haven't checked that setting
WARNS doesn't uncover more bugs except on i386's.
of the recent WARNS commits. The idea is:
1) FreeBSD id tags should follow vendor tags.
2) Vendor tags should not be compiled (though copyrights probably should).
3) There should be no blank line between including cdefs and __FBSDIF.
kernel IPv6 multicast routing support.
pim6 dense mode daemon
pim6 sparse mode daemon
netstat support of IPv6 multicast routing statistics
Merging to the current and testing with other existing multicast routers
is done by Tatsuya Jinmei <jinmei@kame.net>, who writes and maintainances
the base code in KAME distribution.
Make world check and kernel build check was also successful.