Commit Graph

266 Commits

Author SHA1 Message Date
Luiz Otavio O Souza
8f1c8ade60 Fix the typos and style(9) in comment.
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (Netgate)
2016-12-08 18:18:48 +00:00
Andrey V. Elsukov
abe95d87ee Replace rw_init/rw_destroy with corresponding macros.
Obtained from:	Yandex LLC
2016-10-06 14:42:06 +00:00
Bjoern A. Zeeb
89856f7e2d Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by:		re (hrs)
Obtained from:		projects/vnet
Reviewed by:		gnn, jhb
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D6747
2016-06-21 13:48:49 +00:00
Bjoern A. Zeeb
80ae8d609a Provide a public interface to rt_flushifroutes which takes the address
family as an argument as well.
This will be used to cleanup individual protocols during VNET teardown.

Obtained from:	projects/vnet
Sponsored by:	The FreeBSD Foundation
2016-06-06 12:49:47 +00:00
George V. Neville-Neil
6d76822688 This change re-adds L2 caching for TCP and UDP, as originally added in D4306
but removed due to other changes in the system. Restore the llentry pointer
to the "struct route", and use it to cache the L2 lookup (ARP or ND6) as
appropriate.

Submitted by:	Mike Karels
Differential Revision:	https://reviews.freebsd.org/D6262
2016-06-02 17:51:29 +00:00
Bjoern A. Zeeb
4f321dbd1c Fix compile errors after r297225:
- properly V_irtualise variable access unbreaking VIMAGE kernels.
- remove the volatile from the function return type to make architecture
  using gcc happy [-Wreturn-type]
  "type qualifiers ignored on function return type"
  I am not entirely happy with this solution putting the u_int there
  but it will do for now.
2016-03-24 11:40:10 +00:00
George V. Neville-Neil
84cc0778d0 FreeBSD previously provided route caching for TCP (and UDP). Re-add
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.

Submitted by:	Mike Karels
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D4306
2016-03-24 07:54:56 +00:00
Bjoern A. Zeeb
a5243af262 Code duplication but rib_head is special. Not found an easy way to go
back and harmize the use cases among RIB, IPFW, PF yet but it's also not
the scope of this work.   Prevents instant panics on teardown and frees
the FIB bits again.

Sponsored by:	The FreeBSD Foundation
2016-02-03 21:56:51 +00:00
Alexander V. Chernikov
94017572ab Fix flowtable part missed in r294706. 2016-01-25 09:31:32 +00:00
Alexander V. Chernikov
61eee0e202 MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
  with different requirements. In fact, first 3 don't have _any_ requirements
  and first 2 does not use radix locking. On the other hand, routing
  structure do have these requirements (rnh_gen, multipath, custom
  to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.

So, radix code now uses tiny 'struct radix_head' structure along with
  internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
  Existing consumers still uses the same 'struct radix_node_head' with
  slight modifications: they need to pass pointer to (embedded)
  'struct radix_head' to all radix callbacks.

Routing code now uses new 'struct rib_head' with different locking macro:
  RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
  information base).

New net/route_var.h header was added to hold routing subsystem internal
  data. 'struct rib_head' was placed there. 'struct rtentry' will also
  be moved there soon.
2016-01-25 06:33:15 +00:00
Alexander V. Chernikov
fcbfdb37a1 Fix panic in IP redirect. Panic was introduced in r293466.
Found by:	Yamagi Burmeister <lists at yamagi.org>>
2016-01-14 16:31:00 +00:00
Alexander V. Chernikov
10e0e23528 Remove now-unused wrappers for various routing functions. 2016-01-14 08:54:44 +00:00
Alexander V. Chernikov
0eb64f4e44 Remove RTF_RNH_LOCKED support from rtalloc1_fib().
Last caller using it was eliminated in r293471.

Sponsored by:	Yandex LLC
2016-01-13 14:32:48 +00:00
Alexander V. Chernikov
f2b2e77a41 (Temporarily) remove route_redirect_event eventhandler.
Such handler should pass different set of variables, instead
  of directly providing 2 locked route entries.
Given that it hasn't been really used since at least 2012, remove
  current code.
Will re-add it after finishing most major routing-related changes.

Discussed with:	np
2016-01-09 06:26:40 +00:00
Alexander V. Chernikov
16703ea811 Please Coverity by removing unneccessary check (rt_key() is always set).
Coverity CID:	1347797
2016-01-09 05:39:06 +00:00
Alexander V. Chernikov
048738b546 Do more fine-grained locking in rtrequest1_fib().
Last consumer using RTF_RNH_LOCKED flag was eliminated in r291643.
Restrict passing RTF_RNH_LOCKED to rtrequest1_fib() and do better
  locking for RTM_ADD / RTM_DELETE cases.
2016-01-08 16:25:11 +00:00
Alexander V. Chernikov
9a1b64d5a0 Add rib_lookup_info() to provide API for retrieving individual route
entries data in unified format.

There are control plane functions that require information other than
  just next-hop data (e.g. individual rtentry fields like flags or
  prefix/mask). Given that the goal is to avoid rte reference/refcounting,
  re-use rt_addrinfo structure to store most rte fields. If caller wants
  to retrieve key/mask or gateway (which are sockaddrs and are allocated
  separately), it needs to provide sufficient-sized sockaddrs structures
  w/ ther pointers saved in passed rt_addrinfo.

Convert:
  * lltable new records checks (in_lltable_rtcheck(),
    nd6_is_new_addr_neighbor().
  * rtsock pre-add/change route check.
  * IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
     1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
       not be multiple host routes for such hosts 2) if we have multiple
       routes we should inspect them (which is not done). 3) the entire idea
       of abusing KRT as storage for ND proxy seems odd. Userland programs
       should be used for that purpose).
2016-01-04 15:03:20 +00:00
Alexander V. Chernikov
6af272d88e Fix PINNED routes handling.
Before r291643, adding new interface prefix had the following logic:
try_add:
  EEXIST && (PINNED) {
    try_del(w/o PINNED flag)
    if (OK)
      try_add(PINNED)
}

In r291643, deletion was performed w/ PINNED flag held which leaded
  to new interface prefixes (like ::1) overriding older ones.
  Fix this by requesting deletion w/o RTF_PINNED.

PR:		kern/205285
Submitted by:	Fabian Keil <fk at fabiankeil.de>
2015-12-13 16:37:01 +00:00
Alexander V. Chernikov
4b3dc89847 Move RTF_PINNED handling to generic route code.
This eliminates last RTF_RNH_LOCKED rtrequest1_fib() user.
2015-12-02 08:17:31 +00:00
Enji Cooper
af5c99e53f Fix LINT-NOIP kernels after r291467
rn is only used if INET or INET6 are defined

Sponsored by: EMC / Isilon Storage Division
2015-12-01 05:59:53 +00:00
Alexander V. Chernikov
674e0823c1 Move flowtable rte checks to separate function. 2015-11-30 05:59:22 +00:00
Alexander V. Chernikov
e8b0643eee Add new rt_foreach_fib_walk_del() function for deleting route entries
by filter function instead of picking into routing table details in
  each consumer.
Remove now-unused rt_expunge() (eliminating last external RTF_RNH_LOCKED
 user).
This simplifies future nexthops/mulitipath changes and rtrequest1_fib()
  locking refactoring.

Actual changes:
Add "rt_chain" field to permit rte grouping while doing batched delete
  from routing table (thus growing rte 200->208 on amd64).
Add "rti_filter" /  "rti_filterdata" / "rti_spare" fields to rt_addrinfo
  to pass filter function to various routing subsystems in standard way.
Convert all rt_expunge() customers to new rt_addinfo-based api and eliminate
  rt_expunge().
2015-11-30 05:51:14 +00:00
Alexander V. Chernikov
e4790abf19 Pass provided af instead of AF_UNSPEC to setwa_f callback. 2015-11-14 18:16:17 +00:00
Bryan Drewery
2780ba06c7 Avoid passing an uninitialized 'i'. Currently nothing was depending on it
anyhow.

Coverity CID:	1331562
2015-10-29 18:58:18 +00:00
Alexander V. Chernikov
f221bcaa06 Remove several compat functions from pre-fib era. 2015-10-17 17:26:44 +00:00
Eric van Gyzen
17a036563d Fix the handling of IPv6 On-Link Redirects.
On receipt of a redirect message, install an interface route for the
redirected destination.  On removal of the corresponding Neighbor Cache
entry, remove the interface route.

This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.

This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.

Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.

When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE.  cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi.  (Thanks!)

These other test cases also passed in 2012:

* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
  RTF_HOST|RTF_GATEWAY route for the destination)

* the redirected-to-self case, with IPv4 and IPv6

* a valid IPv4 redirect

All testing in 2012 was done with WITNESS and INVARIANTS.

Tested by:    EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
              Mark Kelley <mark_kelley@dell.com> in 2012,
              TC Telkamp <terence_telkamp@dell.com> in 2012
PR:           152791
Reviewed by:  melifaro (current rev), bz (earlier rev)
Approved by:  kib (mentor)
MFC after:    1 month
Relnotes:     yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
2015-09-14 19:17:25 +00:00
Alexander V. Chernikov
441f9243df Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3464
2015-09-05 05:33:20 +00:00
Alexander V. Chernikov
2caee4be35 Rename rt_foreach_fib() to rt_foreach_fib_walk().
Suggested by:	julian
2015-08-10 20:50:31 +00:00
Alexander V. Chernikov
4bdf0b6a9a MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
Luiz Otavio O Souza
8b15f615e0 Follow r256586 and rename the kernel version of the Free() macro to
R_Free().  This matches the other macros and reduces the chances to clash
with other headers.

This also fixes the build of radix.c outside of the kernel environment.

Reviewed by:	glebius
2015-07-30 02:09:03 +00:00
Marcelo Araujo
546afaf83d Remove duplicate header entry. 2015-04-16 02:44:37 +00:00
Alexander V. Chernikov
7f948f12f6 Finish r274175: do control plane MTU tracking.
Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR:		194238
MFC after:	1 month
CR:		D1125
2014-11-17 01:05:29 +00:00
Alexander V. Chernikov
ac2cf5d37e Revert r274585: rte lock is properly destroyed in uma dtor callback.
Pointed by:	glebius
2014-11-16 18:15:23 +00:00
Alexander V. Chernikov
3cb04899de Make witness happy: destroy rte lock before free.
MFC after:	2 weeks
2014-11-16 14:56:31 +00:00
Alexander V. Chernikov
57c3556b58 Fix build.
Pointy hat to:	melifaro
2014-11-06 17:50:35 +00:00
Alexander V. Chernikov
146a181f28 Finish r274118: remove useless fields from struct domain.
Sponsored by:	Yandex LLC
2014-11-06 14:39:04 +00:00
Alexander V. Chernikov
1a75e3b20f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
Hiroki Sato
ee0bd4b909 Make net.add_addr_allfibs vnet-local. 2014-09-21 03:48:20 +00:00
Alan Somers
4f8585e021 Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
	Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
	Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
	versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
	Fixup calls of modified functions.

share/man/man9/ifnet.9
	Document changed API.

CR:		https://reviews.freebsd.org/D458
MFC after:	Never
Sponsored by:	Spectra Logic
2014-09-11 20:21:03 +00:00
Hans Petter Selasky
af3b2549c4 Pull in r267961 and r267973 again. Fix for issues reported will follow. 2014-06-28 03:56:17 +00:00
Glen Barber
37a107a407 Revert r267961, r267973:
These changes prevent sysctl(8) from returning proper output,
such as:

 1) no output from sysctl(8)
 2) erroneously returning ENOMEM with tools like truss(1)
    or uname(1)
 truss: can not get etype: Cannot allocate memory
2014-06-27 22:05:21 +00:00
Hans Petter Selasky
3da1cf1e88 Extend the meaning of the CTLFLAG_TUN flag to automatically check if
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.

Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.

MFC after:	2 weeks
Sponsored by:	Mellanox Technologies
2014-06-27 16:33:43 +00:00
Alan Somers
2f308a343f Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
	Add legacy-compatible functions as described above.  Ensure legacy
	behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
	Call with _fib() functions if we must use a specific fib, or the
	legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
	Improve the udp_dontroute test.  The bug that this test exercises is
	that ifa_ifwithnet() will return the wrong address, if multiple
	interfaces have addresses on the same subnet but with different
	fibs.  The previous version of the test only considered one possible
	failure mode: that ifa_ifwithnet_fib() might fail to find any
	suitable address at all.  The new version also checks whether
	ifa_ifwithnet_fib() finds the correct address by checking where the
	ARP request goes.

Reported by:	bz, hrs
Reviewed by:	hrs
MFC after:	1 week
X-MFC-with:	264905
Sponsored by:	Spectra Logic
2014-05-29 21:03:49 +00:00
Alexander V. Chernikov
972ed56a33 Remove additional fib checks from rtalloc1_fib.
It looks like current consumers are either unaware
of MRT (and uses RT_DEFAULT_FIB implicitly) or
know what thay are doing, In latter case they
will be either hit by KASSERT or ESCRH will be returned
due to NULL rnh.
2014-05-03 16:38:05 +00:00
Alexander V. Chernikov
b980262e63 Pass radix head ptr along with rte to rtexpunge().
Rename rtexpunge to rt_expunge().
2014-05-03 16:28:54 +00:00
Alexander V. Chernikov
0fb9298db9 Move rt_setmetrics() from rtsock.c to route.c.
All rtsock-initiated rte creation/modification are now
performed in route.c holding radix tree write lock.
This reduces the need for per-rte mutex.

Sponsored by:	Yandex LLC
MFC after:	1 month
2014-04-29 19:14:42 +00:00
Alexander V. Chernikov
a713ee5cf7 Do not use senderr() in rtrequest1_fib_change().
Suggested by:	glebius
MFC after:	4 weeks
2014-04-29 12:52:36 +00:00
Alexander V. Chernikov
f59c6cb0fc Remove useless `register' declarations.
MFC after:	1 month
2014-04-26 22:42:21 +00:00
Alexander V. Chernikov
c77462dd64 Decouple RTM_CHANGE from RTM_GET handling in rtsock.c:route_output().
RTM_CHANGE is now handled inside route.c:rtrequest1_fib() as it should be.
Note change change handler is a separate function rtrequest1_fib_change().

MFC after:	1 month
2014-04-26 21:03:41 +00:00
Alexander V. Chernikov
36d55f0f9d Unify sa_equal() macro usage.
MFC after:	2 weeks
2014-04-26 14:52:03 +00:00