1504 Commits

Author SHA1 Message Date
Alexander V. Chernikov
73d770287d Do more fine-grained lltable locking: use table runtime lock as rare
as we can.
2014-11-23 15:38:06 +00:00
Alexander V. Chernikov
9479029b1f * Add lltable llt_hash callback
* Move lltable items insertions/deletions to generic llt code.
2014-11-23 12:15:28 +00:00
Alexander V. Chernikov
7c066c18db Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
    (IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
  use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.
2014-11-22 19:53:36 +00:00
Alexander V. Chernikov
27688dfe1d Temporarily revert r274774. 2014-11-22 17:57:54 +00:00
Alexander V. Chernikov
4194b42144 Another r274774 fix. 2014-11-21 23:37:14 +00:00
Alexander V. Chernikov
86b94cffe4 Finish r274774: add more headers/fix build for non-debug case. 2014-11-21 23:36:21 +00:00
Alexander V. Chernikov
9883e41b4b Switch IF_AFDATA lock to rmlock 2014-11-21 02:28:56 +00:00
Alexander V. Chernikov
4d56c133fb Sync to HEAD@r274766 2014-11-21 01:22:33 +00:00
Alexander V. Chernikov
f9723c7705 Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp
we need to return.
Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags
fields for all relevant functions.
2014-11-20 22:41:59 +00:00
Alexander V. Chernikov
7f948f12f6 Finish r274175: do control plane MTU tracking.
Update route MTU in case of ifnet MTU change.
Add new RTF_FIXEDMTU to track explicitly specified MTU.

Old behavior:
ifconfig em0 mtu 1500->9000 -> all routes traversing em0 do not change MTU.
User has to manually update all routes.
ifconfig em0 mtu 9000->1500 -> all routes traversing em0 do not change MTU.
However, if ip[6]_output finds route with rt_mtu > interface mtu, rt_mtu
gets updated.

New behavior:
ifconfig em0 mtu 1500->9000 -> all interface routes in all fibs gets updated
with new MTU unless RTF_FIXEDMTU flag set on them.
ifconfig em0 mtu 9000->1500 -> all routes in all fibs gets updated with new
MTU unless RTF_FIXEDMTU flag set on them AND rt_mtu is less than ifp mtu.

route add ... -mtu XXX automatically sets RTF_FIXEDMTU flag.
route change .. -mtu 0 automatically removes RTF_FIXEDMTU flag.

PR:		194238
MFC after:	1 month
CR:		D1125
2014-11-17 01:05:29 +00:00
Alexander V. Chernikov
df629abf3e Rework LLE code locking:
* struct llentry is now basically split into 2 pieces:
  all fields within 64 bytes (amd64) are now protected by both
  ifdata lock AND lle lock, e.g. you require both locks to be held
  exclusively for modification. All data necessary for fast path
  operations is kept here. Some fields were added:
  - r_l3addr - makes lookup key liev within first 64 bytes.
  - r_flags - flags, containing pre-compiled decision whether given
    lle contains usable data or not. Current the only flag is RLLE_VALID.
  - r_len - prepend data len, currently unused
  - r_kick - used to provide feedback to control plane (see below).
  All other fields are protected by lle lock.
* Add simple state machine for ARP to handle "about to expire" case:
  Current model (for the fast path) is the following:
  - rlock afdata
  - find / rlock rte
  - runlock afdata
  - see if "expire time" is approaching
    (time_uptime + la->la_preempt > la->la_expire)
  - if true, call arprequest() and decrease la_preempt
  - store MAC and runlock rte
  New model (data plane):
  - rlock afdata
  - find rte
  - check if it can be used using r_* fields only
  - if true, store MAC
  - if r_kick field != 0 set it to 0.
  - runlock afdata
  New mode (control plane):
  - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries)
    seconds instead of V_arpt_keep.
  - on first timer invocation change state from ARP_LLINFO_REACHABLE
    to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in
    V_arpt_rexmit (default to 1 sec).
  - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks
    for r_kick value: reschedule if not changed, and send arprequest()
    if set to zero (e.g. entry was used).
* Convert IPv4 path to use new single-lock approach. IPv6 bits to follow.
* Slow down in_arpinput(): now valid reply will (in most cases) require
  acquiring afdata WLOCK twice. This is requirement for storing changed
  lle data. This change will be slightly optimized in future.
* Provide explicit hash link/unlink functions for both ipv4/ipv6 code.
  This will probably be moved to generic lle code once we have per-AF
  hashing callback inside lltable.
* Perform lle unlink on deletion immediately instead of delaying it to
  the timer routine.
* Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the
  presence of lle reference used for safe callout calls.
2014-11-16 20:12:49 +00:00
Alexander V. Chernikov
b4b1367ae4 * Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
  Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().
2014-11-15 18:54:07 +00:00
Andrey V. Elsukov
794a349c6f We don't return sp pointer, thus NULL assignment isn't needed.
And reference to sp will be freed at the end.

MFC after:	1 week
Sponsored by:	Yandex LLC
2014-11-12 22:58:52 +00:00
Alexander V. Chernikov
670e8b3b8c Kill custom in_matroute() radix mathing function removing one rte mutex lock.
Initially in_matrote() in_clsroute() in their current state was introduced by
r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them
in route table, setting RTPRF_OURS flag and some expire time. After that, either
GC came or RTPRF_OURS got removed on first-packet. It was a good solution
in that days (and probably another decade after that) to keep TCP metrics.
However, after moving metrics to TCP hostcache in r122922, most of in_rmx
functionality became unused. It might had been used for flushing icmp-originated
routes before rte mutexes/refcounting, but I'm not sure about that.

So it looks like this is nearly impossible to make GC do its work nowadays:

in_rtkill() ignores non-RTPRF_OURS routes.
route can only become RTPRF_OURS after dropping last reference via rtfree()
which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes.

Dynamic routes can still be installed via received redirect, but they
have default lifetime (no specific rt_expire) and no one has another trie walker
to call RTFREE() on them.

So, the changelist:
* remove custom rnh_match / rnh_close matching function.
* remove all GC functions
* partially revert r256695 (proto3 is no more used inside kernel,
  it is not possible to use rt_expire from user point of view, proto3 support
  is not complete)
* Finish r241884 (similar to this commit) and remove remaining IPv6 parts

MFC after:	1 month
2014-11-11 02:52:40 +00:00
Andrey V. Elsukov
002c24396d Add sa6_checkzone_ifp() function. It checks correctness of struct
sockaddr_in6, usually obtained from the user level through ioctl.
It initializes sin6_scope_id using given interface.

Sponsored by:	Yandex LLC
2014-11-10 16:12:51 +00:00
Alexander V. Chernikov
e0c0711e01 * Make nd6_dad_duplicated() constant.
* Simplify refcounting by using nd6_dad_add() / nd6_dad_del().

Reviewed by:	ae
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-11-10 16:01:39 +00:00
Andrey V. Elsukov
06fec20791 Remove link-local multicast routes remnants from in6_purgeaddr.
Also merge in6_purgeaddr_mc with in6_purgeaddr.

Sponsored by:	Yandex LLC
2014-11-10 16:01:31 +00:00
Gleb Smirnoff
e6abaf91f4 Consistently use if_link.
Reviewed by:	ae, melifaro
2014-11-10 15:56:30 +00:00
Andrey V. Elsukov
45d1880a36 For now handle only multicast addresses, we still use routes to
LLA unicasts yet.

Sponsored by:	Yandex LLC
2014-11-10 10:59:08 +00:00
Alexander V. Chernikov
f7bab8d0dd Switch route radix to dual-lock model:
use rmlock for data patch access, and config rwlock
for conrol plane processing. Route table changes require
bock locks held.
2014-11-10 00:07:06 +00:00
Andrey V. Elsukov
ea455de91d Use embedded scope zone id to determine outgoing interface for link-local
and node-local addresses.
2014-11-09 22:54:40 +00:00
Alexander V. Chernikov
36f34ac70b Fix nd6_output_flush() prototype.
Remove 'net/route_internal.h' header from stf.
2014-11-09 22:16:50 +00:00
Alexander V. Chernikov
603eaf792b Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.

No objections from:	net@
2014-11-09 21:33:01 +00:00
Alexander V. Chernikov
d0f9fca40d Remove forgotten arguments. 2014-11-09 16:57:31 +00:00
Alexander V. Chernikov
033074c440 Replace 'struct route *' if_output() argument with 'struct nhop_info *'.
Leave 'struct route' as is for legacy routing api users.
Remove most of rtalloc_ign*-derived functions.
2014-11-09 16:33:04 +00:00
Alexander V. Chernikov
9c9bde01d1 Remove unused 'struct route *' argument from nd6_output_flush(). 2014-11-09 16:20:27 +00:00
Alexander V. Chernikov
55e5eda676 Separate radix and routing: use different structures for route and
for other customers.

Introduce new 'struct rib_head' for routing purposes and make
all routing api use it.
2014-11-09 00:36:39 +00:00
Andrey V. Elsukov
3e88eb903b Remove ip6_getdstifaddr() and all functions to work with auxiliary data.
It isn't safe to keep unreferenced ifaddrs. Use in6ifa_ifwithaddr() to
determine ifaddr corresponding to destination address. Since currently
we keep addresses with embedded scope zone, in6ifa_ifwithaddr is called
with zero zoneid and marked with XXX.

Also remove route and lle lookups from ip6_input. Use in6ifa_ifwithaddr()
instead.

Sponsored by:	Yandex LLC
2014-11-08 19:38:34 +00:00
Alexander V. Chernikov
a9413f6ca0 Sync to HEAD@r274297. 2014-11-08 18:13:35 +00:00
Alexander V. Chernikov
1398ffe5bc Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.
2014-11-08 16:38:15 +00:00
Alexander V. Chernikov
3939f50c88 Finish r274290#2: remove unused IPv6 code. 2014-11-08 16:31:11 +00:00
Alexander V. Chernikov
22b08fd8b7 Split radix implementation and system route table structure:
use new 'struct radix_head' for radix.
2014-11-07 22:52:02 +00:00
Andrey V. Elsukov
f325335caf Overhaul if_gre(4).
Split it into two modules: if_gre(4) for GRE encapsulation and
if_me(4) for minimal encapsulation within IP.

gre(4) changes:
* convert to if_transmit;
* rework locking: protect access to softc with rmlock,
  protect from concurrent ioctls with sx lock;
* correct interface accounting for outgoing datagramms (count only payload size);
* implement generic support for using IPv6 as delivery header;
* make implementation conform to the RFC 2784 and partially to RFC 2890;
* add support for GRE checksums - calculate for outgoing datagramms and check
  for inconming datagramms;
* add support for sending sequence number in GRE header;
* remove support of cached routes. This fixes problem, when gre(4) doesn't
  work at system startup. But this also removes support for having tunnels with
  the same addresses for inner and outer header.
* deprecate support for various GREXXX ioctls, that doesn't used in FreeBSD.
  Use our standard ioctls for tunnels.

me(4):
* implementation conform to RFC 2004;
* use if_transmit;
* use the same locking model as gre(4);

PR:		164475
Differential Revision:	D1023
No objections from:	net@
Relnotes:	yes
Sponsored by:	Yandex LLC
2014-11-07 19:13:19 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Gleb Smirnoff
428cf06b31 Remove VNET_SYSCTL_ARG(). The generic sysctl(9) code handles that.
Reviewed by:	ae
Sponsored by:	Nginx, Inc.
2014-11-07 08:58:05 +00:00
Alexander V. Chernikov
064b1bdb2d Convert lle rtchecks to use new routing API.
For inet/ case, this involves reverting r225947
which seem to be pretty strange commit and should
be reverted in HEAD ad well.
2014-11-06 23:35:22 +00:00
Alexander V. Chernikov
146a181f28 Finish r274118: remove useless fields from struct domain.
Sponsored by:	Yandex LLC
2014-11-06 14:39:04 +00:00
Alexander V. Chernikov
1a75e3b20f Make checks for rt_mtu generic:
Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
 In this case domain overrides radix _addroute callback (in[6]_addroute)
 and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
 In this case, there are no explicit per-domain calls, but one can
 override rte by setting ifa_rtrequest hook to domain handler
 (inet6 does this).

3) ifconfig ifaceX mtu YYYY
 In this case, we have no callbacks, but ip[6]_output performes runtime
 checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
 control plane, not in runtime part, and properly deal with increased
 interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
 if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
 However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
 for some cases, so we need an abstract way to know maximum MTU size
 for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
  user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
  use this functions on new non-inserted rte.

More changes will follow soon.

MFC after:	1 month
Sponsored by:	Yandex LLC
2014-11-06 13:13:09 +00:00
Alexander V. Chernikov
9f25cbe45e Remove old hack abusing domattach from NFS code.
According to IANA RPC uaddr registry, there are no AFs
except IPv4 and IPv6, so it's not worth being too abstract here.

Remove ne_rtable[AF_MAX+1] and use explicit per-AF radix tries.
Use own initialization without relying on domattach code.

While I admit that this was one of the rare places in kernel
networking code which really was capable of doing multi-AF
without any AF-depended code, it is not possible anymore to
rely on dom* code.

While here, change terrifying "Invalid radix node head, rn:" message,
to different non-understandable "netcred already exists for given addr/mask",
but less terrifying. Since we know that rn_addaddr() returns NULL if
the same record already exists, we should provide more friendly error.

MFC after:	1 month
2014-11-05 00:58:01 +00:00
Alexander V. Chernikov
69b74805d5 Convert gif and stf to use new routing api. 2014-11-04 18:48:13 +00:00
Alexander V. Chernikov
5c9ef37854 Sync to HEAD@r274095. 2014-11-04 18:22:33 +00:00
Alexander V. Chernikov
8c3cfe0be0 Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.
2014-11-04 17:28:13 +00:00
Alexander V. Chernikov
a9ac00b76b Convert in6p_lookup_mcast_ifp() to use new routing api.
* Add special fib6_lookup_nh_ifp() to return rt_ifp
  instead of rt_ifa->ifa_ifp for that.
2014-11-04 17:05:24 +00:00
Alexander V. Chernikov
257480b8ab Convert netinet6/ to use new routing API.
* Remove &ifpp from ip6_output() in favor of ri->ri_nh_info
* Provide different wrappers to in6_selectsrc:
  Currently it is used by 2 differenct type of customers:
  - socket-based one, which all are unsure about provided
   address scope and
  - in-kernel ones (ND code mostly), which don't have
    any sockets, options, crededentials, etc.
  So, we provide two different wrappers to in6_selectsrc()
  returning select source.
* Make different versions of selectroute():
  Currenly selectroute() is used in two scenarios:
  - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute()
  - output, via in6_output -> wrapper -> selectroute()
  Provide different versions for each customer:
  - fib6_lookup_nh_basic()-based in6_selectif() which is
    capable of returning interface only, without MTU/NHOP/L2
    calculations
  - full-blown fib6_selectroute() with cached route/multipath/
    MTU/L2
* Stop using routing table for link-local address lookups
* Add in6_ifawithifp_lla() to make for-us check faster for link-local
* Add in6_splitscope / in6_setllascope for faster embed/deembed scopes
2014-11-04 15:39:56 +00:00
Hiroki Sato
da1304cb42 Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when
the newly-added IPv6 address was /128.

PR:	188032
2014-11-02 21:58:31 +00:00
Andrey V. Elsukov
94a43496c2 Remove redundant code.
if_detach already did these steps. Also, now we didn't keep routes to link-local
addresses.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-10-30 12:44:46 +00:00
Andrey V. Elsukov
3c268b3afc Move ifq drain into in6m_purge().
Suggested by:	bms
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-10-30 11:34:07 +00:00
Andrey V. Elsukov
8ff1eae10d Fix mbuf leak in IPv6 multicast code.
When multicast capable interface goes away, it leaves multicast groups,
this leads to generate MLD reports, but MLD code does deffered send and
MLD reports are queued in the in6_multi's in6m_scq ifq. The problem is
that in6_multi structures are freed when interface leaves multicast groups
and thread that does deffered send will not take these queued packets.

PR:		194577
MFC after:	1 week
Sponsored by:	Yandex LLC
2014-10-30 10:59:57 +00:00
Andrey V. Elsukov
c56173a626 Do not automatically install routes to link-local and interface-local multicast
addresses.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2014-10-27 16:15:15 +00:00
Andrey V. Elsukov
8e4bdfa2db Remove unused function.
Sponsored by:	Yandex LLC
2014-10-27 10:34:09 +00:00