Commit Graph

303 Commits

Author SHA1 Message Date
Alexander V. Chernikov
9479029b1f * Add lltable llt_hash callback
* Move lltable items insertions/deletions to generic llt code.
2014-11-23 12:15:28 +00:00
Alexander V. Chernikov
7c066c18db Use less-invasive approach for IF_AFDATA lock: convert into 2 locks:
use rwlock accessible via external functions
    (IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
  use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.
2014-11-22 19:53:36 +00:00
Alexander V. Chernikov
27688dfe1d Temporarily revert r274774. 2014-11-22 17:57:54 +00:00
Alexander V. Chernikov
8f465f6690 Convert &in_ifaddr_lock to dual-locking model:
use rwlock accessible via external functions
   (IN_IFADDR_CFG_* -> in_ifaddr_cfg_*()) for all control plane tasks
  use rmlock (IN_IFADDR_RUN_*) for fast-path lookups.
2014-11-22 16:27:51 +00:00
Alexander V. Chernikov
86b94cffe4 Finish r274774: add more headers/fix build for non-debug case. 2014-11-21 23:36:21 +00:00
Alexander V. Chernikov
9883e41b4b Switch IF_AFDATA lock to rmlock 2014-11-21 02:28:56 +00:00
Alexander V. Chernikov
f9723c7705 Simplify API: use new NHOP_LOOKUP_AIFP flag to select what ifp
we need to return.
Rename fib[64]_lookup_nh_basic to fib[64]_lookup_nh, add flags
fields for all relevant functions.
2014-11-20 22:41:59 +00:00
Alexander V. Chernikov
df629abf3e Rework LLE code locking:
* struct llentry is now basically split into 2 pieces:
  all fields within 64 bytes (amd64) are now protected by both
  ifdata lock AND lle lock, e.g. you require both locks to be held
  exclusively for modification. All data necessary for fast path
  operations is kept here. Some fields were added:
  - r_l3addr - makes lookup key liev within first 64 bytes.
  - r_flags - flags, containing pre-compiled decision whether given
    lle contains usable data or not. Current the only flag is RLLE_VALID.
  - r_len - prepend data len, currently unused
  - r_kick - used to provide feedback to control plane (see below).
  All other fields are protected by lle lock.
* Add simple state machine for ARP to handle "about to expire" case:
  Current model (for the fast path) is the following:
  - rlock afdata
  - find / rlock rte
  - runlock afdata
  - see if "expire time" is approaching
    (time_uptime + la->la_preempt > la->la_expire)
  - if true, call arprequest() and decrease la_preempt
  - store MAC and runlock rte
  New model (data plane):
  - rlock afdata
  - find rte
  - check if it can be used using r_* fields only
  - if true, store MAC
  - if r_kick field != 0 set it to 0.
  - runlock afdata
  New mode (control plane):
  - schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries)
    seconds instead of V_arpt_keep.
  - on first timer invocation change state from ARP_LLINFO_REACHABLE
    to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in
    V_arpt_rexmit (default to 1 sec).
  - on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks
    for r_kick value: reschedule if not changed, and send arprequest()
    if set to zero (e.g. entry was used).
* Convert IPv4 path to use new single-lock approach. IPv6 bits to follow.
* Slow down in_arpinput(): now valid reply will (in most cases) require
  acquiring afdata WLOCK twice. This is requirement for storing changed
  lle data. This change will be slightly optimized in future.
* Provide explicit hash link/unlink functions for both ipv4/ipv6 code.
  This will probably be moved to generic lle code once we have per-AF
  hashing callback inside lltable.
* Perform lle unlink on deletion immediately instead of delaying it to
  the timer routine.
* Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the
  presence of lle reference used for safe callout calls.
2014-11-16 20:12:49 +00:00
Alexander V. Chernikov
b4b1367ae4 * Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
  Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().
2014-11-15 18:54:07 +00:00
Alexander V. Chernikov
a9413f6ca0 Sync to HEAD@r274297. 2014-11-08 18:13:35 +00:00
Gleb Smirnoff
6df8a71067 Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.
Sponsored by:	Nginx, Inc.
2014-11-07 09:39:05 +00:00
Alexander V. Chernikov
064b1bdb2d Convert lle rtchecks to use new routing API.
For inet/ case, this involves reverting r225947
which seem to be pretty strange commit and should
be reverted in HEAD ad well.
2014-11-06 23:35:22 +00:00
Alexander V. Chernikov
8c3cfe0be0 Hide 'struct rtentry' and all its macro inside new header:
net/route_internal.h
The goal is to make its opaque for all code except route/rtsock and
proto domain _rmx.
2014-11-04 17:28:13 +00:00
Hiroki Sato
cc45ae406d Add a change missing in r271916. 2014-09-21 04:38:50 +00:00
Xin LI
a7f77a3950 Restore historical behavior of in_control, which, when no matching address
is found, the first usable address is returned for legacy ioctls like
SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR.

While there also fix a subtle issue that a caller from a jail asking for
INADDR_ANY may get the first IP of the host that do not belong to the jail.

Submitted by:	glebius
Differential Revision: https://reviews.freebsd.org/D667
2014-08-22 19:08:12 +00:00
Steven Hartland
5af464bbe0 Ensure that IP's added to CARP always use the CARP MAC
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.

This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
2014-07-31 16:43:56 +00:00
Steven Hartland
d34165f759 Only check error if one could have been generated 2014-07-31 09:18:29 +00:00
Gleb Smirnoff
9753faf553 Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
2014-07-29 15:01:29 +00:00
Alan Somers
7278b62aee Fix a panic when removing an IP address from an interface, if the same address
exists on another interface.  The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS.  The solution
is to use the interface fib in that call.  For the majority of users, that will
be equivalent to the legacy behavior.

PR:		kern/189089
Reported by:	neel
Reviewed by:	neel
MFC after:	3 weeks
X-MFC with:	264887
Sponsored by:	Spectra Logic
2014-04-29 14:46:45 +00:00
Alan Somers
0cfee0c223 Fix subnet and default routes on different FIBs on the same subnet.
These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
	Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
	functions will only return an address whose interface fib equals the
	argument.

sys/net/route.c
	Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
	arguments.

sys/netinet/in.c
	Update in_addprefix to consider the interface fib when adding
	prefixes.  This will prevent it from not adding a subnet route when
	one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
	Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
	In some cases it there wasn't a clear specific fib number to use.
	In others, I was unable to test those functions so I chose
	RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
	fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
	Revert r263738.  The udp_dontroute test was right all along.
	However, bugs kern/187550 and kern/187553 cancelled each other out
	when it came to this test.  Because of kern/187553, ifa_ifwithnet
	searched the default fib instead of the requested one, but because
	of kern/187550, there was an applicable subnet route on the default
	fib.  The new test added in r263738 doesn't work right, however.  I
	can verify with dtrace that ifa_ifwithnet returned the wrong address
	before I applied this commit, but route(8) miraculously found the
	correct interface to use anyway.  I don't know how.

	Clear expected failure messages for kern/187550 and kern/187552.

PR:		kern/187550
PR:		kern/187552
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic
2014-04-24 23:56:56 +00:00
Alan Somers
0489b8916e Fix host and network routes for new interfaces when net.add_addr_allfibs=0
sys/net/route.c
	In rtinit1, use the interface fib instead of the process fib.  The
	latter wasn't very useful because ifconfig(8) is usually invoked
	with the default process fib.  Changing ifconfig(8) to use setfib(2)
	would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
	Clear the expected ATF failure

sys/net/if.c
	Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
	Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
	in_scrubprefix.  Pass it the interface fib.

PR:		kern/187549
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-04-24 17:23:16 +00:00
Kevin Lo
e06e816f67 Add support for UDP-Lite protocol (RFC 3828) to IPv4 and IPv6 stacks.
Tested with vlc and a test suite [1].

[1] http://www.erg.abdn.ac.uk/~gerrit/udp-lite/files/udplite_linux.tar.gz

Reviewed by:	jhb, glebius, adrian
2014-04-07 01:53:03 +00:00
Alan Somers
743c072a09 Correct ARP update handling when the routes for network interfaces are
restricted to a single FIB in a multifib system.

Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x".  This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.

sys/netinet/in.c:
	When dealing with RTM_ADD (add route) requests for an interface, use
	the interface's assigned FIB instead of the default (FIB 0).

sys/netinet/if_ether.c:
	In arpresolve(), enhance error message generated when an
	lla_lookup() fails so that the interface causing the error is
	visible in logs.

tests/sys/netinet/fibs_test.sh
	Clear ATF expected error.

PR:		kern/167947
Submitted by:	Nikolay Denev <ndenev@gmail.com> (previous version)
Reviewed by:	melifaro
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corporation
2014-03-26 22:46:03 +00:00
Alexander V. Chernikov
a49b317c41 Fix refcount leak on netinet ifa.
Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-01-16 12:35:18 +00:00
Alexander V. Chernikov
d375edc9b5 Simplify inet alias handling code: if we're adding/removing alias which
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

This eliminates the following rtsock messages:

Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).

Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

before commit, addition:

  got message of size 116 on Fri Jan 10 14:13:15 2014
  RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  got message of size 192 on Fri Jan 10 14:13:15 2014
  RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
  locks:  inits:
  sockaddrs: <DST,GATEWAY,NETMASK>
   10.0.0.0 10.0.0.2 (255) ffff ffff ff

after commit, addition:

  got message of size 116 on Fri Jan 10 13:56:26 2014
  RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

before commit, wihdrawal:

  got message of size 192 on Fri Jan 10 13:58:59 2014
  RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
  locks:  inits:
  sockaddrs: <DST,GATEWAY,NETMASK>
   10.0.0.0 10.0.0.2 (255) ffff ffff ff

  got message of size 116 on Fri Jan 10 13:58:59 2014
  RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

adter commit, withdrawal:

  got message of size 116 on Fri Jan 10 14:14:11 2014
  RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
  sockaddrs: <NETMASK,IFP,IFA,BRD>
   255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).

I've tested this change with quagga (no change) and bird (*).

bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.

I'm going to MFC this change if there will be no complains about behavior
change.

While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).

Sponsored by:	Yandex LLC
MFC after:	4 weeks
2014-01-10 12:13:55 +00:00
Andrey V. Elsukov
e2d14d9317 Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.

MFC after:	1 week
2014-01-03 02:32:05 +00:00
Gleb Smirnoff
9706c950a2 Fix couple of bugs from r257692 related to scan of address list on
an interface:
- in in_control() skip over not AF_INET addresses.
- in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check
  of address family, w/o accessing memory beyond struct ifaddr.

Sponsored by:	Nginx, Inc.
2013-12-29 22:20:06 +00:00
Gleb Smirnoff
c1f7c3f500 In r257692 I intentionally deleted code that handled P2P interfaces
with equal addresses on both sides. It appeared that OpenVPN uses
such configutations.

Submitted by:	trociny
2013-11-17 15:14:07 +00:00
Gleb Smirnoff
555036b5f6 Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.
2013-11-11 05:39:42 +00:00
Gleb Smirnoff
77b89ad837 Provide compat layer for OSIOCAIFADDR. 2013-11-06 19:46:20 +00:00
Gleb Smirnoff
821b5caf7a Fix my braino in r257692. For SIOCG*ADDR we don't need exact match on
specified address, actually in most cases the address isn't specified.

Reported by:	peter
2013-11-06 08:36:08 +00:00
Nathan Whitehorn
6224cd89c0 Fix build on GCC. 2013-11-06 01:14:00 +00:00
Gleb Smirnoff
f7a39160c2 Rewrite in_control(), so that it is comprehendable without getting mad.
o Provide separate functions for SIOCAIFADDR and for SIOCDIFADDR, with
  clear code flow from beginning to the end. After that the rest of
  in_control() gets very small and clear.
o Provide sx(9) lock to protect against parallel ioctl() invocations.
o Reimplement logic from r201282, that tried to keep localhost route in
  table when multiple P2P interfaces with same local address are created
  and deleted.

Discussed with:		pluknet, melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2013-11-05 07:44:15 +00:00
Gleb Smirnoff
b1b9dcae46 Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
2013-11-05 07:32:09 +00:00
Gleb Smirnoff
237bf7f773 Cleanup in_ifscrub(), which is just an entry to in_scrubprefix(). 2013-11-01 10:18:41 +00:00
Gleb Smirnoff
c3322cb91c Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-28 07:29:16 +00:00
Gleb Smirnoff
4675896098 Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 10:31:42 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Oleg Bulyzhin
1571132f14 Plug static llentry leak (ipv4 & ipv6 were affected).
PR:		kern/172985
MFC after:	1 month
2013-04-21 21:28:38 +00:00
Gleb Smirnoff
9711a168b9 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
Peter Wemm
8a1163e82f Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.
2013-01-03 10:21:28 +00:00
Gleb Smirnoff
468e45f3bd The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

  However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

  This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
2012-12-25 13:01:58 +00:00
Gleb Smirnoff
3e6c8b5366 Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.
2012-12-24 21:35:48 +00:00
Randall Stewart
7db496de2c Though I disagree, I conceed to jhb & Rui. Note
that we still have a problem with this whole structure of
locks and in_input.c [it does not lock which it should not, but
this *can* lead to crashes]. (I have seen it in our SQA
testbed.. besides the one with a refcnt issue that I will
have SQA work on next week ;-)
2012-08-19 11:54:02 +00:00
Randall Stewart
9424879158 Ok jhb, lets move the ifa_free() down to the bottom to
assure that *all* tables and such are removed before
we start to free. This won't protect the Hash in ip_input.c
but in theory should protect any other uses that *do* use locks.

MFC after:	1 week (or more)
2012-08-17 05:51:46 +00:00
Randall Stewart
184749821f Its never a good idea to double free the same
address.

MFC after:	1 week (after the other commits ahead of this gets MFC'd)
2012-08-16 17:55:16 +00:00
Gleb Smirnoff
ea53792942 Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
  disestablish them.
  - This allows us to simplify the arptimer() and make it
    race safe.
o Consistently use ifp->if_afdata_lock to lock access to
  linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
  is attached to the hash.
  - Use LLE_LINKED to avoid double unlinking via consequent
    calls to llentry_free().
  - Mark lle with LLE_DELETED via |= operation istead of =,
    so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
  consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR:		kern/165863
Submitted by:	Andrey Zonov <andrey zonov.org>
Submitted by:	Ryan Stone <rysto32 gmail.com>
Submitted by:	Eric van Gyzen <eric_van_gyzen dell.com>
2012-08-02 13:57:49 +00:00
Gleb Smirnoff
b9aee262e5 Some more whitespace cleanup. 2012-08-01 09:00:26 +00:00
Gleb Smirnoff
ea50c13ebe Some style(9) and whitespace changes.
Together with:	Andrey Zonov <andrey zonov.org>
2012-07-31 11:31:12 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Gleb Smirnoff
90b357f6ec M_DONTWAIT is a flag from historical mbuf(9)
allocator, not malloc(9) or uma(9) flag.
2012-04-10 06:52:39 +00:00
Kip Macy
a93cda789a When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after:	2 weeks
2012-02-23 18:21:37 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Gleb Smirnoff
56cf9dc1f6 Drop support for SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFBRDADDR, SIOCSIFDSTADDR
ioctl commands.

PR:		163524
Reviewed by:	net
2012-01-16 09:53:24 +00:00
John Baldwin
137f91e80f Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
John Baldwin
0f188ebb1d Use a helper variable to wrap a long line. 2012-01-04 13:29:26 +00:00
John Baldwin
56d6e1292d In the handling of the SIOC[DG]LIFADDR icotls in in_lifaddr_ioctl(), add
missing interface address list locking and grab a reference on the
matching interface address after dropping the lock while it is used to
avoid a potential use after free.

Reviewed by:	bz
MFC after:	1 week
2012-01-04 13:26:56 +00:00
John Baldwin
0823c29b81 Fix the SIOC[DG]LIFADDR ioctls in in_lifaddr_ioctl() to work with IPv4
interface address rather than IPv6.

Submitted by:	hrs
Reviewed by:	bz
MFC after:	1 week
2012-01-04 13:23:51 +00:00
Gleb Smirnoff
7121247312 Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by:	brooks
2011-12-21 12:39:08 +00:00
Gleb Smirnoff
92ed4e1a24 Since size of struct in_aliasreq has just been changed in r228571,
and thus ifconfig(8) needs recompile, it is a good chance to make
parameter checks on SIOCAIFADDR arguments more strict.
2011-12-16 13:30:17 +00:00
Gleb Smirnoff
08b68b0e4c A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
Gleb Smirnoff
55174c34ef Belatedly catch up with r151555. in_scrubprefix() also needs this fix. We
should compare not only addresses, but their masks, too, when searching
for matching prefix.
2011-12-13 06:56:43 +00:00
Gleb Smirnoff
f769e5b0fa Fix a very special case when SIOCAIFADDR supplies mask of 0.0.0.0,
don't overwrite the mask with autoguessing based on classes.
2011-12-06 20:55:20 +00:00
Gleb Smirnoff
89b9325530 Fix one more fallout from r227791: do not overwrite trimmed sa_len
on the ia_sockmask when doing SIOCSIFNETMASK.

Reported by:	Stefan Bethke <stb lassitu.de>, gonzo
Pointy hat to:	glebius
2011-11-28 13:30:14 +00:00
Gleb Smirnoff
c6e5c71116 Remove superfluous check: SIOCAIFADDR must have ifra_addr supplied. 2011-11-24 22:46:11 +00:00
Gleb Smirnoff
bd47ae58a6 Fix stupid typo in r227830.
PR:		162806
Pointy hat to:	glebius
2011-11-24 22:43:48 +00:00
Gleb Smirnoff
e278f44bb5 style(9) nit 2011-11-22 19:39:27 +00:00
Gleb Smirnoff
bbaa3f944e Fix SIOCDIFADDR semantics: if no address is specified, then delete first one. 2011-11-22 19:37:57 +00:00
Gleb Smirnoff
cf00e5c6b7 This check isn't needed now, sanity checking done in the beginning.
Missed it in last commit.
2011-11-21 20:07:12 +00:00
Gleb Smirnoff
6d00fd9c2d Historically in_control() did not check sockaddrs supplied with
structs ifreq/in_aliasreq and there've been several panics due
to that problem. All these panics were fixed just a couple of
lines above the panicing code.

Take a more general approach: sanity check sockaddrs supplied
with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the
function and drop all checks below.

One check is now disabled due to strange code in ifconfig(8)
that I've removed recently. I'm going to enable it with next
__FreeBSD_version bump.

Historically in_ifinit() was able to recover from an error
and restore old address. Nowadays this feature isn't working
for all error cases, but for some of them. I suppose no software
relies on this behavior, so I'd like to remove it, since this
simplifies code a lot.

Also, move if_scrub() earlier in the in_ifinit(). It is more
correct to wipe routes before removing address from local
address list, and interface address list.

Silence from:	bz, brooks, andre, rwatson, 3 weeks
2011-11-21 14:10:13 +00:00
Qing Li
b3664a14cc Exclude host routes when checking for prefix coverage on multiple
interfaces. A host route has a NULL mask so check for that condition.
I have also been told by developers who customize the packet output
path with direct manipulation of the route entry (or the outgoing
interface to be specific). This patch checks for the route mask
explicitly to make sure custom code will not panic.

PR:		kern/161805
MFC after:	3 days
2011-10-25 04:06:29 +00:00
Gleb Smirnoff
53883e0c24 Add support for IPv4 /31 prefixes, as described in RFC3021.
To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.
2011-10-15 18:41:25 +00:00
Gleb Smirnoff
b365d954cc Remove last remnants of classful addressing:
- Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr.
- Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011.
- fix bug when we were not forwarding to a host which matches classful
  net address. For example router having 192.168.x.y/16 network attached,
  would not forward traffic to 192.168.*.0, which are legal IPs in
  CIDR world.
- For compatibility, leave autoguessing of mask based on class.

Reviewed by:	andre, bz, rwatson
2011-10-15 16:28:06 +00:00
Gleb Smirnoff
a0b5928b29 De-spl(9). 2011-10-13 13:30:41 +00:00
Qing Li
15d2521975 All indirect routes will fail the rtcheck, except for a special host
route where the destination IP and the gateway IP is the same. This
special case handling is only meant for backward compatibility reason.
The last commit introduced a bug in the route check logic, where a
valid special case is treated as an error. This patch fixes that bug
along with some code cleanup.

Suggested by:	gleb
Reviewed by:	kmacy, discussed with gleb
MFC after:	1 day
2011-10-10 17:41:11 +00:00
Qing Li
6703e7ea10 Do not try removing an ARP entry associated with a given interface
address if that interface does not support ARP. Otherwise the
system will generate error messages unnecessarily due to the missing
entry.

PR:		kern/159602
Submitted by:	pluknet
MFC after:	3 days
2011-10-07 22:22:19 +00:00
Qing Li
41b210c6f6 Remove the reference held on the loopback route when the interface
address is being deleted. Only the last reference holder deletes the
loopback route. All other delete operations just clear the IFA_RTSELF
flag.

PR:		kern/159601
Submitted by:	pluknet
Reviewed by:	discussed on net@
MFC after:	3 days
2011-10-07 18:01:34 +00:00
Qing Li
db92413e6a A system may have multiple physical interfaces, all of which are on the
same prefix. Since a single route entry is installed for the prefix
(without RADIX_MPATH), incoming packets on the interfaces that are not
associated with the prefix route may trigger an error message about
unable to allocation LLE entry, and fails L2. This patch makes sure a
valid route is present in the system, and allow the aforementioned
condition to exist and treats as valid.

Reviewed by:	bz
MFC after:	5 days
2011-10-03 19:51:18 +00:00
Qing Li
6cf8e3300e This patch allows ARP to work properly in the presence of
self-referencing routes. This patch is a rework of r223862.

Reviewed by:	bz, zec
MFC after:	5 days
2011-10-03 19:06:55 +00:00
Qing Li
1184509858 When an interface address route is removed from the system, another
route with the same prefix is searched for as a replacement. The
current code did not bypass routes that have non-operational
interfaces. This patch fixes that bug and will find a replacement
route with an active interface.

PR:		kern/159603
Submitted by:	pluknet, ambrisko at ambrisko dot com
Reviewed by:	discussed on net@
Approved by:	re (bz)
MFC after:	3 days
2011-08-28 00:14:40 +00:00
Kevin Lo
7236660627 If RTF_HOST flag is specified, then we are interested in destination
address.

PR:		kern/159600
Submitted by:	Svatopluk Kraus <onwahe at gmail dot com>
Approved by:	re (hrs)
2011-08-10 06:17:06 +00:00
Marko Zec
13e255fab7 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
Qing Li
92322284cd Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface
address so that proper clean up will take place in the routing code.
This patch fixes the bootp panic on startup problem. Also, added more
error handling and logging code in function in_scrubprefix().

MFC after:	5 days
2011-05-29 02:21:35 +00:00
Qing Li
5b84dc789a The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by:	delphij
MFC after:	5 days
2011-05-20 19:12:20 +00:00
Sergey Kandaurov
79d514355c Reference ifaddr object before unlocking as it can be freed
from another context at the moment of later access.

PR:		kern/155555
Submitted by:	Andrew Boyer <aboyer att averesystems.com>
Approved by:	avg (mentor)
MFC after:	2 weeks
2011-03-21 14:19:40 +00:00
Gleb Smirnoff
a98c06f1c8 Use time_uptime instead of non-monotonic time_second to drive ARP
timeouts.

Suggested by:	bde
2010-11-30 15:57:00 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
George V. Neville-Neil
e162ea60d4 Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry.  This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out.  The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by:	bz, rpaulo
MFC after: 3 weeks
2010-11-12 22:03:02 +00:00
Bjoern A. Zeeb
12112cf676 MfP4 CH182763 (original version):
Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.

(*) It is believed that with the current code and locking strategy we
    cannot completely fix all race.

Reported by:	Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by:	Nima Misaghian (nima_misa hotmail.com) (original version)
PR:		kern/146250
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com) (different version)
MFC after:	1 week
2010-10-16 19:53:22 +00:00
Bjoern A. Zeeb
42db1b87d6 In case of RADIX_MPATH do not leak the IN_IFADDR read lock on
early return.

MFC after:	3 days
2010-09-04 16:06:01 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Qing Li
0ed6142b31 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Qing Li
c7ea0aa648 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
Qing Li
d577d18a00 Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.

Verified by:	avg
MFC after:	5 days
2010-02-02 20:38:30 +00:00
Qing Li
646c800540 Ensure an address is removed from the interface address
list when the installation of that address fails.

PR:		139559
2010-01-08 17:49:24 +00:00
Qing Li
ccbb9c359d Consolidate the route message generation code for when address
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.

MFC after:	5 days
2009-12-30 22:13:01 +00:00
Qing Li
c7ab66020f The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00