use rwlock accessible via external functions
(IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.
use rwlock accessible via external functions
(IN_IFADDR_CFG_* -> in_ifaddr_cfg_*()) for all control plane tasks
use rmlock (IN_IFADDR_RUN_*) for fast-path lookups.
* struct llentry is now basically split into 2 pieces:
all fields within 64 bytes (amd64) are now protected by both
ifdata lock AND lle lock, e.g. you require both locks to be held
exclusively for modification. All data necessary for fast path
operations is kept here. Some fields were added:
- r_l3addr - makes lookup key liev within first 64 bytes.
- r_flags - flags, containing pre-compiled decision whether given
lle contains usable data or not. Current the only flag is RLLE_VALID.
- r_len - prepend data len, currently unused
- r_kick - used to provide feedback to control plane (see below).
All other fields are protected by lle lock.
* Add simple state machine for ARP to handle "about to expire" case:
Current model (for the fast path) is the following:
- rlock afdata
- find / rlock rte
- runlock afdata
- see if "expire time" is approaching
(time_uptime + la->la_preempt > la->la_expire)
- if true, call arprequest() and decrease la_preempt
- store MAC and runlock rte
New model (data plane):
- rlock afdata
- find rte
- check if it can be used using r_* fields only
- if true, store MAC
- if r_kick field != 0 set it to 0.
- runlock afdata
New mode (control plane):
- schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries)
seconds instead of V_arpt_keep.
- on first timer invocation change state from ARP_LLINFO_REACHABLE
to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in
V_arpt_rexmit (default to 1 sec).
- on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks
for r_kick value: reschedule if not changed, and send arprequest()
if set to zero (e.g. entry was used).
* Convert IPv4 path to use new single-lock approach. IPv6 bits to follow.
* Slow down in_arpinput(): now valid reply will (in most cases) require
acquiring afdata WLOCK twice. This is requirement for storing changed
lle data. This change will be slightly optimized in future.
* Provide explicit hash link/unlink functions for both ipv4/ipv6 code.
This will probably be moved to generic lle code once we have per-AF
hashing callback inside lltable.
* Perform lle unlink on deletion immediately instead of delaying it to
the timer routine.
* Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the
presence of lle reference used for safe callout calls.
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().
is found, the first usable address is returned for legacy ioctls like
SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR.
While there also fix a subtle issue that a caller from a jail asking for
INADDR_ANY may get the first IP of the host that do not belong to the jail.
Submitted by: glebius
Differential Revision: https://reviews.freebsd.org/D667
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.
This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
exists on another interface. The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution
is to use the interface fib in that call. For the majority of users, that will
be equivalent to the legacy behavior.
PR: kern/189089
Reported by: neel
Reviewed by: neel
MFC after: 3 weeks
X-MFC with: 264887
Sponsored by: Spectra Logic
These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.
sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.
sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.
sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.
tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.
Clear expected failure messages for kern/187550 and kern/187552.
PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic
sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.
tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure
sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib
sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.
PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
restricted to a single FIB in a multifib system.
Restricting an interface's routes to the FIB to which it is assigned (by
setting net.add_addr_allfibs=0) causes ARP updates to fail with "arpresolve:
can't allocate llinfo for x.x.x.x". This is due to the ARP update code hard
coding it's lookup for existing routing entries to FIB 0.
sys/netinet/in.c:
When dealing with RTM_ADD (add route) requests for an interface, use
the interface's assigned FIB instead of the default (FIB 0).
sys/netinet/if_ether.c:
In arpresolve(), enhance error message generated when an
lla_lookup() fails so that the interface causing the error is
visible in logs.
tests/sys/netinet/fibs_test.sh
Clear ATF expected error.
PR: kern/167947
Submitted by: Nikolay Denev <ndenev@gmail.com> (previous version)
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
has the same prefix as some other alias on the same interface, use
newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().
This eliminates the following rtsock messages:
Pinned RTM_ADD for prefix (for alias addition).
Pinned RTM_DELETE for prefix (for alias withdrawal).
Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):
before commit, addition:
got message of size 116 on Fri Jan 10 14:13:15 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255
got message of size 192 on Fri Jan 10 14:13:15 2014
RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff
after commit, addition:
got message of size 116 on Fri Jan 10 13:56:26 2014
RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255
before commit, wihdrawal:
got message of size 192 on Fri Jan 10 13:58:59 2014
RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
locks: inits:
sockaddrs: <DST,GATEWAY,NETMASK>
10.0.0.0 10.0.0.2 (255) ffff ffff ff
got message of size 116 on Fri Jan 10 13:58:59 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255
adter commit, withdrawal:
got message of size 116 on Fri Jan 10 14:14:11 2014
RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
sockaddrs: <NETMASK,IFP,IFA,BRD>
255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255
Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
(and requires some hacks to keep prefix in route table on RTM_DELETE).
I've tested this change with quagga (no change) and bird (*).
bird alias handling is already broken in *BSD sysdep code, so nothing
changes here, too.
I'm going to MFC this change if there will be no complains about behavior
change.
While here, fix some style(9) bugs introduced by r260488
(pointed by glebius and bde).
Sponsored by: Yandex LLC
MFC after: 4 weeks
an interface:
- in in_control() skip over not AF_INET addresses.
- in in_aifaddr_ioctl() and in_difaddr_ioctl() do correct check
of address family, w/o accessing memory beyond struct ifaddr.
Sponsored by: Nginx, Inc.
o Provide separate functions for SIOCAIFADDR and for SIOCDIFADDR, with
clear code flow from beginning to the end. After that the rest of
in_control() gets very small and clear.
o Provide sx(9) lock to protect against parallel ioctl() invocations.
o Reimplement logic from r201282, that tried to keep localhost route in
table when multiple P2P interfaces with same local address are created
and deleted.
Discussed with: pluknet, melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.
New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.
The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.
Reviewed by: ru, andre, net@
all interested parties in case if interface flag IFF_UP has changed.
However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.
This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.
P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
that we still have a problem with this whole structure of
locks and in_input.c [it does not lock which it should not, but
this *can* lead to crashes]. (I have seen it in our SQA
testbed.. besides the one with a refcnt issue that I will
have SQA work on next week ;-)
assure that *all* tables and such are removed before
we start to free. This won't protect the Hash in ip_input.c
but in theory should protect any other uses that *do* use locks.
MFC after: 1 week (or more)
llentry_free() and arptimer():
o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.
The patch is a collaborative work of all submitters and myself.
PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.
- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.
Build-tested with make universe.
30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE
Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe
Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe
Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.
Move the free pointer in to the llentry itself and update the initalization sites.
MFC after: 2 weeks
the original IPv4 implementation from r178888:
- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
ease MFCs. Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
are locked to the default FIB. Individual IPv6 addresses will only
appear in the default FIB, however redirect information and prefixes
of connected subnets are automatically propagated to all FIBs by
default (mimicking IPv4 behavior as closely as possible).
Sponsored by: Cisco Systems, Inc.
missing interface address list locking and grab a reference on the
matching interface address after dropping the lock while it is used to
avoid a potential use after free.
Reviewed by: bz
MFC after: 1 week
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.
The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.
ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.
To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]
The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.
Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!
PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
structs ifreq/in_aliasreq and there've been several panics due
to that problem. All these panics were fixed just a couple of
lines above the panicing code.
Take a more general approach: sanity check sockaddrs supplied
with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the
function and drop all checks below.
One check is now disabled due to strange code in ifconfig(8)
that I've removed recently. I'm going to enable it with next
__FreeBSD_version bump.
Historically in_ifinit() was able to recover from an error
and restore old address. Nowadays this feature isn't working
for all error cases, but for some of them. I suppose no software
relies on this behavior, so I'd like to remove it, since this
simplifies code a lot.
Also, move if_scrub() earlier in the in_ifinit(). It is more
correct to wipe routes before removing address from local
address list, and interface address list.
Silence from: bz, brooks, andre, rwatson, 3 weeks
interfaces. A host route has a NULL mask so check for that condition.
I have also been told by developers who customize the packet output
path with direct manipulation of the route entry (or the outgoing
interface to be specific). This patch checks for the route mask
explicitly to make sure custom code will not panic.
PR: kern/161805
MFC after: 3 days
To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.
- Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr.
- Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011.
- fix bug when we were not forwarding to a host which matches classful
net address. For example router having 192.168.x.y/16 network attached,
would not forward traffic to 192.168.*.0, which are legal IPs in
CIDR world.
- For compatibility, leave autoguessing of mask based on class.
Reviewed by: andre, bz, rwatson
route where the destination IP and the gateway IP is the same. This
special case handling is only meant for backward compatibility reason.
The last commit introduced a bug in the route check logic, where a
valid special case is treated as an error. This patch fixes that bug
along with some code cleanup.
Suggested by: gleb
Reviewed by: kmacy, discussed with gleb
MFC after: 1 day
address if that interface does not support ARP. Otherwise the
system will generate error messages unnecessarily due to the missing
entry.
PR: kern/159602
Submitted by: pluknet
MFC after: 3 days
address is being deleted. Only the last reference holder deletes the
loopback route. All other delete operations just clear the IFA_RTSELF
flag.
PR: kern/159601
Submitted by: pluknet
Reviewed by: discussed on net@
MFC after: 3 days
same prefix. Since a single route entry is installed for the prefix
(without RADIX_MPATH), incoming packets on the interfaces that are not
associated with the prefix route may trigger an error message about
unable to allocation LLE entry, and fails L2. This patch makes sure a
valid route is present in the system, and allow the aforementioned
condition to exist and treats as valid.
Reviewed by: bz
MFC after: 5 days
route with the same prefix is searched for as a replacement. The
current code did not bypass routes that have non-operational
interfaces. This patch fixes that bug and will find a replacement
route with an active interface.
PR: kern/159603
Submitted by: pluknet, ambrisko at ambrisko dot com
Reviewed by: discussed on net@
Approved by: re (bz)
MFC after: 3 days
same as the host address. This already works fine for INET6 and ND6.
While here, remove two function pointers from struct lltable which are
only initialized but never used.
MFC after: 3 days
address so that proper clean up will take place in the routing code.
This patch fixes the bootp panic on startup problem. Also, added more
error handling and logging code in function in_scrubprefix().
MFC after: 5 days
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.
Reviewed by: delphij
MFC after: 5 days
from another context at the moment of later access.
PR: kern/155555
Submitted by: Andrew Boyer <aboyer att averesystems.com>
Approved by: avg (mentor)
MFC after: 2 weeks
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.
This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.
Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..
Reviewed by: bz, rpaulo
MFC after: 3 weeks
Make it harder to exploit certain in_control() related races between the
intiial lookup at the beginning and the time we will remove the entry
from the lists by re-checking that entry is still in the list before
trying to remove it.
(*) It is believed that with the current code and locking strategy we
cannot completely fix all race.
Reported by: Nima Misaghian (nima_misa hotmail.com) on net@ 20100817
Tested by: Nima Misaghian (nima_misa hotmail.com) (original version)
PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com) (different version)
MFC after: 1 week
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.
Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.
This commit requires IP6PROTOSPACER, introduced in r211115.
Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.
Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively
static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.
Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days
prevented the link-layer entry from being freed.
In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.
In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.
In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().
In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.
Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.
The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.
Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
MFC after: 5 days
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.
Verified by: avg
MFC after: 5 days
aliases were added or deleted. The announced route entry for
an address alias is no longer empty because this empty route
entry was causing some route daemon to fail and exit abnormally.
MFC after: 5 days
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.
MFC after: 5 days