LLE structure is mostly unchanged during its lifecycle.
To be more specific, there are 2 things relevant for fast path
lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we re-send arp request to perform reachability verification instead of
expiring entry. The only signal that is needed from fast path is something
like binary yes/no.
The latter is solved by the following changes:
1) introduce special r_skip_req field which is read lockless by fast path,
but updated under (new) req_mutex mutex. If this field is non-zero, then
fast path will acquire lock and set it back to 0.
2) introduce simple state machine: incomplete->reachable<->verify->deleted.
Before that we implicitely had incomplete->reachable->deleted state machine,
with V_arpt_keep between "reachable" and "deleted". Verification was performed
in runtime 5 seconds before V_arpt_keep expire.
This is changed to "change state to verify 5 seconds before V_arpt_keep,
set r_skip_req to non-zero value and check it every second". If the value
is zero - then send arp verification probe.
These changes do not introduce any signifficant control plane overhead:
typically lle callout timer would fire 1 time more each V_arpt_keep (1200s)
for used lles and up to arp_maxtries (5) for dead lles.
As a result, all packets towards "reachable" lle are handled by fast path without
acquiring lle read lock.
Additional "req_mutex" is needed because callout / arpresolve_slow() or eventhandler
might keep LLE lock for signifficant amount of time, which might not be feasible
for fast path locking (e.g. having rmlock as ether AFDATA or lltable own lock).
Differential Revision: https://reviews.freebsd.org/D3688
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.
MFC after: 1 Month with associated callout change.
down state.
Regression appeared in r287789, where the "prefix has no corresponding
installed route" case was forgotten. Additionally, lltable_delete_addr()
was called with incorrect byte order (default is network for lltable code).
While here, improve comments on given cases and byte order.
PR: 203573
Submitted by: phk
instead of old "ignore-and-return 0" in r287789. This broke arp -da /
ndp -cn behavior (they exit on rtsock command failure). Fix this by
translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and
making arp/ndp ignore these entries in batched delete.
MFC after: 2 weeks
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3573
Before that, the logic besides lle_create() was the following:
return existing if found, create if not. This behaviour was error-prone
since we had to deal with 'sudden' static<>dynamic lle changes.
This commit fixes bunch of different issues like:
- refcount leak when lle is converted to static.
Simple check case:
console 1:
while true;
do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
done;
done
console 2:
ping -f any-dead-host-in-L2
console 3:
# watch for memory consumption:
vmstat -m | awk '$1~/lltable/{print$2}'
- possible problems in arptimer() / nd6_timer() when dropping/reacquiring
lock.
New logic explicitly handles use-or-create cases in every lla_create
user. Basically, most of the changes are purely mechanical. However,
we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
lle_free_t callback for entry deletion
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.
struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]
Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).
Sponsored by: Yandex LLC
* Split lltable_init() into lltable_allocate_htbl() (alloc
hash table with default callbacks) and lltable_link() (
links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
differences between projects/routing and HEAD.
This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.
* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
format (code is not indented properly to minimize diff). Will be fixed
in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.
* Prepare for fine-grained locking by separating lle unlink and deletion in
lltable_free() and lltable_prefix_free().
* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
access by external callers.
* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.
Reviewed by: ae
* Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.
Reviewed by: ae
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.
Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.
Commented upon by hiren and sbruno
See Phabricator D1777 for more details.
Commented upon by hiren and sbruno
Reviewed by: adrian, jhb and bz
Sponsored by: Netflix Inc.
* Make most of lltable_* methods 'normal' functions instead of inline
* Add lltable_get_<af|ifp>() functions to access given lltable fields
* Temporarily resurrect nd6_lookup() function
use llt_fill_sa_entry() llt method to store lle address in sa.
* Eliminate L3_ADDR macro and either reference IPv4/IPv6 address
directly from lle or use newly-created llt_fill_sa_entry().
* Do not store sockaddr inside arp/ndp lle anymore.
and explicit calls to RTENTRY_FREE_LOCKED()
* Use lltable_prefix_free() in arp_ifscrub to be consistent with nd6.
* Rename <lltable_|llt>_delete function to _delete_addr() to note that
this function is used to external callers. Make this function maintain
its own locking.
* Use lookup/unlink/clear call chain from internal callers instead of
delete_addr.
* Fix LLE_DELETED flag handling
cleanup including unlinking/freeing
* Relax locking in lltable_prefix_free_af/lltable_free
* Do not pass @llt to lle free callback: it is always NULL now.
* Unify arptimer/nd6_llinfo_timer: explicitly unlock lle avoiding
unlock/lock sequinces
* Do not pass unlocked lle to nd6_ns_output(): add nd6_llinfo_get_holdsrc()
to retrieve preferred source address from lle hold queue and pass it
instead of lle.
* Finally, make nd6_create() create and return unlocked lle
* Separate defrtr handling code from nd6_free():
use nd6_check_del_defrtr() to check if we need to keep entry instead of
performing GC,
use nd6_check_recalc_defrtr() to perform actual recalc on lle removal.
* Move isRouter handling from nd6_cache_lladdr() to separate
nd6_check_router()
* Add initial code to maintain lle runtime flags in sync.
does actual new lle creation without extensive locking and existing
lle search.
Move lle updating code from gigantic in_arpinput() to arp_update_llle()
and some other functions.
IPv6 changes to follow.
use rwlock accessible via external functions
(IF_AFDATA_CFG_* -> if_afdata_cfg_*()) for all control plane tasks
use rmlock (IF_AFDATA_RUN_*) for fast-path lookups.
use rwlock accessible via external functions
(IN_IFADDR_CFG_* -> in_ifaddr_cfg_*()) for all control plane tasks
use rmlock (IN_IFADDR_RUN_*) for fast-path lookups.
* struct llentry is now basically split into 2 pieces:
all fields within 64 bytes (amd64) are now protected by both
ifdata lock AND lle lock, e.g. you require both locks to be held
exclusively for modification. All data necessary for fast path
operations is kept here. Some fields were added:
- r_l3addr - makes lookup key liev within first 64 bytes.
- r_flags - flags, containing pre-compiled decision whether given
lle contains usable data or not. Current the only flag is RLLE_VALID.
- r_len - prepend data len, currently unused
- r_kick - used to provide feedback to control plane (see below).
All other fields are protected by lle lock.
* Add simple state machine for ARP to handle "about to expire" case:
Current model (for the fast path) is the following:
- rlock afdata
- find / rlock rte
- runlock afdata
- see if "expire time" is approaching
(time_uptime + la->la_preempt > la->la_expire)
- if true, call arprequest() and decrease la_preempt
- store MAC and runlock rte
New model (data plane):
- rlock afdata
- find rte
- check if it can be used using r_* fields only
- if true, store MAC
- if r_kick field != 0 set it to 0.
- runlock afdata
New mode (control plane):
- schedule arptimer to be called in (V_arpt_keep - V_arp_maxtries)
seconds instead of V_arpt_keep.
- on first timer invocation change state from ARP_LLINFO_REACHABLE
to ARP_LLINFO_VERIFY, sets r_kick to 1 and shedules next call in
V_arpt_rexmit (default to 1 sec).
- on subsequent timer invocations in ARP_LLINFO_VERIFY state, checks
for r_kick value: reschedule if not changed, and send arprequest()
if set to zero (e.g. entry was used).
* Convert IPv4 path to use new single-lock approach. IPv6 bits to follow.
* Slow down in_arpinput(): now valid reply will (in most cases) require
acquiring afdata WLOCK twice. This is requirement for storing changed
lle data. This change will be slightly optimized in future.
* Provide explicit hash link/unlink functions for both ipv4/ipv6 code.
This will probably be moved to generic lle code once we have per-AF
hashing callback inside lltable.
* Perform lle unlink on deletion immediately instead of delaying it to
the timer routine.
* Make r244183 more explicit: use new LLE_CALLOUTREF flag to indicate the
presence of lle reference used for safe callout calls.
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
Assume lla_create to return LLE_EXCLUSIVE lock for lle.
* Rework lla_rt_output to perform all lle changes under afdata WLOCK.
* change arp_ifscrub() ackquire afdata WLOCK, the same as arp_ifinit().
is found, the first usable address is returned for legacy ioctls like
SIOCGIFBRDADDR, SIOCGIFDSTADDR, SIOCGIFNETMASK and SIOCGIFADDR.
While there also fix a subtle issue that a caller from a jail asking for
INADDR_ANY may get the first IP of the host that do not belong to the jail.
Submitted by: glebius
Differential Revision: https://reviews.freebsd.org/D667
Previously there was a race condition between the address addition
and associating it with the CARP which resulted in the interface
MAC, instead of the CARP MAC, being used for a brief amount of time.
This caused "is using my IP address" warnings as well as data being
sent to the wrong machine due to incorrect ARP entries being recorded
by other devices on the network.
exists on another interface. The panic was introduced by change 264887, which
changed the fibnum parameter in the call to rtalloc1_fib() in
ifa_switch_loopback_route() from RT_DEFAULT_FIB to RT_ALL_FIBS. The solution
is to use the interface fib in that call. For the majority of users, that will
be equivalent to the legacy behavior.
PR: kern/189089
Reported by: neel
Reviewed by: neel
MFC after: 3 weeks
X-MFC with: 264887
Sponsored by: Spectra Logic