Before that, the logic besides lle_create() was the following:
return existing if found, create if not. This behaviour was error-prone
since we had to deal with 'sudden' static<>dynamic lle changes.
This commit fixes bunch of different issues like:
- refcount leak when lle is converted to static.
Simple check case:
console 1:
while true;
do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
done;
done
console 2:
ping -f any-dead-host-in-L2
console 3:
# watch for memory consumption:
vmstat -m | awk '$1~/lltable/{print$2}'
- possible problems in arptimer() / nd6_timer() when dropping/reacquiring
lock.
New logic explicitly handles use-or-create cases in every lla_create
user. Basically, most of the changes are purely mechanical. However,
we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
lle_free_t callback for entry deletion
This change isolates the most common case (e.g. successful lookup)
from more complicates scenarios. It also (tries to) make code
more simple by avoiding retry: cycle.
The actual goal is to prepare code to the upcoming change that will
allow LL address retrieval without acquiring LLE lock at all.
Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D3383
separate bunch of functions. The goal is to isolate actual lle
updates to permit more fine-grained locking.
Do all lle link-level update under AFDATA wlock.
Sponsored by: Yandex LLC
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.
struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]
Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).
Sponsored by: Yandex LLC
* Split lltable_init() into lltable_allocate_htbl() (alloc
hash table with default callbacks) and lltable_link() (
links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
differences between projects/routing and HEAD.
This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.
* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
format (code is not indented properly to minimize diff). Will be fixed
in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.
* Prepare for fine-grained locking by separating lle unlink and deletion in
lltable_free() and lltable_prefix_free().
* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
access by external callers.
* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.
Reviewed by: ae
This fixes a panic during 'sysctl -a' on VIMAGE kernels.
The tcp_reass_zone variable is not VNET_DEFINE() so we can not mark it as a VNET
variable (with CTLFLAG_VNET).
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
to use new rt_foreach_fib() instead of hand-rolling cycles.
* Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.
Reviewed by: ae
Do not pass 'dst' sockaddr to ip[6]_mloopback:
- We have explicit check for AF_INET in ip_output()
- We assume ip header inside passed mbuf in ip_mloopback
- We assume ip6 header inside passed mbuf in ip6_mloopback
Avoid too strict INP_INFO_RLOCK_ASSERT checks due to
tcp_notify() being called from in6_pcbnotify().
Reported by: Larry Rosenman <ler@lerctr.org>
Submitted by: markj, jch
- The existing TCP INP_INFO lock continues to protect the global inpcb list
stability during full list traversal (e.g. tcp_pcblist()).
- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
and free) and inpcb global counters.
It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.
PR: 183659
Differential Revision: https://reviews.freebsd.org/D2599
Reviewed by: jhb, adrian
Tested by: adrian, nitroboost-gmail.com
Sponsored by: Verisign, Inc.
When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues.
This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route.
Differential Revision: https://reviews.freebsd.org/D3037
Reviewed by: gnn
Approved by: gnn(mentor)
ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it.
Gather all this code together to make it readable and properly handle the reloop cases.
Some of the issues identified:
M_IP_NEXTHOP is not handled properly in existing code.
route reference leaking is possible with in FIB number change
route flags checking is not consistent in the function
Differential Revision: https://reviews.freebsd.org/D3022
Reviewed by: gnn
Approved by: gnn(mentor)
MFC after: 4 weeks
non-inline urgent data and introduce an mbuf exhaustion attack vector
similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs.
Address the issue described in FreeBSD-SA-15:15.tcp.
Reviewed by: glebius
Approved by: so
Approved by: jmallett (mentor)
Security: FreeBSD-SA-15:15.tcp
Sponsored by: Norse Corp, Inc.
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.
Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.
Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
1) We were not handling (or sending) the IN_PROGRESS case if
the other side (or our side) was not able to reset (awaiting more data).
2) We would improperly send a stream-reset when we should not. Not
waiting until the TSN had been assigned when data was inqueue.
Reviewed by: tuexen
lock on the INP before calling the tunnel protocol, else a LOR
may occur (it does with SCTP for sure). Instead we must acquire a
ref count and release the lock, taking care to allow for the case
where the UDP socket has gone away and *not* unlocking since the
refcnt decrement on the inp will do the unlock in that case.
Reviewed by: tuexen
MFC after: 3 weeks
a set of differentiated services, set IPTOS_PREC_* macros using
IPTOS_DSCP_* macro definitions.
While here, add IPTOS_DSCP_VA macro according to RFC 5865.
Differential Revision: https://reviews.freebsd.org/D3119
Reviewed by: gnn
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.
Differential Revision: https://reviews.freebsd.org/D3060
Reviewed by: hiren
Approved by: jmallett (mentor)
MFC after: 3 days
Sponsored by: Norse Corp, Inc.
apparently neither clang nor gcc complain about this.
But clang intis the var to NULL correctly while gcc on at least mips does not.
Correct the undefined behavior by initializing the variable properly.
PR: 201371
Differential Revision: https://reviews.freebsd.org/D3036
Reviewed by: gnn
Approved by: gnn(mentor)
ip_forward() does a route lookup for testing this packet can be sent to a known destination,
it also can do another route lookup if it detects that an ICMP redirect is needed,
it forgets all of this and handovers to ip_output() to do the same lookup yet again.
This optimisation just does one route lookup during the forwarding path and handovers that to be considered by ip_output().
Differential Revision: https://reviews.freebsd.org/D2964
Approved by: ae, gnn(mentor)
MFC after: 1 week
condition.
If you send a 0-length packet, but there is data is the socket buffer, and
neither the rexmt or persist timer is already set, then activate the persist
timer.
PR: 192599
Differential Revision: D2946
Submitted by: jlott at averesystems dot com
Reviewed by: jhb, jch, gnn, hiren
Tested by: jlott at averesystems dot com, jch
MFC after: 2 weeks