The disgusting macro INP_WLOCK_RECHECK may early-return. In
tcp_default_ctloutput() the TCP_CCALGOOPT case allocates memory before invoking
this macro, which may leak memory.
Add a _CLEANUP variant that takes a code argument to perform variable cleanup
in the early return path. Use it to free the 'pbuf' allocated in
tcp_default_ctloutput().
I am not especially happy with this macro, but I reckon it's not any worse than
INP_WLOCK_RECHECK already was.
Reported by: Coverity
CID: 1350286
Sponsored by: EMC / Isilon Storage Division
tp->snd_wnd. This can happen, for example, when the remote side responds to
a window probe by ACKing the one byte it contains.
Differential Revision: https://reviews.freebsd.org/D5625
Reviewed by: hiren
Obtained from: Juniper Networks (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks
It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
containing an INIT chunk. These need to be handled in case the peer
does not support SCTP and returns an ICMP messages indicating destination
unreachable, protocol unreachable.
MFC after: 1 week
the outer IP header, the ICMP header, the inner IP header and the
first n bytes are stored in contgous memory. The ctlinput functions
currently rely on this for n = 8. This fixes a bug in case the inner IP
header had options.
While there, remove the options from the outer header and provide a
way to increase n to allow improved ICMP handling for SCTP. This will
be added in another commit.
MFC after: 1 week
It does not cause any real issues because the variable is overwritten
only when the packet is forwarded (and the variable is not used anymore).
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications (Netgate)
is required to check the verification tag. However, this
requires the verification tag to be not 0. Enforce this.
For packets with a verification tag of 0, we need to
check it it contains an INIT chunk and use the initiate
tag for the validation. This will be a separate commit,
since it touches also other code.
MFC after: 1 week
It looks like as with the safety belt of DELAY() fastened (*) we can
completely tear down and free all memory for TCP (after r281599).
(*) in theory a few ticks should be good enough to make sure the timers
are all really gone. Could we use a better matric here and check a
tcbcb count as an optimization?
PR: 164763
Reviewed by: gnn, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5734
The tcp_inpcb (pcbinfo) zone should be safe to destroy.
PR: 164763
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5732
We attach the "counter" to the tcpcbs. Thus don't free the
TCP Fastopen zone before the tcpcbs are gone, as otherwise
the zone won't be empty.
With that it should be safe to destroy the "tfo" zone without
leaking the memory.
PR: 164763
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5731
While there is no dependency interaction, stopping the timer before
freeing the rest of the resources seems more natural and avoids it
being scheduled an extra time when it is no longer needed.
Reviewed by: gnn, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5733
No need to keep type stability on raw sockets zone.
We've also been running with a KASSERT since r222488 to make sure the
ipi_count is 0 on destroy.
PR: 164763
Reviewed by: gnn
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5735
adds the new I-Data (Interleaved Data) message. This allows a user
to be able to have complete freedom from Head Of Line blocking that
was previously there due to the in-ability to send multiple large
messages without the TSN's being in sequence. The code as been
tested with Michaels various packet drill scripts as well as
inter-networking between the IETF's location in Argentina and Germany.
This is kinda critical to the performance when the CPU is slow and
network bandwidth is high, e.g. in the hypervisor.
Reviewed by: rrs, gallatin, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5765
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c
Reviewed by: gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5725
So that callers could react accordingly.
Reviewed by: gallatin (no objection)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5695
- properly V_irtualise variable access unbreaking VIMAGE kernels.
- remove the volatile from the function return type to make architecture
using gcc happy [-Wreturn-type]
"type qualifiers ignored on function return type"
I am not entirely happy with this solution putting the u_int there
but it will do for now.
route caching for TCP, with some improvements. In particular, invalidate
the route cache if a new route is added, which might be a better match.
The cache is automatically invalidated if the old route is deleted.
Submitted by: Mike Karels
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4306
Furthermore, there is no reason this needs to be a 64-bit integer
for the forseeable future.
Also, there is an inconsistency between to_flags and the mask in
tcp_addoptions(). Before r195654, to_flags was a u_long and the mask in
tcp_addoptions() was a u_int. r195654 changed to_flags to be a u_int64_t
but left the mask in tcp_addoptions() as a u_int, meaning that these
variables will only be the same width on platforms with 64-bit integers.
Convert both to_flags and the mask in tcp_addoptions() to be explicitly
32-bit variables. This may save a few cycles on 32-bit platforms, and
avoids unnecessarily mixing types.
Differential Revision: https://reviews.freebsd.org/D5584
Reviewed by: hiren
MFC after: 2 weeks
Sponsored by: Juniper Networks
struct tcpstat, because the structure can be zeroed out by netstat(1) -z,
and of course running connection counts shouldn't be touched.
Place running connection counts into separate array, and provide
separate read-only sysctl oid for it.
stack is not compliant with RFC 7323, which requires that TCP stacks send
a timestamp option on all packets (except, optionally, RSTs) after the
session is established.
This patch adds that support. It also adds a TCP signature option to the
packet, if appropriate.
PR: 206047
Differential Revision: https://reviews.freebsd.org/D4808
Reviewed by: hiren
MFC after: 2 weeks
Sponsored by: Juniper Networks
- Reorder variables by size
- Move initializer closer to where it is used
- Remove unneeded variable
Differential Revision: https://reviews.freebsd.org/D4808
Reviewed by: hiren
MFC after: 2 weeks
Sponsored by: Juniper Networks
for output and drop; connect didn't always fire a user probe
some probes were missing in fastpath
Submitted by: Hannes Mehnert
Sponsored by: REMS, EPSRC
Differential Revision: https://reviews.freebsd.org/D5525
included in loader.conf. It also fixes it so that no matter if some one incorrectly
specifies a load order, the lists and such will be initialized on demand at that
time so no one can make that mistake.
Reviewed by: hiren
Differential Revision: D5189
ACK aggregation limit is append count based, while the TCP data segment
aggregation limit is length based. Unless the network driver sets these
two limits, it's an NO-OP.
Reviewed by: adrian, gallatin (previous version), hselasky (previous version)
Approved by: adrian (mentor)
MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5185
Fix a panic that occurs when a vnet interface is unavailable at the time the
vnet jail referencing said interface is stopped.
Sponsored by: FIS Global, Inc.
the sign bit doesn't cause an overflow. The overflow manifests itself
as a sorting index wrap around in the middle of the sorted array,
which is not a problem for the LRO code, but might be a problem for
the logic inside qsort().
Reviewed by: gnn @
Sponsored by: Mellanox Technologies
Differential Revision: https://reviews.freebsd.org/D5239
back and harmize the use cases among RIB, IPFW, PF yet but it's also not
the scope of this work. Prevents instant panics on teardown and frees
the FIB bits again.
Sponsored by: The FreeBSD Foundation
new addresses during restart. If this is not done, restart doesn't
work when the local socket is IPv4 only and the peer uses
IPv4 and IPv6 addresses.
MFC after: 3 days.
o Return back the buf[TCP_CA_NAME_MAX] for TCP_CONGESTION,
for TCP_CCALGOOPT use dynamically allocated *pbuf.
o For SOPT_SET TCP_CONGESTION do NULL terminating of string
taking from userland.
o For SOPT_SET TCP_CONGESTION do the search for the algorithm
keeping the inpcb lock.
o For SOPT_GET TCP_CONGESTION first strlcpy() the name
holding the inpcb lock into temporary buffer, then copyout.
Together with: lstewart
60 seconds, respectively. Turn them into sysctls that can be tuned live. The
default values of 5 seconds and 60 seconds have been retained.
Submitted by: Jason Wolfe (j at nitrology dot com)
Reviewed by: gnn, rrs, hiren, bz
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D5024
There are number of radix consumers in kernel land (pf,ipfw,nfs,route)
with different requirements. In fact, first 3 don't have _any_ requirements
and first 2 does not use radix locking. On the other hand, routing
structure do have these requirements (rnh_gen, multipath, custom
to-be-added control plane functions, different locking).
Additionally, radix should not known anything about its consumers internals.
So, radix code now uses tiny 'struct radix_head' structure along with
internal 'struct radix_mask_head' instead of 'struct radix_node_head'.
Existing consumers still uses the same 'struct radix_node_head' with
slight modifications: they need to pass pointer to (embedded)
'struct radix_head' to all radix callbacks.
Routing code now uses new 'struct rib_head' with different locking macro:
RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing
information base).
New net/route_var.h header was added to hold routing subsystem internal
data. 'struct rib_head' was placed there. 'struct rtentry' will also
be moved there soon.
Saw all the printfs already.
Note: not sure the atomics are needed but without them, the condition
would never trigger, and we'd still see panics (which could have been
due to the insert race). Will work my way backwards in case this stays
stable.
Sponsored by: The FreeBSD Foundation
easier. Note: this is currently not in a usable state as certain
teardown parts are not called and the DOMAIN rework is missing.
More to come soon and find its way to head.
Obtained from: P4 //depot/user/bz/vimage/...
Sponsored by: The FreeBSD Foundation
control algorithm options. The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.
Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.
The new API doesn't yet have any in tree consumers.
The original code written by lstewart.
Reviewed by: rrs, emax
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D711
Recover the vertical space.
Sponsored by: The FreeBSD Foundation
MFC After: 3 days
Obtained from: p4 CH=180830
Reviewed by: gnn, hiren
Differential Revision: https://reviews.freebsd.org/D4898
- Add optimizing LRO wrapper which pre-sorts all incoming packets
according to the hash type and flowid. This prevents exhaustion of
the LRO entries due to too many connections at the same time.
Testing using a larger number of higher bandwidth TCP connections
showed that the incoming ACK packet aggregation rate increased from
~1.3:1 to almost 3:1. Another test showed that for a number of TCP
connections greater than 16 per hardware receive ring, where 8 TCP
connections was the LRO active entry limit, there was a significant
improvement in throughput due to being able to fully aggregate more
than 8 TCP stream. For very few very high bandwidth TCP streams, the
optimizing LRO wrapper will add CPU usage instead of reducing CPU
usage. This is expected. Network drivers which want to use the
optimizing LRO wrapper needs to call "tcp_lro_queue_mbuf()" instead
of "tcp_lro_rx()" and "tcp_lro_flush_all()" instead of
"tcp_lro_flush()". Further the LRO control structure must be
initialized using "tcp_lro_init_args()" passing a non-zero number
into the "lro_mbufs" argument.
- Make LRO statistics 64-bit. Previously 32-bit integers were used for
statistics which can be prone to wrap-around. Fix this while at it
and update all SYSCTL's which expose LRO statistics.
- Ensure all data is freed when destroying a LRO control structures,
especially leftover LRO entries.
- Reduce number of memory allocations needed when setting up a LRO
control structure by precomputing the total amount of memory needed.
- Add own memory allocation counter for LRO.
- Bump the FreeBSD version to force recompilation of all KLDs due to
change of the LRO control structure size.
Sponsored by: Mellanox Technologies
Reviewed by: gallatin, sbruno, rrs, gnn, transport
Tested by: Netflix
Differential Revision: https://reviews.freebsd.org/D4914
(RFC 2385/TCP-MD5) kernel option.
If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called,
and to.to_signature is an uninitialized stack variable. The value
is later used as write offset, which leads to writing to random
address.
Submitted by: rstone, jtl
Security: SA-16:05.tcp
Move actual rte selection process from rtalloc_mpath_fib()
to the rt_path_selectrte() function. Add public
rt_mpath_select() to use in fibX_lookup_ functions.
The only piece of information that is required is rt_flags subset.
In particular, if_loop() requires RTF_REJECT and RTF_BLACKHOLE flags
to check if this particular mbuf needs to be dropped (and what
error should be returned).
Note that if_loop() will always return EHOSTUNREACH for "reject" routes
regardless of RTF_HOST flag existence. This is due to upcoming routing
changes where RTF_HOST value won't be available as lookup result.
All other functions require RTF_GATEWAY flag to check if they need
to return EHOSTUNREACH instead of EHOSTDOWN error.
There are 11 places where non-zero 'struct route' is passed to if_output().
For most of the callers (forwarding, bpf, arp) does not care about exact
error value. In fact, the only place where this result is propagated
is ip_output(). (ip6_output() passes NULL route to nd6_output_ifp()).
Given that, add 3 new 'struct route' flags (RT_REJECT, RT_BLACKHOLE and
RT_IS_GW) and inline function (rt_update_ro_flags()) to copy necessary
rte flags to ro_flags. Call this function in ip_output() after looking up/
verifying rte.
Reviewed by: ae
Such handler should pass different set of variables, instead
of directly providing 2 locked route entries.
Given that it hasn't been really used since at least 2012, remove
current code.
Will re-add it after finishing most major routing-related changes.
Discussed with: np
and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned
up after T/TCP removal. After all permutations over the years the result is
that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum
protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if
timestamps are in action, or is equal to t_maxopd otherwise. That's a very
rough estimate of MSS reduced by options length. Throughout the code it
was used in places, where preciseness was not important, like cwnd or
ssthresh calculations.
With this change:
- t_maxopd goes away.
- t_maxseg now stores MSS not adjusted by options.
- new function tcp_maxseg() is provided, that calculates MSS reduced by
options length. The functions gives a better estimate, since it takes
into account SACK state as well.
Reviewed by: jtl
Differential Revision: https://reviews.freebsd.org/D3593
entries data in unified format.
There are control plane functions that require information other than
just next-hop data (e.g. individual rtentry fields like flags or
prefix/mask). Given that the goal is to avoid rte reference/refcounting,
re-use rt_addrinfo structure to store most rte fields. If caller wants
to retrieve key/mask or gateway (which are sockaddrs and are allocated
separately), it needs to provide sufficient-sized sockaddrs structures
w/ ther pointers saved in passed rt_addrinfo.
Convert:
* lltable new records checks (in_lltable_rtcheck(),
nd6_is_new_addr_neighbor().
* rtsock pre-add/change route check.
* IPv6 NS ND-proxy check (RADIX_MPATH code was eliminated because
1) we don't support RTF_ANNOUNCE ND-proxy for networks and there should
not be multiple host routes for such hosts 2) if we have multiple
routes we should inspect them (which is not done). 3) the entire idea
of abusing KRT as storage for ND proxy seems odd. Userland programs
should be used for that purpose).
Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).
Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.
These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.
ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.
BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).
Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.
Differential Revision: https://reviews.freebsd.org/D4102
cperciva's libmd implementation is 5-30% faster
The same was done for SHA256 previously in r263218
cperciva's implementation was lacking SHA-384 which I implemented, validated against OpenSSL and the NIST documentation
Extend sbin/md5 to create sha384(1)
Chase dependancies on sys/crypto/sha2/sha2.{c,h} and replace them with sha512{c.c,.h}
Reviewed by: cperciva, des, delphij
Approved by: secteam, bapt (mentor)
MFC after: 2 weeks
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3929
send_queue and the socket is closed. This results in strange
race conditions for the application.
While there, remove a stray character.
MFC after: 3 days
TFO is disabled by default in the kernel build. See the top comment
in sys/netinet/tcp_fastopen.c for implementation particulars.
Reviewed by: gnn, jch, stas
MFC after: 3 days
Sponsored by: Verisign, Inc.
Differential Revision: https://reviews.freebsd.org/D4350
creation will print extra lines on the console. We are generally not
interested in this (repeated) information for each VNET. Thus only
print it for the default VNET. Virtual interfaces on the base system
will remain printing information, but e.g. each loopback in each vnet
will no longer cause a "bpf attached" line.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4531
on vnet enabled jail shutdown. Call the provided cleanup
routines for IP versions 4 and 6 to plug these leaks.
Sponsored by: The FreeBSD Foundation
MFC atfer: 2 weeks
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D4530
- Use SDT_PROBE<N>() instead of SDT_PROBE(). This has no functional effect
at the moment, but will be needed for some future changes.
- Don't hardcode the module component of the probe identifier. This is
set automatically by the SDT framework.
MFC after: 1 week
If source of ARP request didn't pass the routing check
(e.g. not in directly connected network), be polite and
still answer the request instead of dropping frame.
Reported by: quadro at irc@rusnet
to do is to clean up the timer handling using the async-drain.
Other optimizations may be coming to go with this. Whats here
will allow differnet tcp implementations (one included).
Reviewed by: jtl, hiren, transports
Sponsored by: Netflix Inc.
Differential Revision: D4055
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.
This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.
We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.
Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce
This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.
The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link
Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.
PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111
This routine checks that there are no locks held for an inp,
without having any lock on the inp. This breaks if the inp
goes away when it is called. This happens on stress tests
on a RPi B+.
MFC after: 3 days
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.
Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
return IPv6 destination address with embedded scope. Currently
rw_gateway has it scope embedded, do the same for non-gatewayed
destinations.
Sponsored by: Yandex LLC
other end till it reaches predetermined threshold which is 3 for us right now.
Once that happens, we trigger fast-retransmit to do loss recovery.
Main problem with the current implementation is that we don't honor SACK
information well to detect whether an incoming ack is a dupack or not. RFC6675
has latest recommendations for that. According to it, dupack is a segment that
arrives carrying a SACK block that identifies previously unknown information
between snd_una and snd_max even if it carries new data, changes the advertised
window, or moves the cumulative acknowledgment point.
With the prevalence of Selective ACK (SACK) these days, improper handling can
lead to delayed loss recovery.
With the fix, new behavior looks like following:
0) th_ack < snd_una --> ignore
Old acks are ignored.
1) th_ack == snd_una, !sack_changed --> ignore
Acks with SACK enabled but without any new SACK info in them are ignored.
2) th_ack == snd_una, window == old_window --> increment
Increment on a good dupack.
3) th_ack == snd_una, window != old_window, sack_changed --> increment
When SACK enabled, it's okay to have advertized window changed if the ack has
new SACK info.
4) th_ack > snd_una --> reset to 0
Reset to 0 when left edge moves.
5) th_ack > snd_una, sack_changed --> increment
Increment if left edge moves but there is new SACK info.
Here, sack_changed is the indicator that incoming ack has previously unknown
SACK info in it.
Note: This fix is not fully compliant to RFC6675. That may require a few
changes to current implementation in order to keep per-sackhole dupack counter
and change to the way we mark/handle sack holes.
PR: 203663
Reviewed by: jtl
MFC after: 3 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D4225
Vast majority of rtalloc(9) users require only basic info from
route table (e.g. "does the rtentry interface match with the interface
I have?". "what is the MTU?", "Give me the IPv4 source address to use",
etc..).
Instead of hand-rolling lookups, checking if rtentry is up, valid,
dealing with IPv6 mtu, finding "address" ifp (almost never done right),
provide easy-to-use API hiding all the complexity and returning the
needed info into small on-stack structure.
This change also helps hiding route subsystem internals (locking, direct
rtentry accesses).
Additionaly, using this API improves lookup performance since rtentry is not
locked.
(This is safe, since all the rtentry changes happens under both radix WLOCK
and rtentry WLOCK).
Sponsored by: Yandex LLC
* When processing a cookie, use the number of
streams announced in the INIT-ACK.
* When sending an INIT-ACK for an existing
association, use the value from the association,
not from the end-point.
MFC after: 1 week