When allocating the IPv6 fragement packet queue entry we do checks
against counters and if we pass we increment one of the counters
to claim the spot. Right after that we have two cases (malloc and MAC)
which can both fail in which case we free the entry but never released
our claim on the counter. In theory this can lead to not accepting new
fragments after a long time, especially if it would be MAC "refusing"
them.
Rather than immediately subtracting the value in the error case, only
increment it after these two cases so we can no longer leak it.
MFC after: 3 weeks
Sponsored by: Netflix
When we receive the packet with the first fragmented part (fragoff=0)
we remember the length of the unfragmentable part and the next header
(and should probably also remember ECN) as meta-data on the reassembly
queue.
Someone replying this packet so far could change these 2 (3) values.
While changing the next header seems more severe, for a full size
fragmented UDP packet, for example, adding an extension header to the
unfragmentable part would go unnoticed (as the framented part would be
considered an exact duplicate) but make reassembly fail.
So do not allow updating the meta-data after we have seen the first
fragmented part anymore.
The frag6_20 test case is added which failed before triggering an
ICMPv6 "param prob" due to the check for each queued fragment for
a max-size violation if a fragoff=0 packet was received.
MFC after: 3 weeks
Sponsored by: Netflix
While the comment was updated in r350746, the code was not.
RFC8200 says that unless fragment overlaps are exact (same fragment
twice) not only the current fragment but the entire reassembly queue
for this packet must be silently discarded, which we now do if
fragment offset and fragment length do not match.
Obtained from: jtl
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16850
Similar to the system global counter also export the per-VNET counter
"frag6_nfragpackets" detailing the current number of fragment packets
in this VNET's reassembly queues.
The read-only counter is helpful for in-VNET statistical monitoring and
for test-cases.
MFC after: 3 weeks
Sponsored by: Netflix
In case the first fragmented part (off=0) arrives we check for the
maximum packet size for each fragmented part we already queued with the
addition of the unfragmentable part from the first one.
For one we do not have to enter the loop at all if this is the first
fragmented part to arrive, and we can skip the check.
Should we encounter an error case we send an ICMPv6 message for any
fragment exceeding the maximum length limit. While dequeueing the
original packet and freeing it, statistics were not updated and leaked
both the reassembly queue count for the fragment and the global
fragment count. Found by code inspection and confirmed by tightening
test cases checking more statistical and system counters.
While here properly wrap a line.
MFC after: 3 weeks
Sponsored by: Netflix
When we are checking for the maximum reassembled packet size of the
fragmentable part and run into the error case (packet too big),
we are leaking the packet queue enntry if this was a first fragment
to arrive.
Properly cleanup, removing the queue entry from the bucket, decrementing
counters, and freeing the memory.
MFC after: 3 weeks
Sponsored by: Netflix
Per sepcification the upper layer header needs to be within the first
fragment. The check was not done so far and there is an open review for
related work, so just leave a note as to where to put it.
Move the extraction of frag offset up to this as it is needed to determine
whether this is a first fragment or not.
MFC after: 3 weeks
Sponsored by: Netflix
Check whether we are accepting more fragments (based on global limits)
before doing expensive operations of calculating the hash and taking the
bucket lock. This slightly increases a "race" between check time and
incrementing counters (which is already there) possibly allowing a few
more fragments than the maximum limits. However, when under attack,
we rather save this CPU time for other packets/work.
MFC after: 3 weeks
Sponsored by: Netflix
Rather than walking the mbuf chain manually use m_last() which doing
exactly that for us.
Defer initializing srcifp for longer as there are multiple exit paths
out of the function which do not need it set. Initialize before taking
the lock though.
Rename the mtx lock to match the type better.
MFC after: 3 weeks
Sponsored by: Netflix
The IP6_REASS_MBUF() macro did some pointer gynmastics to end up with the
same type as it gets in [*(cast **)&]. Spelling it out instead saves all
this and makes the code more readable and less obfuscated directly using
the structure field.
MFC after: 3 weeks
Sponsored by: Netflix
Add some ASCII relation of how the bits plug together. The terminology
difference of "fragmented packets" and "fragment packets" is subtle.
While here clear up more whitespace and comments.
No functional change.
MFC after: 3 weeks
Sponsored by: Netflix
Remove the KAME custom circular queue for fragments and fragmented packets
and replace them with a standard TAILQ.
This make the code a lot more understandable and maintainable and removes
further hand-rolled code from the the tree using a standard interface instead.
Hide the still public structures under #ifdef _KERNEL as there is no
use for them in user space.
The naming is a bit confusing now as struct ip6q and the ip6q[] buckets
array are not the same anymore; sadly struct ip6q is also used by the
MAC framework and we cannot rename it.
Submitted by: jtl (initally)
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16847 (jtl's original)
When shutting down a VNET we did not cleanup the fragmentation hashes.
This has multiple problems: (1) leak memory but also (2) leak on the
global counters, which might eventually lead to a problem on a system
starting and stopping a lot of vnets and dealing with a lot of IPv6
fragments that the counters/limits would be exhausted and processing
would no longer take place.
Unfortunately we do not have a useable variable to indicate when
per-VNET initialization of frag6 has happened (or when destroy happened)
so introduce a boolean to flag this. This is needed here as well as
it was in r353635 for ip_reass.c in order to avoid tripping over the
already destroyed locks if interfaces go away after the frag6 destroy.
While splitting things up convert the TRY_LOCK to a LOCK operation in
now frag6_drain_one(). The try-lock was derived from a manual hand-rolled
implementation and carried forward all the time. We no longer can afford
not to get the lock as that would mean we would continue to leak memory.
Assert that all the buckets are empty before destroying to lock to
ensure long-term stability of a clean shutdown.
Reported by: hselasky
Reviewed by: hselasky
MFC after: 3 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D22054
Add a read-only sysctl exporting the global number of fragments
(base system and all vnets). This is helpful to (a) know how many
fragments are currently being processed, (b) if there are possible
leaks, (c) if vnet teardown is not working correctly, and lastly
(d) it can be used as part of test-suits to ensure (a) to (c).
MFC after: 3 weeks
Sponsored by: Netflix
partial fragmented packets before a network interface is detached.
When sending IPv4 or IPv6 fragmented packets and a fragment is lost
before the network device is freed, the mbuf making up the fragment
will remain in the temporary hashed fragment list and cause a panic
when it times out due to accessing a freed network interface
structure.
1) Make sure the m_pkthdr.rcvif always points to a valid network
interface. Else the rcvif field should be set to NULL.
2) Use the rcvif of the last received fragment as m_pkthdr.rcvif for
the fully defragged packet, instead of the first received fragment.
Panic backtrace for IPv6:
panic()
icmp6_reflect() # tries to access rcvif->if_afdata[AF_INET6]->xxx
icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D19622
MFC after: 1 week
Sponsored by: Mellanox Technologies
the epoch section towards return statement. Since entering epoch
is cheap, it is easier to cover the whole function with epoch,
rather than try to properly maintain its state.
called in syscall context, so it must enter epoch itself. This
changeset originates from early version of the patch, and somehow
slipped to the final version.
Reported by: pho
Acquire the inp lock before checking whether the socket is already bound,
and around updates to the inp_vflag field.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21867
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
contains Hop-by-Hop options, the mbuf chain is potentially changed in
ip6_hopopts_input(), called by ip6_input_hbh().
This can happen, because of the the use of IP6_EXTHDR_CHECK, which might
call m_pullup().
So provide the updated pointer back to the called of ip6_input_hbh() to
avoid using a freed mbuf chain in`ip6_input()`.
Reviewed by: markj@
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D21664
KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports
offload of TLS for transmitted data. Key negotation must still be
performed in userland. Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option. All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.
Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type. Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.
At present, rekeying is not supported though the in-kernel framework
should support rekeying.
KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer. Each TLS frame is described by a single
ext_pgs mbuf. The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.
KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.
Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then
called to schedule TLS frames for encryption. In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed. For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().
A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue(). Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.
(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)
KTLS supports pluggable software encryption backends. Internally,
Netflix uses proprietary pure-software backends. This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames. As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.
Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready(). At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.
ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session. TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted. The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface. If so, the packet is tagged
with the TLS send tag and sent to the interface. The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation. If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped. In addition, a task is scheduled to refresh the TLS send
tag for the TLS session. If a new TLS send tag cannot be allocated,
the connection is dropped. If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag. (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another. As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)
ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.
Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option. They can also use this socket
option to toggle between software and ifnet TLS modes.
In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax. However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.
Various sysctls and counters are available under the kern.ipc.tls
sysctl node. The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default). The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.
KTLS is enabled via the KERN_TLS kernel option.
This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.
Reviewed by: gallatin, hselasky, rrs
Obtained from: Netflix
Sponsored by: Netflix, Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21277
Move ip6asfrag and the accompanying IP6_REASS_MBUF macro from
ip6_var.h into frag6.c as they are not used outside frag6.c.
Sadly struct ip6q is all over the mac framework so we have to
leave it public.
This reduces the public KPI space.
MFC after: 3 months
X-MFC: possibly MFC the #define only to stable branches
Sponsored by: Netflix
Consitently put () around return values.
Do not assign variables at the time of variable declaration.
Sort variables. Rename ia to ia6, remove/reuse some variables used only
once or twice for temporary calculations.
No functional changes intended.
MFC after: 3 months
Sponsored by: Netflix
Cleanup some comments (start with upper case, ends in punctuation,
use width and do not consume vertical space). Update comments to
RFC8200. Some whitespace changes.
No functional changes.
MFC after: 3 months
Sponsored by: Netflix
Previously the ICMPv6 input path incorrectly handled cases where an
MLDv2 listener query packet was internally fragmented across multiple
mbufs.
admbugs: 921
Submitted by: jtl
Reported by: CJD of Apple
Approved by: so
MFC after: 0 minutes
Security: CVE-2019-5608
The hash buckets array is called ip6q. The data structure ip6q is a
description of different object, the one the array holds these days
(since r337776). To clear some of this confusion, rename the array
to ip6qb.
When iterating over all buckets or addressing them directly, we
use at least the variables i, hash, and bucket. To keep the
terminology consistent use the variable name "bucket" and always
make it an uint32_t and not sometimes an int.
No functional behaviour changes intended.
MFC after: 3 months
Sponsored by: Netflix
Re-order functions within the file in preparation for an upcoming
code simplification.
No functional changes.
MFC after: 3 months
Sponsored by: Netflix
Bring back systm.h after r350532 and banish errno.h, time.h, and
machine/atomic.h.
Reported by: bde (Thank you!)
Pointyhat to: bz
MFC after: 12 weeks
X-MFC: with r350532
Sponsored by: Netflix
Removing the prototype from the header and making the function static
in r350533 makes architectures using gcc complain "function declaration
isn't a prototype". Add the missing void given the function has no
arguments.
Reported by: the CI machinery
Pointyhat to: bz
MFC after: 3 months
X-MFC with: r350533
Sponsored by: Netflix
Rename M_FTABLE to M_FRAG6 as the former sounds very much like the former
"flowtable" rather than anything to do with fragments and reassembly.
While here, let malloc( , .. | M_ZERO) do the zeroing rather than calling
bzero() ourselves.
MFC after: 3 months
Sponsored by: Netflix
Remove all the #if 0 and #if notyet blocks of dead code which have been
there for at least 18 years from what I can see.
No functional changes.
MFC after: 3 months
Sponsored by: Netflix
Move the sysctls and the related variables only used in frag6.c
into the file and out of in6_proto.c. That way everything belonging
together is in one place.
Sort the variables into global and per-vnet scopes and make
them static. No longer export the (helper) function
frag6_set_bucketsize() now also file-local only.
Should be no functional changes, only reduced public KPI/KBI surface.
MFC after: 3 months
Sponsored by: Netflix
Sort includes and remove duplicate kernel.h as well as the unneeded
systm.h.
Hide the mac framework incude behind #fidef MAC.
MFC after: 3 months
Sponsored by: Netflix
Finish what was started a few years ago and harmonize IPv6 and IPv4
kernel names. We are down to very few places now that it is feasible
to do the change for everything remaining with causing too much disturbance.
Remove "aliases" for IPv6 names which confusingly could indicate
that we are talking about a different data structure or field or
have two fields, one for each address family.
Try to follow common conventions used in FreeBSD.
* Rename sin6p to sin6 as that is how it is spelt in most places.
* Remove "aliases" (#defines) for:
- in6pcb which really is an inpcb and nothing separate
- sotoin6pcb which is sotoinpcb (as per above)
- in6p_sp which is inp_sp
- in6p_flowinfo which is inp_flow
* Try to use ia6 for in6_addr rather than in6p.
* With all these gone also rename the in6p variables to inp as
that is what we call it in most of the network stack including
parts of netinet6.
The reasons behind this cleanup are that we try to further
unify netinet and netinet6 code where possible and that people
will less ignore one or the other protocol family when doing
code changes as they may not have spotted places due to different
names for the same thing.
No functional changes.
Discussed with: tuexen (SCTP changes)
MFC after: 3 months
Sponsored by: Netflix
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages. It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.
For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer. This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers). It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused. To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.
Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.
NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability. This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.
If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output. If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: ae, kp (firewalls)
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
instead of a linear array.
The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.
To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi
All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.
This patch has been tested by pho@ .
Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by: markj @
MFC after: 1 week
Sponsored by: Mellanox Technologies
- Perform ifp mismatch checks (to determine if a send tag is allocated
for a different ifp than the one the packet is being output on), in
ip_output() and ip6_output(). This avoids sending packets with send
tags to ifnet drivers that don't support send tags.
Since we are now checking for ifp mismatches before invoking
if_output, we can now try to allocate a new tag before invoking
if_output sending the original packet on the new tag if allocation
succeeds.
To avoid code duplication for the fragment and unfragmented cases,
add ip_output_send() and ip6_output_send() as wrappers around
if_output and nd6_output_ifp, respectively. All of the logic for
setting send tags and dealing with send tag-related errors is done
in these wrapper functions.
For pseudo interfaces that wrap other network interfaces (vlan and
lagg), wrapper send tags are now allocated so that ip*_output see
the wrapper ifp as the ifp in the send tag. The if_transmit
routines rewrite the send tags after performing an ifp mismatch
check. If an ifp mismatch is detected, the transmit routines fail
with EAGAIN.
- To provide clearer life cycle management of send tags, especially
in the presence of vlan and lagg wrapper tags, add a reference count
to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
Provide a helper function (m_snd_tag_init()) for use by drivers
supporting send tags. m_snd_tag_init() takes care of the if_ref
on the ifp meaning that code alloating send tags via if_snd_tag_alloc
no longer has to manage that manually. Similarly, m_snd_tag_rele
drops the refcount on the ifp after invoking if_snd_tag_free when
the last reference to a send tag is dropped.
This also closes use after free races if there are pending packets in
driver tx rings after the socket is closed (e.g. from tcpdrop).
In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
Drivers now also check this flag instead of checking snd_tag against
NULL. This avoids false positive matches when a forwarded packet
has a non-NULL rcvif that was treated as a send tag.
- cxgbe was relying on snd_tag_free being called when the inp was
detached so that it could kick the firmware to flush any pending
work on the flow. This is because the driver doesn't require ACK
messages from the firmware for every request, but instead does a
kind of manual interrupt coalescing by only setting a flag to
request a completion on a subset of requests. If all of the
in-flight requests don't have the flag when the tag is detached from
the inp, the flow might never return the credits. The current
snd_tag_free command issues a flush command to force the credits to
return. However, the credit return is what also frees the mbufs,
and since those mbufs now hold references on the tag, this meant
that snd_tag_free would never be called.
To fix, explicitly drop the mbuf's reference on the snd tag when the
mbuf is queued in the firmware work queue. This means that once the
inp's reference on the tag goes away and all in-flight mbufs have
been queued to the firmware, tag's refcount will drop to zero and
snd_tag_free will kick in and send the flush request. Note that we
need to avoid doing this in the middle of ethofld_tx(), so the
driver grabs a temporary reference on the tag around that loop to
defer the free to the end of the function in case it sends the last
mbuf to the queue after the inp has dropped its reference on the
tag.
- mlx5 preallocates send tags and was using the ifp pointer even when
the send tag wasn't in use. Explicitly use the ifp from other data
structures instead.
- Sprinkle some assertions in various places to assert that received
packets don't have a send tag, and that other places that overwrite
rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.
Reviewed by: gallatin, hselasky, rgrimes, ae
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20117
since r286195.
Do not forget results of route lookup and initialize rt and ifp pointers.
PR: 238098
Submitted by: Masse Nicolas <nicolas.masse at stormshield eu>
MFC after: 1 week
Currently rinit1() and its IPv6 counterpart
nd6_prefix_onlink_rtrequest() uses dummy null_sdl gateway address
during route insertion and change it afterwards. This behaviour
brings complications to the routing stack and the users of its
upcoming notification system.
This change fixes both rinit1() and nd6_prefix_onlink_rtrequest()
by filling in proper gateway in the beginning. It does not change any
of the userland notifications as in both cases, they happen after
the insertion and fixup process (rt_newaddrmsg_fib() and nd6_rtmsg()).
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20328