freebsd-nq/sys/net
John Baldwin fb3bc59600 Restructure mbuf send tags to provide stronger guarantees.
- Perform ifp mismatch checks (to determine if a send tag is allocated
  for a different ifp than the one the packet is being output on), in
  ip_output() and ip6_output().  This avoids sending packets with send
  tags to ifnet drivers that don't support send tags.

  Since we are now checking for ifp mismatches before invoking
  if_output, we can now try to allocate a new tag before invoking
  if_output sending the original packet on the new tag if allocation
  succeeds.

  To avoid code duplication for the fragment and unfragmented cases,
  add ip_output_send() and ip6_output_send() as wrappers around
  if_output and nd6_output_ifp, respectively.  All of the logic for
  setting send tags and dealing with send tag-related errors is done
  in these wrapper functions.

  For pseudo interfaces that wrap other network interfaces (vlan and
  lagg), wrapper send tags are now allocated so that ip*_output see
  the wrapper ifp as the ifp in the send tag.  The if_transmit
  routines rewrite the send tags after performing an ifp mismatch
  check.  If an ifp mismatch is detected, the transmit routines fail
  with EAGAIN.

- To provide clearer life cycle management of send tags, especially
  in the presence of vlan and lagg wrapper tags, add a reference count
  to send tags managed via m_snd_tag_ref() and m_snd_tag_rele().
  Provide a helper function (m_snd_tag_init()) for use by drivers
  supporting send tags.  m_snd_tag_init() takes care of the if_ref
  on the ifp meaning that code alloating send tags via if_snd_tag_alloc
  no longer has to manage that manually.  Similarly, m_snd_tag_rele
  drops the refcount on the ifp after invoking if_snd_tag_free when
  the last reference to a send tag is dropped.

  This also closes use after free races if there are pending packets in
  driver tx rings after the socket is closed (e.g. from tcpdrop).

  In order for m_free to work reliably, add a new CSUM_SND_TAG flag in
  csum_flags to indicate 'snd_tag' is set (rather than 'rcvif').
  Drivers now also check this flag instead of checking snd_tag against
  NULL.  This avoids false positive matches when a forwarded packet
  has a non-NULL rcvif that was treated as a send tag.

- cxgbe was relying on snd_tag_free being called when the inp was
  detached so that it could kick the firmware to flush any pending
  work on the flow.  This is because the driver doesn't require ACK
  messages from the firmware for every request, but instead does a
  kind of manual interrupt coalescing by only setting a flag to
  request a completion on a subset of requests.  If all of the
  in-flight requests don't have the flag when the tag is detached from
  the inp, the flow might never return the credits.  The current
  snd_tag_free command issues a flush command to force the credits to
  return.  However, the credit return is what also frees the mbufs,
  and since those mbufs now hold references on the tag, this meant
  that snd_tag_free would never be called.

  To fix, explicitly drop the mbuf's reference on the snd tag when the
  mbuf is queued in the firmware work queue.  This means that once the
  inp's reference on the tag goes away and all in-flight mbufs have
  been queued to the firmware, tag's refcount will drop to zero and
  snd_tag_free will kick in and send the flush request.  Note that we
  need to avoid doing this in the middle of ethofld_tx(), so the
  driver grabs a temporary reference on the tag around that loop to
  defer the free to the end of the function in case it sends the last
  mbuf to the queue after the inp has dropped its reference on the
  tag.

- mlx5 preallocates send tags and was using the ifp pointer even when
  the send tag wasn't in use.  Explicitly use the ifp from other data
  structures instead.

- Sprinkle some assertions in various places to assert that received
  packets don't have a send tag, and that other places that overwrite
  rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.

Reviewed by:	gallatin, hselasky, rgrimes, ae
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20117
2019-05-24 22:30:40 +00:00
..
altq Reduce the time it takes the kernel to install a new PF config containing a large number of queues 2019-02-11 05:17:31 +00:00
bpf_buffer.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
bpf_buffer.h
bpf_filter.c
bpf_jitter.c
bpf_jitter.h
bpf_zerocopy.c
bpf_zerocopy.h
bpf.c Restructure mbuf send tags to provide stronger guarantees. 2019-05-24 22:30:40 +00:00
bpf.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
bpfdesc.h Rework locking in BPF code to remove rwlock from fast path. 2019-05-13 13:45:28 +00:00
bridgestp.c bridge: Fix panic if the STP root is removed 2019-03-15 11:21:20 +00:00
bridgestp.h
dlt.h
ethernet.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
firewire.h
ieee8023ad_lacp.c Select lacp egress ports based on NUMA domain 2019-05-03 14:43:21 +00:00
ieee8023ad_lacp.h Select lacp egress ports based on NUMA domain 2019-05-03 14:43:21 +00:00
ieee_oui.h net: adjust randomized address bits 2019-04-17 17:18:43 +00:00
if_arp.h Improve ARP logging. 2019-03-09 01:12:59 +00:00
if_bridge.c net: adjust randomized address bits 2019-04-17 17:18:43 +00:00
if_bridgevar.h
if_clone.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
if_clone.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_dead.c
if_debug.c
if_disc.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
if_dl.h
if_edsc.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
if_enc.c New pfil(9) KPI together with newborn pfil API and control utility. 2019-01-31 23:01:03 +00:00
if_enc.h
if_epair.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
if_ethersubr.c Restructure mbuf send tags to provide stronger guarantees. 2019-05-24 22:30:40 +00:00
if_fwsubr.c
if_gif.c Add handling for appearing/disappearing of ingress addresses to if_gif(4). 2018-10-21 18:06:15 +00:00
if_gif.h Add handling for appearing/disappearing of ingress addresses to if_gif(4). 2018-10-21 18:06:15 +00:00
if_gre.c Add GRE-in-UDP encapsulation support as defined in RFC8086. 2019-04-24 09:05:45 +00:00
if_gre.h Add GRE-in-UDP encapsulation support as defined in RFC8086. 2019-04-24 09:05:45 +00:00
if_ipsec.c Allow configuration of several ipsec interfaces with the same tunnel 2018-11-16 14:21:57 +00:00
if_ipsec.h
if_lagg.c Restructure mbuf send tags to provide stronger guarantees. 2019-05-24 22:30:40 +00:00
if_lagg.h Select lacp egress ports based on NUMA domain 2019-05-03 14:43:21 +00:00
if_llatbl.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_llatbl.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_llc.h
if_loop.c Use the new VNET_DEFINE_STATIC macro when we are defining static VNET 2018-07-24 16:35:52 +00:00
if_me.c Add the check that current VNET is ready and access to srchash is allowed. 2018-10-23 13:11:45 +00:00
if_media.c
if_media.h if_media: Add new 2.5G/5G/25G/40G/50G/100G/200G/400G media types 2018-08-22 18:19:56 +00:00
if_mib.c
if_mib.h
if_pflog.h
if_pfsync.h
if_sppp.h
if_spppfr.c
if_spppsubr.c Replace read_random(9) with more appropriate arc4rand(9) KPIs 2019-04-04 01:02:50 +00:00
if_stf.c Do not perform DAD on stf(4) interfaces. 2019-03-30 18:00:44 +00:00
if_tap.h tun/tap: merge and rename to tuntap 2019-05-08 02:32:11 +00:00
if_tun.h
if_tuntap.c Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_types.h
if_var.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_vlan_var.h Extract eventfilter declarations to sys/_eventfilter.h 2019-05-20 00:38:23 +00:00
if_vlan.c Restructure mbuf send tags to provide stronger guarantees. 2019-05-24 22:30:40 +00:00
if_vxlan.c net: adjust randomized address bits 2019-04-17 17:18:43 +00:00
if_vxlan.h
if.c Restructure mbuf send tags to provide stronger guarantees. 2019-05-24 22:30:40 +00:00
if.h Plug routing sysctl leaks. 2018-11-26 13:42:18 +00:00
ifdi_if.m
iflib_clone.c - Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx. 2019-05-06 20:56:41 +00:00
iflib_private.h Use busdma unconditionally in iflib 2018-11-27 20:01:05 +00:00
iflib.c iflib: use default ntxd and nrxd when user value is not power of 2 2019-05-10 00:41:42 +00:00
iflib.h - Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx. 2019-05-06 20:56:41 +00:00
ifq.h
mp_ring.c - Merge r338254 from cxgbe(4): 2019-05-09 11:34:46 +00:00
mp_ring.h mp_ring: avoid items offset difference between iflib and mp_ring 2019-01-03 23:06:05 +00:00
mppc.h
mppcc.c
mppcd.c
netisr_internal.h
netisr.c Restructure mbuf send tags to provide stronger guarantees. 2019-05-24 22:30:40 +00:00
netisr.h
netmap_legacy.h netmap: add support for multiple host rings 2019-03-18 12:22:23 +00:00
netmap_user.h netmap: add support for multiple host rings 2019-03-18 12:22:23 +00:00
netmap_virt.h netmap: align codebase to the current upstream (760279cfb2730a585) 2018-12-05 11:57:16 +00:00
netmap.h netmap: add support for multiple host rings 2019-03-18 12:22:23 +00:00
paravirt.h
pfil.c Most Ethernet drivers that potentially can run a pfil(9) hook with 2019-03-10 17:20:09 +00:00
pfil.h Most Ethernet drivers that potentially can run a pfil(9) hook with 2019-03-10 17:20:09 +00:00
pfkeyv2.h
pfvar.h pf :Use counter(9) in pf tables. 2019-03-15 11:08:44 +00:00
ppp_defs.h
radix_mpath.c Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). 2018-06-16 08:26:23 +00:00
radix_mpath.h
radix.c Switch RIB and RADIX_NODE_HEAD lock from rwlock(9) to rmlock(9). 2018-06-16 08:26:23 +00:00
radix.h Fix typo. 2018-06-16 19:21:09 +00:00
raw_cb.c
raw_cb.h
raw_usrreq.c
rndis.h
route_var.h Existense of PCB route caching doesn't allow us to use new fast route 2019-05-08 23:39:24 +00:00
route.c Fix gateway setup for the interface routes. 2019-05-22 21:20:15 +00:00
route.h Existense of PCB route caching doesn't allow us to use new fast route 2019-05-08 23:39:24 +00:00
rss_config.c
rss_config.h
rtsock.c When sending a routing message, don't allow the user to set the 2019-04-14 10:18:14 +00:00
sff8436.h
sff8472.h
slcompress.c
slcompress.h
toeplitz.c
toeplitz.h
vnet.c With more excessive use of modules, more kernel parts working with 2018-10-30 20:45:15 +00:00
vnet.h Don't mark module data as static on RISC-V. 2018-09-12 08:05:33 +00:00