freebsd-dev/sys/netinet
Christian S.J. Peron b244c8ad14 Over the past couple of years, there have been a number of reports relating
the use of divert sockets to dead locks.  A number of LORs have been reported
between divert and a number of other network subsystems including: IPSEC, Pfil,
multicast, ipfw and others.  Other dead locks could occur because of recursive
entry into the IP stack.  This change should take care of most if not all of
these issues.

A summary of the changes follow:

- We disallow multicast operations on divert sockets.  It really doesn't make
  semantic sense to allow this, since typically you would set multicast
  parameters on multicast end points.

  NOTE: As a part of this change, we actually dis-allow multicast options on
  any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family

- We check to see if there are any socket options that have been specified on
  the socket, and if there was (which is very un-common and also probably
  doesnt make sense to support) we duplicate the mbuf carrying the options.

- We then drop the INP/INFO locks over the call to ip_output().  It should be
  noted that since we no longer support multicast operations on divert sockets
  and we have duplicated any socket options, we no longer need the reference
  to the pcb to be coherent.

- Finally, we replaced the call to ip_input() to use netisr queuing.  This
  should remove the recursive entry into the IP stack from divert.

By dropping the locks over the call to ip_output() we eliminate all the lock
ordering issues above.  By switching over to netisr on the inbound path,
we can no longer recursively enter the ip_input() code via divert.

I have tested this change by using the following command:

ipfwpcap -r 8000 - | tcpdump -r - -nn -v

This should exercise the input and re-injection (outbound) path, which is
very similar to the work load performed by natd(8).  Additionally, I have
run some ospf daemons which have a heavy reliance on raw sockets and
multicast.

Approved by:	re@ (kensmith)
MFC after:	1 month
LOR:		163
LOR:		181
LOR:		202
LOR:		203
Discussed with:	julian, andre et al (on freebsd-net)
In collaboration with:	bms [1], rwatson [2]

[1] bms helped out with the multicast decisions
[2] rwatson submitted the original netisr patches and came up with some
    of the original ideas on how to combat this issue.
2007-08-06 22:06:36 +00:00
..
libalias o Kill EOLWS while I'm here. 2007-04-30 20:26:11 +00:00
accf_data.c
accf_http.c
icmp6.h - Disabled responding to NI queries from a global address by default as 2007-05-17 21:20:24 +00:00
icmp_var.h Attempt to improve feature parity between UDPv4 and UDPv6 by merging 2007-07-19 22:34:25 +00:00
if_atm.c Add newline to debuging printf. 2005-08-26 15:27:18 +00:00
if_atm.h
if_ether.c Move universally to ANSI C function declarations, with relatively 2007-05-10 15:58:48 +00:00
if_ether.h
igmp_var.h Import rewrite of IPv4 socket multicast layer to support source-specific 2007-06-12 16:24:56 +00:00
igmp.c Improve style(9) conformance of igmp.c. 2006-12-04 00:41:48 +00:00
igmp.h Stub out imported IGMPv3 definitions which clash with those of 2007-06-15 18:59:10 +00:00
in_cksum.c Move universally to ANSI C function declarations, with relatively 2007-05-10 15:58:48 +00:00
in_gif.c Move universally to ANSI C function declarations, with relatively 2007-05-10 15:58:48 +00:00
in_gif.h
in_mcast.c Over the past couple of years, there have been a number of reports relating 2007-08-06 22:06:36 +00:00
in_pcb.c Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC 2007-07-03 12:13:45 +00:00
in_pcb.h Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which 2007-08-06 14:26:03 +00:00
in_proto.c Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC 2007-07-03 12:13:45 +00:00
in_rmx.c Move universally to ANSI C function declarations, with relatively 2007-05-10 15:58:48 +00:00
in_systm.h
in_var.h Import rewrite of IPv4 socket multicast layer to support source-specific 2007-06-12 16:24:56 +00:00
in.c Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr 2007-06-17 00:31:24 +00:00
in.h Import rewrite of IPv4 socket multicast layer to support source-specific 2007-06-12 16:24:56 +00:00
ip6.h move RFC3542 related definitions into ip6.h. 2005-07-20 10:30:52 +00:00
ip_carp.c Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove 2007-07-28 07:31:30 +00:00
ip_carp.h Make sure that carp_header is 36 bytes long 2006-12-01 18:37:41 +00:00
ip_divert.c Over the past couple of years, there have been a number of reports relating 2007-08-06 22:06:36 +00:00
ip_divert.h
ip_dummynet.c Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which 2007-08-06 14:26:03 +00:00
ip_dummynet.h Replace incorrect local OFFSET_OF macro with the correct and generic 2007-06-17 00:33:34 +00:00
ip_ecn.c Move universally to ANSI C function declarations, with relatively 2007-05-10 15:58:48 +00:00
ip_ecn.h
ip_encap.c Move universally to ANSI C function declarations, with relatively 2007-05-10 15:58:48 +00:00
ip_encap.h
ip_fastfwd.c In IPv4 fast forwarding path, send ICMP unreachable messages for 2007-03-18 23:05:20 +00:00
ip_fw2.c Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which 2007-08-06 14:26:03 +00:00
ip_fw_pfil.c Summer of Code 2005: improve libalias - part 2 of 2 2006-12-29 21:59:17 +00:00
ip_fw.h Add support for filtering on Routing Header Type 0 and 2007-05-04 11:15:41 +00:00
ip_gre.c Fix the following bpf(4) race condition which can result in a panic: 2006-06-02 19:59:33 +00:00
ip_gre.h Fix stack corruptions on amd64. 2006-01-21 10:44:34 +00:00
ip_icmp.c Attempt to improve feature parity between UDPv4 and UDPv6 by merging 2007-07-19 22:34:25 +00:00
ip_icmp.h Pass icmp_error() the MTU argument directly instead of 2005-05-04 13:09:19 +00:00
ip_id.c Minor white space and style cleanups. 2007-05-11 11:05:30 +00:00
ip_input.c Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. 2007-08-05 16:16:15 +00:00
ip_ipsec.c Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. 2007-08-05 16:16:15 +00:00
ip_ipsec.h Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. 2007-08-05 16:16:15 +00:00
ip_mroute.c Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which 2007-08-06 14:26:03 +00:00
ip_mroute.h Store the cached route in vifp in the normal send_packet() case. 2007-02-08 23:05:08 +00:00
ip_options.c Normalize style a bit: reduce pseudo-randomness of comment layout and 2007-05-11 10:48:30 +00:00
ip_options.h Normalize style a bit: reduce pseudo-randomness of comment layout and 2007-05-11 10:48:30 +00:00
ip_output.c Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC 2007-07-03 12:13:45 +00:00
ip_var.h Import rewrite of IPv4 socket multicast layer to support source-specific 2007-06-12 16:24:56 +00:00
ip.h White space and style cleanup. 2007-05-11 11:00:48 +00:00
ipprotosw.h
pim_var.h Remove public declarations of variables that were forgotten when they were 2005-08-10 07:10:02 +00:00
pim.h
raw_ip.c Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC 2007-07-03 12:13:45 +00:00
sctp_asconf.c - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_asconf.h - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_auth.c - Matthew's changes to get inlines out, plus a few of my own 2007-06-16 00:33:47 +00:00
sctp_auth.h - fix send_failed notification contents 2007-06-09 13:46:57 +00:00
sctp_bsd_addr.c - The packet log needs to copy all of the buffer not to the end. 2007-06-17 23:43:37 +00:00
sctp_bsd_addr.h - Fixes so we won't try to start a timer when we 2007-05-29 09:29:03 +00:00
sctp_cc_functions.c - added pre-checks to the bindx call. 2007-07-17 20:58:26 +00:00
sctp_cc_functions.h - Modular congestion control, with RFC2581 being the default. 2007-07-14 09:36:28 +00:00
sctp_constants.h - change number assignments for SHA225-512 (match artisync 2007-08-06 15:46:46 +00:00
sctp_crc32.c - Copyright change, cisco's silly tool wants it to say: 2007-05-08 17:01:12 +00:00
sctp_crc32.h - Copyright change, cisco's silly tool wants it to say: 2007-05-08 17:01:12 +00:00
sctp_header.h - Restructure so bindx functions are not done inline to socket option 2007-06-12 11:21:00 +00:00
sctp_indata.c - remove duplicate code from sctp_asconf.c 2007-07-21 21:41:32 +00:00
sctp_indata.h - Fix stream reset so it limits the number of streams that can be listed 2007-06-22 13:50:56 +00:00
sctp_input.c - change number assignments for SHA225-512 (match artisync 2007-08-06 15:46:46 +00:00
sctp_input.h - Consolidate the code that free's chunks to actually also 2007-07-02 19:22:22 +00:00
sctp_lock_bsd.h - Fix so ifn's are properly deleted when the ref count goes to 0. 2007-06-14 22:59:04 +00:00
sctp_os_bsd.h - remove duplicate code from sctp_asconf.c 2007-07-21 21:41:32 +00:00
sctp_os.h - Modular congestion control, with RFC2581 being the default. 2007-07-14 09:36:28 +00:00
sctp_output.c - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_output.h - Take out the broken table-id concept. Panda Routers have a M-VRF 2007-06-01 11:19:54 +00:00
sctp_pcb.c - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_pcb.h - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_peeloff.c - added pre-checks to the bindx call. 2007-07-17 20:58:26 +00:00
sctp_peeloff.h - Copyright change, cisco's silly tool wants it to say: 2007-05-08 17:01:12 +00:00
sctp_structs.h - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_sysctl.c - Modular congestion control, with RFC2581 being the default. 2007-07-14 09:36:28 +00:00
sctp_sysctl.h - Modular congestion control, with RFC2581 being the default. 2007-07-14 09:36:28 +00:00
sctp_timer.c - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
sctp_timer.h - Modular congestion control, with RFC2581 being the default. 2007-07-14 09:36:28 +00:00
sctp_uio.h - change number assignments for SHA225-512 (match artisync 2007-08-06 15:46:46 +00:00
sctp_usrreq.c - change number assignments for SHA225-512 (match artisync 2007-08-06 15:46:46 +00:00
sctp_var.h - added pre-checks to the bindx call. 2007-07-17 20:58:26 +00:00
sctp.h - added pre-checks to the bindx call. 2007-07-17 20:58:26 +00:00
sctputil.c - change number assignments for SHA225-512 (match artisync 2007-08-06 15:46:46 +00:00
sctputil.h - take out a needless panic under invariants for sctp_output.c 2007-07-24 20:06:02 +00:00
tcp_debug.c Rather than selectively zeroing fields in the tcp_debug structure 2007-05-07 14:05:23 +00:00
tcp_debug.h o Use a define for a buffer size. 2007-03-24 22:15:02 +00:00
tcp_fsm.h Make tcpstates[] static, and make sure TCPSTATES is defined before 2007-07-30 11:06:42 +00:00
tcp_hostcache.c Replace a constant with an already defined symbolic name for it. 2007-06-08 13:43:28 +00:00
tcp_input.c Make tcpstates[] static, and make sure TCPSTATES is defined before 2007-07-30 11:06:42 +00:00
tcp_output.c Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC 2007-07-03 12:13:45 +00:00
tcp_reass.c Complete the (mechanical) move of the TCP reassembly and timewait 2007-05-13 22:16:13 +00:00
tcp_sack.c Coalesce two identical UCB licenses into a single license instance with 2007-05-11 11:21:43 +00:00
tcp_seq.h Remove T/TCP RFC1644 Connection Count comparison macros. They are no longer 2006-06-18 14:24:12 +00:00
tcp_subr.c Change TCPTV_MIN to be independent of HZ. While it was documented to 2007-07-31 22:11:55 +00:00
tcp_syncache.c Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which 2007-08-06 14:26:03 +00:00
tcp_syncache.h Export the contents of the syncache to netstat. 2007-07-27 00:57:06 +00:00
tcp_timer.c Handle a race condition on >2 core machines in tcp_timer() when 2007-06-09 17:49:39 +00:00
tcp_timer.h Change TCPTV_MIN to be independent of HZ. While it was documented to 2007-07-31 22:11:55 +00:00
tcp_timewait.c Despite several examples in the kernel, the third argument of 2007-06-04 18:25:08 +00:00
tcp_usrreq.c Make tcpstates[] static, and make sure TCPSTATES is defined before 2007-07-30 11:06:42 +00:00
tcp_var.h Provide a sysctl to toggle reporting of TCP debug logging: 2007-07-28 12:20:39 +00:00
tcp.h The printf %b list in PRINT_TH_FLAGS has to be in octal numbering. 2007-05-25 21:28:49 +00:00
tcpip.h
udp_usrreq.c Further cleanup of UDPv4: 2007-07-10 09:30:46 +00:00
udp_var.h Further cleanup of UDPv4: 2007-07-10 09:30:46 +00:00
udp.h Gratuitous UDP restyling toward style(9) in 7.x. 2007-02-20 10:13:11 +00:00