freebsd-skq/sys/netinet
Luigi Rizzo de240d1013 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
..
ipfw merge code from ipfw3-head to reduce contention on the ipfw lock 2009-12-22 19:01:47 +00:00
libalias Move inet_aton() (specular to inet_ntoa(), already present in libkern) 2009-11-12 00:46:28 +00:00
accf_data.c Rework socket upcalls to close some races with setup/teardown of upcalls. 2009-06-01 21:17:03 +00:00
accf_dns.c Rework socket upcalls to close some races with setup/teardown of upcalls. 2009-06-01 21:17:03 +00:00
accf_http.c Rework socket upcalls to close some races with setup/teardown of upcalls. 2009-06-01 21:17:03 +00:00
icmp6.h Many network stack subsystems use a single global data structure to hold 2009-08-02 19:43:32 +00:00
icmp_var.h Many network stack subsystems use a single global data structure to hold 2009-08-02 19:43:32 +00:00
if_atm.c
if_atm.h
if_ether.c Use the correct option name in the preprocessor command to enable 2009-10-23 18:27:34 +00:00
if_ether.h Add arp_update_event. This replaces route_arp_update_event, which 2009-09-08 21:17:17 +00:00
igmp_var.h Update stats in struct igmpstat using two new macros: 2009-04-12 13:41:13 +00:00
igmp.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
igmp.h These are no longer referenced in the tree, so can be safely removed. 2009-06-10 18:12:15 +00:00
in_cksum.c
in_gif.c Many network stack subsystems use a single global data structure to hold 2009-08-02 19:43:32 +00:00
in_gif.h
in_mcast.c Correct a comment. 2009-11-19 13:21:37 +00:00
in_pcb.c Previously local end of point-to-point interface is not reachable 2009-09-14 22:19:47 +00:00
in_pcb.h Add padding to struct inpcb, missed during our padding sweep earlier in 2009-08-02 22:47:08 +00:00
in_proto.c Get SCTP working in combination with VIMAGE. 2009-09-19 14:02:16 +00:00
in_rmx.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
in_systm.h Use uint32_t instead of n_long and n_time, and uint16_t instead of n_short. 2009-02-13 15:14:43 +00:00
in_var.h Remove unused VNET_SET() and related macros; only VNET_GET() is 2009-07-16 21:13:04 +00:00
in.c Use the correct option name in the preprocessor command to enable 2009-10-23 18:27:34 +00:00
in.h Add new sockopt names for ipfw and dummynet. 2009-12-02 10:36:41 +00:00
ip6.h Start removing IPv6 Type 0 Routing header code. 2009-03-03 13:12:12 +00:00
ip_carp.c Until this moment carp(4) used a strange logging priority. It used debug 2009-12-02 13:24:21 +00:00
ip_carp.h Update stats in struct carpstats using two new macros: CARPSTATS_ADD() 2009-04-12 14:19:37 +00:00
ip_divert.c Start splitting ip_fw2.c and ip_fw.h into smaller components. 2009-12-15 16:15:14 +00:00
ip_divert.h Introduce a div_destroy() function which takes over per-vnet cleanup tasks 2009-08-24 10:06:02 +00:00
ip_dummynet.h merge code from ipfw3-head to reduce contention on the ipfw lock 2009-12-22 19:01:47 +00:00
ip_ecn.c
ip_ecn.h
ip_encap.c
ip_encap.h
ip_fastfwd.c Virtualize the pfil hooks so that different jails may chose different 2009-10-11 05:59:43 +00:00
ip_fw.h merge code from ipfw3-head to reduce contention on the ipfw lock 2009-12-22 19:01:47 +00:00
ip_gre.c
ip_gre.h
ip_icmp.c Compare pointer to NULL rather than 0. 2009-10-13 20:29:14 +00:00
ip_icmp.h Use uint32_t instead of n_long and n_time, and uint16_t instead of n_short. 2009-02-13 15:14:43 +00:00
ip_id.c
ip_input.c Correct spelling typo in ip_input comment. 2009-10-24 09:18:26 +00:00
ip_ipsec.c Remove ifdefed out part of code, which seems to have originated a decade ago 2009-11-09 19:53:34 +00:00
ip_ipsec.h Remove ifdefed out part of code, which seems to have originated a decade ago 2009-11-09 19:53:34 +00:00
ip_mroute.c In expire_mfc(), add an assert on the multicast forwarding cache mutex. 2009-09-13 01:00:24 +00:00
ip_mroute.h Switch cmd argument to u_long. This matches what if_ethersubr.c does and 2009-06-21 10:29:31 +00:00
ip_options.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
ip_options.h Add function ip_checkrouteralert(), which will be used 2009-03-04 02:51:22 +00:00
ip_output.c Remove ifdefed out part of code, which seems to have originated a decade ago 2009-11-09 19:53:34 +00:00
ip_var.h Virtualize the pfil hooks so that different jails may chose different 2009-10-11 05:59:43 +00:00
ip.h Use uint32_t instead of n_long and n_time, and uint16_t instead of n_short. 2009-02-13 15:14:43 +00:00
pim_var.h Update stats in struct pimstat using two new macros: PIMSTAT_ADD() 2009-04-12 14:06:26 +00:00
pim.h
raw_ip.c Throughout the network stack we have a few places of 2009-12-13 13:57:32 +00:00
sctp_asconf.c Do not start the iterator when there are no associations. 2009-11-17 13:11:23 +00:00
sctp_asconf.h
sctp_auth.c Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY, 2009-11-17 20:56:14 +00:00
sctp_auth.h
sctp_bsd_addr.c Fix a race condition where a mutex was destroyed while sleeping on it. 2009-10-11 12:23:56 +00:00
sctp_bsd_addr.h
sctp_cc_functions.c Bugfix: Use formula from section 7.2.3 of RFC 4960. Reported by Martin Becke. 2009-10-27 18:17:07 +00:00
sctp_cc_functions.h
sctp_constants.h Use the default stack size for the iterator thread. 2009-11-27 17:25:19 +00:00
sctp_crc32.c repository sync to multi-OS repo ... spaceing change 2009-05-07 16:43:49 +00:00
sctp_crc32.h This commit fixes the issue with alias_sctp.c. No 2009-02-14 11:34:57 +00:00
sctp_header.h Add the add-stream capability. Still needs more 2009-02-20 15:03:54 +00:00
sctp_indata.c This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT 2009-08-24 11:46:40 +00:00
sctp_indata.h
sctp_input.c Fix a bug where the system panics when a SHUTDOWN is received with an 2009-11-18 12:17:06 +00:00
sctp_input.h
sctp_lock_bsd.h
sctp_os_bsd.h Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY, 2009-11-17 20:56:14 +00:00
sctp_os.h
sctp_output.c Get rid of unused fields addr_over which is never really used, 2009-11-17 23:03:38 +00:00
sctp_output.h Fix the add stream feature of strm-reset to really work: 2009-02-27 20:54:45 +00:00
sctp_pcb.c Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY, 2009-11-17 20:56:14 +00:00
sctp_pcb.h Support for VNET in SCTP (hopefully) 2009-09-17 15:11:12 +00:00
sctp_peeloff.c
sctp_peeloff.h
sctp_structs.h Get rid of unused fields addr_over which is never really used, 2009-11-17 23:03:38 +00:00
sctp_sysctl.c Get SCTP working in combination with VIMAGE. 2009-09-19 14:02:16 +00:00
sctp_sysctl.h Fix a bug where wrong initialization value 2009-07-28 15:07:41 +00:00
sctp_timer.c Support for VNET in SCTP (hopefully) 2009-09-17 15:11:12 +00:00
sctp_timer.h
sctp_uio.h Get rid of unused field. This will also be deleted 2009-07-27 12:09:32 +00:00
sctp_usrreq.c Use always LIST_EMPTY instead of sometime SCTP_LIST_EMPTY, 2009-11-17 20:56:14 +00:00
sctp_var.h Fixes several PR-SCTP releated bugs. 2009-03-14 13:42:13 +00:00
sctp.h Changes to the NR-Sack code so that: 2009-06-17 12:34:56 +00:00
sctputil.c Get rid of unused fields addr_over which is never really used, 2009-11-17 23:03:38 +00:00
sctputil.h * Fix a bug where PR-SCTP settings are ignore when using implicit 2009-08-15 21:10:52 +00:00
tcp_debug.c Remove the "The option TCPDEBUG requires option INET." requirement. 2009-06-10 10:39:41 +00:00
tcp_debug.h Use uint32_t instead of n_long and n_time, and uint16_t instead of n_short. 2009-02-13 15:14:43 +00:00
tcp_fsm.h
tcp_hostcache.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
tcp_hostcache.h
tcp_input.c Remove tcp_input lock statistics; these are intended for debugging only 2009-10-06 20:35:41 +00:00
tcp_lro.c
tcp_lro.h
tcp_offload.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
tcp_offload.h Replace struct tcpopt with a proxy toeopt struct in the TOE driver interface to 2009-07-13 11:51:02 +00:00
tcp_output.c Several years ago a feature was added to TCP that casued soreceive() to 2009-11-06 16:55:05 +00:00
tcp_reass.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
tcp_sack.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
tcp_seq.h
tcp_subr.c Add the ability to see TCP timers via netstat -x. This can be a useful 2009-09-16 05:33:15 +00:00
tcp_syncache.c Merge the remainder of kern_vimage.c and vimage.h into vnet.c and 2009-08-01 19:26:27 +00:00
tcp_syncache.h Replace struct tcpopt with a proxy toeopt struct in the TOE driver interface to 2009-07-13 11:51:02 +00:00
tcp_timer.c Add the ability to see TCP timers via netstat -x. This can be a useful 2009-09-16 05:33:15 +00:00
tcp_timer.h Add the ability to see TCP timers via netstat -x. This can be a useful 2009-09-16 05:33:15 +00:00
tcp_timewait.c Fix signed comparison bug when ticks goes negative after 24 days of 2009-08-20 22:53:28 +00:00
tcp_usrreq.c - Rename the __tcpi_(snd|rcv)_mss fields of the tcp_info structure to remove 2009-12-22 15:47:40 +00:00
tcp_var.h Add the ability to see TCP timers via netstat -x. This can be a useful 2009-09-16 05:33:15 +00:00
tcp.h - Rename the __tcpi_(snd|rcv)_mss fields of the tcp_info structure to remove 2009-12-22 15:47:40 +00:00
tcpip.h
toedev.h
udp_usrreq.c Many network stack subsystems use a single global data structure to hold 2009-08-02 19:43:32 +00:00
udp_var.h Many network stack subsystems use a single global data structure to hold 2009-08-02 19:43:32 +00:00
udp.h Added support for NAT-Traversal (RFC 3948) in IPsec stack. 2009-06-12 15:44:35 +00:00