freebsd-nq/sys
Robert Watson 623dce13c6 Update TCP for infrastructural changes to the socket/pcb refcount model,
pru_abort(), pru_detach(), and in_pcbdetach():

- Universally support and enforce the invariant that so_pcb is
  never NULL, converting dozens of unnecessary NULL checks into
  assertions, and eliminating dozens of unnecessary error handling
  cases in protocol code.

- In some cases, eliminate unnecessary pcbinfo locking, as it is no
  longer required to ensure so_pcb != NULL.  For example, the receive
  code no longer requires the pcbinfo lock, and the send code only
  requires it if building a new connection on an otherwise unconnected
  socket triggered via sendto() with an address.  This should
  significnatly reduce tcbinfo lock contention in the receive and send
  cases.

- In order to support the invariant that so_pcb != NULL, it is now
  necessary for the TCP code to not discard the tcpcb any time a
  connection is dropped, but instead leave the tcpcb until the socket
  is shutdown.  This case is handled by setting INP_DROPPED, to
  substitute for using a NULL so_pcb to indicate that the connection
  has been dropped.  This requires the inpcb lock, but not the pcbinfo
  lock.

- Unlike all other protocols in the tree, TCP may need to retain access
  to the socket after the file descriptor has been closed.  Set
  SS_PROTOREF in tcp_detach() in order to prevent the socket from being
  freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether
  or not it needs to free the socket when the connection finally does
  close.  The typical case where this occurs is if close() is called on
  a TCP socket before all sent data in the send socket buffer has been
  transmitted or acknowledged.  If INP_SOCKREF is found when the
  connection is dropped, we release the inpcb, tcpcb, and socket instead
  of flagging INP_DROPPED.

- Abort and detach protocol switch methods no longer return failures,
  nor attempt to free sockets, as the socket layer does this.

- Annotate the existence of a long-standing race in the TCP timer code,
  in which timers are stopped but not drained when the socket is freed,
  as waiting for drain may lead to deadlocks, or have to occur in a
  context where waiting is not permitted.  This race has been handled
  by testing to see if the tcpcb pointer in the inpcb is NULL (and vice
  versa), which is not normally permitted, but may be true of a inpcb
  and tcpcb have been freed.  Add a counter to test how often this race
  has actually occurred, and a large comment for each instance where
  we compare potentially freed memory with NULL.  This will have to be
  fixed in the near future, but requires is to further address how to
  handle the timer shutdown shutdown issue.

- Several TCP calls no longer potentially free the passed inpcb/tcpcb,
  so no longer need to return a pointer to indicate whether the argument
  passed in is still valid.

- Un-macroize debugging and locking setup for various protocol switch
  methods for TCP, as it lead to more obscurity, and as locking becomes
  more customized to the methods, offers less benefit.

- Assert copyright on tcp_usrreq.c due to significant modifications that
  have been made as part of this work.

These changes significantly modify the memory management and connection
logic of our TCP implementation, and are (as such) High Risk Changes,
and likely to contain serious bugs.  Please report problems to the
current@ mailing list ASAP, ideally with simple test cases, and
optionally, packet traces.

MFC after:	3 months
2006-04-01 16:36:36 +00:00
..
alpha Use the read_cycle_count() function recently added for cpu_ticks() for 2006-03-28 21:20:12 +00:00
amd64 Add kbdmux(4) to GENERIC on amd64 2006-03-31 23:04:48 +00:00
arm Implement pmap_object_init_pt() the way it is on sparc64/alpha, by doing 2006-03-26 22:03:43 +00:00
boot Remove the USB keyboard hack now that KBDMUX is enabled by default. Allow 2006-03-31 21:36:17 +00:00
bsm Update src/sys/bsm for OpenBSM 1.0 alpha 5 changes: 2006-03-04 16:54:21 +00:00
cam Add reference to PR to TOSHIBA TransMemory quirk entry. 2006-03-18 21:13:14 +00:00
coda
compat Annotate uses of fgetsock() with indications that they should rely 2006-04-01 15:25:01 +00:00
conf Add the MacIO attachment for scc(4). 2006-04-01 04:53:08 +00:00
contrib Loopback pf_norm.c rev. 1.106 from OpenBSD: 2006-03-25 21:15:25 +00:00
crypto
ddb Clean up the way we handle auxiliary commands for a given ddb command 2006-03-07 22:17:06 +00:00
dev Fix some of the previus changes 'better'. 2006-04-01 07:12:18 +00:00
doc
fs - Add a bogus vhold/vdrop around vgone() in devfs_revoke. Without this 2006-03-31 23:37:29 +00:00
gdb add support for copying console messages to a remote gdb 2006-03-23 23:06:14 +00:00
geom Revert previous change, as I fixed MD5(9). 2006-03-30 18:50:00 +00:00
gnu Update a DB_SET to DB_FUNC I missed yesterday. 2006-03-08 15:47:48 +00:00
i4b
i386 Add kbdmux(4) to GENERIC 2006-03-31 19:03:37 +00:00
ia64
isa
isofs/cd9660 When encountering a ISO_SUSP_CFLAG_ROOT element in Rock Ridge 2006-03-13 22:32:33 +00:00
kern Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
libkern
modules Build the scc(4) module with EBus and SBus attachments for sparc64 2006-04-01 04:54:47 +00:00
net In raw and raw-derived socket types, maintain and enforce invariant that 2006-04-01 15:55:44 +00:00
net80211 implement set(IEEE80211_IOC_STA_STATS) for hostapd; for 2006-03-27 05:22:35 +00:00
netatalk Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
netatm Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
netgraph Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
netinet Update TCP for infrastructural changes to the socket/pcb refcount model, 2006-04-01 16:36:36 +00:00
netinet6 Update in_pcb-derived basic socket types following changes to 2006-04-01 16:20:54 +00:00
netipsec Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
netipx Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
netkey In raw and raw-derived socket types, maintain and enforce invariant that 2006-04-01 15:55:44 +00:00
netnatm Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
netncp
netsmb Retire NETSMBCRYPTO as a kernel option and make its functionality 2006-03-05 22:52:17 +00:00
nfs
nfs4client
nfsclient - Busy the filesystem in nfs_statfs to prevent us from creating a new 2006-04-01 01:15:23 +00:00
nfsserver - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). 2006-03-31 03:54:20 +00:00
opencrypto Fix memory leak which occurs when crypto.ko module is unloaded. 2006-03-28 08:33:30 +00:00
pc98 Don't allow userland to set hardware watch points on kernel memory at all. 2006-03-14 16:13:55 +00:00
pccard
pci Add support for RTL8111B chip, that can be found on some mainboards, 2006-03-22 07:33:03 +00:00
posix4
powerpc Add a dummy implementation of bus_space_map(). 2006-03-31 01:39:50 +00:00
rpc
security Don't call vn_finished_write() if vn_start_write() failed. 2006-03-19 20:43:07 +00:00
sparc64 Add scc(4). 2006-03-30 18:40:25 +00:00
sys Chance protocol switch method pru_detach() so that it returns void 2006-04-01 15:42:02 +00:00
tools
ufs - Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). 2006-03-31 03:54:20 +00:00
vm MFP4: Support for profiling dynamically loaded objects. 2006-03-26 12:20:54 +00:00
Makefile Reimplementation of world/kernel build options. For details, see: 2006-03-17 18:54:44 +00:00