freebsd-skq

Author	SHA1	Message	Date
mohans	0e65e2a5a1	Certain (bad) values of sack blocks can end up corrupting the sack scoreboard. Make the checks in tcp_sack_doack() more robust to prevent this. Submitted by: Raja Mukerji (raja@mukerji.com) Reviewed by: Mohan Srinivasan	2006-04-05 00:11:04 +00:00
glebius	cea41af9a0	Add a tunable net.inet.tcp.maxtcptw, that allows to set a limit on tcptw zone independently from setting a limit on socket zone.	2006-04-04 14:31:37 +00:00
rwatson	2e3d21db7b	Before dereferencing intotw() when INP_TIMEWAIT, check for inp_ppcb being NULL. We currently do allow this to happen, but may want to remove that possibility in the future. This case can occur when a socket is left open after TCP wraps up, and the timewait state is recycled. This will be cleaned up in the future. Found by: Kazuaki Oda <kaakun at highway dot ne dot jp> MFC after: 3 months	2006-04-04 12:26:07 +00:00
rwatson	56cba4038a	In TCP notify routines, check inpcb for INP_TIMEWAIT and INP_DROPPED. The INP_DROPPED check replaces the current NULL checks; the INP_TIMEWAIT checks appear to have always been required, but not been there, which is/was a bug. This avoids unconditionally casting of in_ppcb to a tcpcb, when it may be a twtcb, which may have resulted in obscure ICMP-related panics in earlier releases. MFC after: 3 months	2006-04-03 14:07:50 +00:00
rwatson	d67aff8ec4	Change inp_ppcb from caddr_t to void , fix/remove associated related casts. Consistently use intotw() to cast inp_ppcb pointers to struct tcptw pointers. Consistently use intotcpcb() to cast inp_ppcb pointers to struct tcpcb * pointers. Don't assign tp to the results to intotcpcb() during variable declation at the top of functions, as that is before the asserts relating to locking have been performed. Do this later in the function after appropriate assertions have run to allow that operation to be conisdered safe. MFC after: 3 months	2006-04-03 13:33:55 +00:00
rwatson	4586157b3a	Style tweaks: convert to ANSI from K&R function prototypes. MFC after: 3 months	2006-04-03 12:59:27 +00:00
rwatson	cf774d5382	Update comment on tcp_close() for new world order. MFC after: 3 months	2006-04-03 12:52:13 +00:00
rwatson	206bd5674e	Clarify comment on handling of non-timewait TCP states in tcp_usr_detach(). MFC after: 3 months	2006-04-03 12:43:56 +00:00
rwatson	34473d63e2	Fix up locking surrounding tcp_drop sysctl: in the new world order, we don't free inpcbs until after the socket is closed, so we always need to unlock an inpcb after calling tcp_drop() on it. MFC after: 3 months	2006-04-03 11:57:12 +00:00
rwatson	2ff901e7be	After checking for SO_ISDISCONNECTED in tcp_usr_accept(), return immediately rather than jumping to the normal output handling, which assumes we've pulled out the inpcb, which hasn't happened at this point (and isn't necessary). Return ECONNABORTED instead of EINVAL when the inpcb has entered INP_TIMEWAIT or INP_DROPPED, as this is the documented error value. This may correct the panic seen by Ganbold. MFC after: 1 month Reported by: Ganbold <ganbold at micom dot mng dot net>	2006-04-03 09:52:55 +00:00
rwatson	c8b4c281fa	Correct incorrect assertion in div_bind(): inp must not be NULL here. Reported by: tegge MFC after: 3 months	2006-04-03 09:01:17 +00:00
rwatson	cce79b77fe	During reformulation of tcp_usr_detach(), the call to initiate TCP disconnect for fully connected sockets was dropped, meaning that if the socket was closed while the connection was alive, it would be leaked. Structure tcp_usr_detach() so that there are two clear parts: initiating disconnect, and reclaiming state, and reintroduce the tcp_disconnect() call in the first part. MFC after: 3 months	2006-04-02 16:42:51 +00:00
rwatson	ace109901c	Properly handle an edge case previously not handled correctly: a socket can have a tcp connection that has entered time wait attached to it, in the event that shutdown() is called on the socket and the FINs properly exchange before close(). In this case we don't detach or free the inpcb, just leave the tcptw detached and freed, but we must release the inpcb lock (which we didn't previously). MFC after: 3 months	2006-04-01 23:53:25 +00:00
rwatson	5078a28ae8	Update TCP for infrastructural changes to the socket/pcb refcount model, pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, the receive code no longer requires the pcbinfo lock, and the send code only requires it if building a new connection on an otherwise unconnected socket triggered via sendto() with an address. This should significnatly reduce tcbinfo lock contention in the receive and send cases. - In order to support the invariant that so_pcb != NULL, it is now necessary for the TCP code to not discard the tcpcb any time a connection is dropped, but instead leave the tcpcb until the socket is shutdown. This case is handled by setting INP_DROPPED, to substitute for using a NULL so_pcb to indicate that the connection has been dropped. This requires the inpcb lock, but not the pcbinfo lock. - Unlike all other protocols in the tree, TCP may need to retain access to the socket after the file descriptor has been closed. Set SS_PROTOREF in tcp_detach() in order to prevent the socket from being freed, and add a flag, INP_SOCKREF, so that the TCP code knows whether or not it needs to free the socket when the connection finally does close. The typical case where this occurs is if close() is called on a TCP socket before all sent data in the send socket buffer has been transmitted or acknowledged. If INP_SOCKREF is found when the connection is dropped, we release the inpcb, tcpcb, and socket instead of flagging INP_DROPPED. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Annotate the existence of a long-standing race in the TCP timer code, in which timers are stopped but not drained when the socket is freed, as waiting for drain may lead to deadlocks, or have to occur in a context where waiting is not permitted. This race has been handled by testing to see if the tcpcb pointer in the inpcb is NULL (and vice versa), which is not normally permitted, but may be true of a inpcb and tcpcb have been freed. Add a counter to test how often this race has actually occurred, and a large comment for each instance where we compare potentially freed memory with NULL. This will have to be fixed in the near future, but requires is to further address how to handle the timer shutdown shutdown issue. - Several TCP calls no longer potentially free the passed inpcb/tcpcb, so no longer need to return a pointer to indicate whether the argument passed in is still valid. - Un-macroize debugging and locking setup for various protocol switch methods for TCP, as it lead to more obscurity, and as locking becomes more customized to the methods, offers less benefit. - Assert copyright on tcp_usrreq.c due to significant modifications that have been made as part of this work. These changes significantly modify the memory management and connection logic of our TCP implementation, and are (as such) High Risk Changes, and likely to contain serious bugs. Please report problems to the current@ mailing list ASAP, ideally with simple test cases, and optionally, packet traces. MFC after: 3 months	2006-04-01 16:36:36 +00:00
rwatson	a7c2bca553	Update in_pcb-derived basic socket types following changes to pru_abort(), pru_detach(), and in_pcbdetach(): - Universally support and enforce the invariant that so_pcb is never NULL, converting dozens of unnecessary NULL checks into assertions, and eliminating dozens of unnecessary error handling cases in protocol code. - In some cases, eliminate unnecessary pcbinfo locking, as it is no longer required to ensure so_pcb != NULL. For example, in protocol shutdown methods, and in raw IP send. - Abort and detach protocol switch methods no longer return failures, nor attempt to free sockets, as the socket layer does this. - Invoke in_pcbfree() after in_pcbdetach() in order to free the detached in_pcb structure for a socket. MFC after: 3 months	2006-04-01 16:20:54 +00:00
rwatson	71cc03392b	Break out in_pcbdetach() into two functions: - in_pcbdetach(), which removes the link between an inpcb and its socket. - in_pcbfree(), which frees a detached pcb. Unlike the previous in_pcbdetach(), neither of these functions will attempt to conditionally free the socket, as they are responsible only for managing in_pcb memory. Mirror these changes into in6_pcbdetach() by breaking it into in6_pcbdetach() and in6_pcbfree(). While here, eliminate undesired checks for NULL inpcb pointers in sockets, as we will now have as an invariant that sockets will always have valid so_pcb pointers. MFC after: 3 months	2006-04-01 16:04:42 +00:00
rwatson	5479e5d692	Chance protocol switch method pru_detach() so that it returns void rather than an error. Detaches do not "fail", they other occur or the protocol flags SS_PROTOREF to take ownership of the socket. soclose() no longer looks at so_pcb to see if it's NULL, relying entirely on the protocol to decide whether it's time to free the socket or not using SS_PROTOREF. so_pcb is now entirely owned and managed by the protocol code. Likewise, no longer test so_pcb in other socket functions, such as soreceive(), which have no business digging into protocol internals. Protocol detach routines no longer try to free the socket on detach, this is performed in the socket code if the protocol permits it. In rts_detach(), no longer test for rp != NULL in detach, and likewise in other protocols that don't permit a NULL so_pcb, reduce the incidence of testing for it during detach. netinet and netinet6 are not fully updated to this change, which will be in an upcoming commit. In their current state they may leak memory or panic. MFC after: 3 months	2006-04-01 15:42:02 +00:00
rwatson	8622e776f9	Change protocol switch pru_abort() API so that it returns void rather than an int, as an error here is not meaningful. Modify soabort() to unconditionally free the socket on the return of pru_abort(), and modify most protocols to no longer conditionally free the socket, since the caller will do this. This commit likely leaves parts of netinet and netinet6 in a situation where they may panic or leak memory, as they have not are not fully updated by this commit. This will be corrected shortly in followup commits to these components. MFC after: 3 months	2006-04-01 15:15:05 +00:00
rwatson	a3688cc84e	Define two new inpcb flags in the inp_vflag field, which for whatever reason, seems to be where new flags are getting defined: INP_DROPPED - The protocol has terminated this connection and the socket is not reusable: when the socket code enters the protocol, an error is immediately returned. This will substitute for NULLing the so_pcb socket field, helping to implement the invariant that all valid sockets have valid pcb's in TCP. INP_SOCKREF - The protocol has become the owner of the socket reference, and will need to free it when freeing the pcb, which will be used when a TCP socket is closed but still has queued data. MFC after: 1 month	2006-03-26 11:30:31 +00:00
rwatson	864627f033	Minor style tweak: tab after #define, not space. MFC after: 1 month	2006-03-26 11:26:12 +00:00
rwatson	46492ab660	Explicitly assert socket pointer is non-NULL in tcp_input() so as to provide better debugging information. Prefer explicit comparison to NULL for tcpcb pointers rather than treating them as booleans. MFC after: 1 month	2006-03-26 01:33:41 +00:00
glebius	aca7253de4	o Introduce carp_multicast_cleanup(), which removes and frees multicast addresses from carp interface. [1] o Rewrite carpdetach(), so that it does the following things: [1] - Stops callouts. - Decrements carp_suppress_preempt, if needed. - Downs interface and sets CARP state to INIT. - Calls carp_multicast_cleanup(). - Detaches softc from carp_if and if we are the last frees the carp_if. o Use new carpdetach() in carp_clone_destroy(). o In carp_ifdetach() acquire the carp_if lock and cleanup all interfaces hanging on carp_if. [1] o Make carp_ifdetach() static and use EVENT(9) to call it from if_detach(). [2] o In carp_setrun() exit if the softc doesn't have a valid pointer to parent. [1] Obtained from: OpenBSD [1] Submitted by: Dan Lukes <dan obluda.cz> [2] PR: kern/82908 [2]	2006-03-21 14:29:48 +00:00
keramida	5b2b6f7af7	Add descriptions for the sysctls: net.inet.icmp.drop_redirect net.inet.icmp.log_redirect net.inet.icmp.icmplim net.inet.icmp.icmplim_output Approved & text by: andre	2006-03-20 21:44:12 +00:00
dwmalone	2dd230f5c3	Make net.inet.ip.portrange.reservedhigh and net.inet.ip.portrange.reservedlow apply to IPv6 aswell as IPv4. We could have made new sysctls for IPv6, but that potentially makes things complicated for mapped addresses. This seems like the least confusing option and least likely to cause obscure problems in the future. This change makes the mac_portacl module useful with IPv6 apps. Reviewed by: ume MFC after: 1 month	2006-03-19 11:48:48 +00:00
rwatson	053507bd40	Change soabort() from returning int to returning void, since all consumers ignore the return value, soabort() is required to succeed, and protocols produce errors here to report multiple freeing of the pcb, which we hope to eliminate.	2006-03-16 07:03:14 +00:00
thompsa	4c1aecad94	Further refine the bridge hack in the arp code. Only do the special arp handling for interfaces which are actually in the bridge group, ignore all others. MFC after: 3 days	2006-03-07 21:40:44 +00:00
glebius	3c6ea150e2	- Do not leak read lock in IP_FW_TABLE_GETSIZE case of ipfw_ctl(). - Acquire read (not write) lock in case of IP_FW_TABLE_LIST. In collaboration with: ru	2006-03-03 12:10:59 +00:00
andre	8bb537fa79	Rework TCP window scaling (RFC1323) to properly scale the send window right from the beginning and partly clean up the differences in handling between SYN_SENT and SYN_RCVD (syncache). Further changes to this code to come. This is a first incremental step to a general overhaul and streamlining of the TCP code. PR: kern/15095 PR: kern/92690 (partly) Reviewed by: qingli (and tested with ANVL) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-02-28 23:05:59 +00:00
qingli	2460d00021	This patch fixes the problem where the current TCP code can not handle simultaneous open. Both the bug and the patch were verified using the ANVL test suite. PR: kern/74935 Submitted by: qingli (before I became committer) Reviewed by: andre MFC after: 5 days	2006-02-23 21:14:34 +00:00
ume	b365fc827b	Obey opt_inet6.h in kernel build directory. Reported by: Peter Losher <plosher-keyword-freebsd.a36e57__at__plosh.net> MFC after: 3 days	2006-02-20 12:30:32 +00:00
andre	a8296c7972	Remove unneeded includes and provide more accurate description to others. Submitted by: garys PR: kern/86437	2006-02-18 17:05:00 +00:00
andre	9e25820325	Add missing TH_PUSH to the TH_FLAGS enumeration. Submitted by: Andre Albsmeier <Andre.Albsmeier-at-siemens.com> PR: kern/85203	2006-02-18 16:50:08 +00:00
andre	e83c574f87	Have TCP Inflight disable itself if the RTT is below a certain threshold. Inflight doesn't make sense on a LAN as it has trouble figuring out the maximal bandwidth because of the coarse tick granularity. The sysctl net.inet.tcp.inflight.rttthresh specifies the threshold in milliseconds below which inflight will disengage. It defaults to 10ms. Tested by: Joao Barros <joao.barros-at-gmail.com>, Rich Murphey <rich-at-whiteoaklabs.com> Sponsored by: TCP/IP Optimization Fundraise 2005	2006-02-16 19:38:07 +00:00
andre	5c75fa42eb	In in_pcbconnect_setup() reduce code duplication and use ip_rtaddr() to find the outgoing interface for this connection. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 2 weeks	2006-02-16 15:45:28 +00:00
andre	201a4f9400	Make sysctl_msec_to_ticks(SYSCTL_HANDLER_ARGS) generally available instead of being private to tcp_timer.c. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-02-16 15:40:36 +00:00
ru	a77fb4360b	When sending a packet from dummynet, indicate that we're forwarding it so that ip_id etc. don't get overwritten. This fixes forwarding of fragmented IP packets through a dummynet pipe -- fragments came out with modified and different(!) ip_id's, making it impossible to reassemble a datagram at the receiver side. Submitted by: Alexander Karptsov (reworked by me) MFC after: 3 days	2006-02-14 06:36:39 +00:00
qingli	83f9969904	Set the M_ZERO flag when calling uma_zalloc() to allocate a syncache entry. Reviewed by: andre, glebius MFC after: 3 days	2006-02-09 21:29:02 +00:00
qingli	5df75b3ab3	Redo the previous fix by setting the UMA_ZONE_ZINIT bit in the syncache zone, eliminating the need to call bzero() after each syncache entry allocation. Suggested by: glebius Reviewed by: andre MFC after: 3 days	2006-02-08 23:32:57 +00:00
qingli	403a98127d	Fixes a crash due to the memory of the newly allocated syncache entry in syncache_lookup() is not cleared and may lead to an arbitrary and bogus rtentry pointer which later gets free'd. Reviewed by: andre MFC after: 3 days	2006-02-07 19:59:46 +00:00
oleg	ce4f4426d2	Fix five years old bug in ip_reass(): if we are using 'full' (i.e. including pseudo header) hardware rx checksum offloading ip_reass() fails to calculate TCP/UDP checksum for reassembled packet correctly. This also should fix recent 'NFS over UDP over bge' issue exposed by if_bge.c rev. 1.123 Reviewed by: sam (earlier version), bde Approved by: glebius (mentor) MFC after: 2 weeks	2006-02-07 11:48:10 +00:00
ume	4185f9e81f	Never select the PCB that has INP_IPV6 flag and is bound to :: if we have another PCB which is bound to 0.0.0.0. If a PCB has the INP_IPV6 flag, then we set its cost higher than IPv4 only PCBs. Submitted by: Keiichi SHIMA <keiichi__at__iijlab.net> Obtained from: KAME MFC after: 1 week	2006-02-04 07:59:17 +00:00
glebius	2a85d3311b	Dropping the lock in the transmit_event() is not safe, because we store some pipe pointers on stack. If user reconfigures dummynet in the interlock gap, we can work with freed pipes after relock. To fix this, we decided not to send packets in transmit_event(), but fill a queue. At the end of dummynet() and dummynet_io(), after the lock is dropped, if there is something in the queue we run dummynet_send() to process the queue. In collaboration with: ru	2006-02-03 11:38:19 +00:00
glebius	34c591bb1c	Axe unused function.	2006-02-03 10:42:28 +00:00
csjp	c8f0963c9e	Use PFIL_HOOKED macros in if_bridge and pass the right argument to rw_assert. This un-breaks the build. Submitted by: Kostik Belousov Pointy hat to: csjp	2006-02-02 16:41:20 +00:00
csjp	31292a14b6	Somewhat re-factor the read/write locking mechanism associated with the packet filtering mechanisms to use the new rwlock(9) locking API: - Drop the variables stored in the phil_head structure which were specific to conditions and the home rolled read/write locking mechanism. - Drop some includes which were used for condition variables - Drop the inline functions, and convert them to macros. Also, move these macros into pfil.h - Move pfil list locking macros intp phil.h as well - Rename ph_busy_count to ph_nhooks. This variable will represent the number of IN/OUT hooks registered with the pfil head structure - Define PFIL_HOOKED macro which evaluates to true if there are any hooks to be ran by pfil_run_hooks - In the IP/IP6 stacks, change the ph_busy_count comparison to use the new PFIL_HOOKED macro. - Drop optimization in pfil_run_hooks which checks to see if there are any hooks to be ran, and returns if not. This check is already performed by the IP stacks when they call: if (!PFIL_HOOKED(ph)) goto skip_hooks; - Drop in assertion which makes sure that the number of hooks never drops below 0 for good measure. This in theory should never happen, and if it does than there are problems somewhere - Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep - Drop variables which support home rolled read/write locking mechanism from the IPFW firewall chain structure. - Swap out the read/write firewall chain lock internal to use the rwlock(9) API instead of our home rolled version - Convert the inlined functions to macros Reviewed by: mlaier, andre, glebius Thanks to: jhb for the new locking API	2006-02-02 03:13:16 +00:00
andre	2013a67745	Move the IPSEC related code blocks to their own file to unclutter and signifincantly improve the readability of ip_input() and ip_output() again. The resulting IPSEC hooks in ip_input() and ip_output() may be used later on for making IPSEC loadable. This move is mostly mechanical and should preserve current IPSEC behaviour as-is. Nothing shall prevent improvements in the way IPSEC interacts with the IPv4 stack. Discussed with: bz, gnn, rwatson; (earlier version)	2006-02-01 13:55:03 +00:00
ru	bb523fe1d2	Brain-o (use standard int types now).	2006-02-01 06:15:37 +00:00
ru	3dd767ffd0	Fix multicast routing on 64-bit platforms. Tested on: amd64 MFC after: 3 days	2006-01-31 22:39:35 +00:00
thompsa	f4270dbad6	Now that the bridge also processes Ethernet frames as itself, two arp replies will be sent if there is an address on the bridge. Exclude the bridge from the special arp handling. This has been tested with all combinations of addresses on the bridge and members. Pointed out by: Michal Mertl	2006-01-31 21:29:41 +00:00
glebius	aecf4a6244	Add some initial locking to gif(4). It doesn't covers the whole driver, however IPv4-in-IPv4 tunnels are now stable on SMP. Details: - Add per-softc mutex. - Hold the mutex on output. The main problem was the rtentry, placed in softc. It could be freed by ip_output(). Meanwhile, another thread being in in_gif_output() can read and write this rtentry. Reported by: many Tested by: Alexander Shiryaev <aixp mail.ru>	2006-01-30 08:39:09 +00:00

1 2 3 4 5 ...

2492 Commits