freebsd-skq

Author	SHA1	Message	Date
Paul Saab	7d5ed1ceea	Fixes a bug in SACK causing us to send data beyond the receive window. Found by: Pawel Worach and Daniel Hartmeier Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com	2004-11-29 18:47:27 +00:00
Robert Watson	2be3bf2244	Assert the inpcb lock in tcp_xmit_timer() as it performs read-modify- write of various time/rtt-related fields in the tcpcb.	2004-11-28 11:06:22 +00:00
Robert Watson	18ad5842c5	Expand coverage of the receive socket buffer lock when handling urgent pointer updates: test available space while holding the socket buffer mutex, and continue to hold until until the pointer update has been performed. MFC after: 2 weeks	2004-11-28 11:01:31 +00:00
Robert Watson	c8443a1dc0	Do export the advertised receive window via the tcpi_rcv_space field of struct tcp_info.	2004-11-27 20:20:11 +00:00
Robert Watson	b8af5dfa81	Implement parts of the TCP_INFO socket option as found in Linux 2.6. This socket option allows processes query a TCP socket for some low level transmission details, such as the current send, bandwidth, and congestion windows. Linux provides a 'struct tcpinfo' structure containing various variables, rather than separate socket options; this makes the API somewhat fragile as it makes it dificult to add new entries of interest as requirements and implementation evolve. As such, I've included a large pad at the end of the structure. Right now, relatively few of the Linux API fields are filled in, and some contain no logical equivilent on FreeBSD. I've include __'d entries in the structure to make it easier to figure ou what is and isn't omitted. This API/ABI should be considered unstable for the time being.	2004-11-26 18:58:46 +00:00
Mike Silbersack	6a220ed80a	Fix a problem where our TCP stack would ignore RST packets if the receive window was 0 bytes in size. This may have been the cause of unsolved "connection not closing" reports over the years. Thanks to Michiel Boland for providing the fix and providing a concise test program for the problem. Submitted by: Michiel Boland MFC after: 2 weeks	2004-11-25 19:04:20 +00:00
Robert Watson	de30ea131f	In tcp_reass(), assert the inpcb lock on the passed tcpcb, since the contents of the tcpcb are read and modified in volume. In tcp_input(), replace th comparison with 0 with a comparison with NULL. At the 'findpcb', 'dropafterack', and 'dropwithreset' labels in tcp_input(), assert 'headlocked'. Try to improve consistency between various assertions regarding headlocked to be more informative. MFC after: 2 weeks	2004-11-23 23:41:20 +00:00
Robert Watson	cce83ffb5a	tcp_timewait() performs multiple non-atomic reads on the tcptw structure, so assert the inpcb lock associated with the tcptw. Also assert the tcbinfo lock, as tcp_timewait() may call tcp_twclose() or tcp_2msl_rest(), which require it. Since tcp_timewait() is already called with that lock from tcp_input(), this doesn't change current locking, merely documents reasons for it. In tcp_twstart(), assert the tcbinfo lock, as tcp_timer_2msl_rest() is called, which requires that lock. In tcp_twclose(), assert the tcbinfo lock, as tcp_timer_2msl_stop() is called, which requires that lock. Document the locking strategy for the time wait queues in tcp_timer.c, which consists of protecting the time wait queues in the same manner as the tcbinfo structure (using the tcbinfo lock). In tcp_timer_2msl_reset(), assert the tcbinfo lock, as the time wait queues are modified. In tcp_timer_2msl_stop(), assert the tcbinfo lock, as the time wait queues may be modified. In tcp_timer_2msl_tw(), assert the tcbinfo lock, as the time wait queues may be modified. MFC after: 2 weeks	2004-11-23 17:21:30 +00:00
Robert Watson	b42ff86e73	De-spl tcp_slowtimo; tcp_maxidle assignment is subject to possible but unlikely races that could be corrected by having tcp_keepcnt and tcp_keepintvl modifications go through handler functions via sysctl, but probably is not worth doing. Updates to multiple sysctls within evaluation of a single addition are unlikely. Annotate that tcp_canceltimers() is currently unused. De-spl tcp_timer_delack(). De-spl tcp_timer_2msl(). MFC after: 2 weeks	2004-11-23 16:45:07 +00:00
Robert Watson	7258e91f0f	Assert the inpcb lock in tcp_twstart(), which does both read-modify-write on the tcpcb, but also calls into tcp_close() and tcp_twrespond(). Annotate that tcp_twrecycleable() requires the inpcb lock because it does a series of non-atomic reads of the tcpcb, but is currently called without the inpcb lock by the caller. This is a bug. Assert the inpcb lock in tcp_twclose() as it performs a read-modify-write of the timewait structure/inpcb, and calls in_pcbdetach() which requires the lock. Assert the inpcb lock in tcp_twrespond(), as it performs multiple non-atomic reads of the tcptw and inpcb structures, as well as calling mac_create_mbuf_from_inpcb(), tcpip_fillheaders(), which require the inpcb lock. MFC after: 2 weeks	2004-11-23 16:23:13 +00:00
Robert Watson	8263bab34d	Assert inpcb lock in tcp_quench(), tcp_drop_syn_sent(), tcp_mtudisc(), and tcp_drop(), due to read-modify-write of TCP state variables. MFC after: 2 weeks	2004-11-23 16:06:15 +00:00
Robert Watson	8438db0f59	Assert the tcbinfo write lock in tcp_new_isn(), as the tcbinfo lock protects access to the ISN state variables. Acquire the tcbinfo write lock in tcp_isn_tick() to synchronize timer-driven isn bumping. Staticize internal ISN variables since they're not used outside of tcp_subr.c. MFC after: 2 weeks	2004-11-23 15:59:43 +00:00
Robert Watson	ca127a3e80	Remove "Unlocked read" annotations associated with previously unlocked use of socket buffer fields in the TCP input code. These references are now protected by use of the receive socket buffer lock. MFC after: 1 week	2004-11-22 13:16:27 +00:00
Robert Watson	98734750b4	s/send/sent/ in comment describing TCPS_SYN_RECEIVED.	2004-11-21 14:38:04 +00:00
Gleb Smirnoff	c1384b5ae2	- Since divert protocol is not connection oriented, remove SS_ISCONNECTED flag from divert sockets. - Remove div_disconnect() method, since it shouldn't be called now. - Remove div_abort() method. It was never called directly, since protocol doesn't have listen queue. It was called only from div_disconnect(), which is removed now. Reviewed by: rwatson, maxim Approved by: julian (mentor) MT5 after: 1 week MT4 after: 1 month	2004-11-18 13:49:18 +00:00
Max Laier	9a6a6eeba2	Fix host route addition for more than one address to a loopback interface after allowing more than one address with the same prefix. Reported by: Vladimir Grebenschikov <vova NO fbsd SPAM ru> Submitted by: ru (also NetBSD rev. 1.83) Pointyhat to: mlaier	2004-11-17 23:14:03 +00:00
Max Laier	81d96ce8a4	Merge copyright notices. Requested by: njl	2004-11-13 17:05:40 +00:00
Gleb Smirnoff	ea0bd57615	Fix ng_ksocket(4) operation as a divert socket, which is pretty useful and has been broken twice: - in the beginning of div_output() replace KASSERT with assignment, as it was in rev. 1.83. [1] [to be MFCed] - refactor changes introduced in rev. 1.100: do not prepend a new tag unconditionally. Before doing this check whether we have one. [2] A small note for all hacking in this area: when divert socket is not a real userland, but ng_ksocket(4), we receive _the same_ mbufs, that we transmitted to socket. These mbufs have rcvif, the tags we've put on them. And we should treat them correctly. Discussed with: mlaier [1] Silence from: green [2] Reviewed by: maxim Approved by: julian (mentor) MFC after: 1 week	2004-11-12 22:17:42 +00:00
Max Laier	48321abefe	Change the way we automatically add prefix routes when adding a new address. This makes it possible to have more than one address with the same prefix. The first address added is used for the route. On deletion of an address with IFA_ROUTE set, we try to find a "fallback" address and hand over the route if possible. I plan to MFC this in 4 weeks, hence I keep the - now obsolete - argument to in_ifscrub as it must be considered KAPI as it is not static in in.c. I will clean this after the MFC. Discussed on: arch, net Tested by: many testers of the CARP patches Nits from: ru, Andrea Campi <andrea+freebsd_arch webcom it> Obtained from: WIDE via OpenBSD MFC after: 1 month	2004-11-12 20:53:51 +00:00
Poul-Henning Kamp	e21e4c19c9	Add missing '=' Spotted by: obrien	2004-11-11 19:02:01 +00:00
Andre Oppermann	5e7b233055	Fix a double-free in the 'hlen > m->m_len' sanity check. Bug report by: <james@towardex.com> MFC after: 2 weeks	2004-11-09 09:40:32 +00:00
SUZUKI Shinsuke	3d54848fc2	support TCP-MD5(IPv4) in KAME-IPSEC, too. MFC after: 3 week	2004-11-08 18:49:51 +00:00
Poul-Henning Kamp	756d52a195	Initialize struct pr_userreqs in new/sparse style and fill in common default elements in net_init_domain(). This makes it possible to grep these structures and see any bogosities.	2004-11-08 14:44:54 +00:00
Robert Watson	d6915262af	Do some re-sorting of TCP pcbinfo locking and assertions: make sure to retain the pcbinfo lock until we're done using a pcb in the in-bound path, as the pcbinfo lock acts as a pseuo-reference to prevent the pcb from potentially being recycled. Clean up assertions and make sure to assert that the pcbinfo is locked at the head of code subsections where it is needed. Free the mbuf at the end of tcp_input after releasing any held locks to reduce the time the locks are held. MFC after: 3 weeks	2004-11-07 19:19:35 +00:00
Andre Oppermann	e9a4cd2426	Fix a double-free in the 'm->m_len < sizeof (struct ip)' sanity check. Bug report by: <james@towardex.com> MFC after: 2 weeks	2004-11-06 10:47:36 +00:00
Poul-Henning Kamp	c83c1318f5	Hide udp_in6 behind #ifdef INET6	2004-11-04 07:14:03 +00:00
Bruce M Simpson	38f061057b	When performing IP fast forwarding, immediately drop traffic which is destined for a blackhole route. This also means that blackhole routes do not need to be bound to lo(4) or disc(4) interfaces for the net.inet.ip.fastforwarding=1 case. Submitted by: james at towardex dot com Sponsored by: eXtensible Open Router Project <URL:http://www.xorp.org/> MFC after: 3 weeks	2004-11-04 02:14:38 +00:00
Robert Watson	d4b509bd7f	Until this change, the UDP input code used global variables udp_in, udp_in6, and udp_ip6 to pass socket address state between udp_input(), udp_append(), and soappendaddr_locked(). While file in the default configuration, when running with multiple netisrs or direct ithread dispatch, this can result in races wherein user processes using recvmsg() get back the wrong source IP/port. To correct this and related races: - Eliminate udp_ip6, which is believed to be generated but then never used. Eliminate ip_2_ip6_hdr() as it is now unneeded. - Eliminate setting, testing, and existence of 'init' status fields for the IPv6 structures. While with multiple UDP delivery this could lead to amortization of IPv4 -> IPv6 conversion when delivering an IPv4 UDP packet to an IPv6 socket, it added substantial complexity and side effects. - Move global structures into the stack, declaring udp_in in udp_input(), and udp_in6 in udp_append() to be used if a conversion is required. Pass &udp_in into udp_append(). - Re-annotate comments to reflect updates. With this change, UDP appears to operate correctly in the presence of substantial inbound processing parallelism. This solution avoids introducing additional synchronization, but does increase the potential stack depth. Discovered by: kris (Bug Magnet) MFC after: 3 weeks	2004-11-04 01:25:23 +00:00
Andre Oppermann	c94c54e4df	Remove RFC1644 T/TCP support from the TCP side of the network stack. A complete rationale and discussion is given in this message and the resulting discussion: http://docs.freebsd.org/cgi/mid.cgi?4177C8AD.6060706 Note that this commit removes only the functional part of T/TCP from the tcp_* related functions in the kernel. Other features introduced with RFC1644 are left intact (socket layer changes, sendmsg(2) on connection oriented protocols) and are meant to be reused by a simpler and less intrusive reimplemention of the previous T/TCP functionality. Discussed on: -arch	2004-11-02 22:22:22 +00:00
Robert Watson	ab5c14d828	Correct a bug in TCP SACK that could result in wedging of the TCP stack under high load: only set function state to loop and continuing sending if there is no data left to send. RELENG_5_3 candidate. Feet provided: Peter Losher <Peter underscore Losher at isc dot org> Diagnosed by: Aniel Hartmeier <daniel at benzedrine dot cx> Submitted by: mohan <mohans at yahoo-inc dot com>	2004-10-30 12:02:50 +00:00
Robert Watson	c427483381	Add a matching tunable for net.inet.tcp.sack.enable sysctl.	2004-10-26 08:59:09 +00:00
Bruce M Simpson	d6fa5d2806	Check that rt_mask(rt) is non-NULL before dereferencing it, in the RTM_ADD case, thus avoiding a panic. Submitted by: Iasen Kostov	2004-10-26 03:31:58 +00:00
Andre Oppermann	84bb6a2e75	IPDIVERT is a module now and tell the other parts of the kernel about it. IPDIVERT depends on IPFIREWALL being loaded or compiled into the kernel.	2004-10-25 20:02:34 +00:00
Ruslan Ermilov	a35d88931c	For variables that are only checked with defined(), don't provide any fake value.	2004-10-24 15:33:08 +00:00
Andre Oppermann	cd109b0d82	Shave 40 unused bytes from struct tcpcb.	2004-10-22 19:55:04 +00:00
Andre Oppermann	21dcc96f4a	When printing the initialization string and IPDIVERT is not compiled into the kernel refer to it as "loadable" instead of "disabled".	2004-10-22 19:18:06 +00:00
Andre Oppermann	24fc79b0a4	Refuse to unload the ipdivert module unless the 'force' flag is given to kldunload. Reflect the fact that IPDIVERT is a loadable module in the divert(4) and ipfw(8) man pages.	2004-10-22 19:12:01 +00:00
Andre Oppermann	57bbe2e1ab	Destroy the UMA zone on unload.	2004-10-19 22:51:20 +00:00
Andre Oppermann	2de1a9eb6e	Slightly extend the locking during unload to fully cover the protocol deregistration. This does not entirely close the race but narrows the even previously extremely small chance of a race some more.	2004-10-19 22:08:13 +00:00
Robert Watson	279128e295	Annotate a newly introduced race present due to the unloading of protocols: it is possible for sockets to be created and attached to the divert protocol between the test for sockets present and successful unload of the registration handler. We will need to explore more mature APIs for unregistering the protocol and then draining consumers, or an atomic test-and-unregister mechanism.	2004-10-19 21:35:42 +00:00
Andre Oppermann	72584fd2c0	Convert IPDIVERT into a loadable module. This makes use of the dynamic loadability of protocols. The call to divert_packet() is done through a function pointer. All semantics of IPDIVERT remain intact. If IPDIVERT is not loaded ipfw will refuse to install divert rules and natd will complain about 'protocol not supported'. Once it is loaded both will work and accept rules and open the divert socket. The module can only be unloaded if no divert sockets are open. It does not close any divert sockets when an unload is requested but will return EBUSY instead.	2004-10-19 21:14:57 +00:00
Andre Oppermann	969bb53e80	Properly declare the "net.inet" sysctl subtree.	2004-10-19 21:06:14 +00:00
Andre Oppermann	539be79a9d	Pre-emptively define IPPROTO_SPACER to 32767, the same value as PROTO_SPACER to document that this value is globally assigned for a special purpose and may not be reused within the IPPROTO number space.	2004-10-19 20:59:01 +00:00
Andre Oppermann	dff3237ee5	Make use of the PROTO_SPACER functionality for dynamically loadable protocols in inetsw[] and define initially eight spacer slots. Remove conflicting declaration 'struct pr_usrreqs nousrreqs'. It is now declared and initialized in kern/uipc_domain.c.	2004-10-19 15:58:22 +00:00
Andre Oppermann	de38924dc0	Support for dynamically loadable and unloadable IP protocols in the ipmux. With pr_proto_register() it has become possible to dynamically load protocols within the PF_INET domain. However the PF_INET domain has a second important structure called ip_protox[] that is derived from the 'struct protosw inetsw[]' and takes care of the de-multiplexing of the various protocols that ride on top of IP packets. The functions ipproto_[un]register() allow to dynamically adjust the ip_protox[] array mux in a consistent and easy way. To register a protocol within ip_protox[] the existence of a corresponding and matching protocol definition in inetsw[] is required. The function does not allow to overwrite an already registered protocol. The unregister function simply replaces the mux slot with the default index pointer to IPPROTO_RAW as it was previously.	2004-10-19 15:45:57 +00:00
Andre Oppermann	1cf15713ed	Add a macro for the destruction of INP_INFO_LOCK's used by loadable modules.	2004-10-19 14:34:13 +00:00
Andre Oppermann	de1c2ac4bf	Make comments more clear. Change the order of one if() statement to check the more likely variable first.	2004-10-19 14:31:56 +00:00
Robert Watson	81158452be	Push acquisition of the accept mutex out of sofree() into the caller (sorele()/sotryfree()): - This permits the caller to acquire the accept mutex before the socket mutex, avoiding sofree() having to drop the socket mutex and re-order, which could lead to races permitting more than one thread to enter sofree() after a socket is ready to be free'd. - This also covers clearing of the so_pcb weak socket reference from the protocol to the socket, preventing races in clearing and evaluation of the reference such that sofree() might be called more than once on the same socket. This appears to close a race I was able to easily trigger by repeatedly opening and resetting TCP connections to a host, in which the tcp_close() code called as a result of the RST raced with the close() of the accepted socket in the user process resulting in simultaneous attempts to de-allocate the same socket. The new locking increases the overhead for operations that may potentially free the socket, so we will want to revise the synchronization strategy here as we normalize the reference counting model for sockets. The use of the accept mutex in freeing of sockets that are not listen sockets is primarily motivated by the potential need to remove the socket from the incomplete connection queue on its parent (listen) socket, so cleaning up the reference model here may allow us to substantially weaken the synchronization requirements. RELENG_5_3 candidate. MFC after: 3 days Reviewed by: dwhite Discussed with: gnn, dwhite, green Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de> Reported by: Vlad <marchenko at gmail dot com>	2004-10-18 22:19:43 +00:00
Robert Watson	6b8e5a9862	Don't release the udbinfo lock until after the last use of UDP inpcb in udp_input(), since the udbinfo lock is used to prevent removal of the inpcb while in use (i.e., as a form of reference count) in the in-bound path. RELENG_5 candidate.	2004-10-12 20:03:56 +00:00
Robert Watson	00fcf9d12d	Modify the thrilling "%D is using my IP address %s!" message so that it isn't printed if the IP address in question is '0.0.0.0', which is used by nodes performing DHCP lookup, and so constitute a false positive as a report of misconfiguration.	2004-10-12 17:10:40 +00:00

1 2 3 4 5 ...

2129 Commits