freebsd-skq

Author	SHA1	Message	Date
rwatson	05369787fc	Coalesce two identical UCB licenses into a single license instance with one set of copyright years. White space and comment cleanup. Export $FreeBSD$ via __FBSDID.	2007-05-11 11:21:43 +00:00
rwatson	185e8c8867	Minor white space and style cleanups.	2007-05-11 11:05:30 +00:00
rwatson	cf57d2e44c	White space and style cleanup.	2007-05-11 11:00:48 +00:00
rwatson	777066ed3b	Minor white space/style normalization.	2007-05-11 10:50:31 +00:00
rwatson	073fe95b38	Normalize style a bit: reduce pseudo-randomness of comment layout and white space. Remove 'register'.	2007-05-11 10:48:30 +00:00
rwatson	47d37a80be	Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.	2007-05-11 10:20:51 +00:00
rwatson	46a4c44c3b	Remove unneeded wrappers for in_setsockaddr() and in_setpeeraddr(), which used to exist so pcbinfo locks could be acquired, but are no longer required as a result of socket/pcb reference model refinements.	2007-05-11 09:54:53 +00:00
andre	f6d9987afe	Fix an incorrect replace of a timer reference made during the TCP timer rewrite in rev. 1.132. This unmasked yet another bug that causes certain connections to get indefinately stuck in LAST_ACK state.	2007-05-10 23:11:29 +00:00
rwatson	a25f94b5ae	Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.	2007-05-10 15:58:48 +00:00
rrs	8531fb6bb2	Two major items here: - All printf that was surrounded by #ifdef SCTP_DEBUG moves to a macro that does all of this. This removes all printfs from the code and makes the code more portable and easier to read. - Static Analysis (cisco) - found a few bugs, but mostly we add checks for NULL pointers and such to make the tool happy. We now pass the Cisco SA tools checks except for where it does not understand tailq/lists. We still need to look at the coverity tools output too (this is like the cisco SA tool) and see if it wants us to fix any other items. Hopefully this will be the last major churn in the code other than bug fixes.	2007-05-09 13:30:06 +00:00
maxim	716f8a4756	o Fix style(9) bugs introduced in the last commit. Pointed out by: bde	2007-05-09 11:39:46 +00:00
maxim	cc6f7c0d99	o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build. PR: kern/112517 Submitted by: vd	2007-05-09 06:09:40 +00:00
rrs	ffa53534cf	- Copyright change, cisco's silly tool wants it to say: "Copyright (c) 2001-2007, by Cisco Systems," instead of *Copyright (c) 2001-2007, Cisco Systems," - Also fix a few straglers that were still in 2006.	2007-05-08 17:01:12 +00:00
rrs	8b8ad155c5	- Get rid of the sctp_inpcb_free() "magic numbers", now they are sensible defines that tell what you are directing the function to do.	2007-05-08 15:53:03 +00:00
rrs	4945992787	- Static analyisis fixes for cisco's commit (this is equivilant to the coverity tool.. may even be the same one.. not sure). - A bug in the way sctp_abort() and friends were setting the IP_CLOSE flag.. and NOT passing the last argument as a (,1)... so that things would get freed..	2007-05-08 14:32:53 +00:00
rrs	532412f6c4	- More macros for OS compatabilty - PR-SCTP would ignore FWD-TSN's above a rwnd's worth of TSN's (1 byte msgs).. this left the peer hopelessly out of sync.. or an attacker. So now we abort the assoc. - New IFN hash, also rename hashes to match addr/ifn now that the vrf has multiple. - Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default as defined in the Socket API ID. - Export MTU information via sysctl. - Vrf's need table id's. This is default for BSD, but may be other things later when BSD fully supports VRFs. - Additional stream reset bug (caught by cisco dev-test). - Additional validations for the address in sending a message (socket api). -------- and ----- - Fix association notifications not to give the active open side false notifications. - Fix so sendfile and SENDALL will work properly (missing flag to say socket sender is done). - Fix Bug that prevented COOKIES from being retransmitted. - Break out connectx into helper sub-models so that iox routines can reuse the helpers. - When an address is added during system init (non-dynamic mode) make sure that the "defer use" flag is not set. its compiling on XR now :-D Reviewed by: gnn	2007-05-08 00:21:05 +00:00
rwatson	b5347099c7	Rather than selectively zeroing fields in the tcp_debug structure throughout tcp_trace(), zero the entire structure up front. Minor style fixes.	2007-05-07 14:05:23 +00:00
rwatson	d225f06201	Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr() and in_setsockaddr(), containing only stale comments on why they exist, remove them and initialize the protosw for UDP to directly reference in_setpeeraddr() and in_setsockaddr().	2007-05-07 13:51:24 +00:00
rwatson	f91ad81f4a	Minor style tweaks.	2007-05-07 13:47:39 +00:00
rwatson	bcbab327dc	When setting up timewait state for a TCP connection, don't hold the socket lock over a crhold() of so_cred: so_cred is constant after socket creation, so doesn't require locking to read.	2007-05-07 13:04:25 +00:00
andre	eac02dc237	Remove unused requested_s_scale from struct tcpcb.	2007-05-06 16:04:36 +00:00
andre	3d2c0e7f91	Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of a decdicated sack_enable int for this bool. Change all users accordingly.	2007-05-06 15:56:31 +00:00
andre	197f220d34	o Remove redundant tcp reassembly check in header prediction code o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()	2007-05-06 15:41:06 +00:00
andre	daeac43899	Reorder the TCP header prediction test to check for the most volatile values first to spend less time on a fallback to normal processing.	2007-05-06 15:23:51 +00:00
andre	568b21767b	Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.	2007-05-06 15:16:05 +00:00
andre	540d02c0b1	Fix two comments.	2007-05-06 13:38:25 +00:00
rrs	f64462a8f2	Two bugs: - Locks were not being unlocked when an invalid size chunk is sent in. - When a notification comes in, we cannot use it to look up the fragment interleave stream information since its not on a stream.	2007-05-06 00:01:17 +00:00
rwatson	25b2d1a7d4	Add global mutex tcp_debug_mtx, which will protect global TCP debugging state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace(). Move to ANSI C function header, correct prototype types so that short TCP state is no longer promoted to int unnecessarily. Add comments. MFC after: 3 weeks	2007-05-04 23:43:18 +00:00
rwatson	a717b2a83a	Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the pcbinfo lock will be released as well, not just the pcb lock.	2007-05-04 17:45:52 +00:00
rrs	16c8515221	Fixes a missing unlock in the one-2-one hash table, if it was full and a collision occured, then we would leave a inp locked. Also fixes a missing inp unlock if IPSEC was on and it failed during the attach. Bug found by Weongyo Jeong.	2007-05-04 15:19:10 +00:00
bz	ab603b3a9c	Add support for filtering on Routing Header Type 0 and Mobile IPv6 Routing Header Type 2 in addition to filter on the non-differentiated presence of any Routing Header. MFC after: 3 weeks	2007-05-04 11:15:41 +00:00
rwatson	20848234d9	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
rrs	803b9be8be	- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave (0-2). - Codenomicon security test updates - length checks and such. - Bug in stream reset (2 actually). - setpeerprimary could unlock a null pointer, fixed. - Added a flag in the pcb so netstat can see if we are listening easier. Obtained from: (some of the Listen changes from Weongyo Jeong)	2007-05-02 12:50:13 +00:00
rwatson	a9656f2df2	Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().	2007-05-01 16:31:02 +00:00
rwatson	c27ef03414	Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.	2007-04-30 23:12:05 +00:00
maxim	9428c924dd	o Kill EOLWS while I'm here.	2007-04-30 20:26:11 +00:00
maxim	3f7c014f10	o Fix strtoul() error conditions check. PR: kern/108211 Submitted by: Yong Tang MFC after: 2 weeks	2007-04-30 20:22:11 +00:00
andre	27e460915a	o Fix INP lock leak in the minttl case o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()	2007-04-23 19:41:47 +00:00
rrs	e06dca92fe	Fixes cut and paste bug using wrong pointer reference.	2007-04-23 00:51:49 +00:00
rrs	8374d51a34	Moves the PCB features and flags from sctp_pcb.h to sctp.h so that netstat can access and display these values.	2007-04-22 12:12:38 +00:00
rrs	44fd758bd5	- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave.	2007-04-22 11:06:27 +00:00
andre	0af88c9154	o Remove unncessary TOF_SIGLEN flag from struct tcpopt o Correctly set to->to_signature in tcp_dooptions() o Update comments	2007-04-20 15:28:01 +00:00
andre	a08a16ab40	Add more KASSERT's.	2007-04-20 15:21:29 +00:00
andre	853a532b7f	o Remove unused and redundant TCP option definitions o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN	2007-04-20 15:08:09 +00:00
andre	edec8cf92b	Remove bogus check for accept queue length and associated failure handling from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.	2007-04-20 14:34:54 +00:00
andre	707a700619	Simplifly syncache_expand() and clarify its semantics. Zero is returned when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)	2007-04-20 13:51:34 +00:00
andre	e7ec21637e	Only update TCP timestamp on SYN duplication if it is present on current SYN in syncache_add(). Otherwise disable timestamps.	2007-04-20 13:36:48 +00:00
andre	fa8779e96a	o Plug memory leak in syncache_add() on MAC label allocation failure. o Simplify code flow with 'done' goto label. o Remove mbuf argument from syncache_respond(). It doesn't make use of it.	2007-04-20 13:30:08 +00:00
rrs	e8a77bd927	- More work on making send lock contention. - Removed free-oqueue cache. - Fix counter for sq entries - Increased the amount of information retained on ASOC_TSN logging on the association. - Made it so with the ASOC_TSN logging on sending or recieving an abort we dump the log. - Went through and added invariant's around some panic's that needed them. - decrements went to atomic_subtact_int instead of add -1 - Removed residual count increment that threw off a strm oq count. - Tracks and complaints if we don't have a LAST fragment and clean up the sp structure. - Track a new stat that counts number of abandoned msgs that happen if you close without reading. - Fix lookup of frag point to be aware of a 0 assoc-id. Reviewed by: gnn	2007-04-19 11:28:43 +00:00
andre	456ea0a9e5	Make tcp_twrespond() use tcp_addoptions() instead of a home grown version.	2007-04-18 18:14:39 +00:00
andre	2573926e35	When we run into the syncache entry limits syncache_add() tries to free the oldest entry in the current bucket row. The global entry limit may be smaller than the bucket rows and their limit combined however. Thus only try to free a syncache entry if we found one in this bucket row. Reported by: kris	2007-04-17 15:25:14 +00:00
rwatson	3d8cf5fc2b	Shorten text string for ip_fw2 dynamic rules zone by removing the word "zone", which is generally not present in zone names. This reduces the incidence of line-wrapping in "vmstat -z " using 80-column displays. MFC after: 3 days	2007-04-17 09:28:36 +00:00
rwatson	e781516999	Remove unused variable tcbinfo_mtx.	2007-04-15 21:03:23 +00:00
rrs	577cabcb8a	Fix stupid syntax error - Pointy hat to me :-(	2007-04-15 13:03:14 +00:00
rrs	0eed90f74a	- Add more comments to sctps_stats struture in sctp_uio.h - Fix bug that prevented EEOR mode from working and simplified the can_we_split code in the process. - Reduce lock contention for the tcb_send_lock. I did this especially for EEOR mode, still need to look at why I need a lock when removing from the tailq and the ->next is NOT null. A lock fixes it but it implies a bug yet exists. - Activated Andre's proposed changes to better use the mbuf infrastructure. - Fixed places that were not using the aloc macro's to take advantage of the per assoc cache. - Adds ifdef fix so any logging will enable stat_logging to get the right data structures in place (suggested by Max Laier).	2007-04-15 11:58:26 +00:00
mlaier	550a575853	Fix a typeo - unbreak the build.	2007-04-14 18:27:34 +00:00
rrs	fb6f6fd9a1	- fix source address selection when picking an acceptable address - name change of prefered -> preferred - CMT fast recover code added. - Comment fixes in CMT. - We were not giving a reason of cant_start_asoc per socket api if we failed to get init/or/cookie to bring up an assoc. Change so we don't just give a generic "comm lost" but look at actual states of dying assoc. - change "crc32" arguments to "crc32c" to silence strict/noisy compiler warnings when crc32() is also declared - A few minor tweaks to get the portable stuff truely portable for sctp6_usrreq.c :-D - one-2-one style vrf match problem. - window recovery would leave chks marked for retran during window probes on the sent queue. This would then cause an out-of-order problem and assure that the flight size "problem" would occur. - Solves a flight size logging issue that caused rwnd overruns, flight size off as well as false retransmissions.g - Macroize the up and down of flight size. - Fix a ECNE bug in its counting. - The strict_sacks options was causing aborts when window probing was active, fix to make strict sacks a bit smarter about what the next unsent TSN is. - Fixes a one-2-one wakeup bug found by Martin Kulas. - If-defed out form, Andre's copy routines pending his commit of at least m_last().. need to adjust for 6.2 as well.. since m_last won't exist. Reviewed by: gnn	2007-04-14 09:44:09 +00:00
ru	69c130ff3d	Make "struct tcp_timer" visible only to the kernel, and unbreak world.	2007-04-11 14:08:42 +00:00
andre	bd6041301a	Change the TCP timer system from using the callout system five times directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)	2007-04-11 09:45:16 +00:00
rwatson	3e9709c551	Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser checks to see whether bind() can reuse a port/address combination while it's already in use (for some definition of use).	2007-04-10 15:58:38 +00:00
piso	591e129226	Prevent the usage of an uninitialized variable: do not accept StartMediaTx message before an OpnRcvChnAck message was received. Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days Found with: Coverity Prevent(tm) CID: 498	2007-04-07 09:52:36 +00:00
piso	9e43cc8d03	Silence Coverity about an unused variable. Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days CID: 538	2007-04-07 09:47:39 +00:00
andre	82cdcabbd1	Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some further INP_INFO_WLOCK_ASSERT() while there.	2007-04-04 18:30:16 +00:00
andre	32bf13d188	Move last tcpcb initialization for the inbound connection case from tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.	2007-04-04 16:13:45 +00:00
andre	890976965d	Some local and style(9) cleanups.	2007-04-04 15:30:31 +00:00
andre	3c03d012a2	Retire unused TCP_SACK_DEBUG.	2007-04-04 14:44:15 +00:00
andre	37b70e01e6	In tcp_dooptions() skip over SACK options if it is a SYN segment.	2007-04-04 14:39:49 +00:00
kan	ecc1614c2b	Include string.h for non-kernel builds to get proper memcpy prototype.	2007-04-04 03:16:59 +00:00
kan	3ee45e3f2e	Include string.h for non-kernel builds to get proper strcpy, strlen prototypes.	2007-04-04 03:14:15 +00:00
kan	8aa4f2f59c	Do not assign result of (char ) cast to u_char variable.	2007-04-04 03:10:42 +00:00
julian	cd89273202	Since we switched to using monatomically increasing timestamps, they have been reported back to the userland as being in 1970. Add boot time to the timestamp to give the time in the scale of the 'current' real timescale. Not perfect if you change the time a lot but good enough to keep all the rules correct relative to each other correct in terms of time relative to "now".	2007-04-03 22:45:50 +00:00
rrs	ad3d567017	- fixed several places where we did not release INP locks. - fixed a refcount bug in the new ifa structures. - use vrf's from default stcb or inp whenever possible. - Address limits raised to account for a full IP fragmented packet (1000 addresses). - flight size correcting updated to include one message only and to handle case where the peer does not cumack the next segment aka lists 1/1 in sack blocks.. - Various bad init/init-ack handling could cause a panic since we tried to unlock the destroyed mutex. Fixes so we properly exit when we need to destroy an assoc. (Found by Cisco DevTest team :D) - name rename in src-addr-selection from pass to sifa. - route structure typedef'd to allow different platforms and updated into sctp_os_bsd file. - Max retransmissions a chunk can be made added. Reviewed by: gnn	2007-04-03 11:15:32 +00:00
rrs	9afebb96fc	- Found bug in min split point bundling which caused incorrect, non-bundlable fragmentation. - Added min residual to better control split points for both how big a msg must be as well as how much needs to be left over. - With our new algo in place, we need to implicitly set "end of msg" on the sp-> structure otherwise we end up with "hung" associations. - Room reserved up front in IP header by pushing IP header to back of mbuf. - Fix so FR's peg count of retransmissions needed. - Fix so an unlucky chunk that never gets across will kill the assoc via the kill timer and send an abort too. - Fix bug in sctp_input which can result in a crash. - Do not strip off IP options anymore. - Clean up sctp_calculate_rto(). - Get rid of unused sysctl. - Fixed so we discard all M-Cast - Fixed so port check done AFTER checksum - Fixed bug in fragmentation code that prevented us from fragmenting a small complete message when we needed to. - Window probes were not marked back to unsent and flight adjusted when a sack came in with no window change or accepting of the probe data. We now fix this with having a mark on the net and the chunk so we can clear it out when the sack arrives forcing it to retran just like it was "new" this improves the handling of window probes, which were dropped by the receiver. - Tighten AUTH protocol error checks during INIT/INIT-ACK exchange	2007-03-31 11:47:30 +00:00
bms	44f999134f	Fix a bug in IPv4 address configuration exposed by refcounting. * Join the IPv4 all-hosts multicast group 224.0.0.1 once only; that is, when an IPv4 address is first configured on an interface. * Do not join it for subsequent IPv4 addresses as this violates IGMP. * Be sure to leave the group when all IPv4 addresses have been removed from the interface. * Add two DIAGNOSTIC printfs related to the issue. Further care and attention is needed in this area; it is suggested that netinet's attachment to the ifnet structure be compartmentalized and non-implicit. Bug found by: andre MFC after: 1 month	2007-03-29 21:39:22 +00:00
andre	b1c6fabadd	When blackholing do a 'dropunlock' in the new world order to prevent the INP_INFO_LOCK from leaking. Reported by: ache Found by: rwatson	2007-03-28 12:58:13 +00:00
rwatson	5723c5ca71	Remove stale comment about not enabling inpcb and inpcbinfo lock assertions when IPv6 is enabled. MFC after: 3 days	2007-03-28 00:50:20 +00:00
andre	40f7cba48d	In tcp_sack_doack() remove too tight KASSERT() added in last revision. This function may be called without any TCP SACK option blocks present. Protect iteration over SACK option blocks by checking for SACK options present flag first. Bug reported by: wkoszek, keramida, Nicolas Blais	2007-03-25 23:27:26 +00:00
rwatson	124744263c	Replace a comment about RSVP/mrouting with a different but similar comment explaining that some more locking is needed. The routing pieces are done, but there is an interlocking issue between optionally compiled code and mandatory code. Spotted by: kris	2007-03-25 21:49:50 +00:00
maxim	d789919b3a	o Use a define for a buffer size. Prodded by: db o Add missed vars for TCPDEBUG in tcp_do_segment(). Prodded by: tinderbox	2007-03-24 22:15:02 +00:00
andre	acede79cba	Split tcp_input() into its two functional parts: o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic. Change all KASSERT() messages to ("%s: ", __func__). The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made. Discussed with: rwatson	2007-03-23 20:16:50 +00:00
andre	04c6de816f	Tidy up some code to conform better to surroundings and style(9), 0 = NULL and space/tab.	2007-03-23 19:11:22 +00:00
andre	8057f89694	Bring SACK option handling in tcp_dooptions() in line with all other options and ajust users accordingly.	2007-03-23 18:33:21 +00:00
bms	508d98cfce	Purge two redundant case labels.	2007-03-23 09:43:36 +00:00
glebius	d983ff428e	Remove global list of all llinfo_arp entries and use a callout per instance expiry of the ARP entries. Since we no longer abuse the IPv4 radix head lock, we can now enter arp_rtrequest() with a lock held on an arbitrary rt_entry. Reviewed by: bms	2007-03-22 10:37:53 +00:00
andre	8aefbb6179	ANSIfy function declarations and remove register keywords for variables. Consistently apply style to all function declarations.	2007-03-21 19:37:55 +00:00
andre	844ebe3715	Match up SYSCTL declarations in style.	2007-03-21 19:34:12 +00:00
andre	46edec2346	Subtract optlen in the maximum length check for TSO and finally avoid slightly oversized TSO mbuf chains. Submitted by: kmacy	2007-03-21 19:04:07 +00:00
andre	8bf685df51	Tidy up IPFIREWALL_FORWARD sections and comments.	2007-03-21 18:56:03 +00:00
andre	f79b08ac22	Update and clarify comments in first section of tcp_input().	2007-03-21 18:52:58 +00:00
andre	9b59a4229e	Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove old dead T/TCP code.	2007-03-21 18:49:43 +00:00
andre	b4b4be32dc	Tidy up tcp_log_in_vain and blackhole.	2007-03-21 18:36:49 +00:00
andre	878e882d88	Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.	2007-03-21 18:25:28 +00:00
andre	77fcda08c7	Remove tcp_minmssoverload DoS detection logic. The problem it tried to protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.	2007-03-21 18:05:54 +00:00
bms	a243534dde	Increase default size of raw IP send and receive buffers to the same as udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB) are unnecessarily fragmented. A common use case for this is OSPF link-state database synchronization during adjacency bringup on a high speed network with a large MTU. It is not possible to auto-tune this setting until a socket is bound to a given interface, and because the laddr part of the inpcb tuple may be overridden, it makes no sense to do so. Applications may request a larger socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options. Certain applications such as Quagga ospfd do not probe for interface MTU and therefore do not increase SO_SENDBUF in this use case. XORP is not affected by this problem as it preemptively uses SO_SENDBUF and SO_RECVBUF to account for any possible additional latency in XRL IPC. PR: kern/108375 Requested by: Vladimir Ivanov MFC after: 1 week	2007-03-20 13:15:20 +00:00
rrs	eecb0a8aa7	- window update sacks sent incorrectly after shutdown which caused extra abort from peer. - RTT time calculation was not being done in express sack handling since it refered to an unused variable (rto_pending). Removed variable. - socket buffer high water access macro-ized.	2007-03-20 10:23:11 +00:00
bms	4ffc004901	Implement reference counting for ifmultiaddr, in_multi, and in6_multi structures. Detect when ifnet instances are detached from the network stack and perform appropriate cleanup to prevent memory leaks. This has been implemented in such a way as to be backwards ABI compatible. Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti() is unable to detect interface removal by design, as it performs searches on structures which are removed with the interface. With this architectural change, the panics FreeBSD users have experienced with carp and pfsync should be resolved. Obtained from: p4 branch bms_netdev Reviewed by: andre Sponsored by: Garance A Drosehn Idea from: NetBSD MFC after: 1 month	2007-03-20 00:36:10 +00:00
andre	9ddf9fc256	Match up SYSCTL declaration style.	2007-03-19 19:00:51 +00:00
andre	d5afb20001	Match up SYSCTL_INT declarations in style.	2007-03-19 18:42:27 +00:00
andre	515a501549	Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin	2007-03-19 18:35:13 +00:00
rrs	56faff0a74	Adds a hash table to speed local address lookup on a per VRF basis (BSD has only one VRF currently). Hash table is sized to 16 but may need to be adjusted for machines with large numbers of addresses. Reviewed by: gnn	2007-03-19 11:11:16 +00:00
rrs	af970e3016	- errno -> becomes error in sctp_output.c and sctputil.c - SB_CLEAR macro defined and used for sb clearing. - Fix for CMT express_sack_handling did not do proper pseudo-cumack updates. - Get rid of extraneous function that was never used ip_2_ip6_hdr() - Fixed source address selection bug (initialization problem). - Source address selection debug added.	2007-03-19 06:53:02 +00:00
bms	b7e4c99744	In IPv4 fast forwarding path, send ICMP unreachable messages for routes which have RTF_REJECT set and a zero expiry timer. PR: kern/109246 MFC after: 10 days Submitted by: Ingo Flaschberger	2007-03-18 23:05:20 +00:00
andre	a31ec7e01c	Unbreak IPv6 after consolidation of TCP options insertion. Submitted by: tegge	2007-03-17 11:52:54 +00:00
kmacy	cda4ca63d4	Fix the most obvious of the bugs introduced by recent syncache changes - *ip is not initialized in the case of inet6 connection, but ip->ip_len is being changed anyway Now the question is, why does it think an ipv4 connection is an ipv6 connection? xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.	2007-03-17 06:40:09 +00:00
rwatson	f229d1a42e	Remove unused and #if 0'd net.inet.tcp.tcp_rttdflt sysctl.	2007-03-16 13:42:26 +00:00
andre	da1f496220	Consolidate insertion of TCP options into a segment from within tcp_output() and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005	2007-03-15 15:59:28 +00:00
rrs	bd8786ed77	- Sysctl's move to seperate file - moved away from ifn/ifa access to sctp_ifa/sctp_ifn built and managed by the add-ip code. - cleaned up add-ip code to use the iterator - made iterator be a thread, which enables auto-asconf now. - rewrote and cleaned up source address selection (also made it use new structures). - Fixed a couple of memory leaks. - DACK now settable as to how many packets to delay as well as time. - connectx() to latest socket API, new associd arg. - Fixed issue with revoking and loosing potential to send when we inflate the flight size. We now inflate the cwnd too and deflate it later when the revoked chunk is sent or acked. - Got rid of some temp debug code - src addr selection moved to a common file (sctp_output.c) - Support for simple VRF's (we have support for multi-vfr via compile switch that is scrubbed from BSD but we won't need multi-vrf until we first get VRF :-D) - Rest of mib work for address information now done - Limit number of addresses in INIT/INIT-ACK to a #def (30). Reviewed by: gnn	2007-03-15 11:27:14 +00:00
bms	74244a8ee1	Diff reduction with NetBSD; use IN_LOCAL_GROUP() to check if an address is within the locally scoped multicast range 224.0.0.0/24.	2007-03-15 08:44:22 +00:00
bms	428a375b20	Fix IP_SENDSRCADDR semantics. * To use this option with a UDP socket, it must be bound to a local port, and INADDR_ANY, to disallow possible collisions with existing udp inpcbs bound to the same port on other interfaces at send time. * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be rejected as it is ambiguous. * If the socket is bound to an address other than INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup(). Reviewed by: silence on -net Tested with: src/tools/regression/netinet/ipbroadcast MFC after: 4 days	2007-03-08 15:26:54 +00:00
qingli	fb4a7a64bd	This patch is provided to fix a couple of deployment issues observed in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time . The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments. Reviewed by: silby, Mike Karels MFC after: 1 week	2007-03-07 23:21:59 +00:00
bms	5ab23e568d	Purge an out-of-date comment.	2007-03-04 16:32:19 +00:00
bms	3562ed06e7	Fix undirected broadcast sends for the case where SO_DONTROUTE has also been set at the socket layer, in our somewhat convoluted IPv4 source selection logic in ip_output(). IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255 must always be delivered on a local link with a TTL of 1. If IP_ONESBCAST has been set at the socket layer, also perform destination interface lookup for point-to-point interfaces based on the destination address of the link; previously it was not possible to use the option with such interfaces; also, the destination/broadcast address fields map to the same field within struct ifnet, which doesn't help matters. One more valid fix going forward for these issues is to treat 255.255.255.255 as a destination in its own right in the forwarding trie. Other implementations do this. It fits with the use of multiple paths, though it then becomes necessary to specify interface preference. This hack will eventually go away when that comes to pass. Reviewed by: andre MFC after: 1 week	2007-03-01 13:29:30 +00:00
andre	ccd57f9789	Prevent TSO mbuf chain from overflowing a few bytes by subtracting the TCP options size before the TSO total length calculation. Bug found by: kmacy	2007-03-01 13:12:09 +00:00
mohans	2010e1d527	In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss(). The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.	2007-02-28 20:48:00 +00:00
bms	84976a55f2	Style: Move declaration of subsystem mutex to where other mutexes are in this file, and use macros for dealing with it.	2007-02-28 20:02:24 +00:00
glebius	5c82cce393	Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't be returned up to the caller. PR: 100172 Submitted by: "Andrew - Supernews" <andrew supernews.net> Reviewed by: rwatson, bms	2007-02-28 12:47:49 +00:00
glebius	481c8d8b0b	Toss the code, that handles errors from ip_output(), to make it more readable: - Merge two embedded if() into one. - Introduce switch() block to handle different kinds of errors. Reviewed by: rwatson, bms	2007-02-28 12:41:49 +00:00
bms	833c0dc8bd	Add INADDR_ALLRPTS_GROUP define for 224.0.0.22 for future IGMPv3 support. Obtained from: OpenSolaris	2007-02-27 14:45:37 +00:00
mohans	384aeb29f6	Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby.	2007-02-26 22:25:21 +00:00
bms	2f11af3e83	Unlock a mutex which should be unlocked before returning. MFC after: 1 week	2007-02-25 14:22:03 +00:00
bms	3e83ac6653	Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel. It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko, if and only if IPv6 support is enabled for loadable modules. Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).	2007-02-24 11:38:47 +00:00
rwatson	b659d84f71	Rename two identically named log_in_vain variables: tcp_input.c's static log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week	2007-02-20 10:20:03 +00:00
rwatson	5c6ebcbcaf	Gratuitous UDP restyling toward style(9) in 7.x.	2007-02-20 10:13:11 +00:00
rwatson	2d9a4ed3b7	#ifdef INET6 printing of inpcb IPv6 addresses in DDB. Patch committed with minor adjustments. Submitted by: Florian C. Smeets <flo at kasimir dot com>	2007-02-18 08:57:23 +00:00
rwatson	6a5d54ffd2	Add "show inpcb", "show tcpcb" DDB commands, which should come in handy for debugging sblock and other network panics.	2007-02-17 21:02:38 +00:00
rwatson	939d61a17e	Remove unused inp6_ifindex field from inpcb, as well as unused macro shortcut for it.	2007-02-16 14:09:24 +00:00
rwatson	9ff22a0b6e	Remove unused in6p_ip6_hlim macro shortcut for non-present inp_depend6.inp6_hlim field in the inpcb.	2007-02-16 13:56:06 +00:00
rrs	e176cc33f5	- Copyright updates (aka 2007) - ZONE get now also take a type cast so it does the cast like mtod does. - New macro SCTP_LIST_EMPTY, which in bsd is just LIST_EMPTY - Removal of const in some of the static hmac functions (not needed) - Store length changes to allow for new fields in auth - Auth code updated to current draft (this should be the RFC version we think). - use uint8_t instead of u_char in LOOPBACK address comparison - Some u_int32_t converted to uint32_t (in crc code) - A bug was found in the mib counts for ordered/unordered count, this was fixed (was referencing a freed mbuf). - SCTP_ASOCLOG_OF_TSNS added (code will probably disappear after my testing completes. It allows us to keep a small log on each assoc of the last 40 TSN's in/out and stream assignment. It is NOT in options and so is only good for private builds. - Some CMT changes in prep for Jana fixing his problem with reneging when CMT is enabled (Concurrent Multipath Transfer = CMT). - Some missing mib stats added. - Correction to number of open assoc's count in mib - Correction to os_bsd.h to get right sha2 macros - Add of special AUTH_04 flags so you can compile the code with the old format (in case the peer does not yet support the latest auth code). - Nonce sum was incorrectly being set in when ecn_nonce was NOT on. - LOR in listen with implicit bind found and fixed. - Moved away from using mbuf's for socket options to using just data pointers. The mbufs were used to harmonize NetBSD code since both Net and Open used this method. We have decided to move away from that and more conform to FreeBSD style (which makes more sense). - Very very nasty bug found in some of my "debug" code. The cookie_how collision case tracking had an endless loop in it if you got a second retransmission of a cookie collision case. This would lock up a CPU .. ugly.. - auth function goes to using size_t instead of int which conforms to socketapi better - Found the nasty bug that happens after 9 days of testing.. you get the data chunk, deliver it and due to the reference to a ch-> that every now and then has been deleted (depending on the postion in the mbuf) you have an invalid ch->ch.flags.. and thus you don't advance the stream sequence number.. so you block the stream permanently. The fix is to make local variables of these guys and set them up before you have any chance of trimming the mbuf. - style fix in sctp_util.h, not sure how this got bad maybe in the last patch? (aka it may not be in the real source). - Found interesting bug when using the extended snd/rcv info where we would get an error on receiving with this. Thats because it was NOT padded to the same size as the snd_rcv info. We increase (add the pad) so the two structs are the same size in sctp_uio.h - In sctp_usrreq.c one of the most common things we did for socket options was to cast the pointer and validate the size. This as been macro-ized to help make the code more readable. - in sctputil.c two things, the socketapi class found a missing flag type (the next msg is a notification) and a missing scope recovery was also fixed. Reviewed by: gnn	2007-02-12 23:24:31 +00:00
bms	7fe6b5282a	Use MAXTTL. Obtained from: NetBSD	2007-02-10 23:15:28 +00:00
bms	17b1800be2	If the rendezvous point for a group is not specified, do not send IGMPMSG_WHOLEPKT notifications to the userland PIM routing daemon, as an optimization to mitigate the effects of high multicast forwarding load. This is an experimental change, therefore it must be explicitly enabled by setting the sysctl/tunable net.inet.pim.squelch_wholepkt to a non-zero value. The tunable may be set from the loader or from within the kernel environment when loading ip_mroute.ko as a module. Submitted by: edrt <edrt at citiz.net> See also: http://mailman.icsi.berkeley.edu/pipermail/xorp-users/2005-June/000639.html	2007-02-10 14:48:42 +00:00
bms	42ce6b97cf	Build PIM by default as part of the IPv4 multicast forwarding path. Make PIM dynamically loadable by using encap_attach_func(). PIM may now be loaded into a GENERIC kernel. Tested with: ports/net/pimdd && tcpreplay && wireshark Reviewed by: Pavlin Radoslavov	2007-02-10 13:59:13 +00:00
bms	ecb2edb6fa	Store the cached route in vifp in the normal send_packet() case. The VIFF_TUNNEL case no longer exists, therefore this field is free to use, and its use eliminates a static data member.	2007-02-08 23:05:08 +00:00
bms	51ca9a4740	Nuke the token bucket filter code. Attempting to request rate limiting by the token bucket filter will result in EINVAL being returned. If you want to rate-limit traffic in future, use ALTQ or dummynet; this isn't a general purpose QoS engine. Preserve the now unused fields in struct vif so as to avoid having to recompile netstat(1) and other tools. Reviewed by: Pavlin Radslavov, Bill Fenner	2007-02-08 22:58:01 +00:00
bms	6fc869225c	eliminate redundant macro MC_SEND()	2007-02-07 20:36:33 +00:00
bms	61cc2fad7d	Remove support for IPIP tunnels in IPv4 multicast forwarding. XORP has never used them; with mrouted, their functionality may be replaced by explicitly configuring gif(4) instances and specifying them with the 'phyint' keyword. Bump __FreeBSD_version to 700030, and update UPDATING. A doc update is forthcoming. Discussed on: net Reviewed by: fenner MFC after: 3 months	2007-02-07 16:04:13 +00:00
bms	7925e63ddf	When fast-forwarding is enabled, do not forward directed IPv4 broadcasts to locally attached broadcast networks. Note well: This relies on the layer 2 route cloning behaviour in BSD. PR: 98799 Tested by: Dmitry Sergienko MFC after: 1 week	2007-02-05 00:15:40 +00:00
alc	d7099516c9	Include opt_ipdivert.h so that the message announcing ipfw correctly describes the state of IPDIVERT.	2007-02-03 22:11:53 +00:00
bms	929d8d99d7	In fast forwarding path, defer processing of 169.254.0.0/16 to ip_input(). See RFC 3927 section 2.7.	2007-02-03 06:46:48 +00:00
bms	b6b883252e	In regular forwarding path, reject packets destined for 169.254.0.0/16 link-local addresses. See RFC 3927 section 2.7.	2007-02-03 06:45:51 +00:00
bms	4341e5c6f6	Comply with RFC 3927, by forcing ARP replies which contain a source address within the link-local IPv4 prefix 169.254.0.0/16, to be broadcast at link layer. Reviewed by: fenner MFC after: 2 weeks	2007-02-02 20:31:44 +00:00
bms	e9ca37568e	Expose smoothed RTT and RTT variance measurements to userland via socket option TCP_INFO. Note that the units used in the original Linux API are in microseconds, so use a 64-bit mantissa to convert FreeBSD's internal measurements from struct tcpcb from ticks.	2007-02-02 18:34:18 +00:00
glebius	325d4d7fda	Since rev. 1.94 of netinet/in.c, the netinet layer frees all its multicast memberships, when interface is detached. Thus, when an underlying interface is detached, we do not need to free our multicast memberships. Reviewed by: bms	2007-02-02 09:39:09 +00:00
andre	25c4be862e	Auto sizing TCP socket buffers. Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month	2007-02-01 18:32:13 +00:00
andre	347b7a6a1e	Change the way the advertized TCP window scaling is computed. Instead of upper-bounding it to the size of the initial socket buffer lower-bound it to the smallest MSS we accept. Ideally we'd use the actual MSS information here but it is not available yet. For socket buffer auto sizing to be effective we need room to grow the receive window. The window scale shift is determined at connection setup and can't be changed afterwards. The previous, original, method effectively just did a power of two roundup of the socket buffer size at connection setup severely limiting the headroom for larger socket buffers. Tested by: many (as part of the socket buffer auto sizing patch) MFC after: 1 month	2007-02-01 17:39:18 +00:00
bms	cad0bb8b16	Import macros IN_LINKLOCAL(), IN_PRIVATE(), IN_LOCAL_GROUP(), IN_ANY_LOCAL(). This is not a functional change. IN_LINKLOCAL() tests if an address falls within the IPv4 link-local prefix. IN_PRIVATE() tests if an address falls within an RFC 1918 private prefix. IN_LOCAL_GROUP() tests if an address falls within the statically assigned link-local multicast scope specified in RFC 2365. IN_ANY_LOCAL() tests for either of IN_LINKLOCAL() or IN_LOCAL_GROUP(). As with the existing macros in the FreeBSD netinet stack, comparisons are performed in host-byte order. See also: RFC 1918, RFC 2365, RFC 3927 Obtained from: NetBSD (dyoung@) MFC after: 2 weeks	2007-01-31 14:34:47 +00:00
glebius	34079a8c02	Make it possible that carpdetach() unlocks on return. Then, in carp_clone_destroy() we are on a safe side, we don't need to unlock the cif, that can me already non-existent at this point. Reported by: Anton Yuzhaninov <citrin rambler-co.ru>	2007-01-25 18:03:40 +00:00
glebius	b7c8a97d9b	Spacing.	2007-01-25 17:58:16 +00:00
rrs	1b181171ae	- most all includes (#include <>) migrate to the sctp_os_bsd.h file - Finally all splxx() are removed - Count error fixed in mapping array which might cause a wrong cumack generation. - Invariants around panic for case D + printf when no invariants. - one-to-one model race condition fixed by using a pre-formed connection and then completing the work so accept won't happen on a non-formed association. - Some additional paranoia checks in sctp_output. - Locks that were missing in the accept code. Approved by: gnn	2007-01-18 09:58:43 +00:00
rrs	094d70fac7	- Macroizes the V6ONLY flag check. - Added a short time wait (not used yet) constant - Corrected the type of the crc32c table (it was unsigned long and really is a uint32_t - Got rid of the user of MHeaders until they are truely needed by lower layers. - Fixed an initialization problem in the readq structure (ordering was off). - Found yet another collision bug when the random number generator returns two numbers on one side (during a collision) that are the same. Also added some tracking of cookies that will go away when we know that we have the last collision bug gone. - Fixed an init bug for book_size_scale, that was causing Early FR code to run when it should not. - Fixed a flight size tracking bug that was associated with Early FR but due to above bug also effected all FR's - Fixed it so Max Burst also will apply to Fast Retransmit. - Fixed a bug in the temporary logging code that allowed a static log array overflow - hashinit_flags is now used. - Two last mcopym's were converted to the macro sctp_m_copym that has always been used by all other places - macro sctp_m_copym was converted to upper case. - We now validate sinfo_flags on input (we did not before). - Fixed a bug that prevented a user from sending data and immediately shuting down with one send operation. - Moved to use hashdestroy instead of free() in our macros. - Fixed an init problem in our timed_wait vtag where we did not fully initialize our time-wait blocks. - Timer stops were re-positioned. - A pcb cleanup method was added, however this probably will not be used in BSD.. unless we make module loadable protocols - I think this fixes the mysterious timer bug.. it was a ordering of locks problem in the way we did timers. It now conforms to the timeout(9) manual (except for the _drain part, we had to do this a different way due to locks). - Fixed error return code so we get either CONNREUSED or CONNRESET depending on where one is in progression - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. Approved by: gnn	2007-01-15 15:12:10 +00:00
maxim	3251ecf31a	o Increment requests counter right before send out an ARP query actually. Otherwise the code could lead to the spurious EHOSTDOWN errors. PR: kern/107807 Submitted by: Dmitrij Tejblum MFC after: 1 month	2007-01-14 18:44:17 +00:00
imp	3610b784ca	Marking this as __packed was needed to get the alignment and offset of members right. However, it also said it was aligned(1), which meant that gcc generated really bad code. Mark this as aligned(4). This makes things a little faster on arm (a couple percent), but also saves about 30k on the size of the kernel for arm. I talked about doing this with bde, but didn't check with him before the commit, so I'm hesitant say 'reviewed by: bde'.	2007-01-12 07:23:31 +00:00
julian	0d4210347a	Remove two lines that somehow snuck back in after testing. ip is now an argument to the function ipfw_log()	2007-01-09 21:03:07 +00:00
maxim	4a9a2f072d	o One more typo in the comment. PR: kern/107609 Submitted by: Dr. Markus Waldeck	2007-01-06 13:12:24 +00:00
piso	e5ae3d604d	Prevent adding a rule with a nat action in case IPFIREWALL_NAT was not defined. Reviewed: luigi	2007-01-05 12:15:31 +00:00
piso	2ccef57014	Wrap ipfw nat support in a new kernel config option named "IPFIREWALL_NAT": this way nat is turned off by default and POLA is preserved. Reviewed by: rwatson	2007-01-03 11:12:54 +00:00
julian	fa85787a84	Remove a bunch of dependencies in the IP header being the first thing in the mbuf. First moves toward being able to cope better with having layer 2 (or other encapsulation data) before the IP header in the packet being examined. More commits to come to round out this functionality. This commit should have no practical effect but clears the way for what is coming. Revirewed by: luigi, yar MFC After: 2 weeks	2007-01-02 19:57:31 +00:00
imp	51b8a53652	Fix typo in comment. Submitted by: remko	2007-01-01 00:35:34 +00:00
imp	426d73c762	Add comment about udp checksums being off in BSD 4.2 compatibility mode. Submitted by: Dr. Markus Waldeck PR: kern/106657	2006-12-31 21:34:53 +00:00
jhb	4d2a36e295	Whitespace fix and remove an extra cast.	2006-12-30 17:53:28 +00:00
piso	0db606a3b1	Summer of Code 2005: improve libalias - part 2 of 2 With the second (and last) part of my previous Summer of Code work, we get: -ipfw's in kernel nat -redirect_* and LSNAT support General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page. To enable in kernel nat in rc.conf, two options were added: o firewall_nat_enable: equivalent to natd_enable o firewall_nat_interface: equivalent to natd_interface Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased. NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso). Approved by: glebius (mentor)	2006-12-29 21:59:17 +00:00
rrs	d392a291a2	a) macro-ization of all mbuf and random number access plus timers. This makes the code more portable and able to change out the mbuf or timer system used more easily ;-) b) removal of all use of pkt-hdr's until only the places we need them (before ip_output routines). c) remove a bunch of code not needed due to <b> aka worrying about pkthdr's :-) d) There was one last reorder problem it looks where if a restart occur's and we release and relock (at the point where we setup our alias vtag) we would end up possibly getting the wrong TSN in place. The code that fixed the TSN's just needed to be shifted around BEFORE the release of the lock.. also code that set the state (since this also could contribute). Approved by: gnn	2006-12-29 20:21:42 +00:00
jhb	9adb288460	Some whitespace nits and remove a few casts.	2006-12-29 14:58:18 +00:00
piso	9ddea13291	o made in kernel libalias mpsafe o fixed a comment o made in kernel libalias a bit less verbose (disabled automatic logging everytime a new link is added or deleted) Approved by: glebius (mentor)	2006-12-15 12:50:06 +00:00
rrs	3de80805ff	1) Fixes on a number of different collision case LOR's. 2) Fix all "magic numbers" to be constants. 3) A collision case that would generate two associations to the same peer due to a missing lock is fixed. 4) Added tracking of where timers are stopped. Approved by: gnn	2006-12-14 17:02:55 +00:00
csjp	7aaca1dfe1	Fix LOR between the syncache and inpcb locks when MAC is present in the kernel. This LOR snuck in with some of the recent syncache changes. To fix this, the inpcb handling was changed: - Hang a MAC label off the syncache object - When the syncache entry is initially created, we pickup the PCB lock is held because we extract information from it while initializing the syncache entry. While we do this, copy the MAC label associated with the PCB and use it for the syncache entry. - When the packet is transmitted, copy the label from the syncache entry to the mbuf so it can be processed by security policies which analyze mbuf labels. This change required that the MAC framework be extended to support the label copy operations from the PCB to the syncache entry, and then from the syncache entry to the mbuf. These functions really should be referencing the syncache structure instead of the label. However, due to some of the complexities associated with exposing this syncache structure we operate directly on it's label pointer. This should be OK since we aren't making any access control decisions within this code directly, we are merely allocating and copying label storage so we can properly initialize mbuf labels for any packets the syncache code might create. This also has a nice side effect of caching. Prior to this change, the PCB would be looked up/locked for each packet transmitted. Now the label is cached at the time the syncache entry is initialized. Submitted by: andre [1] Discussed with: rwatson [1] andre submitted the tcp_syncache.c changes	2006-12-13 06:00:57 +00:00
bz	d1e7a81aa4	In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument. This is the "+ one more change" missed in the original commit. Noticed by: tinderbox Pointy hat to: me (#1)	2006-12-12 17:44:46 +00:00
bz	297206ec2a	MFp4: 92972, 98913 + one more change In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.	2006-12-12 12:17:58 +00:00
bms	1589529aca	Back out revision 1.264. Fixing the IP accounting issue, if we plan to do so, needs to be better thought out; the 'fix' introduces a hash lookup and a possible kernel panic. Reported by: Mark Tinguely	2006-12-10 13:44:00 +00:00
rwatson	9e0e43febb	Improve style(9) conformance of igmp.c.	2006-12-04 00:41:48 +00:00
imp	575f8fc940	Make sure that carp_header is 36 bytes long	2006-12-01 18:37:41 +00:00
piso	8e272a902a	Make libalias.conf parsing a bit smarter. This closes PR kern/106112. While here, add mbuf's #includes i forgot in the previous commit. Approved by: gleb	2006-12-01 16:34:53 +00:00
piso	e2023fe5e9	Remove m_megapullup from ng_nat and put it under libalias. Approved by: gleb	2006-12-01 16:27:11 +00:00
rwatson	12d8083335	Consistently use #ifdef INET6 rather than mixing and matching with #if defined(INET6). Don't comment the end of short #ifdef blocks. Comment cleanup. Line wrap.	2006-11-30 10:54:54 +00:00
sam	c380f600ad	Change error codes returned by protocol operations when an inpcb is marked INP_DROPPED or INP_TIMEWAIT: o return ECONNRESET instead of EINVAL for close, disconnect, shutdown, rcvd, rcvoob, and send operations o return ECONNABORTED instead of EINVAL for accept These changes should reduce confusion in applications since EINVAL is normally interpreted to mean an invalid file descriptor. This change does not conflict with POSIX or other standards I checked. The return of EINVAL has always been possible but rare; it's become more common with recent changes to the socket/inpcb handling and with finer-grained locking and preemption. Note: there are other instances of EINVAL for this state that were left unchanged; they should be reviewed. Reviewed by: rwatson, andre, ru MFC after: 1 month	2006-11-22 17:16:54 +00:00
bz	63dab0caf6	Add SCTP as a known upper layer protocol over v6. We are not yet aware of the protocol internals but this way SCTP traffic over v6 will not be discarded. Reported by: Peter Lei via rrs Tested by: Peter Lei <peterlei cisco.com>	2006-11-13 19:07:32 +00:00
rrs	fb5651e047	In a true restart case, the send_lock was not being aquired. This meant that when we cleanup the outbound we may have one in transit to be added with the old sequence number. This is bad since then we loose a message :( Also the report_outbound needed to have the right lock when its called which it did not.. I added the lock with of course a flag since we want to have the lock before we call it in the restart case. This also fixed the FIX ME case where, in the cookie collision case, we mark for retransmit any that were bundled with the cookie that was dropped. This also means changes to the output routine so we can assure getting the COOKIE-ACK sent BEFORE we retransmit the Data. Approved by: gnn	2006-11-11 22:44:12 +00:00
rrs	e0c50feae8	Turns out we would reset the TSN seq counter during a colliding INIT. This if fine except when we have data outstanding... we basically reset it to the previous value it was.. so then we end up assigning the same TSN to two different data chunks. This patch: 1) Finds a missing lock for when we change the stream numbers during COOKIE and INIT-ACK processing.. we were NOT locking the send_buffer.. which COULD cause problems (found by inspection looking for <2>) 2) Fixes a case during a colliding INIT where we incorrectly reset the sending Sequence thus in some cases duplicately assigning a TSN. 3) Additional enhancments to logging so we can see strm/tsn in the receiver AND new tracking to watch what the sender is doing with TSN and STRM seq's. Approved by: gnn	2006-11-11 15:59:01 +00:00
rrs	463be3f37d	This patch fixes a LOR that happens during INIT-ACK collision. We were calling select_a_tag() inside sctp_send_initate_ack(). During collision cases we have a stcb and thus a SCTP_LOCK. When we call select_a_tag it (below it) locks the INFO lock. We now 1) pre-select the nonce-tie-tags in sctputil.c during setup of a tcb. 2) In the other case where we have to select tags, we unlock after incr the ref cnt (so assoc won't go away0 and then do the tag selection followed by a relock and decr the refcnt. Approved by: gnn	2006-11-10 13:34:55 +00:00
rrs	877f5726fa	Fixes an issue with handling of stream reset. When a reset comes in we need to calculate the length and therefore the number of listed streams (if any) based on the TLV type. Otherwise if we get a retran we could in theory panic by sending a notification to a user with a incorrect list and thus no memory listing the streams. Found in IOS by devtest :-) Approved by: gnn	2006-11-09 21:01:07 +00:00
rrs	1bedc49b68	-Fixes first of all the getcred on IPv6 and V4. The copy's were incorrect and so was the locking. -A bug was also found that would create a race and panic when an abort arrived on a socket being read from. -Also fix the reader to get MSG_TRUNC when a partial delivery is aborted. -Also addresses a couple of coverity caught error path memory leaks and a couple of other valid complaints Approved by: gnn	2006-11-08 00:21:13 +00:00
marcus	42daf70603	Fix TFTP NAT support by making sure the appropriate fingerprinting checks are done. Reviewed by: piso	2006-11-07 21:06:48 +00:00
rwatson	572da55a43	Convert three new suser(9) calls introduced between when the priv(9) patch was prepared and committed to priv(9) calls. Add XXX comments as, in each case, the semantics appear to differ from the TCP/UDP versions of the calls with respect to jail, and because cr_canseecred() is not used to validate the query. Obtained from: TrustedBSD Project	2006-11-06 14:54:06 +00:00
rrs	9da66947c4	This changes tracks down the EEOR->NonEEOR mode failure to wakeup on close of the sender. It basically moves the return (when the asoc has a reader/writer) further down and gets the wakeup and assoc appending (of the PD-API event) moved up before the return. It also moves the flag set right before the return so we can assure only once adding the PD-API events. Approved by: gnn	2006-11-06 14:34:21 +00:00
rwatson	10d0d9cf47	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
ru	cd38181372	Revert previous commit, and instead make the expression in rev. 1.2 match the style of this file. OK'ed by: rrs	2006-11-05 14:36:59 +00:00
rrs	20dc61d3a4	Tons of fixes to get all the 64bit issues removed. This also moves two 16 bit int's to become 32 bit values so we do not have to use atomic_add_16. Most of the changes are %p, casts and other various nasty's that were in the orignal code base. With this commit my machine will now do a build universe.. however I as yet have not tested on a 64bit machine .. it may not work :-(	2006-11-05 13:25:18 +00:00
ru	c5d5272202	Fix pointer arithmetic to be 64-bit friendly.	2006-11-04 08:45:50 +00:00
ru	20b5c61adb	Remove bogus casts that Randall for some reason didn't borrow from my supplied patch.	2006-11-04 08:19:01 +00:00
jb	eea3251554	Remove a bogus cast in an attempt to fix the tinderbox builds on lots of arches.	2006-11-04 05:39:39 +00:00
rrs	9fe01d5c9e	More 64 bit pointer fun. %p changed in multiple prints the mtod() was also fixed.	2006-11-03 23:04:34 +00:00
rrs	7e7f92316b	Fix two of the 64bit errors on the printfs.	2006-11-03 21:19:54 +00:00
rrs	cb4eb8bad1	Somehow I missed this one. The sys/cdef.h was out of order with respect to the FSBID..	2006-11-03 19:48:56 +00:00
rrs	d203a1d908	Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I inserted a few to the new files.. but I falied to add the #include <sys/cdef.h> Which causes a compile error.. sorry about that... got it now :-) Approved by:gnn	2006-11-03 17:21:53 +00:00
rrs	3d3e3f2242	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
oleg	e33bbcbd95	- Use non-recursive mutex. MTX_RECURSE is unnecessary since rev. 1.70 - Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input() Reviewed by: glebius, rwatson - purge_flow_set(): - Do not leak memory while purging queues which are not bound to pipe. - style(9) cleanup MFC after: 2 months	2006-10-29 12:09:24 +00:00
oleg	849676e291	- Convert net.inet.ip.dummynet.curr_time net.inet.ip.dummynet.searches net.inet.ip.dummynet.search_steps to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs. - Implement simple mechanics for dummynet(4) internal time correction. Under certain circumstances (system high load, dummynet lock contention, etc) dummynet's tick counter can be significantly slower than it should be. (I've observed up to 25% difference on one of my production servers). Since this counter used for packet scheduling, it's accuracy is vital for precise bandwidth limitation. Introduce new sysctl nodes: net.inet.ip.dummynet. tick_lost - number of ticks coalesced by taskqueue thread. tick_adjustment - number of time corrections done. tick_diff - adjusted vs non-adjusted tick counter difference tick_delta - last vs 'standard' tick differnece (usec). tick_delta_sum - accumulated (and not corrected yet) time difference (usec). Reviewed by: glebius MFC after: 2 month	2006-10-27 13:05:37 +00:00
oleg	e08889579d	Use separate thread for servicing dummynet(4). Utilize taskqueue(9) API. Submitted by: glebius MFC after: 2 month	2006-10-27 11:16:58 +00:00
oleg	b588f35e74	style(9) cleanup. MFC after: 2 month	2006-10-27 10:52:32 +00:00
rwatson	7beaaf5cd2	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
julian	8f92fe04c3	revert last change.. premature.. need to wait until if_ethersubr.c uses pfil to get to ipfw.	2006-10-21 00:16:31 +00:00
julian	9a86c1b09e	Move some variables to a more likely place and remove "temporary" stuff that is not needed any more.	2006-10-20 19:32:08 +00:00
maxim	3194151ff3	o Do not do args->f_id.addr_type == 6 when there is IS_IP6_FLOW_ID() exactly for that.	2006-10-11 12:14:28 +00:00
maxim	69b55f4fa1	o Kill a nit in the comment.	2006-10-11 12:00:53 +00:00
maxim	dd217a5254	o Extend not very informative ipfw(4) message 'drop session, too many entries' by src:port and dst:port pairs. IPv6 part is non-functional as ``limit'' does not support IPv6 flows. PR: kern/103967 Submitted by: based on Bruce Campbell patch MFC after: 1 month	2006-10-11 11:52:34 +00:00
ru	da24a0b645	Merge the rest of my changes.	2006-10-11 07:11:56 +00:00
piso	862e1ed026	Various mdoc and grammar fixes. Approved by: glebius Reviewed by: glebius, ru	2006-10-08 13:53:45 +00:00
bz	af0ae0b158	Set scope on MC address so IPv6 carp advertisement will not get dropped in ip6_output. In case this fails handle the error directly and log it[1]. In addition permit CARP over v6 in ip_fw2. PR: kern/98622 Similar patch by: suz Discussed with: glebius [1] Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr MFC after: 3 days	2006-10-07 10:19:58 +00:00
glebius	c0d4b0e19f	Save space on stack moving token ring stuff to its own hack block.	2006-10-04 11:08:14 +00:00
glebius	ce5c7dcc2b	Style rev. 1.152.	2006-10-04 10:59:21 +00:00
andre	a5d21a05c1	Remove stone-aged and irrelevant "#ifndef notdef".	2006-09-29 16:44:45 +00:00
bms	b7f17de1eb	Nits. Submitted by: ru	2006-09-29 16:16:41 +00:00
bms	686e54733a	Push removal of mrouted down to the rest of the tree.	2006-09-29 15:45:11 +00:00
maxim	9952e77633	o Convert w/spaces to tabs in the previous commit.	2006-09-29 06:46:31 +00:00
silby	f7318c4798	Rather than autoscaling the number of TIME_WAIT sockets to maxsockets / 5, scale it to min(ephemeral port range / 2, maxsockets / 5) so that people with large gobs of memory and/or large maxsockets settings will not exhaust their entire ephemeral port range with sockets in the TIME_WAIT state during periods of heavy load. Those who wish to tweak the size of the TIME_WAIT zone can still do so with net.inet.tcp.maxtcptw. Reviewed by: glebius, ru	2006-09-29 06:24:26 +00:00
andre	5eae8ee0ac	When tcp_output() receives an error upon sending a packet it reverts parts of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time. The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen. Discussed with: glebius PR: kern/25986 PR: kern/102653	2006-09-28 18:02:46 +00:00
andre	3119a75f5a	When doing TSO correctly do the check to prevent a maximum sized IP packet from overflowing.	2006-09-28 13:59:26 +00:00
bms	03e16bd83a	Fix the IPv4 multicast routing detach path. On interface detach whilst the MROUTER is running, the system would panic as described in the PR. The fix in the PR is a good start, however, the other state associated with the multicast forwarding cache has to be freed in order to avoid leaking memory and other possible panics. More care and attention is needed in this area. PR: kern/82882 MFC after: 1 week	2006-09-28 12:21:08 +00:00
bms	b8c88cc531	The IPv4 code should clean up multicast group state when an interface goes away. Without this change, it leaks in_multi (and often ether_multi state) if many clonable interfaces are created and destroyed in quick succession. The concept of this fix is borrowed from KAME. Detailed information about this behaviour, as well as test cases, are available in the PR. PR: kern/78227 MFC after: 1 week	2006-09-28 10:04:07 +00:00
piso	f925eb73aa	Compilation.	2006-09-27 02:08:44 +00:00
piso	5582e56d9d	Summer of Code 2005: improve libalias - part 1 of 2 With the first part of my previous Summer of Code work, we get: -made libalias modular: -support for 'particular' protocols (like ftp/irc/etcetc) is no more hardcoded inside libalias, but it's available through external modules loadable at runtime -modules are available both in kernel (/boot/kernel/alias_.ko) and user land (/lib/libalias_) -protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp, skinny and smedia -added logging support for kernel side -cleanup After a buildworld, do a 'mergemaster -i' to install the file libalias.conf in /etc or manually copy it. During startup (and after every HUP signal) user land applications running the new libalias will try to read a file in /etc called libalias.conf: that file contains the list of modules to load. User land applications affected by this commit are ppp and natd: if libalias.conf is present in /etc you won't notice any difference. The only kernel land bit affected by this commit is ng_nat: if you are using ng_nat, and it doesn't correctly handle ftp/irc/etcetc sessions anymore, remember to kldload the correspondent module (i.e. kldload alias_ftp). General information and details about the inner working are available in the libalias man page under the section 'MODULAR ARCHITECTURE (AND ipfw(4) SUPPORT)'. NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat support will be part of the next libalias-related commit. Approved by: glebius Reviewed by: glebius, ru	2006-09-26 23:26:53 +00:00
jmg	9a6d92f6df	fix calculating to_tsecr... This prevents the rtt calculations from going all wonky...	2006-09-26 01:21:46 +00:00
bms	e5ef61bea6	Fix an incompatibility between CARP and IPv4 multicast routing, whereby the VRRPv2 advertisements will originate from the wrong source address. This only affects kernels compiled with MROUTING and after the MRT_INIT ioctl() has been issued. Set imo_multicast_vif in carp's softc to the invalid value -1 after it is zeroed by softc allocation, to stop the ip_output() path looking up the incorrect source address thinking a vif is set. PR: kern/100532 Submitted by: Bohus Plucinsky MFC after: 1 week	2006-09-25 11:53:54 +00:00
bms	d9989d37d6	Spleling Submitted by: pjd	2006-09-25 11:48:07 +00:00
bms	27d7e8be08	Account for output IP datagrams on the ifaddr where they originated from, not the first ifaddr on the ifp. This is similar to what NetBSD does. PR: kern/72936 Submitted by: alfred Reviewed by: andre	2006-09-25 10:11:16 +00:00
jmg	7766e0a26b	if min is greater than max, prefer max over min... I managed to get a retransmit timer that was going to take 19 days to trigger... Reviewed by: silby	2006-09-25 07:22:39 +00:00
jmg	be23d3d5d0	now that we don't automagicly increase the MTU of host routes, when we copy the loopback interface, copy it's mtu also.. This means that we again have large mtu support for local ip addresses...	2006-09-23 19:24:10 +00:00
bms	f6e5dafda9	Always set the IP version in the TCP input path, to preserve the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'. PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth	2006-09-23 16:26:31 +00:00
andre	34f4a99b52	Make tcp_usr_send() free the passed mbufs on error in all cases as the comment to it claims. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:39:35 +00:00
jhay	7748ccf222	Handle a list of IPv6 src and dst addresses correctly, eg. ipfw add allow ip6 from any to 2000::/16,2002::/16 PR: 102422 (part 3) Submitted by: Andrey V. Elsukov <bu7cher at yandex dot ru> MFC after: 5 days	2006-09-16 10:27:05 +00:00
andre	e168227ed2	When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_len from wrapping when we generate a maximally sized packet for later segmentation. Noticed by: gallatin Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-15 16:08:09 +00:00
ache	ce4ed61a69	Add missing #ifdef INET6 (can't be compiled)	2006-09-14 10:22:35 +00:00
andre	fc2dc7ce4e	Remove unessary includes and follow common ordering style.	2006-09-13 13:21:17 +00:00
andre	b859d7a1c9	Rewrite of TCP syncookies to remove locking requirements and to enhance functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-13 13:08:27 +00:00
csjp	63e89c05d2	Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point exists to allow the mandatory access control policy to properly initialize mbufs generated by the firewall. An example where this might happen is keep alive packets, or ICMP error packets in response to other packets. This takes care of kernel panics associated with un-initialize mbuf labels when the firewall generates packets. [1] I modified this patch from it's original version, the initial patch introduced a number of entry points which were programmatically equivalent. So I introduced only one. Instead, we should leverage mac_create_mbuf_netlayer() which is used for similar situations, an example being icmp_error() This will minimize the impact associated with the MFC Submitted by: mlaier [1] MFC after: 1 week This is a RELENG_6 candidate	2006-09-12 04:25:13 +00:00
andre	c4015db86a	Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the validity of ro->ro_rt first. This prevents crashing on any non-normally routed IP packet. Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)	2006-09-11 19:56:10 +00:00
jmg	6d7956e801	make use of the host route's mtu for processing. This means we can now support a network w/ split mtu's by assigning each host route the correct mtu. an aspiring programmer could write a daemon to probe hosts and find out if they support a larger mtu.	2006-09-10 17:49:09 +00:00
glebius	9dc28ac0b4	Add a sysctl net.inet.tcp.nolocaltimewait that allows to suppress creating a compress TIME WAIT states, if both connection endpoints are local. Default is off.	2006-09-08 13:09:15 +00:00
ru	0d416557b8	Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).	2006-09-07 13:06:00 +00:00
andre	c9b5882a6e	Second step of TSO (TCP segmentation offload) support in our network stack. TSO is only used if we are in a pure bulk sending state. The presence of TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent using TSO. With TSO the TCP header is the same (except for the sequence number) for all generated packets. This makes it impossible to transmit any options which vary per generated segment or packet. The length of TSO bursts is limited to TCP_MAXWIN. The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled. TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and m_pkthdr.tso_segsz set to the segment size (net payload size, not counting IP+TCP headers or TCP options). IPv6 currently lacks a pseudo-header checksum function and thus doesn't support TSO yet. Tested by: Jack Vogel <jfvogel-at-gmail.com> Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-07 12:53:01 +00:00
ru	8f497b5c52	Remove a microoptimization for i386 that was a micropessimization for amd64.	2006-09-07 09:49:08 +00:00
andre	cb05913fd2	First step of TSO (TCP segmentation offload) support in our network stack. o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-06 21:51:59 +00:00
andre	b095262f18	Check inp_flags instead of inp_vflag for INP_ONESBCAST flag. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 19:04:36 +00:00
andre	f044a1949b	Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing. Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 17:12:10 +00:00
glebius	d907e70991	o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru	2006-09-06 13:56:35 +00:00
glebius	f0b3d43b52	Finally fix rev. 1.256 Pointy hat to: glebius	2006-09-05 14:00:59 +00:00
glebius	697f63c7e3	Remove extra parenthesis in last commit. Nitpicked by: ru	2006-09-05 12:22:54 +00:00
glebius	70efc78561	- Make net.inet.tcp.maxtcptw modifiable at run time. - If net.inet.tcp.maxtcptw was ever set explicitly, do not change it if kern.ipc.maxsockets is changed.	2006-09-05 12:08:47 +00:00
thomas	b6505e0884	Fix typo in comment.	2006-09-04 08:32:17 +00:00
jhay	a1d5a9ece3	Recognise IPv6 PIM packets. MFC after: 1 week	2006-08-31 16:56:45 +00:00
mohans	a1c6033ca7	Fix for a bug that causes the computation of "len" in tcp_output() to get messed up, resulting in an inconsistency between the TCP state and so_snd.	2006-08-26 17:53:19 +00:00

... 3 4 5 6 7 ...

3020 Commits