freebsd-skq

Author	SHA1	Message	Date
Robert Watson	f2565d68a4	Move universally to ANSI C function declarations, with relatively consistent style(9)-ish layout.	2007-05-10 15:58:48 +00:00
Randall Stewart	ad81507eed	Two major items here: - All printf that was surrounded by #ifdef SCTP_DEBUG moves to a macro that does all of this. This removes all printfs from the code and makes the code more portable and easier to read. - Static Analysis (cisco) - found a few bugs, but mostly we add checks for NULL pointers and such to make the tool happy. We now pass the Cisco SA tools checks except for where it does not understand tailq/lists. We still need to look at the coverity tools output too (this is like the cisco SA tool) and see if it wants us to fix any other items. Hopefully this will be the last major churn in the code other than bug fixes.	2007-05-09 13:30:06 +00:00
Maxim Konovalov	d30d90dc80	o Fix style(9) bugs introduced in the last commit. Pointed out by: bde	2007-05-09 11:39:46 +00:00
Maxim Konovalov	10fe523e99	o Unbreak "options TCPDEBUG" && "nooptions INET6" kernel build. PR: kern/112517 Submitted by: vd	2007-05-09 06:09:40 +00:00
Randall Stewart	b100636770	- Copyright change, cisco's silly tool wants it to say: "Copyright (c) 2001-2007, by Cisco Systems," instead of *Copyright (c) 2001-2007, Cisco Systems," - Also fix a few straglers that were still in 2006.	2007-05-08 17:01:12 +00:00
Randall Stewart	b0552ae214	- Get rid of the sctp_inpcb_free() "magic numbers", now they are sensible defines that tell what you are directing the function to do.	2007-05-08 15:53:03 +00:00
Randall Stewart	6e55db5445	- Static analyisis fixes for cisco's commit (this is equivilant to the coverity tool.. may even be the same one.. not sure). - A bug in the way sctp_abort() and friends were setting the IP_CLOSE flag.. and NOT passing the last argument as a (,1)... so that things would get freed..	2007-05-08 14:32:53 +00:00
Randall Stewart	17205ecc85	- More macros for OS compatabilty - PR-SCTP would ignore FWD-TSN's above a rwnd's worth of TSN's (1 byte msgs).. this left the peer hopelessly out of sync.. or an attacker. So now we abort the assoc. - New IFN hash, also rename hashes to match addr/ifn now that the vrf has multiple. - Do not enable SCTP_PCB_FLAGS_RECVDATAIOEVNT per default as defined in the Socket API ID. - Export MTU information via sysctl. - Vrf's need table id's. This is default for BSD, but may be other things later when BSD fully supports VRFs. - Additional stream reset bug (caught by cisco dev-test). - Additional validations for the address in sending a message (socket api). -------- and ----- - Fix association notifications not to give the active open side false notifications. - Fix so sendfile and SENDALL will work properly (missing flag to say socket sender is done). - Fix Bug that prevented COOKIES from being retransmitted. - Break out connectx into helper sub-models so that iox routines can reuse the helpers. - When an address is added during system init (non-dynamic mode) make sure that the "defer use" flag is not set. its compiling on XR now :-D Reviewed by: gnn	2007-05-08 00:21:05 +00:00
Robert Watson	9df79d84c1	Rather than selectively zeroing fields in the tcp_debug structure throughout tcp_trace(), zero the entire structure up front. Minor style fixes.	2007-05-07 14:05:23 +00:00
Robert Watson	6db851a281	Since udp_peeraddr() and udp_sockaddr() directly wrap in_setpeeraddr() and in_setsockaddr(), containing only stale comments on why they exist, remove them and initialize the protosw for UDP to directly reference in_setpeeraddr() and in_setsockaddr().	2007-05-07 13:51:24 +00:00
Robert Watson	af1ee11d54	Minor style tweaks.	2007-05-07 13:47:39 +00:00
Robert Watson	434a0d24dd	When setting up timewait state for a TCP connection, don't hold the socket lock over a crhold() of so_cred: so_cred is constant after socket creation, so doesn't require locking to read.	2007-05-07 13:04:25 +00:00
Andre Oppermann	1a5537409f	Remove unused requested_s_scale from struct tcpcb.	2007-05-06 16:04:36 +00:00
Andre Oppermann	3529149e9a	Use existing TF_SACK_PERMIT flag in struct tcpcb t_flags field instead of a decdicated sack_enable int for this bool. Change all users accordingly.	2007-05-06 15:56:31 +00:00
Andre Oppermann	0ca3f933eb	o Remove redundant tcp reassembly check in header prediction code o Rearrange code to make intent in TCPS_SYN_SENT case more clear o Assorted style cleanup o Comment clarification for tcp_dropwithreset()	2007-05-06 15:41:06 +00:00
Andre Oppermann	c5ad39b910	Reorder the TCP header prediction test to check for the most volatile values first to spend less time on a fallback to normal processing.	2007-05-06 15:23:51 +00:00
Andre Oppermann	679d9708b6	Remove the defunct remains of the TCPS_TIME_WAIT cases from tcp_do_segment and change it to a void function. We use a compressed structure for TCPS_TIME_WAIT to save memory. Any late late segments arriving for such a connection is handled directly in the TW code.	2007-05-06 15:16:05 +00:00
Andre Oppermann	37ba9d112a	Fix two comments.	2007-05-06 13:38:25 +00:00
Randall Stewart	6114cd961a	Two bugs: - Locks were not being unlocked when an invalid size chunk is sent in. - When a notification comes in, we cannot use it to look up the fragment interleave stream information since its not on a stream.	2007-05-06 00:01:17 +00:00
Robert Watson	6087c3c29e	Add global mutex tcp_debug_mtx, which will protect global TCP debugging state tcp_debug, tcp_debx. Acquire and drop as required in tcp_trace(). Move to ANSI C function header, correct prototype types so that short TCP state is no longer promoted to int unnecessarily. Add comments. MFC after: 3 weeks	2007-05-04 23:43:18 +00:00
Robert Watson	1cd6eadfbb	Tweak comment at end of tcp_input() when calling into tcp_do_segment(): the pcbinfo lock will be released as well, not just the pcb lock.	2007-05-04 17:45:52 +00:00
Randall Stewart	1bb552e88d	Fixes a missing unlock in the one-2-one hash table, if it was full and a collision occured, then we would leave a inp locked. Also fixes a missing inp unlock if IPSEC was on and it failed during the attach. Bug found by Weongyo Jeong.	2007-05-04 15:19:10 +00:00
Bjoern A. Zeeb	7a92401aea	Add support for filtering on Routing Header Type 0 and Mobile IPv6 Routing Header Type 2 in addition to filter on the non-differentiated presence of any Routing Header. MFC after: 3 weeks	2007-05-04 11:15:41 +00:00
Robert Watson	7abab91135	sblock() implements a sleep lock by interlocking SB_WANT and SB_LOCK flags on each socket buffer with the socket buffer's mutex. This sleep lock is used to serialize I/O on sockets in order to prevent I/O interlacing. This change replaces the custom sleep lock with an sx(9) lock, which results in marginally better performance, better handling of contention during simultaneous socket I/O across multiple threads, and a cleaner separation between the different layers of locking in socket buffers. Specifically, the socket buffer mutex is now solely responsible for serializing simultaneous operation on the socket buffer data structure, and not for I/O serialization. While here, fix two historic bugs: (1) a bug allowing I/O to be occasionally interlaced during long I/O operations (discovere by Isilon). (2) a bug in which failed non-blocking acquisition of the socket buffer I/O serialization lock might be ignored (discovered by sam). SCTP portion of this patch submitted by rrs.	2007-05-03 14:42:42 +00:00
Randall Stewart	d06c82f169	- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave (0-2). - Codenomicon security test updates - length checks and such. - Bug in stream reset (2 actually). - setpeerprimary could unlock a null pointer, fixed. - Added a flag in the pcb so netstat can see if we are listening easier. Obtained from: (some of the Listen changes from Weongyo Jeong)	2007-05-02 12:50:13 +00:00
Robert Watson	84ca8aa609	Remove unused pcbinfo arguments to in_setsockaddr() and in_setpeeraddr().	2007-05-01 16:31:02 +00:00
Robert Watson	712fc218a0	Rename some fields of struct inpcbinfo to have the ipi_ prefix, consistent with the naming of other structure field members, and reducing improper grep matches. Clean up and comment structure fields in structure definition.	2007-04-30 23:12:05 +00:00
Maxim Konovalov	1e2f57057d	o Kill EOLWS while I'm here.	2007-04-30 20:26:11 +00:00
Maxim Konovalov	38ec733c53	o Fix strtoul() error conditions check. PR: kern/108211 Submitted by: Yong Tang MFC after: 2 weeks	2007-04-30 20:22:11 +00:00
Andre Oppermann	9fa198bead	o Fix INP lock leak in the minttl case o Remove indirection in the decision of unlocking inp o Further annotation of locking in tcp_input()	2007-04-23 19:41:47 +00:00
Randall Stewart	ee7f985774	Fixes cut and paste bug using wrong pointer reference.	2007-04-23 00:51:49 +00:00
Randall Stewart	58967d8d46	Moves the PCB features and flags from sctp_pcb.h to sctp.h so that netstat can access and display these values.	2007-04-22 12:12:38 +00:00
Randall Stewart	9a6142d8cd	- Somehow the disable fragment option got lost. We could set/clear it but would not do it. Now we will. - Moved to latest socket api for extended sndrcv info struct. - Moved to support all new levels of fragment interleave.	2007-04-22 11:06:27 +00:00
Andre Oppermann	df47e4377b	o Remove unncessary TOF_SIGLEN flag from struct tcpopt o Correctly set to->to_signature in tcp_dooptions() o Update comments	2007-04-20 15:28:01 +00:00
Andre Oppermann	7824d002c0	Add more KASSERT's.	2007-04-20 15:21:29 +00:00
Andre Oppermann	0d957bba48	o Remove unused and redundant TCP option definitions o Replace usage of MAX_TCPOPTLEN with the correctly constructed and derived MAX_TCPOPTLEN	2007-04-20 15:08:09 +00:00
Andre Oppermann	4d6e713043	Remove bogus check for accept queue length and associated failure handling from the incoming SYN handling section of tcp_input(). Enforcement of the accept queue limits is done by sonewconn() after the 3WHS is completed. It is not necessary to have an earlier check before a connection request enters the SYN cache awaiting the full handshake. It rather limits the effectiveness of the syncache by preventing legit and illegit connections from entering it and having them shaken out before we hit the real limit which may have vanished by then. Change return value of syncache_add() to void. No status communication is required.	2007-04-20 14:34:54 +00:00
Andre Oppermann	e207f80039	Simplifly syncache_expand() and clarify its semantics. Zero is returned when the ACK is invalid and doesn't belong to any registered connection, either in syncache or through SYN cookies. True but a NULL struct socket is returned when the 3WHS completed but the socket could not be created due to insufficient resources or limits reached. For both cases an RST is sent back in tcp_input(). A logic error leading to a panic is fixed where syncache_expand() would free the mbuf on socket allocation failure but tcp_input() later supplies it to tcp_dropwithreset() to issue a RST to the peer. Reported by: kris (the panic)	2007-04-20 13:51:34 +00:00
Andre Oppermann	0a5df51410	Only update TCP timestamp on SYN duplication if it is present on current SYN in syncache_add(). Otherwise disable timestamps.	2007-04-20 13:36:48 +00:00
Andre Oppermann	c73f70b728	o Plug memory leak in syncache_add() on MAC label allocation failure. o Simplify code flow with 'done' goto label. o Remove mbuf argument from syncache_respond(). It doesn't make use of it.	2007-04-20 13:30:08 +00:00
Randall Stewart	f1f73e5718	- More work on making send lock contention. - Removed free-oqueue cache. - Fix counter for sq entries - Increased the amount of information retained on ASOC_TSN logging on the association. - Made it so with the ASOC_TSN logging on sending or recieving an abort we dump the log. - Went through and added invariant's around some panic's that needed them. - decrements went to atomic_subtact_int instead of add -1 - Removed residual count increment that threw off a strm oq count. - Tracks and complaints if we don't have a LAST fragment and clean up the sp structure. - Track a new stat that counts number of abandoned msgs that happen if you close without reading. - Fix lookup of frag point to be aware of a 0 assoc-id. Reviewed by: gnn	2007-04-19 11:28:43 +00:00
Andre Oppermann	bbf4e1cb47	Make tcp_twrespond() use tcp_addoptions() instead of a home grown version.	2007-04-18 18:14:39 +00:00
Andre Oppermann	9eab54debf	When we run into the syncache entry limits syncache_add() tries to free the oldest entry in the current bucket row. The global entry limit may be smaller than the bucket rows and their limit combined however. Thus only try to free a syncache entry if we found one in this bucket row. Reported by: kris	2007-04-17 15:25:14 +00:00
Robert Watson	c9791cfb3e	Shorten text string for ip_fw2 dynamic rules zone by removing the word "zone", which is generally not present in zone names. This reduces the incidence of line-wrapping in "vmstat -z " using 80-column displays. MFC after: 3 days	2007-04-17 09:28:36 +00:00
Robert Watson	215c8d75b8	Remove unused variable tcbinfo_mtx.	2007-04-15 21:03:23 +00:00
Randall Stewart	f1d6e6dc71	Fix stupid syntax error - Pointy hat to me :-(	2007-04-15 13:03:14 +00:00
Randall Stewart	478d3f0901	- Add more comments to sctps_stats struture in sctp_uio.h - Fix bug that prevented EEOR mode from working and simplified the can_we_split code in the process. - Reduce lock contention for the tcb_send_lock. I did this especially for EEOR mode, still need to look at why I need a lock when removing from the tailq and the ->next is NOT null. A lock fixes it but it implies a bug yet exists. - Activated Andre's proposed changes to better use the mbuf infrastructure. - Fixed places that were not using the aloc macro's to take advantage of the per assoc cache. - Adds ifdef fix so any logging will enable stat_logging to get the right data structures in place (suggested by Max Laier).	2007-04-15 11:58:26 +00:00
Max Laier	d0cf96b407	Fix a typeo - unbreak the build.	2007-04-14 18:27:34 +00:00
Randall Stewart	c105859eee	- fix source address selection when picking an acceptable address - name change of prefered -> preferred - CMT fast recover code added. - Comment fixes in CMT. - We were not giving a reason of cant_start_asoc per socket api if we failed to get init/or/cookie to bring up an assoc. Change so we don't just give a generic "comm lost" but look at actual states of dying assoc. - change "crc32" arguments to "crc32c" to silence strict/noisy compiler warnings when crc32() is also declared - A few minor tweaks to get the portable stuff truely portable for sctp6_usrreq.c :-D - one-2-one style vrf match problem. - window recovery would leave chks marked for retran during window probes on the sent queue. This would then cause an out-of-order problem and assure that the flight size "problem" would occur. - Solves a flight size logging issue that caused rwnd overruns, flight size off as well as false retransmissions.g - Macroize the up and down of flight size. - Fix a ECNE bug in its counting. - The strict_sacks options was causing aborts when window probing was active, fix to make strict sacks a bit smarter about what the next unsent TSN is. - Fixes a one-2-one wakeup bug found by Martin Kulas. - If-defed out form, Andre's copy routines pending his commit of at least m_last().. need to adjust for 6.2 as well.. since m_last won't exist. Reviewed by: gnn	2007-04-14 09:44:09 +00:00
Ruslan Ermilov	7480de4305	Make "struct tcp_timer" visible only to the kernel, and unbreak world.	2007-04-11 14:08:42 +00:00
Andre Oppermann	b8152ba793	Change the TCP timer system from using the callout system five times directly to a merged model where only one callout, the next to fire, is registered. Instead of callout_reset(9) and callout_stop(9) the new function tcp_timer_activate() is used which then internally manages the callout. The single new callout is a mutex callout on inpcb simplifying the locking a bit. tcp_timer() is the called function which handles all race conditions in one place and then dispatches the individual timer functions. Reviewed by: rwatson (earlier version)	2007-04-11 09:45:16 +00:00
Robert Watson	6493245ded	Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser checks to see whether bind() can reuse a port/address combination while it's already in use (for some definition of use).	2007-04-10 15:58:38 +00:00
Paolo Pisati	c326cd0e62	Prevent the usage of an uninitialized variable: do not accept StartMediaTx message before an OpnRcvChnAck message was received. Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days Found with: Coverity Prevent(tm) CID: 498	2007-04-07 09:52:36 +00:00
Paolo Pisati	f4296f2246	Silence Coverity about an unused variable. Reviewed by: glebius Approved by: glebius (mentor) MFC after: 3 days CID: 538	2007-04-07 09:47:39 +00:00
Andre Oppermann	995a77176f	Add INP_INFO_UNLOCK_ASSERT() and use it in tcp_input(). Also add some further INP_INFO_WLOCK_ASSERT() while there.	2007-04-04 18:30:16 +00:00
Andre Oppermann	0c38fd0a7a	Move last tcpcb initialization for the inbound connection case from tcp_input() to syncache_socket() where it belongs and the majority of it already happens. The "tp->snd_up = tp->snd_una" is removed as it is done with the tcp_sendseqinit() macro a few lines earlier.	2007-04-04 16:13:45 +00:00
Andre Oppermann	beaa515e95	Some local and style(9) cleanups.	2007-04-04 15:30:31 +00:00
Andre Oppermann	5dd9dfefd6	Retire unused TCP_SACK_DEBUG.	2007-04-04 14:44:15 +00:00
Andre Oppermann	b728e90260	In tcp_dooptions() skip over SACK options if it is a SYN segment.	2007-04-04 14:39:49 +00:00
Alexander Kabaev	edb2e5dca3	Include string.h for non-kernel builds to get proper memcpy prototype.	2007-04-04 03:16:59 +00:00
Alexander Kabaev	d8164209b3	Include string.h for non-kernel builds to get proper strcpy, strlen prototypes.	2007-04-04 03:14:15 +00:00
Alexander Kabaev	9160afee7c	Do not assign result of (char ) cast to u_char variable.	2007-04-04 03:10:42 +00:00
Julian Elischer	1bd69ee131	Since we switched to using monatomically increasing timestamps, they have been reported back to the userland as being in 1970. Add boot time to the timestamp to give the time in the scale of the 'current' real timescale. Not perfect if you change the time a lot but good enough to keep all the rules correct relative to each other correct in terms of time relative to "now".	2007-04-03 22:45:50 +00:00
Randall Stewart	bff64a4db3	- fixed several places where we did not release INP locks. - fixed a refcount bug in the new ifa structures. - use vrf's from default stcb or inp whenever possible. - Address limits raised to account for a full IP fragmented packet (1000 addresses). - flight size correcting updated to include one message only and to handle case where the peer does not cumack the next segment aka lists 1/1 in sack blocks.. - Various bad init/init-ack handling could cause a panic since we tried to unlock the destroyed mutex. Fixes so we properly exit when we need to destroy an assoc. (Found by Cisco DevTest team :D) - name rename in src-addr-selection from pass to sifa. - route structure typedef'd to allow different platforms and updated into sctp_os_bsd file. - Max retransmissions a chunk can be made added. Reviewed by: gnn	2007-04-03 11:15:32 +00:00
Randall Stewart	5e54f665f0	- Found bug in min split point bundling which caused incorrect, non-bundlable fragmentation. - Added min residual to better control split points for both how big a msg must be as well as how much needs to be left over. - With our new algo in place, we need to implicitly set "end of msg" on the sp-> structure otherwise we end up with "hung" associations. - Room reserved up front in IP header by pushing IP header to back of mbuf. - Fix so FR's peg count of retransmissions needed. - Fix so an unlucky chunk that never gets across will kill the assoc via the kill timer and send an abort too. - Fix bug in sctp_input which can result in a crash. - Do not strip off IP options anymore. - Clean up sctp_calculate_rto(). - Get rid of unused sysctl. - Fixed so we discard all M-Cast - Fixed so port check done AFTER checksum - Fixed bug in fragmentation code that prevented us from fragmenting a small complete message when we needed to. - Window probes were not marked back to unsent and flight adjusted when a sack came in with no window change or accepting of the probe data. We now fix this with having a mark on the net and the chunk so we can clear it out when the sack arrives forcing it to retran just like it was "new" this improves the handling of window probes, which were dropped by the receiver. - Tighten AUTH protocol error checks during INIT/INIT-ACK exchange	2007-03-31 11:47:30 +00:00
Bruce M Simpson	f7e083af90	Fix a bug in IPv4 address configuration exposed by refcounting. * Join the IPv4 all-hosts multicast group 224.0.0.1 once only; that is, when an IPv4 address is first configured on an interface. * Do not join it for subsequent IPv4 addresses as this violates IGMP. * Be sure to leave the group when all IPv4 addresses have been removed from the interface. * Add two DIAGNOSTIC printfs related to the issue. Further care and attention is needed in this area; it is suggested that netinet's attachment to the ifnet structure be compartmentalized and non-implicit. Bug found by: andre MFC after: 1 month	2007-03-29 21:39:22 +00:00
Andre Oppermann	1929eae1cc	When blackholing do a 'dropunlock' in the new world order to prevent the INP_INFO_LOCK from leaking. Reported by: ache Found by: rwatson	2007-03-28 12:58:13 +00:00
Robert Watson	77c78838f0	Remove stale comment about not enabling inpcb and inpcbinfo lock assertions when IPv6 is enabled. MFC after: 3 days	2007-03-28 00:50:20 +00:00
Andre Oppermann	07b64b901a	In tcp_sack_doack() remove too tight KASSERT() added in last revision. This function may be called without any TCP SACK option blocks present. Protect iteration over SACK option blocks by checking for SACK options present flag first. Bug reported by: wkoszek, keramida, Nicolas Blais	2007-03-25 23:27:26 +00:00
Robert Watson	30916a2d1d	Replace a comment about RSVP/mrouting with a different but similar comment explaining that some more locking is needed. The routing pieces are done, but there is an interlocking issue between optionally compiled code and mandatory code. Spotted by: kris	2007-03-25 21:49:50 +00:00
Maxim Konovalov	14739780bd	o Use a define for a buffer size. Prodded by: db o Add missed vars for TCPDEBUG in tcp_do_segment(). Prodded by: tinderbox	2007-03-24 22:15:02 +00:00
Andre Oppermann	302ce8d690	Split tcp_input() into its two functional parts: o tcp_input() now handles TCP segment sanity checks and preparations including the INPCB lookup and syncache. o tcp_do_segment() handles all data and ACK processing and is IPv4/v6 agnostic. Change all KASSERT() messages to ("%s: ", __func__). The changes in this commit are primarily of mechanical nature and no functional changes besides the function split are made. Discussed with: rwatson	2007-03-23 20:16:50 +00:00
Andre Oppermann	4dfdffe9e2	Tidy up some code to conform better to surroundings and style(9), 0 = NULL and space/tab.	2007-03-23 19:11:22 +00:00
Andre Oppermann	fc30a25199	Bring SACK option handling in tcp_dooptions() in line with all other options and ajust users accordingly.	2007-03-23 18:33:21 +00:00
Bruce M Simpson	73ec8173eb	Purge two redundant case labels.	2007-03-23 09:43:36 +00:00
Gleb Smirnoff	1daaa65d3f	Remove global list of all llinfo_arp entries and use a callout per instance expiry of the ARP entries. Since we no longer abuse the IPv4 radix head lock, we can now enter arp_rtrequest() with a lock held on an arbitrary rt_entry. Reviewed by: bms	2007-03-22 10:37:53 +00:00
Andre Oppermann	ad3f9ab320	ANSIfy function declarations and remove register keywords for variables. Consistently apply style to all function declarations.	2007-03-21 19:37:55 +00:00
Andre Oppermann	f7608d9e7f	Match up SYSCTL declarations in style.	2007-03-21 19:34:12 +00:00
Andre Oppermann	eec9d82d8e	Subtract optlen in the maximum length check for TSO and finally avoid slightly oversized TSO mbuf chains. Submitted by: kmacy	2007-03-21 19:04:07 +00:00
Andre Oppermann	b10fbdeafa	Tidy up IPFIREWALL_FORWARD sections and comments.	2007-03-21 18:56:03 +00:00
Andre Oppermann	794235b737	Update and clarify comments in first section of tcp_input().	2007-03-21 18:52:58 +00:00
Andre Oppermann	db33b3e6a7	Tidy up the ACCEPTCONN section of tcp_input(), ajust comments and remove old dead T/TCP code.	2007-03-21 18:49:43 +00:00
Andre Oppermann	574b696407	Tidy up tcp_log_in_vain and blackhole.	2007-03-21 18:36:49 +00:00
Andre Oppermann	85c497918c	Make TCP_DROP_SYNFIN a standard part of TCP. Disabled by default it doesn't impede normal operation negatively and is only a few lines of code. It's close relatives blackhole and log_in_vain aren't options either.	2007-03-21 18:25:28 +00:00
Andre Oppermann	e406f5a1c9	Remove tcp_minmssoverload DoS detection logic. The problem it tried to protect us from wasn't really there and it only bloats the code. Should the problem surface in the future we can simply resurrect it from cvs history.	2007-03-21 18:05:54 +00:00
Bruce M Simpson	c7547d1aaf	Increase default size of raw IP send and receive buffers to the same as udp_sendspace, to avoid a situation where jumbograms (datagrams > 9KB) are unnecessarily fragmented. A common use case for this is OSPF link-state database synchronization during adjacency bringup on a high speed network with a large MTU. It is not possible to auto-tune this setting until a socket is bound to a given interface, and because the laddr part of the inpcb tuple may be overridden, it makes no sense to do so. Applications may request a larger socket buffer size by using the SO_SENDBUF and SO_RECVBUF socket options. Certain applications such as Quagga ospfd do not probe for interface MTU and therefore do not increase SO_SENDBUF in this use case. XORP is not affected by this problem as it preemptively uses SO_SENDBUF and SO_RECVBUF to account for any possible additional latency in XRL IPC. PR: kern/108375 Requested by: Vladimir Ivanov MFC after: 1 week	2007-03-20 13:15:20 +00:00
Randall Stewart	62c1ff9c48	- window update sacks sent incorrectly after shutdown which caused extra abort from peer. - RTT time calculation was not being done in express sack handling since it refered to an unused variable (rto_pending). Removed variable. - socket buffer high water access macro-ized.	2007-03-20 10:23:11 +00:00
Bruce M Simpson	ec002fee99	Implement reference counting for ifmultiaddr, in_multi, and in6_multi structures. Detect when ifnet instances are detached from the network stack and perform appropriate cleanup to prevent memory leaks. This has been implemented in such a way as to be backwards ABI compatible. Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti() is unable to detect interface removal by design, as it performs searches on structures which are removed with the interface. With this architectural change, the panics FreeBSD users have experienced with carp and pfsync should be resolved. Obtained from: p4 branch bms_netdev Reviewed by: andre Sponsored by: Garance A Drosehn Idea from: NetBSD MFC after: 1 month	2007-03-20 00:36:10 +00:00
Andre Oppermann	6489fe6553	Match up SYSCTL declaration style.	2007-03-19 19:00:51 +00:00
Andre Oppermann	8b8ed7a78e	Match up SYSCTL_INT declarations in style.	2007-03-19 18:42:27 +00:00
Andre Oppermann	4e02375908	Maintain a pointer and offset pair into the socket buffer mbuf chain to avoid traversal of the entire socket buffer for larger offsets on stream sockets. Adjust tcp_output() make use of it. Tested by: gallatin	2007-03-19 18:35:13 +00:00
Randall Stewart	6a27c37636	Adds a hash table to speed local address lookup on a per VRF basis (BSD has only one VRF currently). Hash table is sized to 16 but may need to be adjusted for machines with large numbers of addresses. Reviewed by: gnn	2007-03-19 11:11:16 +00:00
Randall Stewart	132dea7d5a	- errno -> becomes error in sctp_output.c and sctputil.c - SB_CLEAR macro defined and used for sb clearing. - Fix for CMT express_sack_handling did not do proper pseudo-cumack updates. - Get rid of extraneous function that was never used ip_2_ip6_hdr() - Fixed source address selection bug (initialization problem). - Source address selection debug added.	2007-03-19 06:53:02 +00:00
Bruce M Simpson	27f8eaaf03	In IPv4 fast forwarding path, send ICMP unreachable messages for routes which have RTF_REJECT set and a zero expiry timer. PR: kern/109246 MFC after: 10 days Submitted by: Ingo Flaschberger	2007-03-18 23:05:20 +00:00
Andre Oppermann	9daba64ed5	Unbreak IPv6 after consolidation of TCP options insertion. Submitted by: tegge	2007-03-17 11:52:54 +00:00
Kip Macy	9ad2c608c2	Fix the most obvious of the bugs introduced by recent syncache changes - *ip is not initialized in the case of inet6 connection, but ip->ip_len is being changed anyway Now the question is, why does it think an ipv4 connection is an ipv6 connection? xemacs still doesn't work over X11 forwarding, but the kernel no longer panics.	2007-03-17 06:40:09 +00:00
Robert Watson	8d0d6d112f	Remove unused and #if 0'd net.inet.tcp.tcp_rttdflt sysctl.	2007-03-16 13:42:26 +00:00
Andre Oppermann	02a1a64357	Consolidate insertion of TCP options into a segment from within tcp_output() and syncache_respond() into its own generic function tcp_addoptions(). tcp_addoptions() is alignment agnostic and does optimal packing in all cases. In struct tcpopt rename to_requested_s_scale to just to_wscale. Add a comment with quote from RFC1323: "The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment itself is never scaled." Reviewed by: silby, mohans, julian Sponsored by: TCP/IP Optimization Fundraise 2005	2007-03-15 15:59:28 +00:00
Randall Stewart	42551e993f	- Sysctl's move to seperate file - moved away from ifn/ifa access to sctp_ifa/sctp_ifn built and managed by the add-ip code. - cleaned up add-ip code to use the iterator - made iterator be a thread, which enables auto-asconf now. - rewrote and cleaned up source address selection (also made it use new structures). - Fixed a couple of memory leaks. - DACK now settable as to how many packets to delay as well as time. - connectx() to latest socket API, new associd arg. - Fixed issue with revoking and loosing potential to send when we inflate the flight size. We now inflate the cwnd too and deflate it later when the revoked chunk is sent or acked. - Got rid of some temp debug code - src addr selection moved to a common file (sctp_output.c) - Support for simple VRF's (we have support for multi-vfr via compile switch that is scrubbed from BSD but we won't need multi-vrf until we first get VRF :-D) - Rest of mib work for address information now done - Limit number of addresses in INIT/INIT-ACK to a #def (30). Reviewed by: gnn	2007-03-15 11:27:14 +00:00
Bruce M Simpson	5c51891ef7	Diff reduction with NetBSD; use IN_LOCAL_GROUP() to check if an address is within the locally scoped multicast range 224.0.0.0/24.	2007-03-15 08:44:22 +00:00
Bruce M Simpson	1b7f038498	Fix IP_SENDSRCADDR semantics. * To use this option with a UDP socket, it must be bound to a local port, and INADDR_ANY, to disallow possible collisions with existing udp inpcbs bound to the same port on other interfaces at send time. * If the socket is bound to INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be rejected as it is ambiguous. * If the socket is bound to an address other than INADDR_ANY, specifying IP_SENDSRCADDR with INADDR_ANY will be disallowed by in_pcbbind_setup(). Reviewed by: silence on -net Tested with: src/tools/regression/netinet/ipbroadcast MFC after: 4 days	2007-03-08 15:26:54 +00:00
Qing Li	95ad8418dc	This patch is provided to fix a couple of deployment issues observed in the field. In one situation, one end of the TCP connection sends a back-to-back RST packet, with delayed ack, the last_ack_sent variable has not been update yet. When tcp_insecure_rst is turned off, the code treats the RST as invalid because last_ack_sent instead of rcv_nxt is compared against th_seq. Apparently there is some kind of firewall that sits in between the two ends and that RST packet is the only RST packet received. With short lived HTTP connections, the symptom is a large accumulation of connections over a short period of time . The +/-(1) factor is to take care of implementations out there that generate RST packets with these types of sequence numbers. This behavior has also been observed in live environments. Reviewed by: silby, Mike Karels MFC after: 1 week	2007-03-07 23:21:59 +00:00
Bruce M Simpson	44c4d7b2cb	Purge an out-of-date comment.	2007-03-04 16:32:19 +00:00
Bruce M Simpson	a3fd02d88b	Fix undirected broadcast sends for the case where SO_DONTROUTE has also been set at the socket layer, in our somewhat convoluted IPv4 source selection logic in ip_output(). IP_ONESBCAST is actually a special case of SO_DONTROUTE, as 255.255.255.255 must always be delivered on a local link with a TTL of 1. If IP_ONESBCAST has been set at the socket layer, also perform destination interface lookup for point-to-point interfaces based on the destination address of the link; previously it was not possible to use the option with such interfaces; also, the destination/broadcast address fields map to the same field within struct ifnet, which doesn't help matters. One more valid fix going forward for these issues is to treat 255.255.255.255 as a destination in its own right in the forwarding trie. Other implementations do this. It fits with the use of multiple paths, though it then becomes necessary to specify interface preference. This hack will eventually go away when that comes to pass. Reviewed by: andre MFC after: 1 week	2007-03-01 13:29:30 +00:00
Andre Oppermann	6aa5b62315	Prevent TSO mbuf chain from overflowing a few bytes by subtracting the TCP options size before the TSO total length calculation. Bug found by: kmacy	2007-03-01 13:12:09 +00:00
Mohan Srinivasan	4a32dc299f	In the SYN_SENT case, Initialize the snd_wnd before the call to tcp_mss(). The TCP hostcache logic in tcp_mss() depends on the snd_wnd being initialized.	2007-02-28 20:48:00 +00:00
Bruce M Simpson	85e0793497	Style: Move declaration of subsystem mutex to where other mutexes are in this file, and use macros for dealing with it.	2007-02-28 20:02:24 +00:00
Gleb Smirnoff	8bec3467b1	Add EHOSTDOWN and ENETUNREACH to the list of soft errors, that shouldn't be returned up to the caller. PR: 100172 Submitted by: "Andrew - Supernews" <andrew supernews.net> Reviewed by: rwatson, bms	2007-02-28 12:47:49 +00:00
Gleb Smirnoff	72757d9a53	Toss the code, that handles errors from ip_output(), to make it more readable: - Merge two embedded if() into one. - Introduce switch() block to handle different kinds of errors. Reviewed by: rwatson, bms	2007-02-28 12:41:49 +00:00
Bruce M Simpson	ad3b9f70ed	Add INADDR_ALLRPTS_GROUP define for 224.0.0.22 for future IGMPv3 support. Obtained from: OpenSolaris	2007-02-27 14:45:37 +00:00
Mohan Srinivasan	7c72af8770	Reap FIN_WAIT_2 connections marked SOCANTRCVMORE faster. This mitigate potential issues where the peer does not close, potentially leaving thousands of connections in FIN_WAIT_2. This is controlled by a new sysctl fast_finwait2_recycle, which is disabled by default. Reviewed by: gnn, silby.	2007-02-26 22:25:21 +00:00
Bruce M Simpson	410052125e	Unlock a mutex which should be unlocked before returning. MFC after: 1 week	2007-02-25 14:22:03 +00:00
Bruce M Simpson	6be2e366d6	Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel. It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko, if and only if IPv6 support is enabled for loadable modules. Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).	2007-02-24 11:38:47 +00:00
Robert Watson	afdb42748d	Rename two identically named log_in_vain variables: tcp_input.c's static log_in_vain to tcp_log_in_vain, and udp_usrreq's global log_in_vain to udp_log_in_vain. MFC after: 1 week	2007-02-20 10:20:03 +00:00
Robert Watson	3329b23659	Gratuitous UDP restyling toward style(9) in 7.x.	2007-02-20 10:13:11 +00:00
Robert Watson	03dc38a48b	#ifdef INET6 printing of inpcb IPv6 addresses in DDB. Patch committed with minor adjustments. Submitted by: Florian C. Smeets <flo at kasimir dot com>	2007-02-18 08:57:23 +00:00
Robert Watson	497057eeea	Add "show inpcb", "show tcpcb" DDB commands, which should come in handy for debugging sblock and other network panics.	2007-02-17 21:02:38 +00:00
Robert Watson	8ca5b13f2f	Remove unused inp6_ifindex field from inpcb, as well as unused macro shortcut for it.	2007-02-16 14:09:24 +00:00
Robert Watson	1f9b46facf	Remove unused in6p_ip6_hlim macro shortcut for non-present inp_depend6.inp6_hlim field in the inpcb.	2007-02-16 13:56:06 +00:00
Randall Stewart	f42a358a6f	- Copyright updates (aka 2007) - ZONE get now also take a type cast so it does the cast like mtod does. - New macro SCTP_LIST_EMPTY, which in bsd is just LIST_EMPTY - Removal of const in some of the static hmac functions (not needed) - Store length changes to allow for new fields in auth - Auth code updated to current draft (this should be the RFC version we think). - use uint8_t instead of u_char in LOOPBACK address comparison - Some u_int32_t converted to uint32_t (in crc code) - A bug was found in the mib counts for ordered/unordered count, this was fixed (was referencing a freed mbuf). - SCTP_ASOCLOG_OF_TSNS added (code will probably disappear after my testing completes. It allows us to keep a small log on each assoc of the last 40 TSN's in/out and stream assignment. It is NOT in options and so is only good for private builds. - Some CMT changes in prep for Jana fixing his problem with reneging when CMT is enabled (Concurrent Multipath Transfer = CMT). - Some missing mib stats added. - Correction to number of open assoc's count in mib - Correction to os_bsd.h to get right sha2 macros - Add of special AUTH_04 flags so you can compile the code with the old format (in case the peer does not yet support the latest auth code). - Nonce sum was incorrectly being set in when ecn_nonce was NOT on. - LOR in listen with implicit bind found and fixed. - Moved away from using mbuf's for socket options to using just data pointers. The mbufs were used to harmonize NetBSD code since both Net and Open used this method. We have decided to move away from that and more conform to FreeBSD style (which makes more sense). - Very very nasty bug found in some of my "debug" code. The cookie_how collision case tracking had an endless loop in it if you got a second retransmission of a cookie collision case. This would lock up a CPU .. ugly.. - auth function goes to using size_t instead of int which conforms to socketapi better - Found the nasty bug that happens after 9 days of testing.. you get the data chunk, deliver it and due to the reference to a ch-> that every now and then has been deleted (depending on the postion in the mbuf) you have an invalid ch->ch.flags.. and thus you don't advance the stream sequence number.. so you block the stream permanently. The fix is to make local variables of these guys and set them up before you have any chance of trimming the mbuf. - style fix in sctp_util.h, not sure how this got bad maybe in the last patch? (aka it may not be in the real source). - Found interesting bug when using the extended snd/rcv info where we would get an error on receiving with this. Thats because it was NOT padded to the same size as the snd_rcv info. We increase (add the pad) so the two structs are the same size in sctp_uio.h - In sctp_usrreq.c one of the most common things we did for socket options was to cast the pointer and validate the size. This as been macro-ized to help make the code more readable. - in sctputil.c two things, the socketapi class found a missing flag type (the next msg is a notification) and a missing scope recovery was also fixed. Reviewed by: gnn	2007-02-12 23:24:31 +00:00
Bruce M Simpson	79760c6bdf	Use MAXTTL. Obtained from: NetBSD	2007-02-10 23:15:28 +00:00
Bruce M Simpson	7a90229b61	If the rendezvous point for a group is not specified, do not send IGMPMSG_WHOLEPKT notifications to the userland PIM routing daemon, as an optimization to mitigate the effects of high multicast forwarding load. This is an experimental change, therefore it must be explicitly enabled by setting the sysctl/tunable net.inet.pim.squelch_wholepkt to a non-zero value. The tunable may be set from the loader or from within the kernel environment when loading ip_mroute.ko as a module. Submitted by: edrt <edrt at citiz.net> See also: http://mailman.icsi.berkeley.edu/pipermail/xorp-users/2005-June/000639.html	2007-02-10 14:48:42 +00:00
Bruce M Simpson	0948f0a28f	Build PIM by default as part of the IPv4 multicast forwarding path. Make PIM dynamically loadable by using encap_attach_func(). PIM may now be loaded into a GENERIC kernel. Tested with: ports/net/pimdd && tcpreplay && wireshark Reviewed by: Pavlin Radoslavov	2007-02-10 13:59:13 +00:00
Bruce M Simpson	f2bf119ead	Store the cached route in vifp in the normal send_packet() case. The VIFF_TUNNEL case no longer exists, therefore this field is free to use, and its use eliminates a static data member.	2007-02-08 23:05:08 +00:00
Bruce M Simpson	162c78d481	Nuke the token bucket filter code. Attempting to request rate limiting by the token bucket filter will result in EINVAL being returned. If you want to rate-limit traffic in future, use ALTQ or dummynet; this isn't a general purpose QoS engine. Preserve the now unused fields in struct vif so as to avoid having to recompile netstat(1) and other tools. Reviewed by: Pavlin Radslavov, Bill Fenner	2007-02-08 22:58:01 +00:00
Bruce M Simpson	aab7b273bf	eliminate redundant macro MC_SEND()	2007-02-07 20:36:33 +00:00
Bruce M Simpson	78cb087e34	Remove support for IPIP tunnels in IPv4 multicast forwarding. XORP has never used them; with mrouted, their functionality may be replaced by explicitly configuring gif(4) instances and specifying them with the 'phyint' keyword. Bump __FreeBSD_version to 700030, and update UPDATING. A doc update is forthcoming. Discussed on: net Reviewed by: fenner MFC after: 3 months	2007-02-07 16:04:13 +00:00
Bruce M Simpson	64e740a352	When fast-forwarding is enabled, do not forward directed IPv4 broadcasts to locally attached broadcast networks. Note well: This relies on the layer 2 route cloning behaviour in BSD. PR: 98799 Tested by: Dmitry Sergienko MFC after: 1 week	2007-02-05 00:15:40 +00:00
Alan Cox	055867a06c	Include opt_ipdivert.h so that the message announcing ipfw correctly describes the state of IPDIVERT.	2007-02-03 22:11:53 +00:00
Bruce M Simpson	d256723b8b	In fast forwarding path, defer processing of 169.254.0.0/16 to ip_input(). See RFC 3927 section 2.7.	2007-02-03 06:46:48 +00:00
Bruce M Simpson	f8429ca2e1	In regular forwarding path, reject packets destined for 169.254.0.0/16 link-local addresses. See RFC 3927 section 2.7.	2007-02-03 06:45:51 +00:00
Bruce M Simpson	d055815799	Comply with RFC 3927, by forcing ARP replies which contain a source address within the link-local IPv4 prefix 169.254.0.0/16, to be broadcast at link layer. Reviewed by: fenner MFC after: 2 weeks	2007-02-02 20:31:44 +00:00
Bruce M Simpson	1baaf8347c	Expose smoothed RTT and RTT variance measurements to userland via socket option TCP_INFO. Note that the units used in the original Linux API are in microseconds, so use a 64-bit mantissa to convert FreeBSD's internal measurements from struct tcpcb from ticks.	2007-02-02 18:34:18 +00:00
Gleb Smirnoff	fbfdcf8735	Since rev. 1.94 of netinet/in.c, the netinet layer frees all its multicast memberships, when interface is detached. Thus, when an underlying interface is detached, we do not need to free our multicast memberships. Reviewed by: bms	2007-02-02 09:39:09 +00:00
Andre Oppermann	6741ecf595	Auto sizing TCP socket buffers. Normally the socket buffers are static (either derived from global defaults or set with setsockopt) and do not adapt to real network conditions. Two things happen: a) your socket buffers are too small and you can't reach the full potential of the network between both hosts; b) your socket buffers are too big and you waste a lot of kernel memory for data just sitting around. With automatic TCP send and receive socket buffers we can start with a small buffer and quickly grow it in parallel with the TCP congestion window to match real network conditions. FreeBSD has a default 32K send socket buffer. This supports a maximal transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer auto scaling and the default values below it supports 20Mbit/s at 100ms and 10Mbit/s at 200ms. That's an improvement of factor 10, or 1000%. For the receive side it looks slightly better with a default of 64K buffer size. New sysctls are: net.inet.tcp.sendbuf_auto=1 (enabled) net.inet.tcp.sendbuf_inc=8192 (8K, step size) net.inet.tcp.sendbuf_max=262144 (256K, growth limit) net.inet.tcp.recvbuf_auto=1 (enabled) net.inet.tcp.recvbuf_inc=16384 (16K, step size) net.inet.tcp.recvbuf_max=262144 (256K, growth limit) Tested by: many (on HEAD and RELENG_6) Approved by: re MFC after: 1 month	2007-02-01 18:32:13 +00:00
Andre Oppermann	087b55ea59	Change the way the advertized TCP window scaling is computed. Instead of upper-bounding it to the size of the initial socket buffer lower-bound it to the smallest MSS we accept. Ideally we'd use the actual MSS information here but it is not available yet. For socket buffer auto sizing to be effective we need room to grow the receive window. The window scale shift is determined at connection setup and can't be changed afterwards. The previous, original, method effectively just did a power of two roundup of the socket buffer size at connection setup severely limiting the headroom for larger socket buffers. Tested by: many (as part of the socket buffer auto sizing patch) MFC after: 1 month	2007-02-01 17:39:18 +00:00
Bruce M Simpson	1976bc4af7	Import macros IN_LINKLOCAL(), IN_PRIVATE(), IN_LOCAL_GROUP(), IN_ANY_LOCAL(). This is not a functional change. IN_LINKLOCAL() tests if an address falls within the IPv4 link-local prefix. IN_PRIVATE() tests if an address falls within an RFC 1918 private prefix. IN_LOCAL_GROUP() tests if an address falls within the statically assigned link-local multicast scope specified in RFC 2365. IN_ANY_LOCAL() tests for either of IN_LINKLOCAL() or IN_LOCAL_GROUP(). As with the existing macros in the FreeBSD netinet stack, comparisons are performed in host-byte order. See also: RFC 1918, RFC 2365, RFC 3927 Obtained from: NetBSD (dyoung@) MFC after: 2 weeks	2007-01-31 14:34:47 +00:00
Gleb Smirnoff	3cf0d02480	Make it possible that carpdetach() unlocks on return. Then, in carp_clone_destroy() we are on a safe side, we don't need to unlock the cif, that can me already non-existent at this point. Reported by: Anton Yuzhaninov <citrin rambler-co.ru>	2007-01-25 18:03:40 +00:00
Gleb Smirnoff	62dae1e917	Spacing.	2007-01-25 17:58:16 +00:00
Randall Stewart	93164cf98c	- most all includes (#include <>) migrate to the sctp_os_bsd.h file - Finally all splxx() are removed - Count error fixed in mapping array which might cause a wrong cumack generation. - Invariants around panic for case D + printf when no invariants. - one-to-one model race condition fixed by using a pre-formed connection and then completing the work so accept won't happen on a non-formed association. - Some additional paranoia checks in sctp_output. - Locks that were missing in the accept code. Approved by: gnn	2007-01-18 09:58:43 +00:00
Randall Stewart	44b7479ba2	- Macroizes the V6ONLY flag check. - Added a short time wait (not used yet) constant - Corrected the type of the crc32c table (it was unsigned long and really is a uint32_t - Got rid of the user of MHeaders until they are truely needed by lower layers. - Fixed an initialization problem in the readq structure (ordering was off). - Found yet another collision bug when the random number generator returns two numbers on one side (during a collision) that are the same. Also added some tracking of cookies that will go away when we know that we have the last collision bug gone. - Fixed an init bug for book_size_scale, that was causing Early FR code to run when it should not. - Fixed a flight size tracking bug that was associated with Early FR but due to above bug also effected all FR's - Fixed it so Max Burst also will apply to Fast Retransmit. - Fixed a bug in the temporary logging code that allowed a static log array overflow - hashinit_flags is now used. - Two last mcopym's were converted to the macro sctp_m_copym that has always been used by all other places - macro sctp_m_copym was converted to upper case. - We now validate sinfo_flags on input (we did not before). - Fixed a bug that prevented a user from sending data and immediately shuting down with one send operation. - Moved to use hashdestroy instead of free() in our macros. - Fixed an init problem in our timed_wait vtag where we did not fully initialize our time-wait blocks. - Timer stops were re-positioned. - A pcb cleanup method was added, however this probably will not be used in BSD.. unless we make module loadable protocols - I think this fixes the mysterious timer bug.. it was a ordering of locks problem in the way we did timers. It now conforms to the timeout(9) manual (except for the _drain part, we had to do this a different way due to locks). - Fixed error return code so we get either CONNREUSED or CONNRESET depending on where one is in progression - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. - Purged an unused clone macro. - Fixed a read erro code issue where we were NOT getting the proper error when the connection was reset. Approved by: gnn	2007-01-15 15:12:10 +00:00
Maxim Konovalov	95ebcabed8	o Increment requests counter right before send out an ARP query actually. Otherwise the code could lead to the spurious EHOSTDOWN errors. PR: kern/107807 Submitted by: Dmitrij Tejblum MFC after: 1 month	2007-01-14 18:44:17 +00:00
Warner Losh	0befead1e0	Marking this as __packed was needed to get the alignment and offset of members right. However, it also said it was aligned(1), which meant that gcc generated really bad code. Mark this as aligned(4). This makes things a little faster on arm (a couple percent), but also saves about 30k on the size of the kernel for arm. I talked about doing this with bde, but didn't check with him before the commit, so I'm hesitant say 'reviewed by: bde'.	2007-01-12 07:23:31 +00:00
Julian Elischer	7e170af886	Remove two lines that somehow snuck back in after testing. ip is now an argument to the function ipfw_log()	2007-01-09 21:03:07 +00:00
Maxim Konovalov	8b5b885047	o One more typo in the comment. PR: kern/107609 Submitted by: Dr. Markus Waldeck	2007-01-06 13:12:24 +00:00
Paolo Pisati	3d2fff0d3d	Prevent adding a rule with a nat action in case IPFIREWALL_NAT was not defined. Reviewed: luigi	2007-01-05 12:15:31 +00:00
Paolo Pisati	61c0e134f5	Wrap ipfw nat support in a new kernel config option named "IPFIREWALL_NAT": this way nat is turned off by default and POLA is preserved. Reviewed by: rwatson	2007-01-03 11:12:54 +00:00
Julian Elischer	3b62120e87	Remove a bunch of dependencies in the IP header being the first thing in the mbuf. First moves toward being able to cope better with having layer 2 (or other encapsulation data) before the IP header in the packet being examined. More commits to come to round out this functionality. This commit should have no practical effect but clears the way for what is coming. Revirewed by: luigi, yar MFC After: 2 weeks	2007-01-02 19:57:31 +00:00
Warner Losh	6796a2d434	Fix typo in comment. Submitted by: remko	2007-01-01 00:35:34 +00:00
Warner Losh	74eb3236c7	Add comment about udp checksums being off in BSD 4.2 compatibility mode. Submitted by: Dr. Markus Waldeck PR: kern/106657	2006-12-31 21:34:53 +00:00
John Baldwin	54e3607de6	Whitespace fix and remove an extra cast.	2006-12-30 17:53:28 +00:00
Paolo Pisati	ff2f6fe80f	Summer of Code 2005: improve libalias - part 2 of 2 With the second (and last) part of my previous Summer of Code work, we get: -ipfw's in kernel nat -redirect_* and LSNAT support General information about nat syntax and some examples are available in the ipfw (8) man page. The redirect and LSNAT syntax are identical to natd, so please refer to natd (8) man page. To enable in kernel nat in rc.conf, two options were added: o firewall_nat_enable: equivalent to natd_enable o firewall_nat_interface: equivalent to natd_interface Remember to set net.inet.ip.fw.one_pass to 0, if you want the packet to continue being checked by the firewall ruleset after being (de)aliased. NOTA BENE: due to some problems with libalias architecture, in kernel nat won't work with TSO enabled nic, thus you have to disable TSO via ifconfig (ifconfig foo0 -tso). Approved by: glebius (mentor)	2006-12-29 21:59:17 +00:00
Randall Stewart	139bc87fda	a) macro-ization of all mbuf and random number access plus timers. This makes the code more portable and able to change out the mbuf or timer system used more easily ;-) b) removal of all use of pkt-hdr's until only the places we need them (before ip_output routines). c) remove a bunch of code not needed due to <b> aka worrying about pkthdr's :-) d) There was one last reorder problem it looks where if a restart occur's and we release and relock (at the point where we setup our alias vtag) we would end up possibly getting the wrong TSN in place. The code that fixed the TSN's just needed to be shifted around BEFORE the release of the lock.. also code that set the state (since this also could contribute). Approved by: gnn	2006-12-29 20:21:42 +00:00
John Baldwin	08651e1f24	Some whitespace nits and remove a few casts.	2006-12-29 14:58:18 +00:00
Paolo Pisati	ccd57eea11	o made in kernel libalias mpsafe o fixed a comment o made in kernel libalias a bit less verbose (disabled automatic logging everytime a new link is added or deleted) Approved by: glebius (mentor)	2006-12-15 12:50:06 +00:00
Randall Stewart	a5d547add3	1) Fixes on a number of different collision case LOR's. 2) Fix all "magic numbers" to be constants. 3) A collision case that would generate two associations to the same peer due to a missing lock is fixed. 4) Added tracking of where timers are stopped. Approved by: gnn	2006-12-14 17:02:55 +00:00
Christian S.J. Peron	826cef3d75	Fix LOR between the syncache and inpcb locks when MAC is present in the kernel. This LOR snuck in with some of the recent syncache changes. To fix this, the inpcb handling was changed: - Hang a MAC label off the syncache object - When the syncache entry is initially created, we pickup the PCB lock is held because we extract information from it while initializing the syncache entry. While we do this, copy the MAC label associated with the PCB and use it for the syncache entry. - When the packet is transmitted, copy the label from the syncache entry to the mbuf so it can be processed by security policies which analyze mbuf labels. This change required that the MAC framework be extended to support the label copy operations from the PCB to the syncache entry, and then from the syncache entry to the mbuf. These functions really should be referencing the syncache structure instead of the label. However, due to some of the complexities associated with exposing this syncache structure we operate directly on it's label pointer. This should be OK since we aren't making any access control decisions within this code directly, we are merely allocating and copying label storage so we can properly initialize mbuf labels for any packets the syncache code might create. This also has a nice side effect of caching. Prior to this change, the PCB would be looked up/locked for each packet transmitted. Now the label is cached at the time the syncache entry is initialized. Submitted by: andre [1] Discussed with: rwatson [1] andre submitted the tcp_syncache.c changes	2006-12-13 06:00:57 +00:00
Bjoern A. Zeeb	7d32aa0cc9	In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument. This is the "+ one more change" missed in the original commit. Noticed by: tinderbox Pointy hat to: me (#1)	2006-12-12 17:44:46 +00:00
Bjoern A. Zeeb	1d54aa3ba9	MFp4: 92972, 98913 + one more change In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.	2006-12-12 12:17:58 +00:00
Bruce M Simpson	3dbee59bd4	Back out revision 1.264. Fixing the IP accounting issue, if we plan to do so, needs to be better thought out; the 'fix' introduces a hash lookup and a possible kernel panic. Reported by: Mark Tinguely	2006-12-10 13:44:00 +00:00
Robert Watson	ece4c06484	Improve style(9) conformance of igmp.c.	2006-12-04 00:41:48 +00:00
Warner Losh	850adc0cd7	Make sure that carp_header is 36 bytes long	2006-12-01 18:37:41 +00:00
Paolo Pisati	5910c1c1b9	Make libalias.conf parsing a bit smarter. This closes PR kern/106112. While here, add mbuf's #includes i forgot in the previous commit. Approved by: gleb	2006-12-01 16:34:53 +00:00
Paolo Pisati	e876228edc	Remove m_megapullup from ng_nat and put it under libalias. Approved by: gleb	2006-12-01 16:27:11 +00:00
Robert Watson	e3fd5ffdf1	Consistently use #ifdef INET6 rather than mixing and matching with #if defined(INET6). Don't comment the end of short #ifdef blocks. Comment cleanup. Line wrap.	2006-11-30 10:54:54 +00:00
Sam Leffler	21367f630d	Change error codes returned by protocol operations when an inpcb is marked INP_DROPPED or INP_TIMEWAIT: o return ECONNRESET instead of EINVAL for close, disconnect, shutdown, rcvd, rcvoob, and send operations o return ECONNABORTED instead of EINVAL for accept These changes should reduce confusion in applications since EINVAL is normally interpreted to mean an invalid file descriptor. This change does not conflict with POSIX or other standards I checked. The return of EINVAL has always been possible but rare; it's become more common with recent changes to the socket/inpcb handling and with finer-grained locking and preemption. Note: there are other instances of EINVAL for this state that were left unchanged; they should be reviewed. Reviewed by: rwatson, andre, ru MFC after: 1 month	2006-11-22 17:16:54 +00:00
Bjoern A. Zeeb	89e7e7e32a	Add SCTP as a known upper layer protocol over v6. We are not yet aware of the protocol internals but this way SCTP traffic over v6 will not be discarded. Reported by: Peter Lei via rrs Tested by: Peter Lei <peterlei cisco.com>	2006-11-13 19:07:32 +00:00
Randall Stewart	7f34832b95	In a true restart case, the send_lock was not being aquired. This meant that when we cleanup the outbound we may have one in transit to be added with the old sequence number. This is bad since then we loose a message :( Also the report_outbound needed to have the right lock when its called which it did not.. I added the lock with of course a flag since we want to have the lock before we call it in the restart case. This also fixed the FIX ME case where, in the cookie collision case, we mark for retransmit any that were bundled with the cookie that was dropped. This also means changes to the output routine so we can assure getting the COOKIE-ACK sent BEFORE we retransmit the Data. Approved by: gnn	2006-11-11 22:44:12 +00:00
Randall Stewart	6a91f103b6	Turns out we would reset the TSN seq counter during a colliding INIT. This if fine except when we have data outstanding... we basically reset it to the previous value it was.. so then we end up assigning the same TSN to two different data chunks. This patch: 1) Finds a missing lock for when we change the stream numbers during COOKIE and INIT-ACK processing.. we were NOT locking the send_buffer.. which COULD cause problems (found by inspection looking for <2>) 2) Fixes a case during a colliding INIT where we incorrectly reset the sending Sequence thus in some cases duplicately assigning a TSN. 3) Additional enhancments to logging so we can see strm/tsn in the receiver AND new tracking to watch what the sender is doing with TSN and STRM seq's. Approved by: gnn	2006-11-11 15:59:01 +00:00
Randall Stewart	de0e935b29	This patch fixes a LOR that happens during INIT-ACK collision. We were calling select_a_tag() inside sctp_send_initate_ack(). During collision cases we have a stcb and thus a SCTP_LOCK. When we call select_a_tag it (below it) locks the INFO lock. We now 1) pre-select the nonce-tie-tags in sctputil.c during setup of a tcb. 2) In the other case where we have to select tags, we unlock after incr the ref cnt (so assoc won't go away0 and then do the tag selection followed by a relock and decr the refcnt. Approved by: gnn	2006-11-10 13:34:55 +00:00
Randall Stewart	08598d7067	Fixes an issue with handling of stream reset. When a reset comes in we need to calculate the length and therefore the number of listed streams (if any) based on the TLV type. Otherwise if we get a retran we could in theory panic by sending a notification to a user with a incorrect list and thus no memory listing the streams. Found in IOS by devtest :-) Approved by: gnn	2006-11-09 21:01:07 +00:00
Randall Stewart	03b0b02163	-Fixes first of all the getcred on IPv6 and V4. The copy's were incorrect and so was the locking. -A bug was also found that would create a race and panic when an abort arrived on a socket being read from. -Also fix the reader to get MSG_TRUNC when a partial delivery is aborted. -Also addresses a couple of coverity caught error path memory leaks and a couple of other valid complaints Approved by: gnn	2006-11-08 00:21:13 +00:00
Joe Marcus Clarke	1bc3d4c1d1	Fix TFTP NAT support by making sure the appropriate fingerprinting checks are done. Reviewed by: piso	2006-11-07 21:06:48 +00:00
Robert Watson	b96fbb37da	Convert three new suser(9) calls introduced between when the priv(9) patch was prepared and committed to priv(9) calls. Add XXX comments as, in each case, the semantics appear to differ from the TCP/UDP versions of the calls with respect to jail, and because cr_canseecred() is not used to validate the query. Obtained from: TrustedBSD Project	2006-11-06 14:54:06 +00:00
Randall Stewart	f4ad963c9f	This changes tracks down the EEOR->NonEEOR mode failure to wakeup on close of the sender. It basically moves the return (when the asoc has a reader/writer) further down and gets the wakeup and assoc appending (of the PD-API event) moved up before the return. It also moves the flag set right before the return so we can assure only once adding the PD-API events. Approved by: gnn	2006-11-06 14:34:21 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Ruslan Ermilov	9274ba8a1f	Revert previous commit, and instead make the expression in rev. 1.2 match the style of this file. OK'ed by: rrs	2006-11-05 14:36:59 +00:00
Randall Stewart	50cec91936	Tons of fixes to get all the 64bit issues removed. This also moves two 16 bit int's to become 32 bit values so we do not have to use atomic_add_16. Most of the changes are %p, casts and other various nasty's that were in the orignal code base. With this commit my machine will now do a build universe.. however I as yet have not tested on a 64bit machine .. it may not work :-(	2006-11-05 13:25:18 +00:00
Ruslan Ermilov	11acae799a	Fix pointer arithmetic to be 64-bit friendly.	2006-11-04 08:45:50 +00:00
Ruslan Ermilov	e349e6b8a0	Remove bogus casts that Randall for some reason didn't borrow from my supplied patch.	2006-11-04 08:19:01 +00:00
John Birrell	5051417909	Remove a bogus cast in an attempt to fix the tinderbox builds on lots of arches.	2006-11-04 05:39:39 +00:00
Randall Stewart	562a89b562	More 64 bit pointer fun. %p changed in multiple prints the mtod() was also fixed.	2006-11-03 23:04:34 +00:00
Randall Stewart	249820a7d8	Fix two of the 64bit errors on the printfs.	2006-11-03 21:19:54 +00:00
Randall Stewart	cef8ad061a	Somehow I missed this one. The sys/cdef.h was out of order with respect to the FSBID..	2006-11-03 19:48:56 +00:00
Randall Stewart	73932c69b6	Opps... in my fix up of all the $FreeBSD:$-> $FreeBSD$ I inserted a few to the new files.. but I falied to add the #include <sys/cdef.h> Which causes a compile error.. sorry about that... got it now :-) Approved by:gnn	2006-11-03 17:21:53 +00:00
Randall Stewart	f8829a4a40	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
Oleg Bulyzhin	35da9180dc	- Use non-recursive mutex. MTX_RECURSE is unnecessary since rev. 1.70 - Pay respect to net.isr.direct: use netisr_dispatch() instead of ip_input() Reviewed by: glebius, rwatson - purge_flow_set(): - Do not leak memory while purging queues which are not bound to pipe. - style(9) cleanup MFC after: 2 months	2006-10-29 12:09:24 +00:00
Oleg Bulyzhin	c2df509a1d	- Convert net.inet.ip.dummynet.curr_time net.inet.ip.dummynet.searches net.inet.ip.dummynet.search_steps to SYSCTL_LONG nodes. It will prevent frequent wrap around on 64bit archs. - Implement simple mechanics for dummynet(4) internal time correction. Under certain circumstances (system high load, dummynet lock contention, etc) dummynet's tick counter can be significantly slower than it should be. (I've observed up to 25% difference on one of my production servers). Since this counter used for packet scheduling, it's accuracy is vital for precise bandwidth limitation. Introduce new sysctl nodes: net.inet.ip.dummynet. tick_lost - number of ticks coalesced by taskqueue thread. tick_adjustment - number of time corrections done. tick_diff - adjusted vs non-adjusted tick counter difference tick_delta - last vs 'standard' tick differnece (usec). tick_delta_sum - accumulated (and not corrected yet) time difference (usec). Reviewed by: glebius MFC after: 2 month	2006-10-27 13:05:37 +00:00
Oleg Bulyzhin	b2b05096fd	Use separate thread for servicing dummynet(4). Utilize taskqueue(9) API. Submitted by: glebius MFC after: 2 month	2006-10-27 11:16:58 +00:00
Oleg Bulyzhin	c447b19f6e	style(9) cleanup. MFC after: 2 month	2006-10-27 10:52:32 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Julian Elischer	010b65f54a	revert last change.. premature.. need to wait until if_ethersubr.c uses pfil to get to ipfw.	2006-10-21 00:16:31 +00:00
Julian Elischer	3df668cc38	Move some variables to a more likely place and remove "temporary" stuff that is not needed any more.	2006-10-20 19:32:08 +00:00
Maxim Konovalov	428b67b194	o Do not do args->f_id.addr_type == 6 when there is IS_IP6_FLOW_ID() exactly for that.	2006-10-11 12:14:28 +00:00
Maxim Konovalov	f16ccf6814	o Kill a nit in the comment.	2006-10-11 12:00:53 +00:00
Maxim Konovalov	5f197ce41e	o Extend not very informative ipfw(4) message 'drop session, too many entries' by src:port and dst:port pairs. IPv6 part is non-functional as ``limit'' does not support IPv6 flows. PR: kern/103967 Submitted by: based on Bruce Campbell patch MFC after: 1 month	2006-10-11 11:52:34 +00:00
Ruslan Ermilov	cc81ddd9db	Merge the rest of my changes.	2006-10-11 07:11:56 +00:00
Paolo Pisati	f3d9aab351	Various mdoc and grammar fixes. Approved by: glebius Reviewed by: glebius, ru	2006-10-08 13:53:45 +00:00
Bjoern A. Zeeb	7002145d8e	Set scope on MC address so IPv6 carp advertisement will not get dropped in ip6_output. In case this fails handle the error directly and log it[1]. In addition permit CARP over v6 in ip_fw2. PR: kern/98622 Similar patch by: suz Discussed with: glebius [1] Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr MFC after: 3 days	2006-10-07 10:19:58 +00:00
Gleb Smirnoff	f7a679b200	Save space on stack moving token ring stuff to its own hack block.	2006-10-04 11:08:14 +00:00
Gleb Smirnoff	9b9a52b496	Style rev. 1.152.	2006-10-04 10:59:21 +00:00
Andre Oppermann	6a7c943c59	Remove stone-aged and irrelevant "#ifndef notdef".	2006-09-29 16:44:45 +00:00
Bruce M Simpson	910e1364b6	Nits. Submitted by: ru	2006-09-29 16:16:41 +00:00
Bruce M Simpson	2d20d32344	Push removal of mrouted down to the rest of the tree.	2006-09-29 15:45:11 +00:00
Maxim Konovalov	acc03ac6bb	o Convert w/spaces to tabs in the previous commit.	2006-09-29 06:46:31 +00:00
Mike Silbersack	d4bdcb16cc	Rather than autoscaling the number of TIME_WAIT sockets to maxsockets / 5, scale it to min(ephemeral port range / 2, maxsockets / 5) so that people with large gobs of memory and/or large maxsockets settings will not exhaust their entire ephemeral port range with sockets in the TIME_WAIT state during periods of heavy load. Those who wish to tweak the size of the TIME_WAIT zone can still do so with net.inet.tcp.maxtcptw. Reviewed by: glebius, ru	2006-09-29 06:24:26 +00:00
Andre Oppermann	2c30ec0a1f	When tcp_output() receives an error upon sending a packet it reverts parts of its internal state to ignore the failed send and try again a bit later. If the error is EPERM the packet got blocked by the local firewall and the revert may cause the session to get stuck and retry indefinitely. This way we treat it like a packet loss and let the retransmit timer and timeouts do their work over time. The correct behavior is to drop a connection that gets an EPERM error. However this _may_ introduce some POLA problems and a two commit approach was chosen. Discussed with: glebius PR: kern/25986 PR: kern/102653	2006-09-28 18:02:46 +00:00
Andre Oppermann	6a2257d911	When doing TSO correctly do the check to prevent a maximum sized IP packet from overflowing.	2006-09-28 13:59:26 +00:00
Bruce M Simpson	050596b4a0	Fix the IPv4 multicast routing detach path. On interface detach whilst the MROUTER is running, the system would panic as described in the PR. The fix in the PR is a good start, however, the other state associated with the multicast forwarding cache has to be freed in order to avoid leaking memory and other possible panics. More care and attention is needed in this area. PR: kern/82882 MFC after: 1 week	2006-09-28 12:21:08 +00:00
Bruce M Simpson	d966841427	The IPv4 code should clean up multicast group state when an interface goes away. Without this change, it leaks in_multi (and often ether_multi state) if many clonable interfaces are created and destroyed in quick succession. The concept of this fix is borrowed from KAME. Detailed information about this behaviour, as well as test cases, are available in the PR. PR: kern/78227 MFC after: 1 week	2006-09-28 10:04:07 +00:00
Paolo Pisati	7c00cc76f0	Compilation.	2006-09-27 02:08:44 +00:00
Paolo Pisati	be4f3cd0d9	Summer of Code 2005: improve libalias - part 1 of 2 With the first part of my previous Summer of Code work, we get: -made libalias modular: -support for 'particular' protocols (like ftp/irc/etcetc) is no more hardcoded inside libalias, but it's available through external modules loadable at runtime -modules are available both in kernel (/boot/kernel/alias_.ko) and user land (/lib/libalias_) -protocols/applications modularized are: cuseeme, ftp, irc, nbt, pptp, skinny and smedia -added logging support for kernel side -cleanup After a buildworld, do a 'mergemaster -i' to install the file libalias.conf in /etc or manually copy it. During startup (and after every HUP signal) user land applications running the new libalias will try to read a file in /etc called libalias.conf: that file contains the list of modules to load. User land applications affected by this commit are ppp and natd: if libalias.conf is present in /etc you won't notice any difference. The only kernel land bit affected by this commit is ng_nat: if you are using ng_nat, and it doesn't correctly handle ftp/irc/etcetc sessions anymore, remember to kldload the correspondent module (i.e. kldload alias_ftp). General information and details about the inner working are available in the libalias man page under the section 'MODULAR ARCHITECTURE (AND ipfw(4) SUPPORT)'. NOTA BENE: this commit affects _ONLY_ libalias, ipfw in-kernel nat support will be part of the next libalias-related commit. Approved by: glebius Reviewed by: glebius, ru	2006-09-26 23:26:53 +00:00
John-Mark Gurney	e16fa5ca55	fix calculating to_tsecr... This prevents the rtt calculations from going all wonky...	2006-09-26 01:21:46 +00:00
Bruce M Simpson	13c8384424	Fix an incompatibility between CARP and IPv4 multicast routing, whereby the VRRPv2 advertisements will originate from the wrong source address. This only affects kernels compiled with MROUTING and after the MRT_INIT ioctl() has been issued. Set imo_multicast_vif in carp's softc to the invalid value -1 after it is zeroed by softc allocation, to stop the ip_output() path looking up the incorrect source address thinking a vif is set. PR: kern/100532 Submitted by: Bohus Plucinsky MFC after: 1 week	2006-09-25 11:53:54 +00:00
Bruce M Simpson	e2fd806b36	Spleling Submitted by: pjd	2006-09-25 11:48:07 +00:00
Bruce M Simpson	07ea6709ea	Account for output IP datagrams on the ifaddr where they originated from, not the first ifaddr on the ifp. This is similar to what NetBSD does. PR: kern/72936 Submitted by: alfred Reviewed by: andre	2006-09-25 10:11:16 +00:00
John-Mark Gurney	4dc630cdd2	if min is greater than max, prefer max over min... I managed to get a retransmit timer that was going to take 19 days to trigger... Reviewed by: silby	2006-09-25 07:22:39 +00:00
John-Mark Gurney	402865f637	now that we don't automagicly increase the MTU of host routes, when we copy the loopback interface, copy it's mtu also.. This means that we again have large mtu support for local ip addresses...	2006-09-23 19:24:10 +00:00
Bruce M Simpson	f1edc3bde5	Always set the IP version in the TCP input path, to preserve the header field for possible later IPSEC SPD lookup, even when the kernel is built without 'options INET6'. PR: kern/57760 MFC after: 1 week Submitted by: Joachim Schueth	2006-09-23 16:26:31 +00:00
Andre Oppermann	7ff0b850a6	Make tcp_usr_send() free the passed mbufs on error in all cases as the comment to it claims. Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-17 13:39:35 +00:00
John Hay	724e825a16	Handle a list of IPv6 src and dst addresses correctly, eg. ipfw add allow ip6 from any to 2000::/16,2002::/16 PR: 102422 (part 3) Submitted by: Andrey V. Elsukov <bu7cher at yandex dot ru> MFC after: 5 days	2006-09-16 10:27:05 +00:00
Andre Oppermann	31ecb34a4e	When doing TSO subtract hdrlen from TCP_MAXWIN to prevent ip->ip_len from wrapping when we generate a maximally sized packet for later segmentation. Noticed by: gallatin Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-15 16:08:09 +00:00
Andrey A. Chernov	239e71c612	Add missing #ifdef INET6 (can't be compiled)	2006-09-14 10:22:35 +00:00
Andre Oppermann	67d828b162	Remove unessary includes and follow common ordering style.	2006-09-13 13:21:17 +00:00
Andre Oppermann	bf6d304ab2	Rewrite of TCP syncookies to remove locking requirements and to enhance functionality: - Remove a rwlock aquisition/release per generated syncookie. Locking is now integrated with the bucket row locking of syncache itself and syncookies no longer add any additional lock overhead. - Syncookie secrets are different for and stored per syncache buck row. Secrets expire after 16 seconds and are reseeded on-demand. - The computational overhead for syncookie generation and verification is one MD5 hash computation as before. - Syncache can be turned off and run with syncookies only by setting the sysctl net.inet.tcp.syncookies_only=1. This implementation extends the orginal idea and first implementation of FreeBSD by using not only the initial sequence number field to store information but also the timestamp field if present. This way we can keep track of the entire state we need to know to recreate the session in its original form. Almost all TCP speakers implement RFC1323 timestamps these days. For those that do not we still have to live with the known shortcomings of the ISN only SYN cookies. The use of the timestamp field causes the timestamps to be randomized if syncookies are enabled. The idea of SYN cookies is to encode and include all necessary information about the connection setup state within the SYN-ACK we send back and thus to get along without keeping any local state until the ACK to the SYN-ACK arrives (if ever). Everything we need to know should be available from the information we encoded in the SYN-ACK. A detailed description of the inner working of the syncookies mechanism is included in the comments in tcp_syncache.c. Reviewed by: silby (slightly earlier version) Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-13 13:08:27 +00:00
Christian S.J. Peron	d94f2a68f8	Introduce a new entry point, mac_create_mbuf_from_firewall. This entry point exists to allow the mandatory access control policy to properly initialize mbufs generated by the firewall. An example where this might happen is keep alive packets, or ICMP error packets in response to other packets. This takes care of kernel panics associated with un-initialize mbuf labels when the firewall generates packets. [1] I modified this patch from it's original version, the initial patch introduced a number of entry points which were programmatically equivalent. So I introduced only one. Instead, we should leverage mac_create_mbuf_netlayer() which is used for similar situations, an example being icmp_error() This will minimize the impact associated with the MFC Submitted by: mlaier [1] MFC after: 1 week This is a RELENG_6 candidate	2006-09-12 04:25:13 +00:00
Andre Oppermann	384a05bfd0	Fix a NULL pointer dereference of ro->ro_rt->rt_flags by checking for the validity of ro->ro_rt first. This prevents crashing on any non-normally routed IP packet. Coverity CID: 162 (incorrectly, it was re-introduced by previous commit)	2006-09-11 19:56:10 +00:00
John-Mark Gurney	3ae2ad088e	make use of the host route's mtu for processing. This means we can now support a network w/ split mtu's by assigning each host route the correct mtu. an aspiring programmer could write a daemon to probe hosts and find out if they support a larger mtu.	2006-09-10 17:49:09 +00:00
Gleb Smirnoff	3e630ef9a9	Add a sysctl net.inet.tcp.nolocaltimewait that allows to suppress creating a compress TIME WAIT states, if both connection endpoints are local. Default is off.	2006-09-08 13:09:15 +00:00
Ruslan Ermilov	751dea2935	Back when we had T/TCP support, we used to apply different timeouts for TCP and T/TCP connections in the TIME_WAIT state, and we had two separate timed wait queues for them. Now that is has gone, the timeout is always 2*MSL again, and there is no reason to keep two queues (the first was unused anyway!). Also, reimplement the remaining queue using a TAILQ (it was technically impossible before, with two queues).	2006-09-07 13:06:00 +00:00
Andre Oppermann	b3c0f300fb	Second step of TSO (TCP segmentation offload) support in our network stack. TSO is only used if we are in a pure bulk sending state. The presence of TCP-MD5, SACK retransmits, SACK advertizements, IPSEC and IP options prevent using TSO. With TSO the TCP header is the same (except for the sequence number) for all generated packets. This makes it impossible to transmit any options which vary per generated segment or packet. The length of TSO bursts is limited to TCP_MAXWIN. The sysctl net.inet.tcp.tso globally controls the use of TSO and is enabled. TSO enabled sends originating from tcp_output() have the CSUM_TCP and CSUM_TSO flags set, m_pkthdr.csum_data filled with the header pseudo-checksum and m_pkthdr.tso_segsz set to the segment size (net payload size, not counting IP+TCP headers or TCP options). IPv6 currently lacks a pseudo-header checksum function and thus doesn't support TSO yet. Tested by: Jack Vogel <jfvogel-at-gmail.com> Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-07 12:53:01 +00:00
Ruslan Ermilov	3c89486cc7	Remove a microoptimization for i386 that was a micropessimization for amd64.	2006-09-07 09:49:08 +00:00
Andre Oppermann	233dcce118	First step of TSO (TCP segmentation offload) support in our network stack. o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6 o add CSUM_TSO flag to mbuf pkthdr csum_flags field o add tso_segsz field to mbuf pkthdr o enhance ip_output() packet length check to allow for large TSO packets o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities o adjust all callers of tcp_maxmtu[46]() accordingly Discussed on: -current, -net Sponsored by: TCP/IP Optimization Fundraise 2005	2006-09-06 21:51:59 +00:00
Andre Oppermann	6fbfd5825f	Check inp_flags instead of inp_vflag for INP_ONESBCAST flag. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 19:04:36 +00:00
Andre Oppermann	773725a255	Fix the socket option IP_ONESBCAST by giving it its own case in ip_output() and skip over the normal IP processing. Add a supporting function ifa_ifwithbroadaddr() to verify and validate the supplied subnet broadcast address. PR: kern/99558 Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru> Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 days	2006-09-06 17:12:10 +00:00
Gleb Smirnoff	2c857a9be9	o Backout rev. 1.125 of in_pcb.c. It appeared to behave extremely bad under high load. For example with 40k sockets and 25k tcptw entries, connect() syscall can run for seconds. Debugging showed that it iterates the cycle millions times and purges thousands of tcptw entries at a time. Besides practical unusability this change is architecturally wrong. First, in_pcblookup_local() is used in connect() and bind() syscalls. No stale entries purging shouldn't be done here. Second, it is a layering violation. o Return back the tcptw purging cycle to tcp_timer_2msl_tw(), that was removed in rev. 1.78 by rwatson. The commit log of this revision tells nothing about the reason cycle was removed. Now we need this cycle, since major cleaner of stale tcptw structures is removed. o Disable probably necessary, but now unused tcp_twrecycleable() function. Reviewed by: ru	2006-09-06 13:56:35 +00:00
Gleb Smirnoff	c3e07bf82a	Finally fix rev. 1.256 Pointy hat to: glebius	2006-09-05 14:00:59 +00:00
Gleb Smirnoff	23ebab416c	Remove extra parenthesis in last commit. Nitpicked by: ru	2006-09-05 12:22:54 +00:00
Gleb Smirnoff	1f1f90c3a7	- Make net.inet.tcp.maxtcptw modifiable at run time. - If net.inet.tcp.maxtcptw was ever set explicitly, do not change it if kern.ipc.maxsockets is changed.	2006-09-05 12:08:47 +00:00
Thomas Quinot	d438d81581	Fix typo in comment.	2006-09-04 08:32:17 +00:00
John Hay	1c31b456b9	Recognise IPv6 PIM packets. MFC after: 1 week	2006-08-31 16:56:45 +00:00
Mohan Srinivasan	2374501ca4	Fix for a bug that causes the computation of "len" in tcp_output() to get messed up, resulting in an inconsistency between the TCP state and so_snd.	2006-08-26 17:53:19 +00:00
Julian Elischer	afad78e259	comply with style police Submitted by: ru MFC after: 1 month	2006-08-18 22:36:05 +00:00
Julian Elischer	c487be961a	Allow ipfw to forward to a destination that is specified by a table. for example: fwd tablearg ip from any to table(1) where table 1 has entries of the form: 1.1.1.0/24 10.2.3.4 208.23.2.0/24 router2 This allows trivial implementation of a secondary routing table implemented in the firewall layer. I expect more work (under discussion with Glebius) to follow this to clean up some of the messy parts of ipfw related to tables. Reviewed by: Glebius MFC after: 1 month	2006-08-17 22:49:50 +00:00
Julian Elischer	b7522c27d2	Remove the IPFIREWALL_FORWARD_EXTENDED option and make it on by default as it always was in older versions of FreeBSD. This option is pointless as it is needed in just about every interesting usage of forward that I have ever seen. It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x It doesn't make the system any safer and just wastes huge amounts of develper time when the system doesn't behave as expected when code is moved from 4.x to 6.x or 7.x Reviewed by: glebius MFC after: 1 week	2006-08-17 00:37:03 +00:00
Mohan Srinivasan	464469c713	Fixes an edge case bug in timewait handling where ticks rolling over causing the timewait expiry to be exactly 0 corrupts the timewait queues (and that entry). Reviewed by: silby	2006-08-11 21:15:23 +00:00
Brooks Davis	43bc7a9c62	With exception of the if_name() macro, all definitions in net_osdep.h were unused or already in if_var.h so add if_name() to if_var.h and remove net_osdep.h along with all references to it. Longer term we may want to kill off if_name() entierly since all modern BSDs have if_xname variables rendering it unnecessicary.	2006-08-04 21:27:40 +00:00
Oleg Bulyzhin	0e0b1bb57a	Remove useless NULL pointer check: we are using M_WAITOK flag for memory allocation. Submitted by: Andrey Elsukov <bu7cher at yandex dot ru> Approved by: glebius (mentor) MFC after: 1 week	2006-08-04 10:50:51 +00:00
Robert Watson	e850475248	Move soisdisconnected() in tcp_discardcb() to one of its calling contexts, tcp_twstart(), but not to the other, tcp_detach(), as the socket is already being torn down and therefore there are no listeners. This avoids a panic if kqueue state is registered on the socket at close(), and eliminates to XXX comments. There is one case remaining in which tcp_discardcb() reaches up to the socket layer as part of the TCP host cache, which would be good to avoid. Reported by: Goran Gajic <ggajic at afrodita dot rcub dot bg dot ac dot yu>	2006-08-02 16:18:05 +00:00
Oleg Bulyzhin	9b1858ca78	Do not leak memory while flushing rules. Noticed by: yar Approved by: glebius (mentor) MFC after: 1 week	2006-08-02 14:58:51 +00:00

... 3 4 5 6 7 ...

3008 Commits