freebsd-dev

Author	SHA1	Message	Date
Randall Stewart	d6af161a34	- Out with some printfs. - Fix a initialization of last_tsn_used - Fix handling of mapped IPv4 addresses Obtained from: Michael Tuexen and I :-) MFC after: 1 week	2008-07-29 09:06:35 +00:00
Alexander Motin	18f401c664	Some style and assertion fixes to the previous commits hinted by rwatson. There is no functional changes.	2008-07-28 06:57:28 +00:00
Alexander Motin	d185578a78	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock.	2008-07-27 20:48:22 +00:00
Alexander Motin	e2ed8f3514	Increase UDBHASHSIZE from 16 to 128 items. Previous value was chosen 10 years ago and not very effective now. This change gives several percents speedup on 1000 L2TP mpd links.	2008-07-26 23:07:34 +00:00
Alexander Motin	0ca3b0967b	According to in_pcb.h protocol binding information has double locking. It allows access it while list travercing holding only global pcbinfo lock. This relaxed locking noticably increses receive socket lookup performance.	2008-07-26 21:12:00 +00:00
Alexander Motin	9ed324c9a5	Add hash table lookup for a fully connected raw sockets. This gives significant performance improvements when many raw sockets used. Benchmarks of mpd handeling 1000 simultaneous PPTP connections show up to 50% performance boost. With higher number of connections benefit becomes even bigger. PopTop snd others should also get some benefits.	2008-07-26 17:32:15 +00:00
Tai-hwa Liang	df9cf830d1	Trying to fix compilation bustage: - removing 'const' qualifier from an input parameter to conform to the type required by rw_assert(); - using in_addr->s_addr to retrive 32 bits address value. Observed by: tinderbox	2008-07-22 04:23:57 +00:00
Kip Macy	9d29c635da	make new accessor functions consistent with existing style	2008-07-21 22:11:39 +00:00
Kip Macy	84330faa64	- Switch to INP_WLOCK macro from inp_wlock - calling sodisconnect after tcp_twstart is both gratuitous and unsafe - remove Submitted by: rwatson	2008-07-21 21:22:56 +00:00
Kip Macy	b1f8bd6464	Add versions of tcp_twstart, tcp_close, and tcp_drop that hide the acquisition the tcbinfo lock. MFC after: 1 week	2008-07-21 02:23:02 +00:00
Kip Macy	409d8ba5c7	add interface for external consumers to syncache_expand - rename syncache_add in a manner consistent with other bits intended for offload	2008-07-21 02:11:06 +00:00
Kip Macy	dd0e6c383a	Add accessor functions for socket fields. MFC after: 1 week	2008-07-21 00:49:34 +00:00
Kip Macy	9378e4377f	add inpcb accessor functions for fields needed by TOE devices	2008-07-21 00:08:34 +00:00
Tom Rhodes	41698ebf5b	Document a few sysctls. Reviewed by: rwatson	2008-07-20 15:29:58 +00:00
Bjoern A. Zeeb	8699ea087e	ia is a pointer thus use NULL rather then 0 for initialization and in comparisons to make this more obvious. MFC after: 5 days	2008-07-20 12:31:36 +00:00
Kip Macy	b1bc0b2a86	remove unused toedev functions and add comments for rest	2008-07-20 02:02:50 +00:00
David Malone	744eaff7e6	Add an accept filter for TCP based DNS requests. It waits until the whole first request is present before returning from accept.	2008-07-18 14:44:51 +00:00
Robert Watson	3b19fa3597	Eliminate use of the global ripsrc which was being used to pass address information from rip_input() to rip_append(). Instead, pass the source address for an IP datagram to rip_append() using a stack-allocated sockaddr_in, similar to udp_input() and udp_append(). Prior to the move to rwlocks for inpcbinfo, this was not a problem, as use of the global was synchronized using the ripcbinfo mutex, but with read-locking there is the potential for a race during concurrent receive. This problem is not present in the IPv6 raw IP socket code, which already used a stack variable for the address. Spotted by: mav MFC after: 1 week (before inpcbinfo rwlock changes)	2008-07-18 10:47:07 +00:00
Robert Watson	ca528788b8	Fix error in comment. MFC after: 3 weeks	2008-07-16 10:55:50 +00:00
Robert Watson	43cc0bc1df	Merge last of a series of rwlock conversion changes to UDP, which completes the move to a fully parallel UDP transmit path by using global read, rather than write, locking of inpcbinfo in further semi-connected cases: - Add macros to allow try-locking of inpcb and inpcbinfo. - Always acquire an incpcb read lock in udp_output(), which stablizes the local inpcb address and port bindings in order to determine what further locking is required: - If the inpcb is currently not bound (at all) and are implicitly connecting, we require inpcbinfo and inpcb write locks, so drop the read lock and re-acquire. - If the inpcb is bound for at least one of the port or address, but an explicit source or destination is requested, trylock the inpcbinfo lock, and if that fails, drop the inpcb lock, lock the global lock, and relock the inpcb lock. - Otherwise, no further locking is required (common case). - Update comments. In practice, this means that the vast majority of consumers of UDP sockets will not acquire any exclusive locks at the socket or UDP levels of the network stack. This leads to a marked performance improvement in several important workloads, including BIND, nsd, and memcached over UDP, as well as significant improvements in pps microbenchmarks. The plan is to MFC all of the rwlock changes to RELENG_7 once they have settled for a weeks in the tree. Tested by: ps, kris (older revision), bde MFC after: 3 weeks	2008-07-15 15:38:47 +00:00
Rui Paulo	b27227029b	Fix commment in typo. M tcp_output.c	2008-07-15 10:32:35 +00:00
Ermal Luçi	7972c979c5	Fix carp(4) panics that can occur during carp interface configuration. Approved by: mlaier (mentor) Reported by: Scott Ullrich MFC after: 1 week	2008-07-14 20:11:51 +00:00
Robert Watson	3144b7d3d3	Slightly rearrange validation of UDP arguments and jail processing in udp_output() so that argument validation occurs before jail processing. Add additional comments explaining what's going on when we process addresses and binding during udp_output(). MFC after: 3 weeks	2008-07-10 16:20:18 +00:00
Bjoern A. Zeeb	078b704233	Pass the ucred along into in{,6}_pcblookup_local for upcoming prison checks. Reviewed by: rwatson	2008-07-10 13:31:11 +00:00
Bjoern A. Zeeb	cdcb11b92c	For consistency take lport as u_short in in{,6}_pcblookup_local. All callers either pass in an u_short or u_int16_t. Reviewed by: rwatson	2008-07-10 13:23:22 +00:00
Robert Watson	1175d9d56d	Apply the MAC label to an outgoing UDP packet when other inpcb properties are processed, meaning that we avoid the cost of MAC label assignment if we're going to drop the packet due to mbuf exhaustion, etc. MFC after: 3 weeks	2008-07-10 09:45:28 +00:00
Bjoern A. Zeeb	e5cf427baf	For consistency with the rest of the function use the locally cached pointer pcbinfo rather than inp->inp_pcbinfo. MFC after: 3 weeks	2008-07-09 19:03:06 +00:00
Randall Stewart	fc14de76f4	1) Adds the rest of the VIMAGE change macros 2) Adds some __UserSpace__ on some of the common defines that the user space code needs 3) Fixes a bug when we send up data to a user that failed. We need to a) trim off the data chunk headers, if present, and b) make sure the frag bit is communicated properly for the msgs coming off the stream queues... i.e. we see if some of the msg has been taken. Obtained from: jeli contributed the VIMAGE changes on this pass Thanks Julain!	2008-07-09 16:45:30 +00:00
Robert Watson	7b709f8ad4	Provide some initial chicken-scratching annotations of locking for struct inpcb. Prodded by: bz MFC after: 3 days	2008-07-08 17:22:59 +00:00
Robert Watson	ac9ae27991	Allow udp_notify() to accept read, as well as write, locks on the passed inpcb. When directly invoking udp_notify() from udp_ctlinput(), acquire only a read lock; we may still see write locks in udp_notify() as the in_pcbnotifyall() routine is shared with TCP and always uses a write lock on the inpcb being notified. MFC after: 1 month	2008-07-07 12:27:55 +00:00
Robert Watson	c4d585aefe	Add additional udbinfo and inpcb locking assertions to udp_output(); for some code paths, global or inpcb write locks are required, but for other code paths, read locks or no locking at all are sufficient for the data structures. MFC after: 1 month	2008-07-07 12:14:10 +00:00
Robert Watson	948d0fc926	First step towards parallel transmit in UDP: if neither a specific source or a specific destination address is requested as part of a send on a UDP socket, read lock the inpcb rather than write lock it. This will allow fully parallel transmit down to the IP layer when sending simultaneously from multiple threads on a connected UDP socket. Parallel transmit for more complex cases, such as when sendto(2) is invoked with an address and there's already a local binding, will follow. MFC after: 1 month	2008-07-07 10:56:55 +00:00
Robert Watson	10cc62b7a6	Drop read lock on udbinfo earlier during delivery to the last matching UDP socket for a datagram; the inpcb read lock is sufficient to provide inpcb stability during udp_append(). MFC after: 1 month	2008-07-07 09:26:52 +00:00
Robert Watson	cec9ffee22	Rename raw_append() to rip_append(): the raw_ prefix is generally used for functions in the generic raw socket library (raw_cb.c, raw_usrreq.c), and they are not used for IPv4 raw sockets. MFC after: 3 days	2008-07-05 18:55:03 +00:00
Robert Watson	0ae76120da	Improve approximation of style(9) in raw socket code.	2008-07-05 18:03:39 +00:00
Oleksandr Tymoshenko	06a37c4203	Enqueue de-capsulated packet instead of performing direct dispatch. It's possible to exhaust and garble stack with a packet that contains a couple of hundreds nested encapsulation levels. Submitted by: Ming Fu <fming@borderware.com> Reviewed by: rwatson PR: kern/85320	2008-07-04 21:01:30 +00:00
Robert Watson	59dd72d040	Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.	2008-07-04 00:21:38 +00:00
Bjoern A. Zeeb	62ee136457	Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, generating an RTM_MISS for every IP packet forwarded making user space routing daemons unhappy. PR: kern/123621, kern/124540, kern/122338 Reported by: Paul <paul gtcomm.net>, Mike Tancsa <mike sentex.net> on net@ Tested by: Paul and Mike Reviewed by: andre MFC after: 3 days	2008-07-03 12:44:36 +00:00
Robert Watson	5df3e83946	Add soreceive_dgram(9), an optimized socket receive function for use by datagram-only protocols, such as UDP. This version removes use of sblock(), which is not required due to an inability to interlace data improperly with datagrams, as well as avoiding some of the larger loops and state management that don't apply on datagram sockets. This is experimental code, so hook it up only for UDPv4 for testing; if there are problems we may need to revise it or turn it off by default, but it offers significant performance improvements for threaded UDP applications such as BIND9, nsd, and memcached using UDP. Tested by: kris, ps	2008-07-02 23:23:27 +00:00
Robert Watson	119d85f6e0	In udp_append() and udp_input(), make use of read locking on incpbs rather than write locking: while we need to maintain a valid reference to the inpcb and fix its state, no protocol layer state is modified during an IPv4 UDP receive -- there are only changes at the socket layer, which is separately protected by socket locking. While parallel concurrent receive on a single UDP socket is currently relatively unusual, introducing read locking in the transmit path, allowing concurrent receive and transmit, will significantly improve performance for loads such as BIND, memcached, etc. MFC after: 2 months Tested by: gnn, kris, ps	2008-06-30 18:26:43 +00:00
Oleksandr Tymoshenko	cf77b84879	In case of interface initialization failure remove struct in_ifaddr* from in_ifaddrhashtbl in in_ifinit because error handler in in_control removes entries only for AF_INET addresses. If in_ifinit is called for the cloned inteface that has just been created its address family is not AF_INET and therefor LIST_REMOVE is not called for respective LIST_INSERT_HEAD and freed entries remain in in_ifaddrhashtbl and lead to memory corruption. PR: kern/124384	2008-06-24 13:58:28 +00:00
Alexander Motin	48ca67bea6	Partially revert previous commit. DeleteLink() does not deletes permanent links so we should be aware of it and try to delete every link only once or we will loop forever.	2008-06-22 11:39:42 +00:00
Alexander Motin	ea29dd9241	Implement UDP transparent proxy support. PR: bin/54274 Submitted by: Nicolai Petri <nicolai@petri.cc>	2008-06-21 20:18:57 +00:00
Alexander Motin	b46d3e21bb	Add support for PORT/EPRT FTP commands in lowercase. Use strncasecmp() instead of huge local implementation to reduce code size. Check space presence after command/code. PR: kern/73034	2008-06-21 16:22:56 +00:00
Stephan Uphoff	606a2669cf	Change incorrect stale cookie detection in syncookie_lookup() that prematurely declared a cookie as expired. Reviewed by: andre@, silby@ Reported by: Yahoo!	2008-06-16 20:08:22 +00:00
Stephan Uphoff	104ac85378	Fix a check in SYN cache expansion (syncache_expand()) to accept packets that arrive in the receive window instead of just on the left edge of the receive window. This is needed for correct behavior when packets are lost or reordered. PR: kern/123950 Reviewed by: andre@, silby@ Reported by: Yahoo!, Wang Jin MFC after: 1 week	2008-06-16 19:56:59 +00:00
Randall Stewart	97a7b90ff3	More prep for Vimage: - only one functino to destroy an SCTP stack sctp_finish() - Make it so this function also arranges for any threads created by the image to do a kthread_exit()	2008-06-15 12:31:23 +00:00
Randall Stewart	9b02321796	- Fixes foobar on my part. Some missing virtualization macros from specific logging cases.	2008-06-14 13:24:49 +00:00
Randall Stewart	b3f1ea41fd	- Macro-izes the packed declaration in all headers. - Vimage prep - these are major restructures to move all global variables to be accessed via a macro or two. The variables all go into a single structure. - Asconf address addition tweaks (add_or_del Interfaces) - Fix rwnd calcualtion to be more conservative. - Support SACK_IMMEDIATE flag to skip delayed sack by demand of peer. - Comment updates in the sack mapping calculations - Invarients panic added. - Pre-support for UDP tunneling (we can do this on MAC but will need added support from UDP to get a "pipe" of UDP packets in. - clear trace buffer sysctl added when local tracing on. Note the majority of this huge patch is all the vimage prep stuff :-)	2008-06-14 07:58:05 +00:00
Jack F Vogel	6c5087a818	Add generic TCP LOR into netinet	2008-06-11 22:12:50 +00:00
Max Laier	1ead26d4e1	Sort IP addresses before hashing them for the signature. Otherwise carp is sensitive to address configuration order. PR: kern/121574 Reported by: Douglas K. Rand, Wouter de Jong Obtained from: OpenBSD (rev 1.114 + fixes) MFC after: 2 weeks	2008-06-02 18:58:07 +00:00
Robert Watson	53640b0e3a	When allocating temporary storage to hold a TCP/IP packet header template, use an M_TEMP malloc(9) allocation rather than an mbuf with mtod(9) and dtom(9). This eliminates the last use of dtom(9) in TCP. MFC after: 3 weeks	2008-06-02 14:20:26 +00:00
Alexander Motin	ef30318ee9	Increase LINK_TABLE_OUT_SIZE from 101 to 4001 like LINK_TABLE_IN_SIZE to reduce performance degradation under heavy outgoing scan/flood. Scalability is now much more important then several kilobytes of RAM. Remove unneded TCP-specific expiration handeling. Before this connected TCP sessions could never expire. Now connected TCP sessions will expire after 24hours of inactivity. Simplify HouseKeeping() to avoid several mul/div-s per packet. Taking into account increased LINK_TABLE_OUT_SIZE, precision is still much more then required.	2008-06-01 18:34:58 +00:00
Alexander Motin	efc66711f9	Make m_megapullup() more intelligent: - to increase performance do not reallocate mbuf when possible, - to support up to 16K packets (was 2K max) use mbuf cluster of proper size. This change depends on recent ng_nat and ip_fw_nat changes.	2008-06-01 17:52:40 +00:00
Alexander Motin	1913488d10	PKT_ALIAS_FOUND_HEADER_FRAGMENT result is not an error, so pass that packet. This fixes packet fragmentation handeling. Pass really available buffer size to libalias instead of MCLBYTES constant. MCLBYTES constant were used with believe that m_megapullup() always moves date into a fresh cluster that sometimes may become not so.	2008-06-01 12:29:23 +00:00
Alexander Motin	aac54f0a70	Fix packet fragmentation support broken by copy/paste error in rev.1.60. ip_id should be u_short, but not u_char.	2008-06-01 11:47:04 +00:00
Robert Watson	c28cb4d82f	Read lock rather than write lock TCP inpcbs in monitoring sysctls. In some cases, add explicit inpcb locking rather than relying on the global lock, as we dereference inp_socket, but also allowing us to drop the global lock more quickly. MFC after: 1 week	2008-05-29 14:28:26 +00:00
Robert Watson	9622e84fcf	Employ read locks on UDP inpcbs, rather than write locks, when monitoring UDP connections using sysctls. In some cases, add previously missing locking of inpcbs, as inp_socket is followed, which also allows us to drop global locks more quickly. MFC after: 1 week	2008-05-29 08:27:14 +00:00
Bjoern A. Zeeb	9a38ba8101	Factor out the v4-only vs. the v6-only inp_flags processing in ip6_savecontrol in preparation for udp_append() to no longer need an WLOCK as we will no longer be modifying socket options. Requested by: rwatson Reviewed by: gnn MFC after: 10 days	2008-05-24 15:20:48 +00:00
Robert Watson	22c82719cf	Consistently check IPFW and DUMMYNET privileges in the configuration routines for those modules, rather than in the raw socket code. This each privilege check to occur in exactly once place and avoids duplicate checks across layers. MFC after: 3 weeks Sponsored by: nCircle Network Security, Inc.	2008-05-22 08:10:31 +00:00
Randall Stewart	d61374e183	- sctputil.c - If debug is on, the INPKILL timer can deref a freed value. Change so that we save off a type field for display and NULL inp just for good measure. - sctp_output.c - Fix it so in sending to the loopback we use the src address of the inbound INIT. We don't want to do this for non local addresses since otherwise we might be ingressed filtered so we need to use the best src address and list the address sent to. Obtained from: time bug - Neil Wilson MFC after: 1 week	2008-05-21 16:51:21 +00:00
Randall Stewart	c54a18d26b	- Adds support for the multi-asconf (From Kozuka-san) - Adds some prepwork (Not all yet) for vimage in particular support the delete the sctppcbinfo.xx structs. There is still a leak in here if it were to be called plus we stil need the regrouping (From Me and Michael Tuexen) - Adds support for UDP tunneling. For BSD there is no socket yet setup so its disabled, but major argument changes are in here to emcompass the passing of the port number (zero when you don't have a udp tunnel, the default for BSD). Will add some hooks in UDP here shortly (discussed with Robert) that will allow easy tunneling. (Mainly from Peter Lei and Michael Tuexen with some BSD work from me :-D) - Some ease for windows, evidently leave is reserved by their compile move label leave: -> out: MFC after: 1 week	2008-05-20 13:47:46 +00:00
Randall Stewart	bfefd19036	- Define changes in sctp.h - Bug in CA that does not get us incrementing the PBA properly which made us more conservative. - comment updated in sctp_input.c - memsets added before we log - added arg to hmac id's MFC after: 2 weeks	2008-05-20 09:51:36 +00:00
George V. Neville-Neil	fff0ededf8	Fix the loopback interface. Cleaning up some code with new macros was a tad too aggressive. PR: kern/123568 Submitted by: Vladimir Ermakov <samflanker at gmail dot com> Obtained from: antoine	2008-05-12 02:44:53 +00:00
Julian Elischer	8b07e49a00	Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco	2008-05-09 23:03:00 +00:00
John Baldwin	790fce68dd	Always bump tcpstat.tcps_badrst if we get a RST for a connection in the syncache that has an invalid SEQ instead of only doing it when we suceed in mallocing space for the log message. MFC after: 1 week Reviewed by: sam, bz	2008-05-08 22:21:09 +00:00
Kip Macy	8ab7ce7c61	replace spaces added in last change with tabs	2008-05-05 23:13:27 +00:00
Kip Macy	535fbad68f	add rcv_nxt, snd_nxt, and toe offload id to FreeBSD-specific extension fields for tcp_info	2008-05-05 20:13:31 +00:00
Dmitry Morozovsky	03bc210eb9	Fix build, together with a bit of style breakage.	2008-05-02 18:54:36 +00:00
Robert Watson	bcf5b9fa38	Fix a comment typo. MFC after: 3 days	2008-04-29 21:21:15 +00:00
Robert Watson	9ad11dd8a4	With IPv4 raw sockets, read lock rather than write lock the inpcb when receiving or transmitting. With IPv6 raw sockets, read lock rather than write lock the inpcb when receiving. Unfortunately, IPv6 source address selection appears to require a write lock on the inpcb for the time being. MFC after: 3 months	2008-04-21 12:06:41 +00:00
Robert Watson	3656a4fe2e	Read lock, rather than write lock, the inpcb when transmitting with or delivering to an IP divert socket. MFC after: 3 months	2008-04-21 12:03:59 +00:00
Bjoern A. Zeeb	032fae41d4	Revert to rev. 1.161 - switch back to optimized TCP options ordering. A lot of testing has shown that the problem people were seeing was due to invalid padding after the end of option list option, which was corrected in tcp_output.c rev. 1.146. Thanks to: anders@, s3raphi, Matt Reimer Thanks to: Doug Hardie and Randy Rose, John Mayer, Susan Guzzardi Special thanks to: dwhite@ and BitGravity Discussed with: silby MFC after: 1 day	2008-04-20 18:36:59 +00:00
Robert Watson	fdd9b0723e	Teach pf and ipfw to use read locks in inpcbs write than write locks when reading credential data from sockets. Teach pf to unlock the pcbinfo more quickly once it has acquired an inpcb lock, as the inpcb lock is sufficient to protect the reference. Assert locks, rather than read locks or write locks, on inpcbs in subroutines--this is necessary as the inpcb may be passed down with a write lock from the protocol, or may be passed down with a read lock from the firewall lookup routine, and either is sufficient. MFC after: 3 months	2008-04-20 00:21:54 +00:00
Robert Watson	baa45840d7	In ip_output(), allow a read lock as well as a write lock when asserting a lock on the passed inpcb. MFC after: 3 months	2008-04-19 14:35:17 +00:00
Robert Watson	a69042a5be	When querying the local or foreign address from an IP socket, acquire only a read lock on the inpcb. When an external module requests a read lock, acquire only a read lock. MFC after: 3 months	2008-04-19 14:34:38 +00:00
Kip Macy	73a0d5896e	move tcbinfo lock acquisition in to syncache	2008-04-19 03:39:17 +00:00
Kip Macy	46b0a854cc	move cxgb_lt2.[ch] from NIC to TOE move most offload functionality from NIC to TOE factor out all socket and inpcb direct access factor out access to locking in incpb, pcbinfo, and sockbuf	2008-04-19 03:22:43 +00:00
George V. Neville-Neil	0327aeb9e3	Add in check for loopback as well, which was missing from the original patch. PR: 120958 Submitted by: James Snow <snow at teardrop.org> MFC after: 2 weeks	2008-04-17 23:24:58 +00:00
Robert Watson	8501a69cc9	Convert pcbinfo and inpcb mutexes to rwlocks, and modify macros to explicitly select write locking for all use of the inpcb mutex. Update some pcbinfo lock assertions to assert locked rather than write-locked, although in practice almost all uses of the pcbinfo rwlock main exclusive, and all instances of inpcb lock acquisition are exclusive. This change should introduce (ideally) little functional change. However, it lays the groundwork for significantly increased parallelism in the TCP/IP code. MFC after: 3 months Tested by: kris (superset of committered patch)	2008-04-17 21:38:18 +00:00
George V. Neville-Neil	6b9ff6b7a7	Clean up the code that checks the types of address so that it is done by understandable macros. Fix the bug that prevented the system from responding on interfaces with link local addresses assigned. PR: 120958 Submitted by: James Snow <snow at teardrop.org> MFC after: 2 weeks	2008-04-17 12:50:42 +00:00
Randall Stewart	5e2c2d872b	Allow SCTP to compile without INET6. PR: 116816 Obtained from tuexen@fh-muenster.de: MFC after: 2 weeks	2008-04-16 17:24:18 +00:00
Randall Stewart	eadccaccf0	Use the pru_flush infrastructure to avoid a panic PR: 122710 MFC after: 1 week	2008-04-14 18:13:33 +00:00
Randall Stewart	c40e9cf2c1	Protection against errant sender sending a stream seq number out of order with no missing TSN's (a cisco box has this problem which will make a ssn be held forever). MFC after: 1 week	2008-04-14 14:34:29 +00:00
Randall Stewart	2a3eb019db	New logging values.	2008-04-14 14:33:07 +00:00
Randall Stewart	45ccc1a635	1) adds some additional logging 2) changes to use a inqueue_bytes calculated value in max_len calc's. MFC after: 1 week	2008-04-14 14:32:32 +00:00
Qing Li	e440aed958	This patch provides the back end support for equal-cost multi-path (ECMP) for both IPv4 and IPv6. Previously, multipath route insertion is disallowed. For example, route add -net 192.103.54.0/24 10.9.44.1 route add -net 192.103.54.0/24 10.9.44.2 The second route insertion will trigger an error message of "add net 192.103.54.0/24: gateway 10.2.5.2: route already in table" Multiple default routes can also be inserted. Here is the netstat output: default 10.2.5.1 UGS 0 3074 bge0 => default 10.2.5.2 UGS 0 0 bge0 When multipath routes exist, the "route delete" command requires a specific gateway to be specified or else an error message would be displayed. For example, route delete default would fail and trigger the following error message: "route: writing to routing socket: No such process" "delete net default: not in table" On the other hand, route delete default 10.2.5.2 would be successful: "delete net default: gateway 10.2.5.2" One does not have to specify a gateway if there is only a single route for a particular destination. I need to perform more testings on address aliases and multiple interfaces that have the same IP prefixes. This patch as it stands today is not yet ready for prime time. Therefore, the ECMP code fragments are fully guarded by the RADIX_MPATH macro. Include the "options RADIX_MPATH" in the kernel configuration to enable this feature. Reviewed by: robert, sam, gnn, julian, kmacy	2008-04-13 05:45:14 +00:00
Bjoern A. Zeeb	b835b6fe2b	Take the route mtu into account, if available, when sending an ICMP unreach, frag needed. Up to now we only looked at the interface MTU. Make sure to only use the minimum of the two. In case IPSEC is compiled in, loop the mtu through ip_ipsec_mtu() to avoid any further conditional maths. Without this, PMTU was broken in those cases when there was a route with a lower MTU than the MTU of the outgoing interface. PR: kern/122338 Tested by: Mark Cammidge mark peralex.com Reviewed by: silence on net@ MFC after: 2 weeks	2008-04-09 05:17:18 +00:00
Andre Oppermann	3a4018c4e8	Remove TCP options ordering assumptions in tcp_addoptions(). Ordering was changed in rev. 1.161 of tcp_var.h. All option now test for sufficient space in TCP header before getting added. Reported by: Mark Atkinson <atkin901-at-yahoo.com> Tested by: Mark Atkinson <atkin901-at-yahoo.com> MFC after: 1 week	2008-04-07 19:09:23 +00:00
Andre Oppermann	5b2e33eab5	Remove now unnecessary comment.	2008-04-07 18:50:05 +00:00
Andre Oppermann	c343c524e1	Use #defines for TCP options padding after EOL to be consistent. Reviewed by: bz	2008-04-07 18:43:59 +00:00
Robert Watson	7a3244ccb7	Add further TCP inpcb locking assertions to some TCP input code paths. MFC after: 1 month	2008-04-07 12:41:45 +00:00
Robert Watson	f457d58098	In in_pcbnotifyall() and in6_pcbnotify(), use LIST_FOREACH_SAFE() and eliminate unnecessary local variable caching of the list head pointer, making the code a bit easier to read. MFC after: 3 weeks	2008-04-06 21:20:56 +00:00
Ruslan Ermilov	ea26d58729	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.	2008-03-25 09:39:02 +00:00
Kip Macy	e79dd20dd5	change inp_wlock_assert to inp_lock_assert	2008-03-24 20:24:04 +00:00
Kip Macy	8815ab518a	Label inp as unused in the non-INVARIANTS case	2008-03-24 00:29:01 +00:00
Kip Macy	3d5853271e	Insulate inpcb consumers outside the stack from the lock type and offset within the pcb by adding accessor functions. Reviewed by: rwatson MFC after: 3 weeks	2008-03-23 22:34:16 +00:00
Paolo Pisati	63bea44682	Explicitate the newpacket size. Bug pointed out by: many Pointy hat to: me :(	2008-03-19 11:28:13 +00:00
Paolo Pisati	8368edc123	Don't cache ptr to nat rule in case of tablearg argument. Bug spotted by: Dyadchenko Mihail	2008-03-17 23:02:56 +00:00
Paolo Pisati	f6efbc8842	Don't abuse stack space while in kernel land, use heap instead.	2008-03-17 22:08:31 +00:00
Robert Watson	c2877015a1	Fix indentation for a closing brace in in_pcballoc(). MFC after: 3 days	2008-03-17 13:04:56 +00:00
Bjoern A. Zeeb	9e3bdede0f	Correct IPsec behaviour with a 'use' level in SP but no SA available. In that case return an continue processing the packet without IPsec. PR: 121384 MFC after: 5 days Reported by: Cyrus Rahman (crahman gmail.com) Tested by: Cyrus Rahman (crahman gmail.com) [slightly older version]	2008-03-14 16:38:11 +00:00
Paolo Pisati	ab0fcfd00a	-Don't pass down the entire pkt to ProtoAliasIn, ProtoAliasOut, FragmentIn and FragmentOut. -Axe the old PacketAlias API: it has been deprecated since 5.x.	2008-03-12 11:58:29 +00:00
Bjoern A. Zeeb	413deb1262	Padding after EOL option must be zeros according to RFC793 but the NOPs used are 0x01. While we could simply pad with EOLs (which are 0x00), rather use an explicit 0x00 constant there to not confuse poeple with 'EOL padding'. Put in a comment saying just that. Problem discussed on: src-committers with andre, silby, dwhite as follow up to the rev. 1.161 commit of tcp_var.h. MFC after: 11 days	2008-03-09 13:26:50 +00:00
Paolo Pisati	4741f3a109	MFP4: restrict the utilization of direct pointers to the content of ip packet. These modifications are functionally nop()s thus can be merged with no side effects.	2008-03-06 21:50:41 +00:00
Rui Paulo	1cf6e4f5ff	Change the default port range for outgoing connections by introducing IPPORT_EPHEMERALFIRST and IPPORT_EPHEMERALLAST with values 10000 and 65535 respectively. The rationale behind is that it makes the attacker's life more difficult if he/she wants to guess the ephemeral port range and also lowers the probability of a port colision (described in draft-ietf-tsvwg-port-randomization-01.txt). While there, remove code duplication in in_pcbbind_setup(). Submitted by: Fernando Gont <fernando at gont.com.ar> Approved by: njl (mentor) Reviewed by: silby, bms Discussed on: freebsd-net	2008-03-04 19:16:21 +00:00
Paolo Pisati	31937d2fb0	When unloading kld, don't forget to flush the nat pointers.	2008-03-03 22:32:01 +00:00
Paolo Pisati	2b40ce00a5	Raise a bit ipfw kld priority. Discussed on: net-, ipfw-.	2008-03-03 10:12:46 +00:00
Bjoern A. Zeeb	c3b02504bc	Some "cleanup" of tcp_mss(): - Move the assigment of the socket down before we first need it. No need to do it at the beginning and then drop out the function by one of the returns before using it 100 lines further down. - Use t_maxopd which was assigned the "tcp_mssdflt" for the corrrect AF already instead of another #ifdef ? : #endif block doing the same. - Remove an unneeded (duplicate) assignment of mss to t_maxseg just before we possibly change mss and re-do the assignment without using t_maxseg in between. Reviewed by: silby No objections: net@ (silence) MFC after: 5 days	2008-03-02 08:40:47 +00:00
Bjoern A. Zeeb	af92e6cf95	Fix indentation (whitespace changes only). MFC after: 6 days	2008-03-01 22:27:15 +00:00
Paolo Pisati	531c890b8a	Move ipfw's nat code into its own kld: ipfw_nat.	2008-02-29 22:27:19 +00:00
David Malone	2b2c3b23d1	Dummynet has a limit of 100 slots queue size (or 1MB, if you give the limit in bytes) hard coded into both the kernel and userland. Make both these limits a sysctl, so it is easy to change the limit. If the userland part of ipfw finds that the sysctls don't exist, it will just fall back to the traditional limits. (100 packets is quite a small limit these days. If you want to test TCP at 100Mbps, 100 packets can only accommodate a DBP of 12ms.) Note these sysctls in the man page and warn against increasing them without thinking first. MFC after: 3 weeks	2008-02-27 13:52:33 +00:00
Paolo Pisati	f94a7fc0b5	Add table/tablearg support to ipfw's nat. MFC After: 1 week	2008-02-24 15:37:45 +00:00
Mike Silbersack	ea346b19cc	Change FreeBSD 7 so that it returns TCP options in the same order that FreeBSD 6 and before did. Doug White and the other bloodhounds at ISC discovered that while FreeBSD 7's ordering of options was more efficient, it caused some cable modem routers to ignore the SYN-ACKs ordered in this fashion. The placement of sackOK after the timestamp option seems to be the critical difference: FreeBSD 6: <mss 1460,nop,wscale 1,nop,nop,timestamp 3512155768 0,sackOK,eol> FreeBSD 7.0: <mss 1460,nop,wscale 3,sackOK,timestamp 1370692577 0> FreeBSD 7.0 + this change: <mss 1460,nop,wscale 3,nop,nop,timestamp 7371813 0,sackOK,eol> MFC after: 1 week	2008-02-24 05:13:20 +00:00
Randall Stewart	7a846e9ad8	Fixes a memory leak when VRF's are in play. Submitted by: Prasad Narasimha (snprasad@cisco.com) Reviewed by: rrs	2008-02-22 15:08:10 +00:00
Randall Stewart	69d5ee4f23	- Takes out stray ifdef code that should not have been present.	2008-02-22 15:06:25 +00:00
Gleb Smirnoff	e60a0104f8	If the vhid already present, return EEXIST instead of non-informative EINVAL.	2008-02-07 13:18:59 +00:00
Gleb Smirnoff	3a2f50140c	Remove unused structure member from struct in_ifadown_arg.	2008-02-07 11:26:52 +00:00
Mike Silbersack	361021cc6e	Replace the random IP ID generation code we obtained from OpenBSD with an algorithm suggested by Amit Klein. The OpenBSD algorithm has a few flaws; see Amit's paper for more information. For a description of how this algorithm works, please see the comments within the code. Note that this commit does not yet enable random IP ID generation by default. There are still some concerns that doing so will adversely affect performance. Reviewed by: rwatson MFC After: 2 weeks	2008-02-06 15:40:30 +00:00
Bjoern A. Zeeb	c26fe973a3	Rather than passing around a cached 'priv', pass in an ucred to ipsec_set_policy and do the privilege check only if needed. Try to assimilate both ip_ctloutput code blocks calling ipsec*_set_policy. Reviewed by: rwatson	2008-02-02 14:11:31 +00:00
Robert Watson	265de5bb62	Correct two problems relating to sorflush(), which is called to flush read socket buffers in shutdown() and close(): - Call socantrcvmore() before sblock() to dislodge any threads that might be sleeping (potentially indefinitely) while holding sblock(), such as a thread blocked in recv(). - Flag the sblock() call as non-interruptible so that a signal delivered to the thread calling sorflush() doesn't cause sblock() to fail. The sblock() is required to ensure that all other socket consumer threads have, in fact, left, and do not enter, the socket buffer until we're done flushin it. To implement the latter, change the 'flags' argument to sblock() to accept two flags, SBL_WAIT and SBL_NOINTR, rather than one M_WAITOK flag. When SBL_NOINTR is set, it forces a non-interruptible sx acquisition, regardless of the setting of the disposition of SB_NOINTR on the socket buffer; without this change it would be possible for another thread to clear SB_NOINTR between when the socket buffer mutex is released and sblock() is invoked. Reviewed by: bz, kmacy Reported by: Jos Backus <jos at catnook dot com>	2008-01-31 08:22:24 +00:00
Randall Stewart	3ca1bceea5	- Fix a comment about prison. - Fix it so the VRF is captured while locks are held. MFC after: 1 week	2008-01-28 10:34:38 +00:00
Randall Stewart	bf949ea2d4	- Change back to using prioity 0. Which means don't change the prioity when running the thread. (this is for the sctp_interator thread). MFC after: 1 week	2008-01-28 10:33:41 +00:00
Randall Stewart	257438fb6c	- Fix a bug where the socket may have been closed which could cause a crash in the auth code. Obtained from: Michael Tuexen MFC after: 1 week	2008-01-28 10:31:12 +00:00
Randall Stewart	f36d98069e	- Fixes a comparison wrap issue with sack gap ack blocks that span the 32 bit roll over mark.	2008-01-28 10:25:43 +00:00
Robert Watson	bb5081a7eb	Hide ipfw internal data structures behind IPFW_INTERNAL rather than exposing them to all consumers of ip_fw.h. These structures are used in both ipfw(8) and ipfw(4), but not part of the user<->kernel interface for other applications to use, rather, shared implementation. MFC after: 3 days Reported by: Paul Vixie <paul at vix dot com>	2008-01-25 14:38:27 +00:00
Bjoern A. Zeeb	79ba395267	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)	2008-01-24 08:25:59 +00:00
Bjoern A. Zeeb	107d12440a	Differentiate between addifaddr and delifaddr for the privilege check. Reviewed by: rwatson MFC after: 2 weeks	2008-01-24 08:14:38 +00:00
Robert Watson	109058b094	tcp_usrreq.c:1.313 removed tcbinfo locking from tcp_usr_accept(), which while in principle a good idea, opened us up to a race inherrent to the syncache's direct insertion of incoming TCP connections into the "completed connection" listen queue, as it transpires that the socket is inserted before the inpcb is fully filled in by syncache_expand(). The bug manifested with the occasional returning of 0.0.0.0:0 in the address returned by the accept() system call, which occurred if accept managed to execute tcp_usr_accept() before syncache_expand() had copied the endpoint addresses into inpcb connection state. Re-add tcbinfo locking around the address copyout, which has the effect of delaying the copy until syncache_expand() has finished running, as it is run while the tcbinfo lock is held. This is undesirable in that it increases contention on tcbinfo further, but a more significant change will be required to how the syncache inserts new sockets in order to fix this and keep more granular locking here. In particular, either more state needs to be passed into sonewconn() so that pru_attach() can fill in the fields before the socket is inserted, or the socket needs to be inserted in the incomplete connection queue until it is actually ready to be used. Reported by: glebius (and kris) Tested by: glebius	2008-01-23 21:15:51 +00:00
Robert Watson	1e8f5ffa35	In tcp_ctloutput(), don't hold the inpcb lock over sooptcopyin(), rather, drop the lock and then re-acquire it, revalidating TCP connection state assumptions when we do so. This avoids a potential lock order reversal (and potential deadlock, although none have been reported) due to the inpcb lock being held over a page fault. MFC after: 1 week PR: 102752 Reviewed by: bz Reported by: VÃ¡clav Haisman <v dot haisman at sh dot cvut dot cz>	2008-01-18 12:19:50 +00:00
Julian Elischer	b6ae6984e8	Don't duplicate the whole of arpresolve to arpresolve 2 for the sake of two compares against 0. The negative effect of cache flushing is probably more than the gain by not doing the two compares (the value is almost certainly in register or at worst, cache). Note that the uses of m_freem() are in error cases and m_freem() handles NULL anyhow. So fast-path really isn't changed much at all.	2007-12-31 23:48:06 +00:00
Oleg Bulyzhin	5254af0cf1	Workaround p->numbytes overflow, which can result in infinite loop inside dummynet module (prerequisite is using queues with "fat" pipe). PR: kern/113548	2007-12-25 09:36:51 +00:00
Robert Watson	0bffde27b2	When IPSEC fails to allocate policy state for an inpcb, and MAC is in use, free the MAC label on the inpcb before freeing the inpcb. MFC after: 3 days Submitted by: tanyong <tanyong at ercist dot iscas dot ac dot cn>, zhouzhouyi	2007-12-22 10:06:11 +00:00
Ruslan Ermilov	9eb1b6aabb	Fix bugs in the TCP syncache timeout code. including: When system ticks are positive, for entries in the cache bucket, syncache_timer() ran on every tick (doing nothing useful) instead of the supposed 3, 6, 12, and 24 seconds later (when it's time to retransmit SYN,ACK). When ticks are negative, syncache_timer() was scheduled for the too far future (up to ~25 days on systems with HZ=1000), no SYN,ACK retransmits were attempted at all, and syncache entries added in that period that correspond to non-established connections stay there forever. Only HEAD and RELENG_7 are affected. Reviewed by: silby, kmacy (earlier version) Submitted by: Maxim Dounin, ru	2007-12-19 16:56:28 +00:00
Kip Macy	d29a9a83fd	Remove extraneous debug statements. Noticed by: Andrey Chernov	2007-12-19 05:17:40 +00:00
Kip Macy	bc65987ade	Incorporate TCP offload hooks in to core TCP code. - Rename output routines tcp_gen_* -> tcp_output_. - Rename notification routines that turn in to no-ops in the absence of TOE from tcp_gen_ -> tcp_offload_. - Fix some minor comment nits. - Add a / FALLTHROUGH */ Reviewed by: Sam Leffler, Robert Watson, and Mike Silbersack	2007-12-18 22:59:07 +00:00
Randall Stewart	83073fcba3	- sctp-iterator should run at PI_NET priority ...not 0. MFC after: 1 week	2007-12-18 01:24:15 +00:00
Kip Macy	8b5709dfab	incorporate feedback since initial commit - rename tcp_ofld.[ch] to tcp_offload.[ch] - document usage and locking conventions of the functions in the toe_usrreqs function vector - document tcpcb, inpcb, and socket fields used by toe - widen the listen interface into 2 functions - rename DISABLE_TCP_OFFLOAD to TCP_OFFLOAD_DISABLE - shrink conditional compilation to reduce the likelihood of bitrot - replace sc->sc_toepcb checks in tcp_syncache.c with TOEPCB_ISSET	2007-12-17 07:56:27 +00:00
Kip Macy	29910a5a77	widen the routing event interface (arp update, redirect, and eventually pmtu change) into separate functions revert previous commit's changes to arpresolve and add a new interface arpresolve2 which does arp resolution without an mbuf	2007-12-17 07:40:34 +00:00
Kip Macy	58505389d1	Don't panic in arpresolve if we're given a null mbuf. We could insist that the caller just pass in an initialized mbuf even if didn't have any data - but that seems rather contrived.	2007-12-17 04:19:25 +00:00
Kip Macy	bdca760906	Update tod_connect call to reflect updated interface	2007-12-16 07:37:48 +00:00
Kip Macy	b3e761e5c8	Move arp update upcall to always be called for ARP replies - previous invocation would not always get called at the appropriate times	2007-12-16 06:42:33 +00:00
Kip Macy	a9420d282f	Update the toedev's connect interface to reflect the fact that the inpcb doesn't cache the rtentry in HEAD.	2007-12-16 05:30:21 +00:00
Kip Macy	ee939bbf7e	Add socket option for setting and retrieving the congestion control algorithm. The name used is to allow compatibility with Linux.	2007-12-16 03:30:07 +00:00
Kip Macy	9f117e1062	make naming prefixes consistent across tom_info	2007-12-15 20:20:08 +00:00
Kip Macy	0005682030	Fix error in previous commit - the style fix changed flag name without changing references to the flag	2007-12-13 01:24:20 +00:00
Kip Macy	76b262c426	Fix style issues with initial TCP offload commit Requested by: rwatson Submitted by: rwatson	2007-12-12 23:31:49 +00:00
Kip Macy	8e7e854cd6	add interface for allowing consumers to register for ARP updates, redirects, and path MTU changes Reviewed by: silby	2007-12-12 20:53:25 +00:00
Kip Macy	284333d353	Add interface for tcp offload to syncache: - make neccessary changes to release offload resources when a syncache entry is removed before connection establishment - disable checks for offloaded connection where insufficient information is available Reviewed by: silby	2007-12-12 20:35:59 +00:00
Kip Macy	620721db82	Add driver independent interface to offload active established TCP connections Reviewed by: silby	2007-12-12 20:21:39 +00:00
Kip Macy	4f1efccf29	Remove spurious timestamp check. RFC 1323 explicitly states that timestamps MAY be transmitted if negotiated.	2007-12-12 06:11:50 +00:00
David Malone	71bd9b9cf9	If we are walking the IPv6 header chain and we hit an IPPROTO_NONE header, then don't try to pullup anything, because there is no next header if we hit IPPROTO_NONE. Set ulp to a non-NULL value so the search for an upper layer header terinates. This is based on Pekka's diagnosis, but I chose a simpler fix. PR: 115261 Submitted by: Pekka Savola <pekkas@netcore.fi> Reviewed by: mlaier MFC after: 2 weeks	2007-12-09 15:35:09 +00:00
Kip Macy	2de2af32a0	Add padding for anticipated functionality - vimage - TOE - multiq - host rtentry caching Rename spare used by 80211 to if_llsoftc Reviewed by: rwatson, gnn MFC after: 1 day	2007-12-07 01:46:13 +00:00
Randall Stewart	41eee5558c	- More fixes for lock misses on the transfer of data to the sent_queue. Sometimes I wonder why any code ever works :-) - Fix the pad of the last mbuf routine, It was working improperly on non-4 byte aligned chunks which could cause memory overruns. MFC after: 1 week	2007-12-07 01:32:14 +00:00
Dag-Erling Smørgrav	6c7faee24f	Simpler version of the previous commit.	2007-12-06 09:31:13 +00:00
Randall Stewart	9c04b2966d	- optimize the initialization of the SB max variables. - Missing lock when sending data and moving it to the outqueue. - If a mbuf alloc fails during moving to outqueue the reassembly of the old mbuf chain was incorrect. - some_taken becomes a counter in sctputil.c instead of a set to 1. - Fix a panic to be only under invarients and have a proper recovery. - msg_flags needed to be set.to the value collected not or'd. MFC after: 1 week	2007-12-06 00:22:55 +00:00
Randall Stewart	2aedc03dad	- More fixes for the non-blocking msg send, had the skip of the pre-block test incorrect. - Fix the initial buf calculation to be more friendly, calc is the same but we use different variable to make it easier amongst the different code versions. MFC after: 1 week	2007-12-04 20:20:42 +00:00
Randall Stewart	0e81d2ed7a	- Opps, signedness issue with one of the new var's (this is an issue mainly in apple but with the right -Wall it could effect us too). MFC after: 1 week	2007-12-04 14:47:39 +00:00
Randall Stewart	9f22f50039	- Found a problem in non-blocking sends. When sending, once the locks are all unlocked to do the copy's in, its possible that other events could then raise the number of bytes outstanding pushing it so not all the message would fit. This would then cause us to send only part of the message. This fix makes it so we keep a "reserved" amount that can be kept in mind when making calculations to send. - rcv msg args with a NULL/NULL for to/tolen will return an error incorrectly for the 1-2-1 model. - We were not doing 0 len return correctly and not setting cantrcv more correctly. Previouly we "fixed" this area by taking out the socantrcv since we then could not get the data out. The correct rix is to still flag the socket but alow a by-pass route to continue to read until all data is consumed. MFC after: 1 week	2007-12-04 14:41:48 +00:00
Yaroslav Tykhiy	3affb6fb19	For the sake of convenience, print the name of the network interface IPv4 address duplication was detected on. Idea by: marck	2007-12-04 13:01:12 +00:00
Mike Silbersack	136286a141	Fix SACK negotiation that was broken in rev 1.105. Before this fix, FreeBSD would negotiate SACK on outgoing connections, but would always fail to negotiate it on incoming connections. Discovered by: James Healy and Lawrence Stewart Submitted by: James Healy and Lawrence Stewart MFC after: 3 days	2007-12-04 07:11:13 +00:00
Guido van Rooij	d23d475fb4	Consider the following situation: 1. A packet comes in that is to be forwarded 2. The destination of the packet is rewritten by some firewall code 3. The next link's MTU is too small 4. The packet has the DF bit set Then the current code is such that instead of setting the next link's MTU in the ICMP error, ip_next_mtu() is called and a guess is sent as to which MTU is supposed to be tried next. This is because in this case ip_forward() is called with srcrt set to 1. In that case the ia pointer remains NULL but it is needed to get the MTU of the interface the packet is to be sent out from. Thus, we always set ia to the outgoing interface. MFC after: 2 weeks	2007-12-02 13:00:47 +00:00
Bjoern A. Zeeb	ee763d0d9c	Centralize and correct computation of TCP-MD5 signature offset within the packet (tcp header options field). Reviewed by: tools/regression/netinet/tcpconnect MFC after: 3 days Tested by: Nick Hilliard (see net@)	2007-11-30 23:46:51 +00:00
Bjoern A. Zeeb	beb8b626d1	Move call to tcp_signature_compute() after we adjusted the payload offset in the tcp header. With relevant parts of the tcp header changing after the 'signature' was computed, the signature becomes invalid. Reviewed by: tools/regression/netinet/tcpconnect MFC after: 3 days Tested by: Nick Hilliard (see net@)	2007-11-30 23:41:51 +00:00
Bjoern A. Zeeb	4a411b9fcc	Let opt be an array. Though &opt[0] == opt == &opt, &opt is highly confusing and hard to understand so change it to just opt and remove the extra cast no longer/not needed. Discussed with: rwatson MFC after: 3 days	2007-11-28 13:33:27 +00:00
Bjoern A. Zeeb	abebe6db7a	Correctly get the authentication key for TCP-MD5 from the SA. Submitted by: Nick Hilliard on net@ MFC after: 8 weeks	2007-11-28 13:23:50 +00:00
Robert Watson	2b19cb1b87	More carefully handle various cases in sysctl_drop(), such as unlocking the inpcb when there's an inpcb without associated timewait state, and not unlocking when the inpcb has been freed. This avoids a kernel panic when tcpdrop(8) is run on a socket in the TIMEWAIT state. MFC after: 3 days Reported by: Rako <rako29 at gmail dot com>	2007-11-24 18:43:59 +00:00
John Birrell	962e1ce30f	Fix strict alias warnings.	2007-11-23 23:56:03 +00:00
Bjoern A. Zeeb	9ad0173df1	Make TSO work with IPSEC compiled into the kernel. The lookup hurts a bit for connections but had been there anyway if IPSEC was compiled in. So moving the lookup up a bit gives us TSO support at not extra cost. PR: kern/115586 Tested by: gallatin Discussed with: kmacy MFC after: 2 months	2007-11-21 22:30:14 +00:00
Mike Silbersack	1b67beea13	Comment out the syncache's test which ensures that hosts which negotiate TCP timestamps in the initial SYN packet actually use them in the rest of the connection. Unfortunately, during the 7.0 testing cycle users have already found network devices that violate this constraint. RFC 1323 states 'and may send a TSopt in other segments' rather than 'and MUST send', so we must allow it. Discovered by: Rob Zietlow Tracked down by: Kip Macy PR: bin/118005	2007-11-20 06:56:04 +00:00
Oleg Bulyzhin	8d1e3aed2d	- New sysctl variable: net.inet.ip.dummynet.io_fast If it is set to zero value (default) dummynet module will try to emulate real link as close as possible (bandwidth & latency): packet will not leave pipe faster than it should be on real link with given bandwidth. (This is original behaviour of dummynet which was altered in previous commit) If it is set to non-zero value only bandwidth is enforced: packet's latency can be lower comparing to real link with given bandwidth. - Document recently introduced dummynet(4) sysctl variables. Requested by: luigi, julian MFC after: 3 month	2007-11-17 21:54:57 +00:00
Randall Stewart	81aca91ab6	- Fix a bug in sctp_calc_rwnd() which resulted in wrong rwnd predictions. - Fix a signedness problem that shows up in some 64 bit platforms (macos). MFC after: 1 week	2007-11-10 00:47:14 +00:00
Oleg Bulyzhin	897c0f57d4	1) dummynet_io() declaration has changed. 2) Alter packet flow inside dummynet: allow certain packets to bypass dummynet scheduler. Benefits are: - lower latency: if packet flow does not exceed pipe bandwidth, packets will not be (up to tick) delayed (due to dummynet's scheduler granularity). - lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip stack later. Such packets can be fastforwarded. - recursion (which can lead to kernel stack exhaution) eliminated. This fix long existed panic, which can be triggered this way: kldload dummynet sysctl net.inet.ip.fw.one_pass=0 ipfw pipe 1 config bw 0 for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done ping -c 1 localhost 3) Three new sysctl nodes are added: net.inet.ip.dummynet.io_pkt - packets passed to dummynet net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow is not changed yet. MFC after: 3 month	2007-11-06 23:01:42 +00:00
Oleg Bulyzhin	e793482352	style(9) cleanup. MFC after: 3 month	2007-11-06 22:53:41 +00:00
Randall Stewart	fb8fb8f815	- Change the Time Wait of vtags value to match the cookie-life - Select a tag gains ability to optionally save new tags off in the timewait system. - When looking up associations do not give back a stcb that is in the about-to-be-freed state, and instead continue looking for other candiates. - New function to query to see if value is in time-wait. - Timewait had a time comparison error that caused very few vtags to actually stay in time-wait. - When setting tags in time-wait, we now use the time requested NOT a fixed constant value. - sstat now gets the proper associd when we do the query. - When we process an association, we expect the tag chosen (if we have one from a cookie) to be in time-wait. Before we would NOT allow the assoc up by checking if its good. In theory this should have caused almost all assoc not to come up except for the time-comparison bug above (this bug was hidden by the time comparison bug :-D). - Don't save tags for nonce values in the time-wait cache since these are used only during cookie collisions and do not matter if they are unique or not. MFC after: 1 week	2007-10-30 14:09:24 +00:00
Robert Watson	a13e21f7bc	Continue to move from generic network entry points in the TrustedBSD MAC Framework by moving from mac_mbuf_create_netlayer() to more specific entry points for specific network services: - mac_netinet_firewall_reply() to be used when replying to in-bound TCP segments in pf and ipfw (etc). - Rename mac_netinet_icmp_reply() to mac_netinet_icmp_replyinplace() and add mac_netinet_icmp_reply(), reflecting that in some cases we overwrite a label in place, but in others we apply the label to a new mbuf. Obtained from: TrustedBSD Project	2007-10-28 17:12:48 +00:00
Robert Watson	b9b0dac33b	Move towards more explicit support for various network protocol stacks in the TrustedBSD MAC Framework: - Add mac_atalk.c and add explicit entry point mac_netatalk_aarp_send() for AARP packet labeling, rather than using a generic link layer entry point. - Add mac_inet6.c and add explicit entry point mac_netinet6_nd6_send() for ND6 packet labeling, rather than using a generic link layer entry point. - Add expliict entry point mac_netinet_arp_send() for ARP packet labeling, and mac_netinet_igmp_send() for IGMP packet labeling, rather than using a generic link layer entry point. - Remove previous genering link layer entry point, mac_mbuf_create_linklayer() as it is no longer used. - Add implementations of new entry points to various policies, largely by replicating the existing link layer entry point for them; remove old link layer entry point implementation. - Make MAC_IFNET_LOCK(), MAC_IFNET_UNLOCK(), and mac_ifnet_mtx global to the MAC Framework rather than static to mac_net.c as it is now needed outside of mac_net.c. Obtained from: TrustedBSD Project	2007-10-28 15:55:23 +00:00
Robert Watson	8640764682	Rename 'mac_mbuf_create_from_firewall' to 'mac_netinet_firewall_send' as we move towards netinet as a pseudo-object for the MAC Framework. Rename 'mac_create_mbuf_linklayer' to 'mac_mbuf_create_linklayer' to reflect general object-first ordering preference. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-26 13:18:38 +00:00
Robert Watson	02be6269c3	Normalize TCP syncache-related MAC Framework entry points to match most other entry points in the form mac_<object>_method(). Discussed with: csjp Obtained from: TrustedBSD Project	2007-10-25 14:37:37 +00:00
Robert Watson	30d239bc4c	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer	2007-10-24 19:04:04 +00:00
Julian Elischer	3745c395ec	Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.	2007-10-20 23:23:23 +00:00
Rui Paulo	bf37f5b05f	Remove IPTOS_CE and IPTOS_ECT constants. They were defined in RFC 2481 but later obsoleted by RFC 3168. Discussed on freebsd-net with no objections. Approved by: njl (mentor), rwatson	2007-10-19 12:46:15 +00:00
Mike Silbersack	9b3bc6bf83	Pick the smallest possible TCP window scaling factor that will still allow us to scale up to sb_max, aka kern.ipc.maxsockbuf. We do this because there are broken firewalls that will corrupt the window scale option, leading to the other endpoint believing that our advertised window is unscaled. At scale factors larger than 5 the unscaled window will drop below 1500 bytes, leading to serious problems when traversing these broken firewalls. With the default maxsockbuf of 256K, a scale factor of 3 will be chosen by this algorithm. Those who choose a larger maxsockbuf should watch out for the compatiblity problems mentioned above. Reviewed by: andre	2007-10-19 08:53:14 +00:00
Randall Stewart	b201f5360c	- fix sctp_ifn initial refcount issue (prevents deletion) - fix a bug during cookie collision that prevented an association from coming up in a specific restart case. - Fix it so the shutdown-pending flag gets removed (this is more for correctness then needed) when we enter shutdown-sent or shutdown-ack-sent states. - Fix a bug that caused the receiver to sometimes NOT send a SACK when a duplicate TSN arrived. Without this fix it was possible for the association to fall down if the - Deleted primary destination is also stored when SCTP_MOBILITY_BASE. (Previously, it is stored when only SCTP_MOBILITY_FASTHANDOFF) - Fix a locking issue where we might call send_initiate_ack() and incorrectly state the lock held/not held. Also fix it so that when we release the lock the inp cannot be deleted on us. - Add the debug option that can cause the stack to panic instead of aborting an assoc. This does not and should never show up in options but is useful for debugging unexpected aborts. - Add cumack_log sent to track sending cumack information for the debug case where we are running a special log per assoc. - Added extra () aroudn sctp_sbspace macro to avoid compile warnings. MFC after: 1 week	2007-10-16 14:05:51 +00:00
Kevin Lo	976b010645	Spelling fix for interupt -> interrupt	2007-10-12 06:03:46 +00:00
Mike Silbersack	4b421e2daa	Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports. Approved by: re (kensmith)	2007-10-07 20:44:24 +00:00
Mike Silbersack	e31d8aa3da	Improve the debugging message: TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received data after socket was closed, sending RST and removing tcpcb So that it also includes how many bytes of data were received. It now looks like this: TCP: [X.X.X.X]:X to [X.X.X.X]:X tcpflags 0x18<PUSH,ACK>; tcp_do_segment: FIN_WAIT_2: Received X bytes of data after socket was closed, sending RST and removing tcpcb Approved by: re (gnn)	2007-10-07 00:07:27 +00:00
Randall Stewart	8d3b5e7afe	- Fix the one-2-one model to properly do a socantrecv() Approved by: re@freeBSD.org (Ken Smith)	2007-10-06 13:23:42 +00:00
Robert Watson	0fb651b1c4	Disable TCP syncache debug logging by default. While useful in debugging problems with the syncache, it produces a lot of console noise and has led to quite a few false positive bug reports. It can be selectively re-enabled when debugging specific problems by frobbing the same sysctl. Discussed with: silby Approved by: re (gnn)	2007-10-05 22:39:44 +00:00
Randall Stewart	7924093f84	- We should return error = 0 and the upper processing would return a zero length read. Otherwise we don't return the right error indication. Approved by: re@freebsd.org (gnn)	2007-10-04 09:29:33 +00:00
Randall Stewart	d55b0b1b09	- Bug fix managing congestion parameter on immediate retransmittion by handover event (fast mobility code) - Fixed problem of mobility code which is caused by remaining parameters in the deleted primary destination. - Add a missing lock. When a peer sends an INIT, and while we are processing it to send an INIT-ACK the socket is closed, we did not hold a lock to keep the socket from going away. Add protection for this case. - Fix so that arwnd is alway uses the minimal rwnd if the user has set the socket buffer smaller. Found this when the test org decided to see what happens when you set in a rwnd of 10 bytes (which is not allowed per RFC .. 4k is minimum). - Fixes so a cookie-echo ootb will NOT cause an abort to be sent. This was happening in a MPI collision case. - Examined all panics and unless there was no recovery, moved any that were not already to INVARANTS. Approved by: re@freebsd.org (gnn)	2007-10-01 03:22:29 +00:00
Maxim Konovalov	eeb36ca3d5	o For dynamic rules log a parent rule number. Prefix a log message by 'ipfw: '. PR: kern/115755 Submitted by: sem Approved by: re (gnn) MFC after: 4 weeks	2007-09-29 15:01:41 +00:00
Konstantin Belousov	586b4a0e50	Revert rev. 1.94. After recent tcp backouts, tcp_close() may return NULL. Check the return value of tcp_close() being NULL before dereferencing it in #ifdef TCPDEBUG block. Reviewed by: rwatson Approved by: re (gnn)	2007-09-24 14:46:27 +00:00
Mike Silbersack	e2f2059f68	Two changes: - Reintegrate the ANSI C function declaration change from tcp_timer.c rev 1.92 - Reorganize the tcpcb structure so that it has a single pointer to the "tcp_timer" structure which contains all of the tcp timer callouts. This change means that when the single tcp timer change is reintegrated, tcpcb will not change in size, and therefore the ABI between netstat and the kernel will not change. Neither of these changes should have any functional impact. Reviewed by: bmah, rrs Approved by: re (bmah)	2007-09-24 05:26:24 +00:00
Christian S.J. Peron	bc60490a88	Certain consumers of rtalloc like gif(4) and if_stf(4) lookup the route and once they are done with it, call rtfree(). rtfree() should only be used when we are certain we hold the last reference to the route. This bug results in console messages like the following: rtfree: 0xc40f7000 has 1 refs This patch switches the rtfree() to use RTFREE_LOCKED() instead, which should handle the reference counting on the route better. Approved by: re@ (gnn) Reviewed by: bms Reported by: many via net@ and current@ Tested by: many	2007-09-23 17:50:17 +00:00
Randall Stewart	baf3da661c	- fix (global) address handling in the presence of duplicates, the last interface should own the address, but the current code fumbles the handoff. This fixes that. - move address related debugs to PCB4 and add additional ones to help in debugging address problems. Approved by: re@freebsd.org (K Smith)	2007-09-21 04:19:33 +00:00
Randall Stewart	c99efcf633	- The address lock is changed to a rwlock. This also involves macro changes to have a RLOCK and a WLOCK and placing the correct version within the code. - The INP-INFO lock is changed to a rwlock. - When sctp_shutdown() is called on Mac OS X, the socket lock is held. So call sctp_chunk_output with SCTP_SO_LOCKED and not SCTP_SO_NOT_LOCKED. - Add SCTP_IPI_ADDR_[RW]LOCK and SCTP_IPI_ADDR_[RW]UNLOCK for Mac OS X. - u_int64_t -> uint64_t - add missing addr unlock for error return path Approved by: re@freebsd.org (K Smith)	2007-09-18 15:16:39 +00:00
Randall Stewart	0dc12c958a	- For the 1-to-1 model, fix an off by one error that allowed an extra connection over the backlog (by one) Approved by: re@freebsd.org (B. Mah)	2007-09-16 23:03:38 +00:00
Randall Stewart	3232788ef2	- Get rid of unsused constants for sysctl variables. - Fix panic from mutex unlock on freed lock when ASCONF-ACK aborts an assoc - Fix panic from addr lock recursion when ASCONFs are queued in the front states - ASCONFs "queued" in the front states should really be bundled after the COOKIE-ACK, not in front of it - Fix issue with addresses deleted in the front states from being sent with ASCONF(DELETE)-- replaced sctp_asconf_queue_add_sa() with delete specific function - Comment change in sctp.h the drafts are now RFC's Approved by: re@freebsd.org (B Mah)	2007-09-15 19:07:42 +00:00
Randall Stewart	b27a6b7d73	- DF bit was on for COOKIE-ECHO chunks. This is incorrect and should be OFF letting IP fragment large cookie-echos. - Rename sysctl variable logging to log_level. - Fix description of sysctl variable stats. - Add sysctl variable log to make sctp_log readable via sysctl mechanism (this is by compile switch and targets non KTR platforms or when someone wants to do performance wise tracing). - Removed debug code Approved by: re@freebsd.org (B Mah)	2007-09-13 14:43:54 +00:00
Randall Stewart	04ee05e815	- Incorrect error EAGAIN returned for invalid send on a locked stream (using EEOR mode). Changed to EINVAL (in sctp_output.c) - Static analysis comments added - fix in mobility code to return a value (static analysis found). - sctp6_notify function made visible instead of static (this is needed for Panda). Approved by: re@freebsd.org (B Mah)	2007-09-13 10:36:43 +00:00
Randall Stewart	19cf67115c	- Removed debug code and more C++ style comments in the mobility code in sctp_asconf.c Approved by: re@freebsd.org (B Mah)	2007-09-10 21:01:56 +00:00
Randall Stewart	b7a446b8b7	- Added some comments to tell where the htcp code comes from. - Fix a LOR on Mac OS X: Do not hold an stcb lock when calling soisconnected for a socket which has the SS_INCOMP bit set on so_state. - fix a comment to be non c++ style. Approved by: re@freebsd.org (B Mah)	2007-09-10 17:06:25 +00:00
Ken Smith	a258946554	Make sure that either inp is NULL or we have obtained a lock on it before jumping to dropunlock to avoid a panic. While here move the calls to ipsec4_in_reject() and ipsec6_in_reject() so they are after we obtain the lock on inp. Original patch to avoid panic: pjd Review of locking adjustments: gnn, sam Approved by: re (rwatson)	2007-09-10 14:49:32 +00:00
Robert Watson	f5514f084e	Further UDPv4 cleanup: - Resort includes a bit. - Correct typos and wording problems in comments. - Rename udpcksum to udp_cksum to be consistent with other UDP-related configuration variables. - Remove indirection of udp_notify through local notify variable in udp_ctlinput(), which is presumably due to copying and pasting from TCP, where multiple notify routines exist. Approved by: re (kensmith)	2007-09-10 14:22:15 +00:00
Randall Stewart	851b7298b3	- send call has a reference to uio->uio_resid in the recent send code, but uio may be NULL on sendfile calls. Change to use sndlen variable. - EMSGSIZE is not being returned in non-blocking mode and needs a small tweak to look if the msg would ever fit when returning EWOULDBLOCK. - FWD-TSN has a bug in stream processing which could cause a panic. This is a follow on to the codenomicon fix. - PDAPI level 1 and 2 do not work unless the reader gets his returned buffer full. Fix so we can break out when at level 1 or 2. - Fix fast-handoff features to copy across properly on accepted sockets - Fix sctp_peeloff() system call when no true system call exists to screen arguments for errors. In cases where a real system call exists the system call itself does this. - Fix raddr leak in recent add-ip code change for bundled asconfs (even when non-bundled asconfs are received) - Make sure ipi_addr lock is held when walking global addr list. Need to change this lock type to a rwlock(). - Add don't wake flag on both input and output when the socket is closing. - When deleting an address verify the interface is correct before allowing the delete to process. This protects panda and unnumbered. - Clean up old sysctl stuff and get rid of the old Open/Net BSD structures. - Add a function to watch the ranges in the sysctl sets. - When appending in the reassembly queue, validate that the assoc has not gone to about to be freed. If so (in the middle) abort out. Note this especially effects MAC I think due to the lock/unlock they do (or with LOCK testing in place). - Netstat patch to get rid of warnings. - Make sure that no data gets queued to inactive/unconfirmed destinations. This especially effect CMT but also makes a impact on regular SCTP as well. - During init collision when we detect seq number out of sync we need to treat it like Case C and discard the cookie (no invarient needed here). - Atomic access to the random store. - When we declare a vtag good, we need to shove it into the time wait hash to prevent further use. When the tag is put into the assoc hash, we need to remove it from the twait hash (where it will surely be). This prevents duplicate tag assignments. - Move decr-ref count to better protect sysctl out of data. - ltrace error corrections in sctp6_usrreq.c - Add hook for interface up/down to be sent to us. - Make sysctl() exported structures independent of processor architecture. - Fix route and src addr cache clearing for delete address case. - Make sure address marked SCTP_DEL_IP_ADDRESS is never selected as src addr. - in icmp handling fixed so we actually look at the icmp codes to figure out what to do. - Modified mobility code. Reception of DELETE IP ADDRESS for a primary destination and SET PRIMARY for a new primary destination is used for retransmission trigger to the new primary destination. Also, in this case, destination of chunks in send_queue are changed to the new primary destination. - Fix so that we disallow sending by mbuf to ever have EEOR mode set upon it. Approved by: re@freebsd.org (B Mah)	2007-09-08 17:48:46 +00:00
Randall Stewart	ceaad40ae7	- Locking compatiability changes. This involves adding additional flags to many function calls. The flags only get used in BSD when we compile with lock testing. These flags allow apple to escape the "giant" lock it holds on the socket and have more fine-grained locking in the NKE. It also allows us to test (with witness) the locking used by apple via a compile switch (manually applied). Approved by: re@freebsd.org(B Mah)	2007-09-08 11:35:11 +00:00
Robert Watson	85d9437250	Back out tcp_timer.c:1.93 and associated changes that reimplemented the many TCP timers as a single timer, but retain the API changes necessary to reintroduce this change. This will back out the source of at least two reported problems: lock leaks in certain timer edge cases, and TCP timers continuing to fire after a connection has closed (a bug previously fixed and then reintroduced with the timer rewrite). In a follow-up commit, some minor restylings and comment changes performed after the TCP timer rewrite will be reapplied, and a further change to allow the TCP timer rewrite to be added back without disturbing the ABI. The new design is believed to be a good thing, but the outstanding issues are leading to significant stability/correctness problems that are holding up 7.0. This patch was generated by silby, but is being committed by proxy due to poor network connectivity for silby this week. Approved by: re (kensmith) Submitted by: silby Tested by: rwatson, kris Problems reported by: peter, kris, others	2007-09-07 09:19:22 +00:00
Brian Feldman	598fa04675	Repair ALTQ-tagging rules in IPFW which got broken in the last PF import. The PF mbuf-tagging support routines changed to link the allocated tags into the provided mbuf themselves, so the left-over m_tag_prepend() was trying to add a bogus (usually NULL) tag. Reviewed by: mlaier Approved by: re	2007-08-29 19:34:28 +00:00
Randall Stewart	2afb3e849f	- During shutdown pending, when the last sack came in and the last message on the send stream was "null" but still there, a state we allow, we could get hung and not clean it up and wait for the shutdown guard timer to clear the association without a graceful close. Fix this so that that we properly clean up. - Added support for Multiple ASCONF per new RFC. We only (so far) accept input of these and cannot yet generate a multi-asconf. - Sysctl'd support for experimental Fast Handover feature. Always disabled unless sysctl or socket option changes to enable. - Error case in add-ip where the peer supports AUTH and ADD-IP but does NOT require AUTH of ASCONF/ASCONF-ACK. We need to ABORT in this case. - According to the Kyoto summit of socket api developers (Solaris, Linux, BSD). We need to have: o non-eeor mode messages be atomic - Fixed o Allow implicit setup of an assoc in 1-2-1 model if using the sctp_**() send calls - Fixed o Get rid of HAVE_XXX declarations - Done o add a sctp_pr_policy in hole in sndrcvinfo structure - Done o add a PR_SCTP_POLICY_VALID type flag - yet to-do in a future patch! - Optimize sctp6 calls to reuse code in sctp_usrreq. Also optimize when we close sending out the data and disabling Nagle. - Change key concatenation order to match the auth RFC - When sending OOTB shutdown_complete always do csum. - Don't send PKT-DROP to a PKT-DROP - For abort chunks just always checksums same for shutdown-complete. - inpcb_free front state had a bug where in queue data could wedge an assoc. We need to just abandon ones in front states (free_assoc). - If a peer sends us a 64k abort, we would try to assemble a response packet which may be larger than 64k. This then would be dropped by IP. Instead make a "minimum" size for us 64k-2k (we want at least 2k for our initack). If we receive such an init discard it early without all the processing. - When we peel off we must increment the tcb ref count to keep it from being freed from underneath us. - handling fwd-tsn had bugs that caused memory overwrites when given faulty data, fixed so can't happen and we also stop at the first bad stream no. - Fixed so comm-up generates the adaption indication. - peeloff did not get the hmac params copied. - fix it so we lock the addr list when doing src-addr selection (in future we need to use a multi-reader/one writer lock here) - During lowlevel output, we could end up with a _l_addr set to null if the iterator is calling the output routine. This means we would possibly crash when we gather the MTU info. Fix so we only do the gather where we have a src address cached. - we need to be sure to set abort flag on conn state when we receive an abort. - peeloff could leak a socket. Moved code so the close will find the socket if the peeloff fails (uipc_syscalls.c) Approved by: re@freebsd.org(Ken Smith)	2007-08-27 05:19:48 +00:00
Maxim Konovalov	4a296ec798	o Fix bug I introduced in the previous commit (ipfw set extention): pack a set number correctly. Submitted by: oleg o Plug a memory leak. Submitted by: oleg and Andrey V. Elsukov Approved by: re (kensmith) MFC after: 1 week	2007-08-26 18:38:31 +00:00
Randall Stewart	c4739e2f47	- Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. - Fix sctp_lower_sosend to send shutdown chunk for mbuf send case when sndlen = 0 and sinfoflag = SCTP_EOF - Fix sctp_lower_sosend for SCTP_ABORT mbuf send case with null data, So that it does not send the "null" data mbuf out and cause it to get freed twice. - Fix so auto-asconf sysctl actually effect the socket's asconf state. - Do not allow SCTP_AUTO_ASCONF option to be used on subset bound sockets. - Memset bug in sctp_output.c (arguments were reversed) submitted found and reported by Dave Jones (davej@codemonkey.org.uk). - PD-API point needs to be invoked >= not just > to conform to socket api draft this fixes sctp_indata.c in the two places need to be >=. - move M_NOTIFICATION to use M_PROTO5. - PEER_ADDR_PARAMS did not fail properly if you specify an address that is not in the association with a valid assoc_id. This meant you got or set the stcb level values instead of the destination you thought you were going to get/set. Now validate if the stcb is non-null and the net is NULL that the sa_family is set and the address is unspecified otherwise return an error. - The thread based iterator could crash if associations were freed at the exact time it was running. rework the worker thread to use the increment/decrement to prevent this and no longer use the markers that the timer based iterator uses. - Fix the memleak in sctp_add_addr_to_vrf() for the case when it is detected that ifa is already pointing to a ifn. - Fix it so that if someone is so insane that they drop the send window below the minimal add mark, they still can send. - Changed all state for associations to use mask safe macro. - During front states in association freeing in sctp_inpcbfree, we had a locking problem where locks were not in place where they should have been. - Free association calls were not testing the return value in sctp_inpcb_free() properly... others should be cast void returns where we don't care about the return value. - If a reference count is held on an assoc, even from the "force free" we should not do the actual free.. but instead let the timer free it. - When we enter sctp_input(), if the SCTP_ASOC_ABOUT_TO_BE_FREED flag is set, we must NOT process the packet but handle it like ootb. This is because while freeing an assoc we release the locks to get all the higher order locks so we can purge all the hash tables. This leaves a hole if a packet comes in just at that point. Now sctp_common_input_processing() will call the ootb code in such a case. - Change MBUF M_NOTIFICATION to use M_PROTO5 (per Sam L). This makes it so we don't have a conflict (I think this is a covertity change). We made this change AFTER some conversation and looking to make sure that M_PROTO5 does not have a problem between SCTP and the 802.11 stuff (which is the only other place its used). - Fixed lock order reversal and missing atomic protection around locked_tcb during association lookup and the 1-2-1 model. - Added debug to source address selection. - V6 output must always do checksum even for loopback. - Remove more locks around inp that are not needed for an atomically added/subtracted ref count. - slight optimization in the way we zero the array in sctp_sack_check() - It was possible to respond to a ABORT() with bad checksum with a PKT-DROP. This lead to a PKT-DROP/ABORT war. Add code to NOT send a PKT-DROP to any ABORT(). - Add an option for local logging (useful for macintosh or when you need better performing during debugging). Note no commands are here to get the log info, you must just use kgdb. - The timer code needs to be aware of if it needs to call sctp_sack_check() to slide the maps and adjust the cum-ack. This is because it may be out of sync cum-ack wise. - Added threshold managment logging. - If the user picked just the right size, that just filled the send window minus one mtu, we would enter a forever loop not copying and at the same time not blocking. Change from < to <= solves this. - Sysctl added to control the fragment interleave level which defaults to 1. - My rwnd control was not being used to control the rwnd properly (we did not add and subtract to it :-() this is now fixed so we handle small messages (1 byte etc) better to bring our rwnd down more slowly. Approved by: re@freebsd.org (Bruce Mah)	2007-08-24 00:53:53 +00:00
Randall Stewart	2dad8a55be	- Remove extra comment for 7.0 (no GIANT here). - Remove unneeded WLOCK/UNLOCK of inp for getting TCB lock. - Fix panic that may occur when freeing an assoc that has partial delivery in progress (may dereference null socket pointer when queuing partial delivery aborted notification) - Some spacing and comment fixes. - Fix address add handling to clear cached routes and source addresses when peer acks the add in case the routing table changes. Approved by: re@freebsd.org (Bruce Mah)	2007-08-16 01:51:22 +00:00
Qing Li	8cb5ba02d8	Use the sequence number comparison macro to compare projected_offset against isn_offset to account for wrap around. Reviewed by: gnn, kmacy, silby Submitted by: yusheng.huang@bluecoat.com Approved by: re MFC: 3 days	2007-08-16 01:35:55 +00:00
Christian S.J. Peron	b244c8ad14	Over the past couple of years, there have been a number of reports relating the use of divert sockets to dead locks. A number of LORs have been reported between divert and a number of other network subsystems including: IPSEC, Pfil, multicast, ipfw and others. Other dead locks could occur because of recursive entry into the IP stack. This change should take care of most if not all of these issues. A summary of the changes follow: - We disallow multicast operations on divert sockets. It really doesn't make semantic sense to allow this, since typically you would set multicast parameters on multicast end points. NOTE: As a part of this change, we actually dis-allow multicast options on any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family - We check to see if there are any socket options that have been specified on the socket, and if there was (which is very un-common and also probably doesnt make sense to support) we duplicate the mbuf carrying the options. - We then drop the INP/INFO locks over the call to ip_output(). It should be noted that since we no longer support multicast operations on divert sockets and we have duplicated any socket options, we no longer need the reference to the pcb to be coherent. - Finally, we replaced the call to ip_input() to use netisr queuing. This should remove the recursive entry into the IP stack from divert. By dropping the locks over the call to ip_output() we eliminate all the lock ordering issues above. By switching over to netisr on the inbound path, we can no longer recursively enter the ip_input() code via divert. I have tested this change by using the following command: ipfwpcap -r 8000 - \| tcpdump -r - -nn -v This should exercise the input and re-injection (outbound) path, which is very similar to the work load performed by natd(8). Additionally, I have run some ospf daemons which have a heavy reliance on raw sockets and multicast. Approved by: re@ (kensmith) MFC after: 1 month LOR: 163 LOR: 181 LOR: 202 LOR: 203 Discussed with: julian, andre et al (on freebsd-net) In collaboration with: bms [1], rwatson [2] [1] bms helped out with the multicast decisions [2] rwatson submitted the original netisr patches and came up with some of the original ideas on how to combat this issue.	2007-08-06 22:06:36 +00:00
Randall Stewart	63981c2b40	- change number assignments for SHA225-512 (match artisync for bakeoff.. using the next sequential ones) - In cookie processing 1-2-1, we did not increment the stcb refcnt before releasing the tcb lock. We need to do this to keep the tcb from being freed by a abort or ?? unlikely but worth doing. Also get rid of unneed INP_WLOCK. - extra receive info included the rcvinfo which killed the padding/alignment. We now redefine all the fields properly so they both align properly both to 128 bytes. - A peeled off socket would not close without an error due to its misguided idea that sctp_disconnect() was not supported on it. This fixes it so it goes through the proper path. - When an assoc was being deleted after abort (via a timer) a small race condition exists where we might take a packet for the old assoc (since we are waiting for a cleanup timer). This state especially happens in mac. We now add a state in the asoc so these can properly handle the packet as OOTB. Approved by: re@freebsd.org(Ken Smith)	2007-08-06 15:46:46 +00:00
Robert Watson	0bf686c125	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
Bjoern A. Zeeb	cc977adc71	Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. Also rename the related functions in a similar way. There are no functional changes. For a packet coming in with IPsec tunnel mode, the default is to only call into the firewall with the "outer" IP header and payload. With this option turned on, in addition to the "outer" parts, the "inner" IP header and payload are passed to the firewall too when going through ip_input() the second time. The option was never only related to a gif(4) tunnel within an IPsec tunnel and thus the name was very misleading. Discussed at: BSDCan 2007 Best new name suggested by: rwatson Reviewed by: rwatson Approved by: re (bmah)	2007-08-05 16:16:15 +00:00
Peter Wemm	c4a184bdc4	Change TCPTV_MIN to be independent of HZ. While it was documented to be in ticks "for algorithm stability" when originally committed, it turns out that it has a significant impact in timing out connections. When we changed HZ from 100 to 1000, this had a big effect on reducing the time before dropping connections. To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet and establish a reliable round-trip-time (ie: type a few commands). Then unplug the ethernet and press a key. Time how long it takes to drop the connection. The old behavior (with hz=100) caused the connection to typically drop between 90 and 110 seconds of getting no response. Now boot with kern.hz=1000 (default). The same test causes the ssh session to drop after just 9-10 seconds. This is a big deal on a wifi connection. With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30. Note how it behaves the same as when HZ was 100. Also, note that when booting with hz=100, net.inet.tcp.rexmit_min used to be 30. This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should always be about 30. If you set hz to Really Slow(TM), there is a safety feature to prevent a value of 0 being used. This may be revised in the future, but for the time being, it restores the old, pre-hz=1000 behavior, which is significantly less annoying. As a workaround, to avoid rebooting or rebuilding a kernel, you can run "sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30" to /etc/sysctl.conf. This is safe to run from 6.0 onwards. Approved by: re (rwatson) Reviewed by: andre, silby	2007-07-31 22:11:55 +00:00
Dag-Erling Smørgrav	218cbbea9a	Make tcpstates[] static, and make sure TCPSTATES is defined before <netinet/tcp_fsm.h> is included into any compilation unit that needs tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG conditionals. This allows kernels both with and without TCPDEBUG to build, and unbreaks the tinderbox. Approved by: re (rwatson)	2007-07-30 11:06:42 +00:00
Bruce A. Mah	e251d2f4f6	Fix a typo in a log message: s/Reveived/Received/. Approved by: re (rwatson)	2007-07-29 20:13:22 +00:00
Matt Jacob	24face5416	Fix compilation problems- tcpstates is only available if TCPDEBUG is set. Approved by: re (in spirit)	2007-07-29 01:31:33 +00:00
Mike Silbersack	e3020cfd3c	Fix a panic introduced in rev 1.126. Approved by: re (rwatson)	2007-07-28 20:13:40 +00:00
Andre Oppermann	773673c133	Provide a sysctl to toggle reporting of TCP debug logging: sys.net.inet.tcp.log_debug = 1 It defaults to enabled for the moment and is to be turned off for the next release like other diagnostics from development branches. It is important to note that sysctl sys.net.inet.tcp.log_in_vain uses the same logging function as log_debug. Enabling of the former also causes the latter to engage, but not vice versa. Use consistent terminology in tcp log messages: "ignored" means a segment contains invalid flags/information and is dropped without changing state or issuing a reply. "rejected" means a segments contains invalid flags/information but is causing a reply (usually RST) and may cause a state change. Approved by: re (rwatson)	2007-07-28 12:20:39 +00:00
Andre Oppermann	cdaf208d09	o Move setting/resetting logic of syncache timer from macro SYNCACHE_TIMEOUT to new function syncache_timeout(). o Fix inverted timeout callout engagement logic to actually enable the timer for the bucket row. Before SYN\|ACK was not retransmitted. o Simplify SYN\|ACK retransmit timeout backoff calculation. o Improve logging of retransmit and timeout events. o Reset timeout when duplicate SYN arrives. o Add comments. o Rearrange SYN cookie statistics counting. Bug found by: silby Submitted by: silby (different version) Approved by: re (rwatson)	2007-07-28 12:02:05 +00:00
Andre Oppermann	19bc77c549	o Move all detailed checks for RST in LISTEN state from tcp_input() to syncache_rst(). o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before a RST for a connection in syncache did not properly free the entry. o Add more detailed logging. Approved by: re (rwatson)	2007-07-28 11:51:44 +00:00
Robert Watson	c6b2899785	Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed. The once over: bz Approved by: re (kensmith)	2007-07-28 07:31:30 +00:00
Mike Silbersack	c325962b47	Export the contents of the syncache to netstat. Approved by: re (kensmith) MFC after: 2 weeks	2007-07-27 00:57:06 +00:00
Andre Oppermann	564aab1fe6	Fix comments in tcp_do_segment(). Approved by: re (kensmith)	2007-07-25 18:48:24 +00:00
Randall Stewart	1b649582bb	- take out a needless panic under invariants for sctp_output.c - Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than SCTP_SMALL_IOVEC_SIZE - re-add back inpcb_bind local address check bypass capability - Fix it so sctp_opt_info is independant of assoc_id postion. - Fix cookie life set to use MSEC_TO_TICKS() macro. - asconf changes o More comment changes/clarifications related to the old local address "not" list which is now an explicit restricted list. o Rename some functions for clarity: - sctp_add/del_local_addr_assoc to xxx_local_addr_restricted() - asconf related iterator functions to sctp_asconf_iterator_xxx() o Fix bug when the same address is deleted and added (and removed from the asconf queue) where the ifa is "freed" twice refcount wise, possibly freeing it completely. o Fix bug in output where the first ASCONF would not go out after the last address is changed (e.g. only goes out when retransmitted). o Fix bug where multiple ASCONFs can be bundled in the same packet with the and with the same serial numbers. o Fix asconf stcb iterator to not send ASCONF until after all work queue entries have been processed. o Change behavior so that when the last address is deleted (auto asconf on a bound all endpoint) no action is taken until an address is added; at that time, an ASCONF add+delete is sent (if the assoc is still up). o Fix local address counting so that address scoping is taken into account. o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending of ASCONF (after an RTO). The default now is to send ASCONF immediately (except for the case of changing/deleting the last usable address). Approved by: re(ken smith)@freebsd.org	2007-07-24 20:06:02 +00:00
Randall Stewart	52be287ebb	- remove duplicate code from sctp_asconf.c - remove duplicate #include <sys/priv.h> that is not under #ifdef FreeBSD version to allow compile on 6.1 - static analysis changes per the cisco SA tool including: o some SA_IGNORE comments o some checks for NULL before unlock. o type corrections int -> size_t - Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this we pass a NULL in to bind on implicit assoc setup and crash :-( Approved by: re@freebsd.org(Ken Smith)	2007-07-21 21:41:32 +00:00
Robert Watson	08af97b790	Attempt to improve feature parity between UDPv4 and UDPv6 by merging UDPv4 features to UDPv6: - Add MAC checks on delivery and MAC labeling on transmit. - Check for (and reject) datagrams with destination port 0. - For multicast delivery, check the source port only if the socket being considered as a destination has been connected. - Implement UDP blackholing based on net.inet.udp.blackhole. - Add a new ICMPv6 unreachable reply rate limiting category for failed delivery attempts and implement rate limiting for UDPv6 (submitted by bz). Approved by: re (kensmith) Reviewed by: bz	2007-07-19 22:34:25 +00:00
Randall Stewart	18e198d3a3	- added pre-checks to the bindx call. - use proper tick gathering macro instead of ticks directly. - Placed reasonable boundaries on sets that a user can do that are converted to ticks from ms. - Fix CMT_PF to always check to be sure CMT is on. - Fix ticks use of CMT_PF. - put back code to allow asconfs to be queued while INITs are in flight and before the assoc is established. - During window probes, an ack'd packet might be left with the window probe mark on it causing it to be retransmitted. Change so that the flight decrease macro clears the window_probe mark. - Additional logging flight size/reading and ASOC LOG. This is only enabled if you manually insert things into opt_sctp.h since its a set of debug code only. - Found an interesting SMP race in the way data was appended which could cause a reader to lose a part of a message, had to reorder when we marked the message was complete to after the data was appended. - bug in ADD-IP for the subset bound socket case when the peer has only one address - fix ASCONF implicit success/error handling case - proper support of jails in Freebsd 6> - copy out the timeval for the 64 bit sparc world on cookie-echo alignment error crashes without this). Approved by: re(Ken Smith)	2007-07-17 20:58:26 +00:00
Randall Stewart	b54d3a6c48	- Modular congestion control, with RFC2581 being the default. - CMT_PF states added (w/sysctl to turn the PF version on) - sctp_input.c had a missing incr of cookie case when the auth was bad. This meant a free was called without an increment to refcnt, added increment like rest of code. - There was a case, unlikely, when the scope of the destination changed (this is a TSNH case). In that case, it would not free the alloc'ed asoc (in sctp_input.c). - When listed addresses found a colliding cookie/Init, then the collided upon tcb was not unlocked in sctp_pcb.c - Add error checking on arguments of sctp_sendx(3) to prevent it from referencing a NULL pointer. - Fix an error return of sctp_sendx(3), it was returing ENOMEM not -1. - Get assoc id was changed to use the sanctified socket api method for getting a assoc id (PEER_ADDR_INFO instead of PEER_ADDR_PARAMS). - Fix it so a peeled off socket will get a proper error return if it trys to send to a different address then it is connected to. - Fix so that select_a_stream can avoid an endless loop that could hang a caller. - time_entered (state set time) was not being set in all cases to the time we went established. Approved by: re(ken smith)	2007-07-14 09:36:28 +00:00
Robert Watson	43bbb6aa10	Further cleanup of UDPv4: - Move udp_sendspace and udp_recvspace global variables and associated sysctls to the top of the file where most other such things are present. - Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize so that we can add blackhole support for UDPv6 using the same MIB variable. - Move udp_append() above udp_input() to match the function order in udp6_usrreq.c. Approved by: re (kensmith)	2007-07-10 09:30:46 +00:00
Bruce M Simpson	d90b8675c2	Fix a regression in IPv4 multicast join path (IP_ADD_MEMBERSHIP). With the in_mcast.c code, if an interface for an IPv4 multicast join was not specified, and a route did not exist for the specified group in the unicast forwarding tables, the join would be rejected with the error EADDRNOTAVAIL. This change restores the old behaviour whereby if no interface is specified, and no route exists for the group destination, the IPv4 address list is walked to find a non-loopback, multicast-capable interface to satisfy the join request. This should resolve problems with starting multicast services during system boot or when a default forwarding entry does not exist. Approved by: re (rwatson)	2007-07-09 10:36:47 +00:00
Robert Watson	bd84d20457	Minor UDPv4 cleanup: capitalize comment, move statistics update after mbuf free to be consistent with other error handling, and release socket buffer lock before freeing mbufs and statistics updates rather than after. Approved by: re (kensmith)	2007-07-07 09:46:34 +00:00
Peter Wemm	477d44c467	Fix a second warning, introduced by my last "fix". I committed the wrong diff from the wrong machine. Pointy hat to: peter Approved by: re (rwatson - blanket, several days ago)	2007-07-05 06:04:46 +00:00
Peter Wemm	9fb5d4c064	Fix cast-qualifiers warning when INET6 is not present Approved by: re (rwatson)	2007-07-05 05:55:57 +00:00
Max Laier	60ee384760	Link pf 4.1 to the build: - move ftp-proxy from libexec to usr.sbin - add tftp-proxy - new altq mtag link Approved by: re (kensmith)	2007-07-03 12:46:08 +00:00
George V. Neville-Neil	b2630c2934	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing	2007-07-03 12:13:45 +00:00
Randall Stewart	5bead43650	- Consolidate the code that free's chunks to actually also call the sctp_free_remote_address() function. - Assure that when we allocate a chunk the whoTo is NULL, also when we free it and place it into the cache we NULL it (that way the consolidation code will always work). - Fix a small race, when a empty data holder is left on the stream out queue, and both sides do a shutdown, the empty data holder would prevent us from sending a SHUTDOWN-ACK and at the same time we never would cleanup the empty holder (since nothing was ever in queue). We now add a utility function that a) cleans up empty holders and b) properly determines if there are still pending data chunks on the stream out wheel. Approved by: re@freebsd.org (Ken Smith)	2007-07-02 19:22:22 +00:00
Robert Watson	02dd4b5cbd	Continue pre-7.0 privilege cleanup: update suser(9) comments to be priv(9) comments. Approved by: re (bmah)	2007-07-02 15:44:30 +00:00
George V. Neville-Neil	0d29af67f2	Fix a dangling netinet6 to netipsec transition for SCTP include files. Approved by: re	2007-07-01 14:18:20 +00:00
George V. Neville-Neil	2cb64cb272	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 11:41:27 +00:00
Randall Stewart	9ceab0faf0	- When a SCTP socket is closed, but the last data SACK is lost, we would incorrectly abort the association instead of retransmitting the SACK. Approved by: re@freebsd.org (Ken Smith)	2007-06-29 15:14:23 +00:00
Randall Stewart	97c76f10a0	- Update bindx address checking to properly screen out address per the socket api, adding port validation. We allow port 0 or the already bound port number and no others. Approved by: re@freebsd.org (Ken Smith)	2007-06-25 19:05:26 +00:00
Randall Stewart	a964e8de4c	- Fix type casts in calling sctp_m_getptr, it expects a int not an unsigned (returned by sizeof) also add cast to comparison check for size bounds. Approved by: re(bmah@freebsd.org)	2007-06-22 14:40:09 +00:00
Randall Stewart	671d309c7c	- Fix stream reset so it limits the number of streams that can be listed - Fix fwd-tsn to use proper accessor so it does not overrun mbufs - Fix stream reset error reporting to actually work (it has always been broken if the peer rejects a stream reset) - Some 64 bit friendly changes Approved by: re(bmah@freebsd.org)	2007-06-22 13:50:56 +00:00
Randall Stewart	ea1fbec59a	- Two more static analisys bugs found by cisco's tool on a subsequent run.	2007-06-18 22:36:52 +00:00

... 3 4 5 6 7 ...

3345 Commits