freebsd-skq

Author	SHA1	Message	Date
csjp	e58c2855d8	Over the past couple of years, there have been a number of reports relating the use of divert sockets to dead locks. A number of LORs have been reported between divert and a number of other network subsystems including: IPSEC, Pfil, multicast, ipfw and others. Other dead locks could occur because of recursive entry into the IP stack. This change should take care of most if not all of these issues. A summary of the changes follow: - We disallow multicast operations on divert sockets. It really doesn't make semantic sense to allow this, since typically you would set multicast parameters on multicast end points. NOTE: As a part of this change, we actually dis-allow multicast options on any socket that IS a divert socket OR IS NOT a SOCK_RAW or SOCK_DGRAM family - We check to see if there are any socket options that have been specified on the socket, and if there was (which is very un-common and also probably doesnt make sense to support) we duplicate the mbuf carrying the options. - We then drop the INP/INFO locks over the call to ip_output(). It should be noted that since we no longer support multicast operations on divert sockets and we have duplicated any socket options, we no longer need the reference to the pcb to be coherent. - Finally, we replaced the call to ip_input() to use netisr queuing. This should remove the recursive entry into the IP stack from divert. By dropping the locks over the call to ip_output() we eliminate all the lock ordering issues above. By switching over to netisr on the inbound path, we can no longer recursively enter the ip_input() code via divert. I have tested this change by using the following command: ipfwpcap -r 8000 - \| tcpdump -r - -nn -v This should exercise the input and re-injection (outbound) path, which is very similar to the work load performed by natd(8). Additionally, I have run some ospf daemons which have a heavy reliance on raw sockets and multicast. Approved by: re@ (kensmith) MFC after: 1 month LOR: 163 LOR: 181 LOR: 202 LOR: 203 Discussed with: julian, andre et al (on freebsd-net) In collaboration with: bms [1], rwatson [2] [1] bms helped out with the multicast decisions [2] rwatson submitted the original netisr patches and came up with some of the original ideas on how to combat this issue.	2007-08-06 22:06:36 +00:00
rrs	50d4dc714d	- change number assignments for SHA225-512 (match artisync for bakeoff.. using the next sequential ones) - In cookie processing 1-2-1, we did not increment the stcb refcnt before releasing the tcb lock. We need to do this to keep the tcb from being freed by a abort or ?? unlikely but worth doing. Also get rid of unneed INP_WLOCK. - extra receive info included the rcvinfo which killed the padding/alignment. We now redefine all the fields properly so they both align properly both to 128 bytes. - A peeled off socket would not close without an error due to its misguided idea that sctp_disconnect() was not supported on it. This fixes it so it goes through the proper path. - When an assoc was being deleted after abort (via a timer) a small race condition exists where we might take a packet for the old assoc (since we are waiting for a cleanup timer). This state especially happens in mac. We now add a state in the asoc so these can properly handle the packet as OOTB. Approved by: re@freebsd.org(Ken Smith)	2007-08-06 15:46:46 +00:00
rwatson	23574c8673	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)	2007-08-06 14:26:03 +00:00
bz	3793d89229	Rename option IPSEC_FILTERGIF to IPSEC_FILTERTUNNEL. Also rename the related functions in a similar way. There are no functional changes. For a packet coming in with IPsec tunnel mode, the default is to only call into the firewall with the "outer" IP header and payload. With this option turned on, in addition to the "outer" parts, the "inner" IP header and payload are passed to the firewall too when going through ip_input() the second time. The option was never only related to a gif(4) tunnel within an IPsec tunnel and thus the name was very misleading. Discussed at: BSDCan 2007 Best new name suggested by: rwatson Reviewed by: rwatson Approved by: re (bmah)	2007-08-05 16:16:15 +00:00
peter	465b2caeed	Change TCPTV_MIN to be independent of HZ. While it was documented to be in ticks "for algorithm stability" when originally committed, it turns out that it has a significant impact in timing out connections. When we changed HZ from 100 to 1000, this had a big effect on reducing the time before dropping connections. To demonstrate, boot with kern.hz=100. ssh to a box on local ethernet and establish a reliable round-trip-time (ie: type a few commands). Then unplug the ethernet and press a key. Time how long it takes to drop the connection. The old behavior (with hz=100) caused the connection to typically drop between 90 and 110 seconds of getting no response. Now boot with kern.hz=1000 (default). The same test causes the ssh session to drop after just 9-10 seconds. This is a big deal on a wifi connection. With kern.hz=1000, change sysctl net.inet.tcp.rexmit_min from 3 to 30. Note how it behaves the same as when HZ was 100. Also, note that when booting with hz=100, net.inet.tcp.rexmit_min used to be 30. This commit changes TCPTV_MIN to be scaled with hz. rexmit_min should always be about 30. If you set hz to Really Slow(TM), there is a safety feature to prevent a value of 0 being used. This may be revised in the future, but for the time being, it restores the old, pre-hz=1000 behavior, which is significantly less annoying. As a workaround, to avoid rebooting or rebuilding a kernel, you can run "sysctl net.inet.tcp.rexmit_min=30" and add "net.inet.tcp.rexmit_min=30" to /etc/sysctl.conf. This is safe to run from 6.0 onwards. Approved by: re (rwatson) Reviewed by: andre, silby	2007-07-31 22:11:55 +00:00
des	a969e2957b	Make tcpstates[] static, and make sure TCPSTATES is defined before <netinet/tcp_fsm.h> is included into any compilation unit that needs tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG conditionals. This allows kernels both with and without TCPDEBUG to build, and unbreaks the tinderbox. Approved by: re (rwatson)	2007-07-30 11:06:42 +00:00
bmah	b1cc9aa203	Fix a typo in a log message: s/Reveived/Received/. Approved by: re (rwatson)	2007-07-29 20:13:22 +00:00
mjacob	26ba4b61a6	Fix compilation problems- tcpstates is only available if TCPDEBUG is set. Approved by: re (in spirit)	2007-07-29 01:31:33 +00:00
silby	867152dd71	Fix a panic introduced in rev 1.126. Approved by: re (rwatson)	2007-07-28 20:13:40 +00:00
andre	1d3ef28a99	Provide a sysctl to toggle reporting of TCP debug logging: sys.net.inet.tcp.log_debug = 1 It defaults to enabled for the moment and is to be turned off for the next release like other diagnostics from development branches. It is important to note that sysctl sys.net.inet.tcp.log_in_vain uses the same logging function as log_debug. Enabling of the former also causes the latter to engage, but not vice versa. Use consistent terminology in tcp log messages: "ignored" means a segment contains invalid flags/information and is dropped without changing state or issuing a reply. "rejected" means a segments contains invalid flags/information but is causing a reply (usually RST) and may cause a state change. Approved by: re (rwatson)	2007-07-28 12:20:39 +00:00
andre	ff2e8247ee	o Move setting/resetting logic of syncache timer from macro SYNCACHE_TIMEOUT to new function syncache_timeout(). o Fix inverted timeout callout engagement logic to actually enable the timer for the bucket row. Before SYN\|ACK was not retransmitted. o Simplify SYN\|ACK retransmit timeout backoff calculation. o Improve logging of retransmit and timeout events. o Reset timeout when duplicate SYN arrives. o Add comments. o Rearrange SYN cookie statistics counting. Bug found by: silby Submitted by: silby (different version) Approved by: re (rwatson)	2007-07-28 12:02:05 +00:00
andre	85c8a77bff	o Move all detailed checks for RST in LISTEN state from tcp_input() to syncache_rst(). o Fix tests for flag combinations of RST and SYN, ACK, FIN. Before a RST for a connection in syncache did not properly free the entry. o Add more detailed logging. Approved by: re (rwatson)	2007-07-28 11:51:44 +00:00
rwatson	a62dbe240a	Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed. The once over: bz Approved by: re (kensmith)	2007-07-28 07:31:30 +00:00
silby	4c84d1d020	Export the contents of the syncache to netstat. Approved by: re (kensmith) MFC after: 2 weeks	2007-07-27 00:57:06 +00:00
andre	90c73e9aec	Fix comments in tcp_do_segment(). Approved by: re (kensmith)	2007-07-25 18:48:24 +00:00
rrs	1db8ba2474	- take out a needless panic under invariants for sctp_output.c - Fix addrs's error checking of sctp_sendx(3) when addrcnt is less than SCTP_SMALL_IOVEC_SIZE - re-add back inpcb_bind local address check bypass capability - Fix it so sctp_opt_info is independant of assoc_id postion. - Fix cookie life set to use MSEC_TO_TICKS() macro. - asconf changes o More comment changes/clarifications related to the old local address "not" list which is now an explicit restricted list. o Rename some functions for clarity: - sctp_add/del_local_addr_assoc to xxx_local_addr_restricted() - asconf related iterator functions to sctp_asconf_iterator_xxx() o Fix bug when the same address is deleted and added (and removed from the asconf queue) where the ifa is "freed" twice refcount wise, possibly freeing it completely. o Fix bug in output where the first ASCONF would not go out after the last address is changed (e.g. only goes out when retransmitted). o Fix bug where multiple ASCONFs can be bundled in the same packet with the and with the same serial numbers. o Fix asconf stcb iterator to not send ASCONF until after all work queue entries have been processed. o Change behavior so that when the last address is deleted (auto asconf on a bound all endpoint) no action is taken until an address is added; at that time, an ASCONF add+delete is sent (if the assoc is still up). o Fix local address counting so that address scoping is taken into account. o #ifdef SCTP_TIMER_BASED_ASCONF the old timer triggered sending of ASCONF (after an RTO). The default now is to send ASCONF immediately (except for the case of changing/deleting the last usable address). Approved by: re(ken smith)@freebsd.org	2007-07-24 20:06:02 +00:00
rrs	1918b8aea1	- remove duplicate code from sctp_asconf.c - remove duplicate #include <sys/priv.h> that is not under #ifdef FreeBSD version to allow compile on 6.1 - static analysis changes per the cisco SA tool including: o some SA_IGNORE comments o some checks for NULL before unlock. o type corrections int -> size_t - Fix it so sctp_alloc_asoc takes a thread/proc argument. Without this we pass a NULL in to bind on implicit assoc setup and crash :-( Approved by: re@freebsd.org(Ken Smith)	2007-07-21 21:41:32 +00:00
rwatson	5fe56c549d	Attempt to improve feature parity between UDPv4 and UDPv6 by merging UDPv4 features to UDPv6: - Add MAC checks on delivery and MAC labeling on transmit. - Check for (and reject) datagrams with destination port 0. - For multicast delivery, check the source port only if the socket being considered as a destination has been connected. - Implement UDP blackholing based on net.inet.udp.blackhole. - Add a new ICMPv6 unreachable reply rate limiting category for failed delivery attempts and implement rate limiting for UDPv6 (submitted by bz). Approved by: re (kensmith) Reviewed by: bz	2007-07-19 22:34:25 +00:00
rrs	baae800484	- added pre-checks to the bindx call. - use proper tick gathering macro instead of ticks directly. - Placed reasonable boundaries on sets that a user can do that are converted to ticks from ms. - Fix CMT_PF to always check to be sure CMT is on. - Fix ticks use of CMT_PF. - put back code to allow asconfs to be queued while INITs are in flight and before the assoc is established. - During window probes, an ack'd packet might be left with the window probe mark on it causing it to be retransmitted. Change so that the flight decrease macro clears the window_probe mark. - Additional logging flight size/reading and ASOC LOG. This is only enabled if you manually insert things into opt_sctp.h since its a set of debug code only. - Found an interesting SMP race in the way data was appended which could cause a reader to lose a part of a message, had to reorder when we marked the message was complete to after the data was appended. - bug in ADD-IP for the subset bound socket case when the peer has only one address - fix ASCONF implicit success/error handling case - proper support of jails in Freebsd 6> - copy out the timeval for the 64 bit sparc world on cookie-echo alignment error crashes without this). Approved by: re(Ken Smith)	2007-07-17 20:58:26 +00:00
rrs	1e9af2c480	- Modular congestion control, with RFC2581 being the default. - CMT_PF states added (w/sysctl to turn the PF version on) - sctp_input.c had a missing incr of cookie case when the auth was bad. This meant a free was called without an increment to refcnt, added increment like rest of code. - There was a case, unlikely, when the scope of the destination changed (this is a TSNH case). In that case, it would not free the alloc'ed asoc (in sctp_input.c). - When listed addresses found a colliding cookie/Init, then the collided upon tcb was not unlocked in sctp_pcb.c - Add error checking on arguments of sctp_sendx(3) to prevent it from referencing a NULL pointer. - Fix an error return of sctp_sendx(3), it was returing ENOMEM not -1. - Get assoc id was changed to use the sanctified socket api method for getting a assoc id (PEER_ADDR_INFO instead of PEER_ADDR_PARAMS). - Fix it so a peeled off socket will get a proper error return if it trys to send to a different address then it is connected to. - Fix so that select_a_stream can avoid an endless loop that could hang a caller. - time_entered (state set time) was not being set in all cases to the time we went established. Approved by: re(ken smith)	2007-07-14 09:36:28 +00:00
rwatson	10c60be23e	Further cleanup of UDPv4: - Move udp_sendspace and udp_recvspace global variables and associated sysctls to the top of the file where most other such things are present. - Rename static variable 'blackhole' to 'udp_blackhole' and unstaticize so that we can add blackhole support for UDPv6 using the same MIB variable. - Move udp_append() above udp_input() to match the function order in udp6_usrreq.c. Approved by: re (kensmith)	2007-07-10 09:30:46 +00:00
bms	73f66e3d09	Fix a regression in IPv4 multicast join path (IP_ADD_MEMBERSHIP). With the in_mcast.c code, if an interface for an IPv4 multicast join was not specified, and a route did not exist for the specified group in the unicast forwarding tables, the join would be rejected with the error EADDRNOTAVAIL. This change restores the old behaviour whereby if no interface is specified, and no route exists for the group destination, the IPv4 address list is walked to find a non-loopback, multicast-capable interface to satisfy the join request. This should resolve problems with starting multicast services during system boot or when a default forwarding entry does not exist. Approved by: re (rwatson)	2007-07-09 10:36:47 +00:00
rwatson	a78856a25a	Minor UDPv4 cleanup: capitalize comment, move statistics update after mbuf free to be consistent with other error handling, and release socket buffer lock before freeing mbufs and statistics updates rather than after. Approved by: re (kensmith)	2007-07-07 09:46:34 +00:00
peter	e680ae87c6	Fix a second warning, introduced by my last "fix". I committed the wrong diff from the wrong machine. Pointy hat to: peter Approved by: re (rwatson - blanket, several days ago)	2007-07-05 06:04:46 +00:00
peter	c4233cd978	Fix cast-qualifiers warning when INET6 is not present Approved by: re (rwatson)	2007-07-05 05:55:57 +00:00
mlaier	83807ec50d	Link pf 4.1 to the build: - move ftp-proxy from libexec to usr.sbin - add tftp-proxy - new altq mtag link Approved by: re (kensmith)	2007-07-03 12:46:08 +00:00
gnn	aeca69ded5	Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC option is now deprecated, as well as the KAME IPsec code. What was FAST_IPSEC is now IPSEC. Approved by: re Sponsored by: Secure Computing	2007-07-03 12:13:45 +00:00
rrs	a400d04306	- Consolidate the code that free's chunks to actually also call the sctp_free_remote_address() function. - Assure that when we allocate a chunk the whoTo is NULL, also when we free it and place it into the cache we NULL it (that way the consolidation code will always work). - Fix a small race, when a empty data holder is left on the stream out queue, and both sides do a shutdown, the empty data holder would prevent us from sending a SHUTDOWN-ACK and at the same time we never would cleanup the empty holder (since nothing was ever in queue). We now add a utility function that a) cleans up empty holders and b) properly determines if there are still pending data chunks on the stream out wheel. Approved by: re@freebsd.org (Ken Smith)	2007-07-02 19:22:22 +00:00
rwatson	bb6f1c3d9b	Continue pre-7.0 privilege cleanup: update suser(9) comments to be priv(9) comments. Approved by: re (bmah)	2007-07-02 15:44:30 +00:00
gnn	c00a304dcf	Fix a dangling netinet6 to netipsec transition for SCTP include files. Approved by: re	2007-07-01 14:18:20 +00:00
gnn	0cd74db89b	Commit IPv6 support for FAST_IPSEC to the tree. This commit includes only the kernel files, the rest of the files will follow in a second commit. Reviewed by: bz Approved by: re Supported by: Secure Computing	2007-07-01 11:41:27 +00:00
rrs	51f099c2d2	- When a SCTP socket is closed, but the last data SACK is lost, we would incorrectly abort the association instead of retransmitting the SACK. Approved by: re@freebsd.org (Ken Smith)	2007-06-29 15:14:23 +00:00
rrs	a6df17b326	- Update bindx address checking to properly screen out address per the socket api, adding port validation. We allow port 0 or the already bound port number and no others. Approved by: re@freebsd.org (Ken Smith)	2007-06-25 19:05:26 +00:00
rrs	aaa3e960bf	- Fix type casts in calling sctp_m_getptr, it expects a int not an unsigned (returned by sizeof) also add cast to comparison check for size bounds. Approved by: re(bmah@freebsd.org)	2007-06-22 14:40:09 +00:00
rrs	cdfbc01471	- Fix stream reset so it limits the number of streams that can be listed - Fix fwd-tsn to use proper accessor so it does not overrun mbufs - Fix stream reset error reporting to actually work (it has always been broken if the peer rejects a stream reset) - Some 64 bit friendly changes Approved by: re(bmah@freebsd.org)	2007-06-22 13:50:56 +00:00
rrs	660ca20248	- Two more static analisys bugs found by cisco's tool on a subsequent run.	2007-06-18 22:36:52 +00:00
rrs	ef68a809b7	- Fixes cstatic issues found by cisco sa tool (missing frees and such on error legs) - align sctp_sockstore to 64 bit boundary ..	2007-06-18 21:59:15 +00:00
maxim	2139af42ea	o Make ipfw set more robust -- now it is possible: - to show a specific set: ipfw set 3 show - to delete rules from the set: ipfw set 9 delete 100 200 300 - to flush the set: ipfw set 4 flush - to reset rules counters in the set: ipfw set 1 zero PR: kern/113388 Submitted by: Andrey V. Elsukov Approved by: re (kensmith) MFC after: 6 weeks	2007-06-18 17:52:37 +00:00
rrs	27754de272	Add additional logging level mask for packet_logging too.	2007-06-18 13:57:37 +00:00
rrs	85dbbe2781	- The packet log needs to copy all of the buffer not to the end.	2007-06-17 23:43:37 +00:00
rrs	ca1ca54cb0	Back out last change to inpcb_free. Turns out we need to hold off freeing if there is data pending ... someone might do send/close. Which means we want the data to go and then close it after startup. Added comments to the code as well to note that this is done for a reason.	2007-06-17 19:27:46 +00:00
mjacob	cd743f496a	Make gcc4.2 happy and zero save_ip for the unlikely (blackhole != 0) codepath.	2007-06-17 04:07:11 +00:00
rrs	a50eb788fa	- For sctp_input/sctp6_input add announcment when a packet arrives (debug) - re-factor the packet drop in sctp_output a bit more, we don't need the trim after all, but the size calc is now corrected. - When a assoc is in the COOKIE-ECHO/COOKIE-WAIT state and the user closes, it should not matter if data is queued, the assoc should be purged. - In error leg a missing free_chunk when iph comes in NULL (should not happen but just in case).	2007-06-17 01:36:02 +00:00
mjacob	f9596e5996	Replace incorrect local OFFSET_OF macro with the correct and generic offsetof macro.	2007-06-17 00:33:34 +00:00
mjacob	49d2064d40	Simplification to quiet a gcc4.2 warning. Just by setting match.s_addr to nonzero you fulfill the same function as the variable 'cmp'. so you might as well zero match and test against it later. Reviewed by: timeout on review request	2007-06-17 00:31:24 +00:00
rrs	84b8595243	- Better handle sending large pkt-drops. We were not triming the data with m_adj if a large pkt arrived with a bad csum some systems can't handle you not triming the tail (think panda :-D)	2007-06-16 14:03:15 +00:00
rrs	a2d7081fdf	- Raise max range of sctp_logging sysctl so panda does not disallow us to turn on logging levels.	2007-06-16 03:28:18 +00:00
rrs	942494315a	- Matthew's changes to get inlines out, plus a few of my own to deal with the VRF inline function -> becomes a macro now. Submitted by: Matthew Jacobs	2007-06-16 00:33:47 +00:00
mjacob	76b267f01e	Garbage collect some debug code that not only no longer could work but in fact probably causes a random pointer dereferences. Garbage collect the tp variable too.	2007-06-15 22:54:11 +00:00
rrs	795f6bc14a	Name change SCTP_KTR_SUBSYS -> KTR_SCTP	2007-06-15 20:54:12 +00:00

1 2 3 4 5 ...

2935 Commits