freebsd-nq

Author	SHA1	Message	Date
John Baldwin	3b0b2840be	Use queue(3) macros instead of home-rolled versions in several places in the INET6 code. This includes retiring the 'ndpr_next' and 'pfr_next' macros. Submitted by: pluknet (earlier version) Reviewed by: pluknet	2011-12-29 18:25:18 +00:00
Gleb Smirnoff	9de96e891c	Don't fallback to a CARP address in BACKUP state.	2011-12-29 15:59:14 +00:00
Michael Tuexen	60990c0c06	Address issues found by clang. While there, fix also some style issues. MFC after: 3 months.	2011-12-27 10:16:24 +00:00
Gleb Smirnoff	1c435c73a1	Use a better log message for master down event.	2011-12-22 18:48:21 +00:00
Gleb Smirnoff	7121247312	Provide ABI compatibility shim to enable configuring of addresses with ifconfig(8) prior to r228571. Requested by: brooks	2011-12-21 12:39:08 +00:00
Gleb Smirnoff	f08535f872	Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.	2011-12-20 13:53:31 +00:00
Michael Tuexen	7215cc1b74	Fix unused parameter warnings. While there, fix some whitespace issues. MFC after: 3 months.	2011-12-17 19:21:40 +00:00
Gleb Smirnoff	92ed4e1a24	Since size of struct in_aliasreq has just been changed in r228571, and thus ifconfig(8) needs recompile, it is a good chance to make parameter checks on SIOCAIFADDR arguments more strict.	2011-12-16 13:30:17 +00:00
Gleb Smirnoff	08b68b0e4c	A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]	2011-12-16 12:16:56 +00:00
Gleb Smirnoff	55174c34ef	Belatedly catch up with r151555. in_scrubprefix() also needs this fix. We should compare not only addresses, but their masks, too, when searching for matching prefix.	2011-12-13 06:56:43 +00:00
Michael Tuexen	972478a4c0	Fix a bug reported by Irene Ruengeler which resulted in not sending out HEARTBEATs when requested by the user. The HEARTBEATs were only queued, but not actually sent out. MFC after: 2 months.	2011-12-10 10:52:54 +00:00
Gleb Smirnoff	f769e5b0fa	Fix a very special case when SIOCAIFADDR supplies mask of 0.0.0.0, don't overwrite the mask with autoguessing based on classes.	2011-12-06 20:55:20 +00:00
Michael Tuexen	a56569ba55	Remove debug code. MFC after: 1 month.	2011-11-28 20:48:35 +00:00
Gleb Smirnoff	89b9325530	Fix one more fallout from r227791: do not overwrite trimmed sa_len on the ia_sockmask when doing SIOCSIFNETMASK. Reported by: Stefan Bethke <stb lassitu.de>, gonzo Pointy hat to: glebius	2011-11-28 13:30:14 +00:00
Michael Tuexen	70acddf158	Fix a warning reported by arundel@. Fix a bug where the parameter length of a supported address types parameter is set to a wrong value if the kernel is built with with either INET or INET6, but not both. MFC after: 3 days.	2011-11-27 17:51:13 +00:00
Lawrence Stewart	a26fef3a21	Plug a TCP reassembly UMA zone leak introduced in r226113 by only using the backup stack queue entry when the zone is exhausted, otherwise we leak a zone allocation each time we plug a hole in the reassembly queue. Reported by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Tested by: many on freebsd-stable@ (thread: "TCP Reassembly Issues") Reviewed by: bz (very brief sanity check) MFC after: 3 days	2011-11-27 02:32:08 +00:00
Gleb Smirnoff	c6e5c71116	Remove superfluous check: SIOCAIFADDR must have ifra_addr supplied.	2011-11-24 22:46:11 +00:00
Gleb Smirnoff	bd47ae58a6	Fix stupid typo in r227830. PR: 162806 Pointy hat to: glebius	2011-11-24 22:43:48 +00:00
Michael Tuexen	052230f978	Move up the address to the top of the sctp_udencaps structure like in all other structures. This avoids alignment problems. MFC after: 3 months.	2011-11-24 10:58:48 +00:00
Michael Tuexen	ec9925ed78	Move up the address to the top of the sctp_paddrthlds structure like in all other structures. This avoids alignment problems. MFC after: 3 days.	2011-11-24 10:54:30 +00:00
Gleb Smirnoff	e278f44bb5	style(9) nit	2011-11-22 19:39:27 +00:00
Gleb Smirnoff	bbaa3f944e	Fix SIOCDIFADDR semantics: if no address is specified, then delete first one.	2011-11-22 19:37:57 +00:00
Gleb Smirnoff	cf00e5c6b7	This check isn't needed now, sanity checking done in the beginning. Missed it in last commit.	2011-11-21 20:07:12 +00:00
Gleb Smirnoff	6d00fd9c2d	Historically in_control() did not check sockaddrs supplied with structs ifreq/in_aliasreq and there've been several panics due to that problem. All these panics were fixed just a couple of lines above the panicing code. Take a more general approach: sanity check sockaddrs supplied with SIOCAIFADDR and SIOCSIF*ADDR at the beggining of the function and drop all checks below. One check is now disabled due to strange code in ifconfig(8) that I've removed recently. I'm going to enable it with next __FreeBSD_version bump. Historically in_ifinit() was able to recover from an error and restore old address. Nowadays this feature isn't working for all error cases, but for some of them. I suppose no software relies on this behavior, so I'd like to remove it, since this simplifies code a lot. Also, move if_scrub() earlier in the in_ifinit(). It is more correct to wipe routes before removing address from local address list, and interface address list. Silence from: bz, brooks, andre, rwatson, 3 weeks	2011-11-21 14:10:13 +00:00
Gleb Smirnoff	619051718c	Be more informative for "unknown hardware address format" message. Submitted by: Andrzej Tobola <ato iem.pw.edu.pl>	2011-11-21 13:40:35 +00:00
Gleb Smirnoff	c9168718ca	- Reduce severity for all ARP events, that can be triggered from remote machine to LOG_NOTICE. Exception left to "using my IP address". - Fix multicast ARP warning: add newline and also log the bad MAC address. Tested by: Alexander Wittig <wittigal msu.edu>	2011-11-21 12:07:18 +00:00
Michael Tuexen	c9c5805975	Add support for the SCTP_REMOTE_UDP_ENCAPS_PORT socket option. Retire the the now unused sctp_udp_tunneling_for_client_enable sysctl variable. MFC after: 3 months.	2011-11-20 15:00:45 +00:00
Michael Tuexen	363114118b	Cleanup comparison of interface names. MFC after: 1 month.	2011-11-18 09:01:08 +00:00
Michael Tuexen	a62e467ac3	Set the MTU of an path to an approriate value if the interface MTU can't be determined. MFC after: 3 days.	2011-11-15 20:41:50 +00:00
Eitan Adler	3b6dc18ef5	- fix duplicate "a a" in some comments Submitted by: eadler Approved by: simon MFC after: 3 days	2011-11-13 17:06:33 +00:00
Michael Tuexen	eb20220e9b	Don't copy uninitialized memory. Also simplify the comparison of interface names. MFC after: 3 days.	2011-11-13 11:53:18 +00:00
Brooks Davis	4b22573a89	In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month	2011-11-11 22:57:52 +00:00
Eitan Adler	14517324d0	- add a missing "be" and "in" - fix other errors introduced when committing r226436 - add 'function' to a sentence where it makes sense Submitted by: delphij Submitted by: dougb Submitted by: jhb Approved by: dougb Approved by: jhb	2011-11-11 22:27:09 +00:00
Michael Tuexen	dc81ec897e	When loading addresses from INITs, always use the correct local address. MFC after: 3 days.	2011-11-07 22:30:19 +00:00
Ed Schouten	6472ac3d8a	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.	2011-11-07 15:43:11 +00:00
Ed Schouten	d745c852be	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.	2011-11-07 06:44:47 +00:00
Michael Tuexen	7dd1259f73	Initialize all components of the sent COOKIE. MFC after: 3 days.	2011-11-06 20:09:22 +00:00
Mikolaj Golub	fc06cd427e	Cache SO_REUSEPORT socket option in inpcb-layer in order to avoid inp_socket->so_options dereference when we may not acquire the lock on the inpcb. This fixes the crash due to NULL pointer dereference in in_pcbbind_setup() when inp_socket->so_options in a pcb returned by in_pcblookup_local() was checked. Reported by: dave jones <s.dave.jones@gmail.com>, Arnaud Lacombe <lacombar@gmail.com> Suggested by: rwatson Glanced by: rwatson Tested by: dave jones <s.dave.jones@gmail.com>	2011-11-06 10:47:20 +00:00
Mikolaj Golub	ec95b70995	Fix the typo made in r157474. MFC after: 3 days	2011-11-06 09:17:48 +00:00
Bjoern A. Zeeb	8813217a67	Always use the opt_.h options for ipfw.ko, not just when compiled into the kernel. Do not try to build the module in case of no INET support but keep #error calls for now in case we would compile it into the kernel. This should fix an issue where the module would fail to enable IPv6 support from the rc framework, but also other INET and INET6 parts being silently compiled out without giving a warning in the module case. While here garbage collect unneeded opt_.h includes. opt_ipdn.h is not used anywhere but we need to leave the DUMMYNET entry in options for conditional inclusion in kernel so keep the file with the same name. Reported by: pluknet Reviewed by: plunket, jhb MFC After: 3 days	2011-11-04 16:24:19 +00:00
Sergey Kandaurov	ddd0c4a969	Restore sysctl names for tcp_sendspace/tcp_recvspace. They seem to be changed unintentionally in r226437, and there were no any mentions of renaming in commit log message. Reported by: Anton Yuzhaninov <citrin citrin ru>	2011-11-02 20:58:47 +00:00
Michael Tuexen	7ffa229018	When add a new remote address using sctp_add_remote_addr(), return the correct net if requested. MFC after: 3 days.	2011-10-27 22:38:48 +00:00
Michael Tuexen	a0fe4c5b36	Send out control chunks which have no specific destination. MFC after: 3 days.	2011-10-27 22:37:59 +00:00
Qing Li	b3664a14cc	Exclude host routes when checking for prefix coverage on multiple interfaces. A host route has a NULL mask so check for that condition. I have also been told by developers who customize the packet output path with direct manipulation of the route entry (or the outgoing interface to be specific). This patch checks for the route mask explicitly to make sure custom code will not panic. PR: kern/161805 MFC after: 3 days	2011-10-25 04:06:29 +00:00
Ed Schouten	cf05e311ea	Add missing #includes. According to POSIX, these two header files should be able to be included by themselves, not depending on other headers. The <net/if.h> header uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses integer datatypes (u_int32_t, u_short, etc). MFC after: 2 months	2011-10-21 12:58:34 +00:00
Bjoern A. Zeeb	fba0cea143	Add syntactic sugar missed in r226437 and then not added either when moving things around in r226448 but desperately needed to always make things compile successfully. MFC after: 1 week	2011-10-17 00:05:31 +00:00
Andre Oppermann	873789cb0f	Move the tcp_sendspace and tcp_recvspace sysctl's from the middle of tcp_usrreq.c to the top of tcp_output.c and tcp_input.c respectively next to the socket buffer autosizing controls. MFC after: 1 week	2011-10-16 20:18:39 +00:00
Andre Oppermann	9ec4a4cca5	Remove the ss_fltsz and ss_fltsz_local sysctl's which have long been superseded by the RFC3390 initial CWND sizing. Also remove the remnants of TCP_METRICS_CWND which used the TCP hostcache to set the initial CWND in a non-RFC compliant way. MFC after: 1 week	2011-10-16 20:06:44 +00:00
Andre Oppermann	e233e2acb3	VNET virtualize tcp_sendspace/tcp_recvspace and change the type to INT. A long is not necessary as the TCP window is limited to 2**30. A larger initial window isn't useful. MFC after: 1 week	2011-10-16 15:08:43 +00:00
Eitan Adler	36daf0495a	- change "is is" to "is" or "it is" - change "the the" to "the" Approved by: lstewart Approved by: sahil (mentor) MFC after: 3 days	2011-10-16 14:30:28 +00:00
Andre Oppermann	c8360ae220	Update the comment and description of tcp_sendspace and tcp_recvspace to better reflect their purpose. MFC after: 1 week	2011-10-16 13:54:46 +00:00
Ed Schouten	3fa417554b	Forward declare mbuf and inpcb. This fixes a compiler warning at WARNS=6 when including the header files as follows: #include <sys/types.h> #include <netinet/in.h> #include <netinet/ip_var.h> #include <netinet/udp.h> #include <netinet/udp_var.h>	2011-10-16 10:58:00 +00:00
Gleb Smirnoff	53883e0c24	Add support for IPv4 /31 prefixes, as described in RFC3021. To run a /31 network, participating hosts MUST drop support for directed broadcasts, and treat the first and last addresses on subnet as unicast. The broadcast address for the prefix should be the link local broadcast address, INADDR_BROADCAST.	2011-10-15 18:41:25 +00:00
Gleb Smirnoff	b365d954cc	Remove last remnants of classful addressing: - Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr. - Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011. - fix bug when we were not forwarding to a host which matches classful net address. For example router having 192.168.x.y/16 network attached, would not forward traffic to 192.168.*.0, which are legal IPs in CIDR world. - For compatibility, leave autoguessing of mask based on class. Reviewed by: andre, bz, rwatson	2011-10-15 16:28:06 +00:00
Gleb Smirnoff	2a2e6f0aeb	Never switch directly from INIT to MASTER, since this produces nasty status flaps. PR: kern/161123 Submitted by: Damien Fleuriot <dam my.gd> OpenBSD: ip_carp.c, rev. 1.115	2011-10-14 19:05:26 +00:00
Gleb Smirnoff	a0b5928b29	De-spl(9).	2011-10-13 13:30:41 +00:00
Navdeep Parhar	aa4b09c5c7	Make sure the inp wasn't dropped when rexmt let go of the inp and pcbinfo locks. Reviewed by: andre@ MFC after: 7 days	2011-10-12 19:52:23 +00:00
Michael Tuexen	7906f59a29	Use the most significant 6 bits of the dscp instead of the least significant ones. This has changed in the latest version of the socket API ID and provides backwards compatibility and gets it in syn with the usage of the IP_TOS socket option. MFC after: 3 days.	2011-10-11 13:24:37 +00:00
Qing Li	15d2521975	All indirect routes will fail the rtcheck, except for a special host route where the destination IP and the gateway IP is the same. This special case handling is only meant for backward compatibility reason. The last commit introduced a bug in the route check logic, where a valid special case is treated as an error. This patch fixes that bug along with some code cleanup. Suggested by: gleb Reviewed by: kmacy, discussed with gleb MFC after: 1 day	2011-10-10 17:41:11 +00:00
Michael Tuexen	69c59f8ba2	Get struct sctp_net_route in tune with struct route. struct route was changed in http://svn.freebsd.org/changeset/base/225698 and since then SCTP support was broken. This needs to be MFCed to stable/9 to unbreak SCTP support in 9.0 MFC after: 3 days.	2011-10-10 16:31:18 +00:00
Michael Tuexen	3d2443cc84	When moving an stcb to a new inp and we copy over the list of bound addresses, update the last used address pointer. If not, it might result in a crash if the old inp goes away. MFC after: 3 days.	2011-10-10 12:28:47 +00:00
Michael Tuexen	629749b60c	Update the inp stored in a HB-timer when moving an stcb to a new inp. Use only this stored inp when processing a HB timeout. This fixes a bug which results in a crash. MFC after: 3 days.	2011-10-09 14:12:17 +00:00
Qing Li	6703e7ea10	Do not try removing an ARP entry associated with a given interface address if that interface does not support ARP. Otherwise the system will generate error messages unnecessarily due to the missing entry. PR: kern/159602 Submitted by: pluknet MFC after: 3 days	2011-10-07 22:22:19 +00:00
Qing Li	41b210c6f6	Remove the reference held on the loopback route when the interface address is being deleted. Only the last reference holder deletes the loopback route. All other delete operations just clear the IFA_RTSELF flag. PR: kern/159601 Submitted by: pluknet Reviewed by: discussed on net@ MFC after: 3 days	2011-10-07 18:01:34 +00:00
Andre Oppermann	1593dcd025	Prevent TCP sessions from stalling indefinitely in reassembly when reaching the zone limit of reassembly queue entries. When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress. Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket. Add logging under net.inet.tcp.log_debug for reassembly queue issues. Reviewed by: lsteward (previous version) Tested by: Steven Hartland <killing-at-multiplay.co.uk> MFC after: 3 days	2011-10-07 16:39:03 +00:00
Andre Oppermann	50b1479e65	Add back the IP header length to the total packet length field on raw IP sockets. It was deducted in ip_input() in preparation for protocols interested only in the payload. On raw sockets the IP header should be delivered as it at came in from the network except for the byte order swaps in some fields. This brings us in line with all other OS'es that provide raw IP sockets. Reported by: Matthew Cini Sarreo <mcins1-at-gmail.com> MFC after: 3 days	2011-10-07 13:43:01 +00:00
Attilio Rao	4af309c810	For the INP_TIMEWAIT case, there is no valid tcpcb object tied to the inpcb object. Skip the TCP_SIGNATURE check in that case as it is consistent with the output path (no TCP_SIGNATURE for outcoming packets in TIMEWAIT state) and also because for TIMEWAIT state the verify may be less effective. Sponsored by: Sandvine Incorporated Reported by: rwatson No objections by: rwatson MFC after: 3 days	2011-10-06 14:29:38 +00:00
Qing Li	db92413e6a	A system may have multiple physical interfaces, all of which are on the same prefix. Since a single route entry is installed for the prefix (without RADIX_MPATH), incoming packets on the interfaces that are not associated with the prefix route may trigger an error message about unable to allocation LLE entry, and fails L2. This patch makes sure a valid route is present in the system, and allow the aforementioned condition to exist and treats as valid. Reviewed by: bz MFC after: 5 days	2011-10-03 19:51:18 +00:00
Qing Li	6cf8e3300e	This patch allows ARP to work properly in the presence of self-referencing routes. This patch is a rework of r223862. Reviewed by: bz, zec MFC after: 5 days	2011-10-03 19:06:55 +00:00
Bjoern A. Zeeb	75e54d6017	Unbreak no-ip and no-inet6 module builds with ipfw. For now continue to build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the assumption that the private L2 hook (which hopefully eventually will be a pfil hook as well) can still be useful. Allow building the module without inet as well. Glanced at by: jhb MFC after: 3 days	2011-09-27 13:27:17 +00:00
Michael Tuexen	87eac1ceb9	Cleanup the iterator code, remove code that is never executed. Approved by: re MFC after: 1 month.	2011-09-19 21:47:20 +00:00
Michael Tuexen	80c79bbe7a	Fix the enabling/disabling of Heartbeats and path MTU discovery when using the SCTP_PEER_ADDR_PARAMS socket option. Approved by: re MFC after: 1 month.	2011-09-17 08:50:29 +00:00
Michael Tuexen	3657c405e3	Fix a typo introduced in http://svn.freebsd.org/changeset/base/225571 Reported by Ilya A. Arkhipov. Approved by: re MFC after: 1 month.	2011-09-15 12:20:52 +00:00
Michael Tuexen	92776dfd5a	Make sure that SCTP rejects broadcast, multicast and wildcard addresses as remote addresses. Approved by: re MFC after: 1 month.	2011-09-15 08:49:54 +00:00
Michael Tuexen	c55b70cef6	Ensure that 1-to-1 style SCTP sockets can only be connected once. Allow implicit setup also for 1-to-1 style sockets as described in the latest version of the socket API ID. Approved by: re MFC after: 1 month	2011-09-14 19:10:13 +00:00
Michael Tuexen	58bdb69150	Fix the handling of the flowlabel and DSCP value in the SCTP_PEER_ADDR_PARAMS socket option. Honor the net.inet6.ip6.auto_flowlabel sysctl setting. Approved by: re (bz) MFC after: 1 month.	2011-09-14 08:15:21 +00:00
John Baldwin	5bb3652f05	Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_* options defined in the kernel config. This more closely matches the behavior of other modules which inherit configuration settings from the kernel configuration during a kernel + modules build. Reviewed by: luigi Approved by: re (kib) MFC after: 1 week	2011-09-12 21:09:56 +00:00
Michael Tuexen	e4f820b3c6	Improve implementation of the Nagle algorithm for SCTP: Don't delay the final fragment of a fragmented user message. Approved by: re MFC after: 4 weeks	2011-09-09 13:52:37 +00:00
Qing Li	1184509858	When an interface address route is removed from the system, another route with the same prefix is searched for as a replacement. The current code did not bypass routes that have non-operational interfaces. This patch fixes that bug and will find a replacement route with an active interface. PR: kern/159603 Submitted by: pluknet, ambrisko at ambrisko dot com Reviewed by: discussed on net@ Approved by: re (bz) MFC after: 3 days	2011-08-28 00:14:40 +00:00
Bjoern A. Zeeb	b233773bb9	Increase the defaults for the maximum socket buffer limit, and the maximum TCP send and receive buffer limits from 256kB to 2MB. For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths. Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls. Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)	2011-08-25 09:20:13 +00:00
Bjoern A. Zeeb	6f69742441	Fix compilation in case of defined(INET) && defined(IPFIREWALL_FORWARD) but no INET6. Reported by: avg Tested by: avg MFC after: 4 weeks X-MFC with: r225044 Approved by: re (kib)	2011-08-20 18:45:38 +00:00
Bjoern A. Zeeb	8a006adb24	Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)	2011-08-20 17:05:11 +00:00
Bjoern A. Zeeb	f76fdd221b	Hide IPv6 next header parsing warnings under the verbose sysctl so people can possibly disable it when their consoles are flooded, or enabled it for debugging. MFC after: 2 weeks Approved by: re (kib)	2011-08-20 14:20:36 +00:00
Bjoern A. Zeeb	0c4dbd5af7	After r225032 fix logging in a similar way masking the the IPv6 more fragments flag off so that offset == 0 checks work properly. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks X-MFC with: r225032 Approved by: re (kib)	2011-08-20 13:47:08 +00:00
Bjoern A. Zeeb	49239b28da	If we detect an IPv6 fragment header and it is not the first fragment, then terminate the loop as we will not find any further headers and for short fragments this could otherwise lead to a pullup error discarding the fragment. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)	2011-08-20 13:46:19 +00:00
Bjoern A. Zeeb	720fee0674	ipfw internally checks for offset == 0 to determine whether the packet is a/the first fragment or not. For IPv6 we have added the "more fragments" flag as well to be able to determine on whether there will be more as we do not have the fragment header avaialble for logging, while for IPv4 this information can be derived directly from the IPv4 header. This allowed fragmented packets to bypass normal rules as proper masking was not done when checking offset. Split variables to not need masking for IPv6 to avoid further errors. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)	2011-08-20 13:17:47 +00:00
Bjoern A. Zeeb	391255b8a4	While not explicitly allowed by RFC 2460, in case there is no translation technology involved (and that section is suggested to be removed by Errata 2843), single packet fragments do not harm. There is another errata under discussion to clarify and allow this. Meanwhile add a sysctl to allow disabling this behaviour again. We will treat single packet fragment (a fragment header added when not needed) as if there was no fragment header. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version) Tested by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)	2011-08-20 12:40:17 +00:00
Michael Tuexen	3900c0936f	Fix the handling of [gs]etsockopt() unconnected 1-to-1 style sockets. While there: * Fix a locking issue in setsockopt() of SCTP_CMT_ON_OFF. * Fix a bug in setsockopt() of SCTP_DEFAULT_PRINFO, where the pr_value was ignored. Approved by: re@ MFC after: 2 months.	2011-08-16 21:04:18 +00:00
Michael Tuexen	b10f2dc889	Add support for the spp_dscp field in the SCTP_PEER_ADDR_PARAMS socket option. Backwards compatibility is provided by still supporting the spp_ipv4_tos field. Approved by: re@ MFC after: 2 months.	2011-08-14 20:55:32 +00:00
Kevin Lo	7236660627	If RTF_HOST flag is specified, then we are interested in destination address. PR: kern/159600 Submitted by: Svatopluk Kraus <onwahe at gmail dot com> Approved by: re (hrs)	2011-08-10 06:17:06 +00:00
Michael Tuexen	ca85e9482a	The result of a joint work between rrs@ and myself at the IETF: * Decouple the path supervision using a separate HB timer per path. * Add support for potentially failed state. * Bring back RTO.min to 1 second. * Accept packets on IP-addresses already announced via an ASCONF * While there: do some cleanups. Approved by: re@ MFC after: 2 months.	2011-08-03 20:21:00 +00:00
Gleb Smirnoff	217e3abc03	Add missing break; in r223593. Submitted by: sem Pointy hat to: glebius Approved by: re (kib)	2011-08-01 13:41:38 +00:00
Bjoern A. Zeeb	d9a362862c	Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix) Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align. Discussed with: rwatson (slightly earlier version)	2011-07-17 21:15:20 +00:00
Bjoern A. Zeeb	dceced71fb	Unbreak no-INET kernels after r223839 adding the needed #ifdef INET. MFC after: 4 weeks	2011-07-14 13:44:48 +00:00
Michael Tuexen	1a3b5ce2b9	Don't check for SOCK_DGRAM anymore. Also remove multicast related code which is not necessary anymore.	2011-07-12 20:14:03 +00:00
Michael Tuexen	78d9a31d3a	The socket API only specifies SCTP for SOCK_SEQPACKET and SOCK_STREAM, but not SOCK_DGRAM. So don't register it for SOCK_DGRAM. While there, fix some indentation.	2011-07-12 19:29:29 +00:00
Marko Zec	13e255fab7	Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. While here, remove two function pointers from struct lltable which are only initialized but never used. MFC after: 3 days	2011-07-08 09:38:33 +00:00
Andrey V. Elsukov	4659e09dcb	Add again the checking for log_arp_permanent_modify that was by accident removed in the r186119. PR: kern/154831 MFC after: 1 week	2011-07-07 11:59:51 +00:00
Andre Oppermann	1c6e7fa7f1	Remove the TCP_SORECEIVE_STREAM compile time option. The use of soreceive_stream() for TCP still has to be enabled with the loader tuneable net.inet.tcp.soreceive_stream. Suggested by: trociny and others	2011-07-07 10:37:14 +00:00
Colin Percival	472ea5befb	Remove #ifdef notyet code dating back to 4.3BSD Net/2 (and possibly earlier). I think the benefit of making the code cleaner and easier to understand outweighs the humour of leaving this intact (or possibly changing it to #ifdef not_yet_and_probably_never). MFC after: 2 weeks	2011-07-05 18:49:55 +00:00
Colin Percival	ca7122622b	Don't allow lro->len to exceed 65535, as this will result in overflow when len is inserted back into the synthetic IP packet and cause a multiple of 2^16 bytes of TCP "packet loss". This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in testing on Amazon EC2. Reviewed by: jfv MFC after: 2 weeks	2011-07-05 18:43:54 +00:00
Glen Barber	ff19f85d50	- General grammar and mdoc(7) fixes. [1] [2] - While here, remove a paragraph about userspace operation that has been outdated for some time. [2] PR: 158623 Submitted by: Ben Kudak (kaduk % mit!edu) [1] Reviewed by: glebius [2] MFC after: 1 week	2011-07-04 23:00:26 +00:00
Ermal Luçi	e6c90582c7	pf(4) tags now store the state key but tcp_respond tries to reuse a mbuf as an optimization. This makes pf find the wrong state and cause errors reported with state mismatches. Clear the cached state link on the pf(4) tag to avoid the state mismatches. Approved by: bz	2011-07-04 17:43:04 +00:00
Andrey V. Elsukov	2303570fe8	ARP code reuses mbuf from ARP request to make a reply, but it does not reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP replies were received on specified interface. Reset rcvif to NULL for ARP replies to fix this issue. PR: kern/131817 Reviewed by: glebius MFC after: 1 month	2011-07-04 05:47:48 +00:00
Michael Tuexen	b845acda75	Add the missing sca_keylength field to the sctp_authkey structure, which is used the the SCTP_AUTH_KEY socket option. MFC after: 1 month.	2011-06-30 16:56:55 +00:00
Andrey V. Elsukov	9527ec6e52	Add new rule actions "call" and "return" to ipfw. They make possible to organize subroutines with rules. The "call" action saves the current rule number in the internal stack and rules processing continues from the first rule with specified number (similar to skipto action). If later a rule with "return" action is encountered, the processing returns to the first rule with number of "call" rule saved in the stack plus one or higher. Submitted by: Vadim Goncharov Discussed by: ipfw@, luigi@	2011-06-29 10:06:58 +00:00
Bjoern A. Zeeb	e0bfbfce79	Update packet filter (pf) code to OpenBSD 4.5. You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri	2011-06-28 11:57:25 +00:00
Michael Tuexen	3c4401ecab	Add support for SCTP_PR_SCTP_NONE which I misded to add. This constant is defined in the socket API ID. MFC after: 2 months.	2011-06-27 22:03:33 +00:00
Gleb Smirnoff	812f1d32e7	Add possibility to pass IPv6 packets to a divert(4) socket. Submitted by: sem	2011-06-27 12:21:11 +00:00
Andrey V. Elsukov	0511675327	Export AddLink() function from libalias. It can be used when custom alias address needs to be specified. Add inbound handler to the alias_ftp module. It helps handle active FTP transfer mode for the case with external clients and FTP server behind NAT. Fix passive FTP transfer case for server behind NAT using redirect with external IP address different from NAT ip address. PR: kern/157957 Submitted by: Alexander V. Chernikov	2011-06-22 20:00:27 +00:00
Andrey V. Elsukov	62b6e03adf	Document PKT_ALIAS_SKIP_GLOBAL option. Submitted by: Alexander V. Chernikov	2011-06-22 09:55:28 +00:00
Andrey V. Elsukov	bb3dd40974	Do not use SET_HOST_IPLEN() macro for IPv6 packets. PR: kern/157239 MFC after: 2 weeks	2011-06-21 06:06:47 +00:00
Bjoern A. Zeeb	75497cc5eb	Fix a KASSERT from r212803 to check the correct length also in case of IPsec being compiled in and used. Improve reporting by adding the length fields to the panic message, so that we would have some immediate debugging hints. Discussed with: jhb	2011-06-20 07:07:18 +00:00
Bjoern A. Zeeb	f404863979	Remove a these days incorrect comment left from before new-arp. MFC after: 1 week	2011-06-18 13:54:36 +00:00
Michael Tuexen	6037f89c81	Add SCTP_DEFAULT_PRINFO socket option. Fix the SCTP_DEFAULT_SNDINFO socket option: Don't clear the PR SCTP policy when setting sinfo_flags. MFC after: 1 month.	2011-06-16 21:12:36 +00:00
Michael Tuexen	0b064106dd	* Fix the handling of addresses in sctp_sendv(). * Add support for SCTP_SENDV_NOINFO. * Improve the error handling of sctp_sendv() and sctp_recv(). MFC after: 1 month	2011-06-16 15:36:09 +00:00
Michael Tuexen	e2e7c62edc	Add support for the newly added SCTP API. In particular add support for: * SCTP_SNDINFO, SCTP_PRINFO, SCTP_AUTHINFO, SCTP_DSTADDRV4, and SCTP_DSTADDRV6 cmsgs. * SCTP_NXTINFO and SCTP_RCVINFO cmgs. * SCTP_EVENT, SCTP_RECVRCVINFO, SCTP_RECVNXTINFO and SCTP_DEFAULT_SNDINFO socket option. * Special association ids (SCTP_FUTURE_ASSOC, ...) * sctp_recvv() and sctp_sendv() functions. MFC after: 1 month.	2011-06-15 23:50:27 +00:00
Andrey V. Elsukov	1875bbfe54	Implement "global" mode for ipfw nat. It is similar to natd(8) "globalport" option for multiple NAT instances. If ipfw rule contains "global" keyword instead of nat_number, then for each outgoing packet ipfw_nat looks up translation state in all configured nat instances. If an entry is found, packet aliased according to that entry, otherwise packet is passed unchanged. User can specify "skip_global" option in NAT configuration to exclude an instance from the lookup in global mode. PR: kern/157867 Submitted by: Alexander V. Chernikov (previous version) Tested by: Eugene Grosbein	2011-06-14 13:35:24 +00:00
Andrey V. Elsukov	81a654646e	Sort alias mode flags in the increasing order.	2011-06-14 12:06:38 +00:00
Andrey V. Elsukov	3265f69ce6	Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure to the check_uidgid() function, since it contains all needed arguments and also pointer to mbuf and now it is possible use in_pcblookup_mbuf() function. Since i can not test it for the non-FreeBSD case, i keep this ifdef unchanged. Tested by: Alexander V. Chernikov MFC after: 3 weeks	2011-06-14 07:20:16 +00:00
John Baldwin	6b7c15e580	Advance the advertised window (rcv_adv) to the currently received data (rcv_nxt) if we advertising a zero window. This can be true when ACK'ing a window probe whose one byte payload was accepted rather than dropped because the socket's receive buffer was not completely full, but the remaining space was smaller than the window scale. This ensures that window probe ACKs satisfy the assumption made in r221346 and closes a window where rcv_nxt could be greater than rcv_adv. Tested by: trasz, pho, trociny Reviewed by: silby MFC after: 1 week	2011-06-13 15:38:31 +00:00
Bjoern A. Zeeb	ffe8cd7b10	Correct comments and debug logging in ipsec to better match reality. MFC after: 3 days	2011-06-08 03:02:11 +00:00
Andrey V. Elsukov	56e38090a4	Fix indentation.	2011-06-07 06:57:22 +00:00
Andrey V. Elsukov	bd853db48c	Make a behaviour of the libalias based in-kernel NAT a bit closer to how natd(8) does work. natd(8) drops packets only when libalias returns PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat always did drop packets that were not aliased, even if they should not be aliased and just are going through. PR: kern/122109, kern/129093, kern/157379 Submitted by: Alexander V. Chernikov (previous version) MFC after: 1 month	2011-06-07 06:42:29 +00:00
Bjoern A. Zeeb	1417604e70	Unbreak kernels with non-default PCBGROUP included but no WITNESS. Rather than including lock.h in in_pcbgroup.c in right order, fix it for all consumers of in_pcb.h by further header file pollution under #ifdef KERNEL. Reported by: Pan Tsu (inyaoo gmail.com)	2011-06-06 21:45:32 +00:00
Robert Watson	52cd27cb58	Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table. Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow). Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state. Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture. Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz). Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default. Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-06 12:55:02 +00:00
Andrey V. Elsukov	1e587bfa32	Do not return EINVAL when user does `ipfw set N flush` on an empty set. MFC after: 2 weeks	2011-06-06 10:39:38 +00:00
Hiroki Sato	db82af41db	- Implement RDNSS and DNSSL options (RFC 6106, IPv6 Router Advertisement Options for DNS Configuration) into rtadvd(8) and rtsold(8). DNS information received by rtsold(8) will go to resolv.conf(5) by resolvconf(8) script. This is based on work by J.R. Oldroyd (kern/156259) but revised extensively[1]. - rtadvd(8) now supports "noifprefix" to disable gathering on-link prefixes from interfaces when no "addr" is specified[2]. An entry in rtadvd.conf with "noifprefix" + no "addr" generates an RA message with no prefix information option. - rtadvd(8) now supports RTM_IFANNOUNCE message to fix crashes when an interface is added or removed. - Correct bogus ND_OPT_ROUTE_INFO value to one in RFC 4191. Reviewed by: bz[1] PR: kern/156259 [1] PR: bin/152458 [2]	2011-06-06 03:06:43 +00:00
Robert Watson	d3c1f00350	Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes). Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently. (In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.) Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-04 16:33:06 +00:00
Robert Watson	711b3dbd54	IP divert sockets use their inpcbinfo for port reservation, although not for lookup. I missed its call to in_pcbbind() when preparing previous patches, which would lead to a lock assertion failure (although problem not an actual race condition due to global pcbinfo locks providing required synchronisation -- in this particular case only). This change adds the missing locking of the pcbhash lock. (Existing comments in the ipdivert code question the need for using the global hash to manage the namespace, as really it's a simple port namespace and not an address/port namespace. Also, although in_pcbbind is used to manage reservations, the hash tables aren't used for lookup. It might be a good idea to make them use hashed lookup, or to use a different reservation scheme.) Reviewed by: bz Reported by: Kristof Provost <kristof at sigsegv.be> Sponsored by: Juniper Networks	2011-06-04 16:26:02 +00:00
Robert Watson	b598155a85	Do not leak the pcbinfohash lock in the case where in6_pcbladdr() returns an error during TCP connect(2) on an IPv6 socket. Submitted by: bz Sponsored by: Juniper Networks, Inc.	2011-06-02 10:21:05 +00:00
Andrey V. Elsukov	281d42c371	O_FORWARD_IP is only action which depends from the result of lookup of dynamic rules. We are doing forwarding in the following cases: o For the simple ipfw fwd rule, e.g. fwd 10.0.0.1 ip from any to any out xmit em0 fwd 127.0.0.1,3128 tcp from any to any 80 in recv em1 o For the dynamic fwd rule, e.g. fwd 192.168.0.1 tcp from any to 10.0.0.3 3333 setup keep-state When this rule triggers it creates a dynamic rule, but this dynamic rule should forward packets only in forward direction. o And the last case that does not work before - simple fwd rule which triggers when some dynamic rule is already executed. PR: kern/147720, kern/150798 MFC after: 1 month	2011-06-01 19:44:52 +00:00
Andrey V. Elsukov	88eb7833cb	Hide some debug messages under debug macro. MFC after: 1 week	2011-06-01 12:33:05 +00:00
Andrey V. Elsukov	e35a05d3e7	Hide useless warning under debug macro. PR: kern/69963 MFC after: 1 week	2011-06-01 12:05:35 +00:00
Bjoern A. Zeeb	d2025bd0f6	Unbreak NOINET kernels after r222488. Reviewed by: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems! Pointy hat: to myself for missing this during review?	2011-05-30 18:07:35 +00:00
Robert Watson	fa046d8774	Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-30 09:43:55 +00:00
Andrey V. Elsukov	d832ded1a1	Wrap long line. MFC after: 2 weeks	2011-05-30 05:53:00 +00:00
Andrey V. Elsukov	41b6083752	Add tablearg support for ipfw setfib. PR: kern/156410 MFC after: 2 weeks	2011-05-30 05:37:26 +00:00
Michael Tuexen	14cfa970bf	Get rid of unused functions. MFC after: 1 week.	2011-05-29 18:41:06 +00:00
Qing Li	92322284cd	Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface address so that proper clean up will take place in the routing code. This patch fixes the bootp panic on startup problem. Also, added more error handling and logging code in function in_scrubprefix(). MFC after: 5 days	2011-05-29 02:21:35 +00:00
Bjoern A. Zeeb	8d5a3ca77b	Add FEATURE() definitions for IPv4 and IPv6 so that we can use feature_present(3) to dynamically decide whether to use one or the other family. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 10 days	2011-05-25 00:34:25 +00:00
Robert Watson	61401ec2de	An inpcb lock is no longer required in in_pcbref() since the move to refcount(9). MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-05-24 13:08:59 +00:00
Robert Watson	79bdc6e5d3	Continue to refine inpcb reference counting and locking, in preparation for reworking of inpcbinfo locking: (1) Convert inpcb reference counting from manually manipulated integers to the refcount(9) KPI. This allows the refcount to be managed atomically with an inpcb read lock rather than write lock, or even with no inpcb lock at all. As a result, in_pcbref() also no longer requires an inpcb lock, so can be performed solely using the lock used to look up an inpcb. (2) Shift more inpcb freeing activity from the in_pcbrele() context (via in_pcbfree_internal) to the explicit in_pcbfree() context. This means that the inpcb refcount is increasingly used only to maintain memory stability, not actually defer the clean up of inpcb protocol parts. This is desirable as many of those protocol parts required the pcbinfo lock, which we'd like not to acquire in in_pcbrele() contexts. Document this in comments better. (3) Introduce new read-locked and write-locked in_pcbrele() variations, in_pcbrele_rlocked() and in_pcbrele_wlocked(), which allow the inpcb to be properly unlocked as needed. in_pcbrele() is a wrapper around the latter, and should probably go away at some point. This makes it easier to use this weak reference model when holding only a read lock, as will happen in the future. This may well be safe to MFC, but some more KBI analysis is required. Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-05-23 19:32:02 +00:00
Robert Watson	68e0d7e06a	Move from passing a wildcard boolean to a general set up lookup flags into in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly for IPv6 functions. In the future, we would like to support other flags relating to locking strategy. This change doesn't appear to modify the KBI in practice, as callers already passed in INPLOOKUP_WILDCARD rather than a simple boolean. MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-23 15:23:18 +00:00
Robert Watson	82a5be494a	A number of quite incremental refinements to struct inpcbinfo's definition: (1) Add a locking guide for inpcbinfo. (2) Annotate inpcbinfo fields with synchronisation information; not all annotations are 100% satisfactory. (3) Reorder inpcbinfo fields so that the lock is at the head of the structure, and close to fields it protects. (4) Sort fields that will eventually be hashlock/pcbgroup-related together even though they remain locked by ipi_lock for now. Reviewed by: bz Sponsored by: Juniper Networks X-MFC after: KBI analysis required	2011-05-23 13:51:57 +00:00
Qing Li	5b84dc789a	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
Michael Tuexen	b7e08865e8	Unbreak INET-less build. Reported by bz@ MFC after: 1 week	2011-05-18 19:49:39 +00:00
Michael Tuexen	4f36da915f	Copy out the mtu when calling getsockopt() with SCTP_GET_PEER_ADDR_INFO. MFC after: 1 week.	2011-05-17 15:57:31 +00:00
Michael Tuexen	c954cac48b	Fix whitespacing. Reported by scf@ MFC after: 1 week.	2011-05-17 15:46:28 +00:00
Michael Tuexen	96f4bcfff2	Fix the source address selection for boundall sockets when sending INITs to a global IPv4 address having only private IPv4 address. Allow the usage of a private address and make sure that no other private address will be used by the association. Initial work was done by rrs@. MFC after: 1 week.	2011-05-14 18:22:14 +00:00
John Baldwin	5891ebd6cd	Oops, fix order of sequence numbers in KASSERT()'s to catch negative receive windows to match the labels in the panic message. Submitted by: trociny	2011-05-14 14:41:40 +00:00
Alexander Motin	bc7d18ae72	Refactor TCP ISN increment logic. Instead of firing callout at 100Hz to keep constant ISN growth rate, do the same directly inside tcp_new_isn(), taking into account how much time (ticks) passed since the last call. On my test systems this decreases idle interrupt rate from 140Hz to 70Hz.	2011-05-09 07:37:47 +00:00
Michael Tuexen	689e6a5fa3	Fix a locking issue showing up on Mac OS X when subscribing to authentication events. DTLS/SCTP renegotiations trigger the bug. MFC after: 2 weeks.	2011-05-08 09:11:59 +00:00
Michael Tuexen	936fc35bb3	Change the name of an internal structure, since the name is used by a structure of the (new) SCTP API. MFC after: 1 week.	2011-05-06 20:40:33 +00:00
Andrey V. Elsukov	318b735cc3	Convert delay parameter back to ms when reporting to user. PR: 156838 MFC after: 1 week	2011-05-06 07:13:34 +00:00
Michael Tuexen	c3d72c80d3	Implement Resource Pooling V2 and an MPTCP like congestion control. Based on a patch received from Martin Becke. MFC after: 2 weeks.	2011-05-04 21:27:05 +00:00
Michael Tuexen	274b0bd51d	Remove code with any effect.	2011-05-03 20:34:02 +00:00
Michael Tuexen	1d663b4658	Add a missing break. This bug was introduced in r221249. MFC after: 1 week	2011-05-03 20:32:21 +00:00
John Baldwin	f701e30d7f	Handle a rare edge case with nearly full TCP receive buffers. If a TCP buffer fills up causing the remote sender to enter into persist mode, but there is still room available in the receive buffer when a window probe arrives (either due to window scaling, or due to the local application very slowing draining data from the receive buffer), then the single byte of data in the window probe is accepted. However, this can cause rcv_nxt to be greater than rcv_adv. This condition will only last until the next ACK packet is pushed out via tcp_output(), and since the previous ACK advertised a zero window, the ACK should be pushed out while the TCP pcb is write-locked. During the window while rcv_nxt is greather than rcv_adv, a few places would compute the remaining receive window via rcv_adv - rcv_nxt. However, this value was then (uint32_t)-1. On a 64 bit machine this could expand to a positive 2^32 - 1 when cast to a long. In particular, when calculating the receive window in tcp_output(), the result would be that the receive window was computed as 2^32 - 1 resulting in advertising a far larger window to the remote peer than actually existed. Fix various places that compute the remaining receive window to either assert that it is not negative (i.e. rcv_nxt <= rcv_adv), or treat the window as full if rcv_nxt is greather than rcv_adv. Reviewed by: bz MFC after: 1 month	2011-05-02 21:05:52 +00:00
Michael Tuexen	ea5eba1157	Some more cleanups related to an kernel without INET. MFC after: 1 week	2011-05-02 15:53:00 +00:00
Bjoern A. Zeeb	29bd2010d4	Fix a mismerge from p4 in that in_localaddr() is not available without INET. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 16:30:18 +00:00
Michael Tuexen	d085528d04	Remove some leftover debug code. MFC after: 1 week	2011-04-30 11:22:30 +00:00
Bjoern A. Zeeb	b287c6c70c	Make the TCP code compile without INET. Sort #includes and add #ifdef INETs. Add some comments at #endifs given more nestedness. To make the compiler happy, some default initializations were added in accordance with the style on the files. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:21:29 +00:00
Michael Tuexen	e6194c2ed4	Improve compilation of SCTP code without INET support. Some bugs where fixed while doing this: * ASCONF-ACK messages might use wrong port number when using IPv6. * Checking for additional addresses takes the correct address into account and also does not do more comparisons than necessary. This patch is based on one received from bz@ who was sponsored by The FreeBSD Foundation and iXsystems. MFC after: 1 week	2011-04-30 11:18:16 +00:00
Bjoern A. Zeeb	79288c112c	Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only as well compiling out most functions adding or extending #ifdef INET coverage. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:17:00 +00:00
Bjoern A. Zeeb	67107f4594	Make the PCB code compile without INET support by adding #ifdef INETs and correcting few #includes. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:04:34 +00:00
John Baldwin	672dc4aea2	TCP reuses t_rxtshift to determine the backoff timer used for both the persist state and the retransmit timer. However, the code that implements "bad retransmit recovery" only checks t_rxtshift to see if an ACK has been received in during the first retransmit timeout window. As a result, if ticks has wrapped over to a negative value and a socket is in the persist state, it can incorrectly treat an ACK from the remote peer as a "bad retransmit recovery" and restore saved values such as snd_ssthresh and snd_cwnd. However, if the socket has never had a retransmit timeout, then these saved values will be zero, so snd_ssthresh and snd_cwnd will be set to 0. If the socket is in fast recovery (this can be caused by excessive duplicate ACKs such as those fixed by 220794), then each ACK that arrives triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd to be no larger than snd_ssthresh. In effect, the socket's send window is permamently stuck at 0 even though the remote peer is advertising a much larger window and pending data is only sent via TCP window probes (so one byte every few seconds). Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that the various snd_*_prev fields in the pcb are valid and only perform "bad retransmit recovery" if this flag is set in the pcb. The flag is set on the first retransmit timeout that occurs and is cleared on subsequent retransmit timeouts or when entering the persist state. Reviewed by: bz MFC after: 2 weeks	2011-04-29 15:40:12 +00:00
Bjoern A. Zeeb	b8e463e644	MfP4 CH=192029: Expose ip_icmp.c to INET6 as well and only export badport_bandlim() along with the two sysctls in the non-INET case. The bandlim types work for all cases I reviewed in IPv6 as well and the sysctls are available as we export net.inet.* from in_proto.c. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:36:35 +00:00
Bjoern A. Zeeb	74e9dcf786	MfP4 CH=192004: Move ip_defttl to raw_ip.c where it is actually used. In an IPv6 only world we do not want to compile ip_input.c in for that and it is a shared default with INET6. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:32:27 +00:00
Bjoern A. Zeeb	a0ae8f04e8	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
Attilio Rao	2903309aca	Add the possibility to verify MD5 hash of incoming TCP packets. As long as this is a costy function, even when compiled in (along with the option TCP_SIGNATURE), it can be disabled via the net.inet.tcp.signature_verify_input sysctl. Sponsored by: Sandvine Incorporated Reviewed by: emaste, bz MFC after: 2 weeks	2011-04-25 17:13:40 +00:00
Bjoern A. Zeeb	acaeca65b3	Be less strict on includes than in r220746. We need in.h for both INET or INET6 as it holds all the IPPROTO_* definitions needed for the SYSCTL_NODE definitions. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 5 days	2011-04-25 16:36:16 +00:00
Gleb Smirnoff	acdef0460e	Use size_t for sopt_valsize. Submitted by: Brandon Gooch <jamesbrandongooch gmail.com>	2011-04-21 08:18:55 +00:00
Bjoern A. Zeeb	00c081e908	MFp4 CH=191760: When compiling out INET we still need the initialization routines as well as the tuning and montoring sysctls shared with IPv6. Move the two send/recvspace variables up from the middle of the file to ease compiling out the INET only code. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days	2011-04-20 08:03:22 +00:00
Bjoern A. Zeeb	aae49dd304	MFp4 CH=191470: Move the ipport_tick_callout and related functions from ip_input.c to in_pcb.c. The random source port allocation code has been merged and is now local to in_pcb.c only. Use a SYSINIT to get the callout started and no longer depend on initialization from the inet code, which would not work in an IPv6 only setup. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-20 08:00:29 +00:00
Bjoern A. Zeeb	ec4f97277f	MFp4 CH=191466: Move fw_one_pass to where it belongs: it is a property of ipfw, not of ip_input. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days	2011-04-20 07:55:33 +00:00
Gleb Smirnoff	9d0a2ddf69	- Rewrite functions that copyin/out NAT configuration, so that they calculate required memory size dynamically. - Fix races on chain re-lock. - Introduce new field to ip_fw_chain - generation count. Now utilized only in the NAT configuration, but can be utilized wider in ipfw. - Get rid of NAT_BUF_LEN in ip_fw.h PR: kern/143653	2011-04-19 15:06:33 +00:00
Andrey V. Elsukov	e3665201f5	Add sysctl handlers for net.inet.ip.dummynet.hash_size, .pipe_byte_limit and .pipe_slot_limit oids to prevent to set incorrect values. MFC after: 2 weeks	2011-04-19 11:33:39 +00:00
Andrey V. Elsukov	8ad66025f6	ipdn_bound_var() functions is designed to bound a variable between specified minimum and maximum. In case when specified default value is out of bounds it does not work as expected and does not limit variable. Check that default value is in range and limit it if needed. Also bump max_hash_size value to 65536 to correspond with manual page. PR: kern/152887 MFC after: 2 weeks	2011-04-19 11:29:09 +00:00
Andrey V. Elsukov	3ab4af737d	Use M_WAITOK instead M_WAIT for malloc. Remove unneded checks. MFC after: 1 week	2011-04-19 05:59:37 +00:00
Gleb Smirnoff	ca47294ddf	LibAliasInit() should allocate memory with M_WAITOK flag. Modify it and its callers.	2011-04-18 20:07:08 +00:00
Gleb Smirnoff	d0e16e0d1e	Pullup up to TCP header length before matching against 'tcpopts'. PR: kern/156180 Reviewed by: luigi	2011-04-18 18:22:10 +00:00
John Baldwin	da84b2e6c5	When checking to see if a window update should be sent to the remote peer, don't force a window update if the window would not actually grow due to window scaling. Specifically, if the window scaling factor is larger than 2 * MSS, then after the local reader has drained 2 * MSS bytes from the socket, a window update can end up advertising the same window. If this happens, the supposed window update actually ends up being a duplicate ACK. This can result in an excessive number of duplicate ACKs when using a higher maximum socket buffer size. Reviewed by: bz MFC after: 1 month	2011-04-18 17:43:16 +00:00
Bjoern A. Zeeb	336d023b2e	Make in_proto.c dependent on either inet or inet6. While it does not provide any functionality for IPv6, it provides the sysctl nodes for net.inet.* that a lot of functionality shared between IPv4 and IPv6 depends on. We cannot change these anymore without breaking a lot of management and tuning. In case of IPv6 only, we compile out everything but the sysctl node declarations. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC After: 5 days	2011-04-17 16:35:16 +00:00
Edward Tomasz Napierala	79bb84fb15	Refactor udp_input(), moving calls to u_tun_func() into udp_append(). Obtained from: Wheel Systems Sp. z o.o. Reviewed by: bz@	2011-04-14 10:40:57 +00:00
Bjoern A. Zeeb	05b9d121aa	The mbuf_frag_size always was and is file local and not queried from base user space tools via kvm. Mark it static. MFC after: 3 days	2011-04-14 09:47:09 +00:00
Sergey Kandaurov	6bed196c35	Staticize malloc types. Approved by: lstewart MFC after: 1 week	2011-04-13 11:28:46 +00:00
Andrey V. Elsukov	9974d151ec	Restore previous behaviour - always match rule when we doing tagging, even when tag is already exists. Reported by: Vadim Goncharov MFC after: 1 week	2011-04-12 15:20:34 +00:00
Lawrence Stewart	891b8ed467	Use the full and proper company name for Swinburne University of Technology throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days	2011-04-12 08:13:18 +00:00
Jack F Vogel	c31aa19c53	Port of the LRO fix from mxge driver to the generic LRO code. Thanks to Andrew Gallatin for the change. MFC after: 7 days	2011-04-07 21:20:26 +00:00
Andrey V. Elsukov	a5620cc6c5	Fill up src_port and dst_port variables for SCTP over IPv4. PR: kern/153415 MFC after: 1 week	2011-03-31 16:30:14 +00:00
Andrey V. Elsukov	5600c92750	Fix malloc types. MFC after: 1 week	2011-03-31 15:11:12 +00:00
Andrey V. Elsukov	3d10d64fd3	Fix a memory leak. Memory that is allocated for schedulers hash table was not freed. PR: kern/156083 MFC after: 1 week	2011-03-31 15:10:41 +00:00
John Baldwin	766282cbe7	Clamp the initial advertised receive window when responding to a SYN/ACK to the maximum allowed window. Growing the window too large would cause an underflow in the calculations in tcp_output() to decide if a window update should be sent which would prevent the persist timer from being started if data was pending and the other end of the connection advertised an initial window size of 0. PR: kern/154006 Submitted by: Stefan `Sec` Zehl sec 42 org Reviewed by: bz MFC after: 1 week	2011-03-30 12:35:39 +00:00
Weongyo Jeong	c45e1b3cad	Covers values if (BYTES_THIS_ACK(tp, th) / tp->t_maxseg) value is from 2.0 to 3.0. Reviewed by: lstewart	2011-03-28 19:03:56 +00:00
Sergey Kandaurov	79d514355c	Reference ifaddr object before unlocking as it can be freed from another context at the moment of later access. PR: kern/155555 Submitted by: Andrew Boyer <aboyer att averesystems.com> Approved by: avg (mentor) MFC after: 2 weeks	2011-03-21 14:19:40 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	4d457387fe	Properly check for an IPv4 socket after r219579. In some cases as udp6_connect() without an earlier bind(2) to an address, v4-mapped scokets allowed and a non mapped destination address, we can end up here with both v4 and v6 indicated: inp_vflag = (INP_IPV4\|INP_IPV6\|INP_IPV6PROTO) In that case however laddrp is NULL as the IPv6 path does not pass in a copy currently. Reported by: Pawel Worach (pawel.worach gmail.com) Tested by: Pawel Worach (pawel.worach gmail.com) MFC after: 6 days X-MFC with: r219579	2011-03-19 19:08:54 +00:00
Bjoern A. Zeeb	efc76f729a	Merge the two identical implementations for local port selections from in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport(). MFC after: 2 weeks	2011-03-12 21:46:37 +00:00
Randall Stewart	f79aab1866	Tunes and fixes the new DC-CC to seem to hit the right mix. Still may need some tweaks but it appears to almost not give away too much to an RFC2581 flow, but can really minimize the amount of buffers used in the net. MFC after: 3 months	2011-03-08 11:58:25 +00:00
Randall Stewart	48b6c64938	Adds a new Congestion Control that helps reduce the RTT that a flow will build up in buffers in transit. It is a slight modification to RFC2581 but is more friendly i.e. less aggressive. MFC after: 3 months	2011-03-01 00:37:46 +00:00
Dimitry Andric	cb8750c269	Fix breakage in sys/netinet/sctp_sysctl.c, introduced by r219057. If SCTP_HAS_RTTC is not defined, this file fails to compile. Insert the necessary #ifdefs to make it work. Pointy hat to: rrs	2011-02-26 22:45:40 +00:00
Randall Stewart	299108c5a2	Improvements to CC modules: 1) Add four new points that allow you to get more information to cc algo's 2) Fix the case where user changes module on a existing TCB, in such a case, the initialization module needs to be called on all nets. 3) Move htcp_cc structure to a union that other modules can use. 4) Add 5th point for get/set socket options for cc_module specific options MFC after: 2 months	2011-02-26 15:23:46 +00:00
Michael Tuexen	0191fb6de2	* Fix several bugs where the scaled versions of srtt and rttvar where used incorrectly. * Use appropriate variable names for RTO instead of RTT. MFC after: 3 months.	2011-02-24 22:58:15 +00:00
Michael Tuexen	be1d917696	* Cleanup the code computing the retransmission timeout. * Fix an initialization bug for the scaled variance of the RTO. MFC after: 3 months.	2011-02-24 22:36:40 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Michael Tuexen	f0878bdcc5	Bugfix: Get per vnet sysctl variables and statistics working. MFC after:3 months.	2011-02-18 20:30:58 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Sergey Kandaurov	4fd8408ae7	Bump dummynet module version to meet dummynet schedulers' requirements, and thus unbreak loading dummynet.ko via /boot/loader.conf. Reported by: rihad <rihad att mail.ru> on freebsd-net Approved by: kib (mentor)	2011-02-16 15:43:35 +00:00
Randall Stewart	d69e7322cb	Fix a bug reported by Jonathan Leighton in his web-sctp testing at the Univ-of-Del. Basically when a 1-to-1 socket did a socket/bind/send(data)/close. If the timing was right we would dereference a socket that is NULL. MFC after: 1 month	2011-02-13 14:48:11 +00:00
Michael Tuexen	be2a6988a1	Fix several bugs related to stream scheduling. Obtained from: Robin Seggelmann MFC after: 3 months.	2011-02-13 13:53:28 +00:00
Daniel Eischen	9d22191d17	Oops, revert an accidental local change that got added in my last commit (r218627). No damage was done in the last commit, just some duplicated code was added (which is now removed).	2011-02-13 04:44:06 +00:00
Daniel Eischen	f7e6ce6d7a	Allow the SO_SETFIB socket option to select the default (0) routing table. Reviewed by: julian	2011-02-13 00:14:13 +00:00
Michael Tuexen	2678fe1ee9	Remove addresses from endpoint when there are no associations. This fixes a bug reported by brucec@. MFC after: 3 months.	2011-02-10 14:46:37 +00:00
Michael Tuexen	4c97400f86	Fix bugs related to M_FLOWID: * Store the flowid when receiving an SCTP/IPv6 packet. * Store the flowid when receiving an SCTP packet with wrong CRC. * Initilize flowid correctly. * Put test code under INVARIANTS. MFC after: 3 months.	2011-02-07 15:04:23 +00:00
Randall Stewart	f8140f7291	If not set (due to some error Michael is working on fixing) set it for the net. MFC after: 3 months	2011-02-07 08:12:24 +00:00
Randall Stewart	73403d4141	1) Track when flowid does get set. MFC after: 3 months	2011-02-07 08:10:29 +00:00
Randall Stewart	38521fb9b4	1) Use same scheme Michael and I discussed for a selected for a flowid 2) If flowid is not set, arrange so it is stored. 3) If flowid is set by lower layer, use it. MFC after: 3 Months	2011-02-06 13:17:40 +00:00
Luigi Rizzo	9b0456f075	correct the 'output_time' of packets generated by dummynet. In the dec.2009 rewrite I introduced a bug, using for the computation the arrival time instead of the time the packet has exited from the queue. The bandwidth computation was still correct because it is computed elsewhere, but traffic was sent out in bursts. The bug is also present in RELENG_8 after dec.2009 Thanks to Daikichi Osuga for investingating, finding and fixing the bug with detailed graphs of the behaviour before and after the fix. Submitted by: Daikichi Osuga MFC after: 2 weeks	2011-02-05 23:32:17 +00:00
Michael Tuexen	a4ae38f117	Add support for M_FLOWID.	2011-02-05 19:13:38 +00:00
Randall Stewart	5d40cf5d23	1) Typo correction in comments and one spacing change. 2) Mass update to all copyrights. MFC after: 3 Months	2011-02-05 12:12:51 +00:00
John Baldwin	d28b9e89a9	When turning off TCP_NOPUSH, only call tcp_output() to immediately flush any pending data if the connection is established. Submitted by: csjp Reviewed by: lstewart MFC after: 1 week	2011-02-04 14:13:15 +00:00
Randall Stewart	0071ee5ede	1) Fix cpu mapping per JB's suggestions 2) Fix it so INIT's don't always end up on CPU0 MFC after: 3 months	2011-02-04 13:50:30 +00:00
Rebecca Cran	492fddb2c4	Fix typo (Tuneable -> Tunable).	2011-02-04 12:03:48 +00:00
Michael Tuexen	252f7f93b0	Fix several bugs in the stream schedulers. From Robin Seggelmann. MFC after: 3 months.	2011-02-03 20:44:49 +00:00
Michael Tuexen	c446091b1e	Make sure that changing the ECN sysctl does not affect exisiting associations and endpoints. MFC after: 3 months.	2011-02-03 19:59:00 +00:00
Randall Stewart	dec0177df6	1) Move per John Baldwin to mp_maxid 2) Some signed/unsigned errors found by Mac OS compiler (from Michael) 3) a couple of copyright updates on the effected files. MFC after: 3 months	2011-02-03 19:22:21 +00:00
Randall Stewart	ae26e0a472	Fix the per CPU stats so that: 1) They don't use the giant "MAX_CPU" define and instead are allocated dynamically based on mp_ncpus 2) Will zero with the netstat -z -s -p sctp 3) Will be properly handled by both the sctp_init and finish (the multi-net stuff was incorrectly bzero'ing in sctp_init the wrong size.. the bzero is now moved to the right places). And of course the free is put in at the very end. MFC after: 3 Months	2011-02-03 11:52:22 +00:00
Randall Stewart	bfc46083b9	Adds an experimental option to create a pool of threads. These serve as input threads and are queued packets based on the V-tag number. This is similar to what a modern card can do with queue's for TCP... but alas modern cards know nothing about SCTP. MFC after: 3 months (maybe)	2011-02-03 10:05:30 +00:00
Randall Stewart	899288ae4b	1) Allow a chunk to track the cwnd it was at when sent. 2) Add separate max-bursts for retransmit and hb. These are set to sysctlable values but not settable via the socket api. This makes sure we don't blast out HB's or fast-retransmits. 3) Determine on the first data transmission on a net if its local-lan (by being under or over a RTT). This can later be used to think about different algorithms based on locallan vs big-i (experimental) 4) The cwnd should NOT be allowed to grow when an ECNEcho is seen (TCP has this same bug). We fix this in SCTP so an ECNe being seen prevents an advance of cwnd. 5) CWR's should not be sent multiple times to the same network, instead just updating the TSN being transmitted if needed. MFC after: 1 Month	2011-02-02 11:13:23 +00:00
Lawrence Stewart	03f0843bdb	Algorithm modules can define their own private congestion signal types in the top 8 bits of the 32 bit signal bit field space for internal use. These private signals should not be leaked outside of a module. Given that many algorithm modules use the NewReno hook functions to simplify their implementation, the obvious place such a leak would show up is in the NewReno cong_signal hook function. - Show the full number of significant bits in the signal type definitions in <netinet/cc.h>. - Add a bitmask to simplify figuring out if a given signal is in the private or public bit range. - Add a sanity check in newreno_cong_signal() to ensure private signals are not being leaked into the hook function. Sponsored by: FreeBSD Foundation Discussed with: David Hayes <dahayes at swin edu au> MFC after: 1 week X-MFC with: r215166	2011-02-01 13:32:27 +00:00
Lawrence Stewart	ec943febbb	Fix typo in comment: "course" -> "coarse" Sponsored by: FreeBSD Foundation Submitted by: jmallett MFC after: 3 months X-MFC with: r218152	2011-02-01 07:10:13 +00:00
Lawrence Stewart	0927e1a18b	Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control algorithm described in the paper "Improved coexistence and loss tolerance for delay based TCP congestion control" by Hayes and Armitage. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide tolerance to non-congestion related packet loss and improvements to coexistence with loss-based congestion control algorithms. A key idea in improving coexistence with loss-based congestion control algorithms is the use of a shadow window, which attempts to track how NewReno's congestion window (cwnd) would evolve. At the next packet loss congestion event, CHD uses the shadow window to correct cwnd in a way that reduces the amount of unfairness CHD experiences when competing with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 07:05:14 +00:00
Lawrence Stewart	ac230a79e1	Import a clean-room implementation of the Hamilton-Delay (HD) congestion control algorithm based on the paper "A strategy for fair coexistence of loss and delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and Baker. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. HD uses a probabilistic approach to reacting to delay-based congestion. The probability of reducing cwnd is zero when the queuing delay is very small, increasing to a maximum at a set threshold, then back down to zero again when the queuing delay is high. Normal operation keeps the queuing delay below the set threshold. However, since loss-based congestion control algorithms push the queuing delay high when probing for bandwidth, having the probability of reducing cwnd drop back to zero for high delays allows HD to compete with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:42:46 +00:00
Lawrence Stewart	1d4ed791d0	Import a clean-room implementation of the VEGAS congestion control algorithm based on the paper "TCP Vegas: end to end congestion avoidance on a global internet" by Brakmo and Peterson. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. VEGAS uses network delay as a congestion indicator and unlike regular loss-based algorithms, attempts to keep the network operating with stable queuing delays and no congestion losses. By keeping network buffers used along the path within a set range, queuing delays are kept low while maintaining high throughput. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:17:00 +00:00
Randall Stewart	493d8e5a83	More ECN fixes: 1) We now remove ECN-Nonce since it will no longer continue as a I-D 2) Eliminate last_tsn_echo, this tied us to an assoc not the net and thus we were not doing m-homing on the ECN-Echo senders side right. 3) Increment the count going out even if the TSN in lower in the pending ECN-Echo, this way the receiver knows exactly how many packets were marked even with network re-ordering 4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue MFC after: 1 month	2011-01-31 11:50:11 +00:00
Bjoern A. Zeeb	7f79e7e4db	Remove duplicate printing of TF_NOPUSH in db_print_tflags(). MFC after: 10 days	2011-01-29 22:11:13 +00:00
Randall Stewart	a21779f050	Fixes to ECN in SCTP. 1) ECN was on an association basis, this is incorrect and will not work with CMT or for that matter if the user is sending to multiple addresses. This commit makes ECN on a per path basis. 2) Adopt the new format for the ECN internet draft. This also maintains compatability with old format chunks as well. 3) Keep track of the real time of a RTT down to micro seconds. For some future conditional features (for like a data center this is good information to have). MFC after: 1 month	2011-01-29 19:55:29 +00:00
Randall Stewart	410bcbef0a	Keep track of the real last RTT on each net. This will be used for Data Center congestion control, we won't want to engage it in the ECN code unless we KNOW that the RTT is less than 500us. MFC after: 1 week	2011-01-28 21:05:21 +00:00
Randall Stewart	d77e2e42b3	Fix a bug in the way ECN-Echo chunk sends were being accounted for. The counting was such that we counted only when we queued a chunk, not when we sent it. Now keep an additional counter for queuing and one for sending. MFC after: 1 week	2011-01-28 20:49:15 +00:00
Michael Tuexen	f8cdf87663	* Use 300 ms as the default for RTO_MIN. * Disable burst mitigation by default. * Remove unused constant. Discussed with rrs. MFC after: 3 months.	2011-01-26 21:38:17 +00:00
Michael Tuexen	507c72969d	Make SCTP_MAX_BURST compliant with the latest version of the socket API ID. This is not compatible with the API in stable/8.	2011-01-26 19:55:54 +00:00
Michael Tuexen	90fed1d88e	Change infrastructure for SCTP_MAX_BURST to allow compliance with the latest socket API ID. Especially it can be disabled. Full compliance needs changing the structure used in the socket option. Since this breaks the API, it will be a seperate commit which will not be MFCed to stable/8. MFC after: 3 months.	2011-01-26 19:49:03 +00:00
Daniel Eischen	e691be70f9	Prison check addresses set with multicast interface options. Reviewed by: bz MFC after: 1 week	2011-01-26 17:31:03 +00:00
Andrew Thompson	965615476e	When matching an incoming ARP against a bridge, ensure both interfaces belong to the same bridge. Submitted by: Alexander Zagrebin	2011-01-25 17:15:23 +00:00
Lawrence Stewart	050570efa7	Import the ERTT (Enhanced Round Trip Time) Khelp module. ERTT uses the Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low noise estimate of the instantaneous RTT. ERTT's implementation is robust even in the face of delayed acknowledgements and/or TSO being in use for a connection. A high quality, low noise RTT estimate is a requirement for applications such as delay-based congestion control, for which we will be importing some algorithm implementations shortly. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-01-24 23:08:38 +00:00
Michael Tuexen	f7a77f6fd3	Add stream scheduling support. This work is based on a patch received from Robin Seggelmann. MFC after: 3 months.	2011-01-23 19:36:28 +00:00
Lawrence Stewart	a66ac850d7	An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a write to the buffer causes it to overflow. We therefore can't hold the CC list rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND. Switch to a fixed length sbuf which should be of sufficient size except in the very unlikely event that the sysctl is being processed as one or more new algorithms are loaded. If that happens, we accept the race and may fail the sysctl gracefully if there is insufficient room to print the names of all the algorithms. This should address a WITNESS warning and the potential panic that would occur if the sbuf call to malloc did sleep whilst holding the CC list rwlock. Sponsored by: FreeBSD Foundation Reported by: Nick Hibma Reviewed by: bz MFC after: 3 weeks X-MFC with: r215166	2011-01-23 13:00:25 +00:00
Michael Tuexen	afb048b8ef	Remove unnecessary checking of variable. MFC after: 3 months.	2011-01-23 07:27:35 +00:00
Lawrence Stewart	47f44cdd93	Some correctness and robustness fixes related to CUBIC's mean RTT estimate: - The mean RTT is updated at the end of each congestion epoch, but if we switch to congestion avoidance within the first epoch (e.g. if ssthresh was primed from the hostcache), we'll trigger a divide by zero panic in cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the mean is less than the min to ensure we have a sane mean for use in this situation. This fixes the panic reported by Nick Hibma. - Adjust conditions under which we update the mean RTT in cubic_post_recovery() to ensure a low latency path won't yield an RTT of less than 1. This avoids another potential divide by zero panic when running CUBIC in networks with sub-millisecond latencies. - Remove the "safety" assignment of min into mean when we don't update the mean because of failed conditions. The above change to the conditions for updating the mean ensures the safety issue is addressed and I feel it is better to keep our previous mean estimate around if we can't update than to revert to the min. - Initialise the mean RTT to 1 on connection startup to act as a safety belt if a situation we haven't considered and addressed with the above changes were to crop up in the wild. Sponsored by: FreeBSD Foundation Reported and tested by: Nick Hibma Discussed with: David Hayes <dahayes at swin edu au> MFC after: 5 weeks X-MFC with: r216114	2011-01-21 05:19:47 +00:00

... 3 4 5 6 7 ...

4433 Commits