freebsd-skq

Author	SHA1	Message	Date
Gleb Smirnoff	b365d954cc	Remove last remnants of classful addressing: - Remove ia_net, ia_netmask, ia_netbroadcast from struct in_ifaddr. - Remove net.inet.ip.subnetsarelocal, I bet no one need it in 2011. - fix bug when we were not forwarding to a host which matches classful net address. For example router having 192.168.x.y/16 network attached, would not forward traffic to 192.168.*.0, which are legal IPs in CIDR world. - For compatibility, leave autoguessing of mask based on class. Reviewed by: andre, bz, rwatson	2011-10-15 16:28:06 +00:00
Gleb Smirnoff	2a2e6f0aeb	Never switch directly from INIT to MASTER, since this produces nasty status flaps. PR: kern/161123 Submitted by: Damien Fleuriot <dam my.gd> OpenBSD: ip_carp.c, rev. 1.115	2011-10-14 19:05:26 +00:00
Gleb Smirnoff	a0b5928b29	De-spl(9).	2011-10-13 13:30:41 +00:00
Navdeep Parhar	aa4b09c5c7	Make sure the inp wasn't dropped when rexmt let go of the inp and pcbinfo locks. Reviewed by: andre@ MFC after: 7 days	2011-10-12 19:52:23 +00:00
Michael Tuexen	7906f59a29	Use the most significant 6 bits of the dscp instead of the least significant ones. This has changed in the latest version of the socket API ID and provides backwards compatibility and gets it in syn with the usage of the IP_TOS socket option. MFC after: 3 days.	2011-10-11 13:24:37 +00:00
Qing Li	15d2521975	All indirect routes will fail the rtcheck, except for a special host route where the destination IP and the gateway IP is the same. This special case handling is only meant for backward compatibility reason. The last commit introduced a bug in the route check logic, where a valid special case is treated as an error. This patch fixes that bug along with some code cleanup. Suggested by: gleb Reviewed by: kmacy, discussed with gleb MFC after: 1 day	2011-10-10 17:41:11 +00:00
Michael Tuexen	69c59f8ba2	Get struct sctp_net_route in tune with struct route. struct route was changed in http://svn.freebsd.org/changeset/base/225698 and since then SCTP support was broken. This needs to be MFCed to stable/9 to unbreak SCTP support in 9.0 MFC after: 3 days.	2011-10-10 16:31:18 +00:00
Michael Tuexen	3d2443cc84	When moving an stcb to a new inp and we copy over the list of bound addresses, update the last used address pointer. If not, it might result in a crash if the old inp goes away. MFC after: 3 days.	2011-10-10 12:28:47 +00:00
Michael Tuexen	629749b60c	Update the inp stored in a HB-timer when moving an stcb to a new inp. Use only this stored inp when processing a HB timeout. This fixes a bug which results in a crash. MFC after: 3 days.	2011-10-09 14:12:17 +00:00
Qing Li	6703e7ea10	Do not try removing an ARP entry associated with a given interface address if that interface does not support ARP. Otherwise the system will generate error messages unnecessarily due to the missing entry. PR: kern/159602 Submitted by: pluknet MFC after: 3 days	2011-10-07 22:22:19 +00:00
Qing Li	41b210c6f6	Remove the reference held on the loopback route when the interface address is being deleted. Only the last reference holder deletes the loopback route. All other delete operations just clear the IFA_RTSELF flag. PR: kern/159601 Submitted by: pluknet Reviewed by: discussed on net@ MFC after: 3 days	2011-10-07 18:01:34 +00:00
Andre Oppermann	1593dcd025	Prevent TCP sessions from stalling indefinitely in reassembly when reaching the zone limit of reassembly queue entries. When the zone limit was reached not even the missing segment that would complete the sequence space could be processed preventing the TCP session forever from making any further progress. Solve this deadlock by using a temporary on-stack queue entry for the missing segment followed by an immediate dequeue again by delivering the contiguous sequence space to the socket. Add logging under net.inet.tcp.log_debug for reassembly queue issues. Reviewed by: lsteward (previous version) Tested by: Steven Hartland <killing-at-multiplay.co.uk> MFC after: 3 days	2011-10-07 16:39:03 +00:00
Andre Oppermann	50b1479e65	Add back the IP header length to the total packet length field on raw IP sockets. It was deducted in ip_input() in preparation for protocols interested only in the payload. On raw sockets the IP header should be delivered as it at came in from the network except for the byte order swaps in some fields. This brings us in line with all other OS'es that provide raw IP sockets. Reported by: Matthew Cini Sarreo <mcins1-at-gmail.com> MFC after: 3 days	2011-10-07 13:43:01 +00:00
Attilio Rao	4af309c810	For the INP_TIMEWAIT case, there is no valid tcpcb object tied to the inpcb object. Skip the TCP_SIGNATURE check in that case as it is consistent with the output path (no TCP_SIGNATURE for outcoming packets in TIMEWAIT state) and also because for TIMEWAIT state the verify may be less effective. Sponsored by: Sandvine Incorporated Reported by: rwatson No objections by: rwatson MFC after: 3 days	2011-10-06 14:29:38 +00:00
Qing Li	db92413e6a	A system may have multiple physical interfaces, all of which are on the same prefix. Since a single route entry is installed for the prefix (without RADIX_MPATH), incoming packets on the interfaces that are not associated with the prefix route may trigger an error message about unable to allocation LLE entry, and fails L2. This patch makes sure a valid route is present in the system, and allow the aforementioned condition to exist and treats as valid. Reviewed by: bz MFC after: 5 days	2011-10-03 19:51:18 +00:00
Qing Li	6cf8e3300e	This patch allows ARP to work properly in the presence of self-referencing routes. This patch is a rework of r223862. Reviewed by: bz, zec MFC after: 5 days	2011-10-03 19:06:55 +00:00
Bjoern A. Zeeb	75e54d6017	Unbreak no-ip and no-inet6 module builds with ipfw. For now continue to build the ip_fw_pfil.c hooks and ipfw even in case of no-ip under the assumption that the private L2 hook (which hopefully eventually will be a pfil hook as well) can still be useful. Allow building the module without inet as well. Glanced at by: jhb MFC after: 3 days	2011-09-27 13:27:17 +00:00
Michael Tuexen	87eac1ceb9	Cleanup the iterator code, remove code that is never executed. Approved by: re MFC after: 1 month.	2011-09-19 21:47:20 +00:00
Michael Tuexen	80c79bbe7a	Fix the enabling/disabling of Heartbeats and path MTU discovery when using the SCTP_PEER_ADDR_PARAMS socket option. Approved by: re MFC after: 1 month.	2011-09-17 08:50:29 +00:00
Michael Tuexen	3657c405e3	Fix a typo introduced in http://svn.freebsd.org/changeset/base/225571 Reported by Ilya A. Arkhipov. Approved by: re MFC after: 1 month.	2011-09-15 12:20:52 +00:00
Michael Tuexen	92776dfd5a	Make sure that SCTP rejects broadcast, multicast and wildcard addresses as remote addresses. Approved by: re MFC after: 1 month.	2011-09-15 08:49:54 +00:00
Michael Tuexen	c55b70cef6	Ensure that 1-to-1 style SCTP sockets can only be connected once. Allow implicit setup also for 1-to-1 style sockets as described in the latest version of the socket API ID. Approved by: re MFC after: 1 month	2011-09-14 19:10:13 +00:00
Michael Tuexen	58bdb69150	Fix the handling of the flowlabel and DSCP value in the SCTP_PEER_ADDR_PARAMS socket option. Honor the net.inet6.ip6.auto_flowlabel sysctl setting. Approved by: re (bz) MFC after: 1 month.	2011-09-14 08:15:21 +00:00
John Baldwin	5bb3652f05	Allow the ipfw.ko module built with a kernel to honor any IPFIREWALL_* options defined in the kernel config. This more closely matches the behavior of other modules which inherit configuration settings from the kernel configuration during a kernel + modules build. Reviewed by: luigi Approved by: re (kib) MFC after: 1 week	2011-09-12 21:09:56 +00:00
Michael Tuexen	e4f820b3c6	Improve implementation of the Nagle algorithm for SCTP: Don't delay the final fragment of a fragmented user message. Approved by: re MFC after: 4 weeks	2011-09-09 13:52:37 +00:00
Qing Li	1184509858	When an interface address route is removed from the system, another route with the same prefix is searched for as a replacement. The current code did not bypass routes that have non-operational interfaces. This patch fixes that bug and will find a replacement route with an active interface. PR: kern/159603 Submitted by: pluknet, ambrisko at ambrisko dot com Reviewed by: discussed on net@ Approved by: re (bz) MFC after: 3 days	2011-08-28 00:14:40 +00:00
Bjoern A. Zeeb	b233773bb9	Increase the defaults for the maximum socket buffer limit, and the maximum TCP send and receive buffer limits from 256kB to 2MB. For sb_max_adj we need to add the cast as already used in the sysctl handler to not overflow the type doing the maths. Note that this is just the defaults. They will allow more memory to be consumed per socket/connection if needed but not change the default "idle" memory consumption. All values are still tunable by sysctls. Suggested by: gnn Discussed on: arch (Mar and Aug 2011) MFC after: 3 weeks Approved by: re (kib)	2011-08-25 09:20:13 +00:00
Bjoern A. Zeeb	6f69742441	Fix compilation in case of defined(INET) && defined(IPFIREWALL_FORWARD) but no INET6. Reported by: avg Tested by: avg MFC after: 4 weeks X-MFC with: r225044 Approved by: re (kib)	2011-08-20 18:45:38 +00:00
Bjoern A. Zeeb	8a006adb24	Add support for IPv6 to ipfw fwd: Distinguish IPv4 and IPv6 addresses and optional port numbers in user space to set the option for the correct protocol family. Add support in the kernel for carrying the new IPv6 destination address and port. Add support to TCP and UDP for IPv6 and fix UDP IPv4 to not change the address in the IP header. Add support for IPv6 forwarding to a non-local destination. Add a regession test uitilizing VIMAGE to check all 20 possible combinations I could think of. Obtained from: David Dolson at Sandvine Incorporated (original version for ipfw fwd IPv6 support) Sponsored by: Sandvine Incorporated PR: bin/117214 MFC after: 4 weeks Approved by: re (kib)	2011-08-20 17:05:11 +00:00
Bjoern A. Zeeb	f76fdd221b	Hide IPv6 next header parsing warnings under the verbose sysctl so people can possibly disable it when their consoles are flooded, or enabled it for debugging. MFC after: 2 weeks Approved by: re (kib)	2011-08-20 14:20:36 +00:00
Bjoern A. Zeeb	0c4dbd5af7	After r225032 fix logging in a similar way masking the the IPv6 more fragments flag off so that offset == 0 checks work properly. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks X-MFC with: r225032 Approved by: re (kib)	2011-08-20 13:47:08 +00:00
Bjoern A. Zeeb	49239b28da	If we detect an IPv6 fragment header and it is not the first fragment, then terminate the loop as we will not find any further headers and for short fragments this could otherwise lead to a pullup error discarding the fragment. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)	2011-08-20 13:46:19 +00:00
Bjoern A. Zeeb	720fee0674	ipfw internally checks for offset == 0 to determine whether the packet is a/the first fragment or not. For IPv6 we have added the "more fragments" flag as well to be able to determine on whether there will be more as we do not have the fragment header avaialble for logging, while for IPv4 this information can be derived directly from the IPv4 header. This allowed fragmented packets to bypass normal rules as proper masking was not done when checking offset. Split variables to not need masking for IPv6 to avoid further errors. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)	2011-08-20 13:17:47 +00:00
Bjoern A. Zeeb	391255b8a4	While not explicitly allowed by RFC 2460, in case there is no translation technology involved (and that section is suggested to be removed by Errata 2843), single packet fragments do not harm. There is another errata under discussion to clarify and allow this. Meanwhile add a sysctl to allow disabling this behaviour again. We will treat single packet fragment (a fragment header added when not needed) as if there was no fragment header. PR: kern/145733 Submitted by: Matthew Luckie (mjl luckie.org.nz) (original version) Tested by: Matthew Luckie (mjl luckie.org.nz) MFC after: 2 weeks Approved by: re (kib)	2011-08-20 12:40:17 +00:00
Michael Tuexen	3900c0936f	Fix the handling of [gs]etsockopt() unconnected 1-to-1 style sockets. While there: * Fix a locking issue in setsockopt() of SCTP_CMT_ON_OFF. * Fix a bug in setsockopt() of SCTP_DEFAULT_PRINFO, where the pr_value was ignored. Approved by: re@ MFC after: 2 months.	2011-08-16 21:04:18 +00:00
Michael Tuexen	b10f2dc889	Add support for the spp_dscp field in the SCTP_PEER_ADDR_PARAMS socket option. Backwards compatibility is provided by still supporting the spp_ipv4_tos field. Approved by: re@ MFC after: 2 months.	2011-08-14 20:55:32 +00:00
Kevin Lo	7236660627	If RTF_HOST flag is specified, then we are interested in destination address. PR: kern/159600 Submitted by: Svatopluk Kraus <onwahe at gmail dot com> Approved by: re (hrs)	2011-08-10 06:17:06 +00:00
Michael Tuexen	ca85e9482a	The result of a joint work between rrs@ and myself at the IETF: * Decouple the path supervision using a separate HB timer per path. * Add support for potentially failed state. * Bring back RTO.min to 1 second. * Accept packets on IP-addresses already announced via an ASCONF * While there: do some cleanups. Approved by: re@ MFC after: 2 months.	2011-08-03 20:21:00 +00:00
Gleb Smirnoff	217e3abc03	Add missing break; in r223593. Submitted by: sem Pointy hat to: glebius Approved by: re (kib)	2011-08-01 13:41:38 +00:00
Bjoern A. Zeeb	d9a362862c	Add spares to the network stack for FreeBSD-9: - TCP keep* timers - TCP UTO (adjust from what was there already) - netmap - route caching - user cookie (temporary to allow for the real fix) Slightly re-shuffle struct ifnet moving fields out of the middle of spares and to better align. Discussed with: rwatson (slightly earlier version)	2011-07-17 21:15:20 +00:00
Bjoern A. Zeeb	dceced71fb	Unbreak no-INET kernels after r223839 adding the needed #ifdef INET. MFC after: 4 weeks	2011-07-14 13:44:48 +00:00
Michael Tuexen	1a3b5ce2b9	Don't check for SOCK_DGRAM anymore. Also remove multicast related code which is not necessary anymore.	2011-07-12 20:14:03 +00:00
Michael Tuexen	78d9a31d3a	The socket API only specifies SCTP for SOCK_SEQPACKET and SOCK_STREAM, but not SOCK_DGRAM. So don't register it for SOCK_DGRAM. While there, fix some indentation.	2011-07-12 19:29:29 +00:00
Marko Zec	13e255fab7	Permit ARP to proceed for IPv4 host routes for which the gateway is the same as the host address. This already works fine for INET6 and ND6. While here, remove two function pointers from struct lltable which are only initialized but never used. MFC after: 3 days	2011-07-08 09:38:33 +00:00
Andrey V. Elsukov	4659e09dcb	Add again the checking for log_arp_permanent_modify that was by accident removed in the r186119. PR: kern/154831 MFC after: 1 week	2011-07-07 11:59:51 +00:00
Andre Oppermann	1c6e7fa7f1	Remove the TCP_SORECEIVE_STREAM compile time option. The use of soreceive_stream() for TCP still has to be enabled with the loader tuneable net.inet.tcp.soreceive_stream. Suggested by: trociny and others	2011-07-07 10:37:14 +00:00
Colin Percival	472ea5befb	Remove #ifdef notyet code dating back to 4.3BSD Net/2 (and possibly earlier). I think the benefit of making the code cleaner and easier to understand outweighs the humour of leaving this intact (or possibly changing it to #ifdef not_yet_and_probably_never). MFC after: 2 weeks	2011-07-05 18:49:55 +00:00
Colin Percival	ca7122622b	Don't allow lro->len to exceed 65535, as this will result in overflow when len is inserted back into the synthetic IP packet and cause a multiple of 2^16 bytes of TCP "packet loss". This improves Linux->FreeBSD netperf bandwidth by a factor of 300 in testing on Amazon EC2. Reviewed by: jfv MFC after: 2 weeks	2011-07-05 18:43:54 +00:00
Glen Barber	ff19f85d50	- General grammar and mdoc(7) fixes. [1] [2] - While here, remove a paragraph about userspace operation that has been outdated for some time. [2] PR: 158623 Submitted by: Ben Kudak (kaduk % mit!edu) [1] Reviewed by: glebius [2] MFC after: 1 week	2011-07-04 23:00:26 +00:00
Ermal Luçi	e6c90582c7	pf(4) tags now store the state key but tcp_respond tries to reuse a mbuf as an optimization. This makes pf find the wrong state and cause errors reported with state mismatches. Clear the cached state link on the pf(4) tag to avoid the state mismatches. Approved by: bz	2011-07-04 17:43:04 +00:00
Andrey V. Elsukov	2303570fe8	ARP code reuses mbuf from ARP request to make a reply, but it does not reset rcvif to NULL. Since rcvif is not NULL, ipfw(4) supposes that ARP replies were received on specified interface. Reset rcvif to NULL for ARP replies to fix this issue. PR: kern/131817 Reviewed by: glebius MFC after: 1 month	2011-07-04 05:47:48 +00:00
Michael Tuexen	b845acda75	Add the missing sca_keylength field to the sctp_authkey structure, which is used the the SCTP_AUTH_KEY socket option. MFC after: 1 month.	2011-06-30 16:56:55 +00:00
Andrey V. Elsukov	9527ec6e52	Add new rule actions "call" and "return" to ipfw. They make possible to organize subroutines with rules. The "call" action saves the current rule number in the internal stack and rules processing continues from the first rule with specified number (similar to skipto action). If later a rule with "return" action is encountered, the processing returns to the first rule with number of "call" rule saved in the stack plus one or higher. Submitted by: Vadim Goncharov Discussed by: ipfw@, luigi@	2011-06-29 10:06:58 +00:00
Bjoern A. Zeeb	e0bfbfce79	Update packet filter (pf) code to OpenBSD 4.5. You need to update userland (world and ports) tools to be in sync with the kernel. Submitted by: mlaier Submitted by: eri	2011-06-28 11:57:25 +00:00
Michael Tuexen	3c4401ecab	Add support for SCTP_PR_SCTP_NONE which I misded to add. This constant is defined in the socket API ID. MFC after: 2 months.	2011-06-27 22:03:33 +00:00
Gleb Smirnoff	812f1d32e7	Add possibility to pass IPv6 packets to a divert(4) socket. Submitted by: sem	2011-06-27 12:21:11 +00:00
Andrey V. Elsukov	0511675327	Export AddLink() function from libalias. It can be used when custom alias address needs to be specified. Add inbound handler to the alias_ftp module. It helps handle active FTP transfer mode for the case with external clients and FTP server behind NAT. Fix passive FTP transfer case for server behind NAT using redirect with external IP address different from NAT ip address. PR: kern/157957 Submitted by: Alexander V. Chernikov	2011-06-22 20:00:27 +00:00
Andrey V. Elsukov	62b6e03adf	Document PKT_ALIAS_SKIP_GLOBAL option. Submitted by: Alexander V. Chernikov	2011-06-22 09:55:28 +00:00
Andrey V. Elsukov	bb3dd40974	Do not use SET_HOST_IPLEN() macro for IPv6 packets. PR: kern/157239 MFC after: 2 weeks	2011-06-21 06:06:47 +00:00
Bjoern A. Zeeb	75497cc5eb	Fix a KASSERT from r212803 to check the correct length also in case of IPsec being compiled in and used. Improve reporting by adding the length fields to the panic message, so that we would have some immediate debugging hints. Discussed with: jhb	2011-06-20 07:07:18 +00:00
Bjoern A. Zeeb	f404863979	Remove a these days incorrect comment left from before new-arp. MFC after: 1 week	2011-06-18 13:54:36 +00:00
Michael Tuexen	6037f89c81	Add SCTP_DEFAULT_PRINFO socket option. Fix the SCTP_DEFAULT_SNDINFO socket option: Don't clear the PR SCTP policy when setting sinfo_flags. MFC after: 1 month.	2011-06-16 21:12:36 +00:00
Michael Tuexen	0b064106dd	* Fix the handling of addresses in sctp_sendv(). * Add support for SCTP_SENDV_NOINFO. * Improve the error handling of sctp_sendv() and sctp_recv(). MFC after: 1 month	2011-06-16 15:36:09 +00:00
Michael Tuexen	e2e7c62edc	Add support for the newly added SCTP API. In particular add support for: * SCTP_SNDINFO, SCTP_PRINFO, SCTP_AUTHINFO, SCTP_DSTADDRV4, and SCTP_DSTADDRV6 cmsgs. * SCTP_NXTINFO and SCTP_RCVINFO cmgs. * SCTP_EVENT, SCTP_RECVRCVINFO, SCTP_RECVNXTINFO and SCTP_DEFAULT_SNDINFO socket option. * Special association ids (SCTP_FUTURE_ASSOC, ...) * sctp_recvv() and sctp_sendv() functions. MFC after: 1 month.	2011-06-15 23:50:27 +00:00
Andrey V. Elsukov	1875bbfe54	Implement "global" mode for ipfw nat. It is similar to natd(8) "globalport" option for multiple NAT instances. If ipfw rule contains "global" keyword instead of nat_number, then for each outgoing packet ipfw_nat looks up translation state in all configured nat instances. If an entry is found, packet aliased according to that entry, otherwise packet is passed unchanged. User can specify "skip_global" option in NAT configuration to exclude an instance from the lookup in global mode. PR: kern/157867 Submitted by: Alexander V. Chernikov (previous version) Tested by: Eugene Grosbein	2011-06-14 13:35:24 +00:00
Andrey V. Elsukov	81a654646e	Sort alias mode flags in the increasing order.	2011-06-14 12:06:38 +00:00
Andrey V. Elsukov	3265f69ce6	Add IPv6 support to the ipfw uid/gid check. Pass an ip_fw_args structure to the check_uidgid() function, since it contains all needed arguments and also pointer to mbuf and now it is possible use in_pcblookup_mbuf() function. Since i can not test it for the non-FreeBSD case, i keep this ifdef unchanged. Tested by: Alexander V. Chernikov MFC after: 3 weeks	2011-06-14 07:20:16 +00:00
John Baldwin	6b7c15e580	Advance the advertised window (rcv_adv) to the currently received data (rcv_nxt) if we advertising a zero window. This can be true when ACK'ing a window probe whose one byte payload was accepted rather than dropped because the socket's receive buffer was not completely full, but the remaining space was smaller than the window scale. This ensures that window probe ACKs satisfy the assumption made in r221346 and closes a window where rcv_nxt could be greater than rcv_adv. Tested by: trasz, pho, trociny Reviewed by: silby MFC after: 1 week	2011-06-13 15:38:31 +00:00
Bjoern A. Zeeb	ffe8cd7b10	Correct comments and debug logging in ipsec to better match reality. MFC after: 3 days	2011-06-08 03:02:11 +00:00
Andrey V. Elsukov	56e38090a4	Fix indentation.	2011-06-07 06:57:22 +00:00
Andrey V. Elsukov	bd853db48c	Make a behaviour of the libalias based in-kernel NAT a bit closer to how natd(8) does work. natd(8) drops packets only when libalias returns PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat always did drop packets that were not aliased, even if they should not be aliased and just are going through. PR: kern/122109, kern/129093, kern/157379 Submitted by: Alexander V. Chernikov (previous version) MFC after: 1 month	2011-06-07 06:42:29 +00:00
Bjoern A. Zeeb	1417604e70	Unbreak kernels with non-default PCBGROUP included but no WITNESS. Rather than including lock.h in in_pcbgroup.c in right order, fix it for all consumers of in_pcb.h by further header file pollution under #ifdef KERNEL. Reported by: Pan Tsu (inyaoo gmail.com)	2011-06-06 21:45:32 +00:00
Robert Watson	52cd27cb58	Implement a CPU-affine TCP and UDP connection lookup data structure, struct inpcbgroup. pcbgroups, or "connection groups", supplement the existing inpcbinfo connection hash table, which when pcbgroups are enabled, might now be thought of more usefully as a per-protocol 4-tuple reservation table. Connections are assigned to connection groups base on a hash of their 4-tuple; wildcard sockets require special handling, and are members of all connection groups. During a connection lookup, a per-connection group lock is employed rather than the global pcbinfo lock. By aligning connection groups with input path processing, connection groups take on an effective CPU affinity, especially when aligned with RSS work placement (see a forthcoming commit for details). This eliminates cache line migration associated with global, protocol-layer data structures in steady state TCP and UDP processing (with the exception of protocol-layer statistics; further commit to follow). Elements of this approach were inspired by Willman, Rixner, and Cox's 2006 USENIX paper, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems". However, there are also significant differences: we maintain the inpcb lock, rather than using the connection group lock for per-connection state. Likewise, the focus of this implementation is alignment with NIC packet distribution strategies such as RSS, rather than pure software strategies. Despite that focus, software distribution is supported through the parallel netisr implementation, and works well in configurations where the number of hardware threads is greater than the number of NIC input queues, such as in the RMI XLR threaded MIPS architecture. Another important difference is the continued maintenance of existing hash tables as "reservation tables" -- these are useful both to distinguish the resource allocation aspect of protocol name management and the more common-case lookup aspect. In configurations where connection tables are aligned with hardware hashes, it is desirable to use the traditional lookup tables for loopback or encapsulated traffic rather than take the expense of hardware hashes that are hard to implement efficiently in software (such as RSS Toeplitz). Connection group support is enabled by compiling "options PCBGROUP" into your kernel configuration; for the time being, this is an experimental feature, and hence is not enabled by default. Subject to the limited MFCability of change dependencies in inpcb, and its change to the inpcbinfo init function signature, this change in principle could be merged to FreeBSD 8.x. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-06 12:55:02 +00:00
Andrey V. Elsukov	1e587bfa32	Do not return EINVAL when user does `ipfw set N flush` on an empty set. MFC after: 2 weeks	2011-06-06 10:39:38 +00:00
Hiroki Sato	db82af41db	- Implement RDNSS and DNSSL options (RFC 6106, IPv6 Router Advertisement Options for DNS Configuration) into rtadvd(8) and rtsold(8). DNS information received by rtsold(8) will go to resolv.conf(5) by resolvconf(8) script. This is based on work by J.R. Oldroyd (kern/156259) but revised extensively[1]. - rtadvd(8) now supports "noifprefix" to disable gathering on-link prefixes from interfaces when no "addr" is specified[2]. An entry in rtadvd.conf with "noifprefix" + no "addr" generates an RA message with no prefix information option. - rtadvd(8) now supports RTM_IFANNOUNCE message to fix crashes when an interface is added or removed. - Correct bogus ND_OPT_ROUTE_INFO value to one in RFC 4191. Reviewed by: bz[1] PR: kern/156259 [1] PR: bin/152458 [2]	2011-06-06 03:06:43 +00:00
Robert Watson	d3c1f00350	Add _mbuf() variants of various inpcb-related interfaces, including lookup, hash install, etc. For now, these are arguments are unused, but as we add RSS support, we will want to use hashes extracted from mbufs, rather than manually calculated hashes of header fields, due to the expensive of the software version of Toeplitz (and similar hashes). Add notes that it would be nice to be able to pass mbufs into lookup routines in pf(4), optimising firewall lookup in the same way, but the code structure there doesn't facilitate that currently. (In principle there is no reason this couldn't be MFCed -- the change extends rather than modifies the KBI. However, it won't be useful without other previous possibly less MFCable changes.) Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-06-04 16:33:06 +00:00
Robert Watson	711b3dbd54	IP divert sockets use their inpcbinfo for port reservation, although not for lookup. I missed its call to in_pcbbind() when preparing previous patches, which would lead to a lock assertion failure (although problem not an actual race condition due to global pcbinfo locks providing required synchronisation -- in this particular case only). This change adds the missing locking of the pcbhash lock. (Existing comments in the ipdivert code question the need for using the global hash to manage the namespace, as really it's a simple port namespace and not an address/port namespace. Also, although in_pcbbind is used to manage reservations, the hash tables aren't used for lookup. It might be a good idea to make them use hashed lookup, or to use a different reservation scheme.) Reviewed by: bz Reported by: Kristof Provost <kristof at sigsegv.be> Sponsored by: Juniper Networks	2011-06-04 16:26:02 +00:00
Robert Watson	b598155a85	Do not leak the pcbinfohash lock in the case where in6_pcbladdr() returns an error during TCP connect(2) on an IPv6 socket. Submitted by: bz Sponsored by: Juniper Networks, Inc.	2011-06-02 10:21:05 +00:00
Andrey V. Elsukov	281d42c371	O_FORWARD_IP is only action which depends from the result of lookup of dynamic rules. We are doing forwarding in the following cases: o For the simple ipfw fwd rule, e.g. fwd 10.0.0.1 ip from any to any out xmit em0 fwd 127.0.0.1,3128 tcp from any to any 80 in recv em1 o For the dynamic fwd rule, e.g. fwd 192.168.0.1 tcp from any to 10.0.0.3 3333 setup keep-state When this rule triggers it creates a dynamic rule, but this dynamic rule should forward packets only in forward direction. o And the last case that does not work before - simple fwd rule which triggers when some dynamic rule is already executed. PR: kern/147720, kern/150798 MFC after: 1 month	2011-06-01 19:44:52 +00:00
Andrey V. Elsukov	88eb7833cb	Hide some debug messages under debug macro. MFC after: 1 week	2011-06-01 12:33:05 +00:00
Andrey V. Elsukov	e35a05d3e7	Hide useless warning under debug macro. PR: kern/69963 MFC after: 1 week	2011-06-01 12:05:35 +00:00
Bjoern A. Zeeb	d2025bd0f6	Unbreak NOINET kernels after r222488. Reviewed by: rwatson Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems! Pointy hat: to myself for missing this during review?	2011-05-30 18:07:35 +00:00
Robert Watson	fa046d8774	Decompose the current single inpcbinfo lock into two locks: - The existing ipi_lock continues to protect the global inpcb list and inpcb counter. This lock is now relegated to a small number of allocation and free operations, and occasional operations that walk all connections (including, awkwardly, certain UDP multicast receive operations -- something to revisit). - A new ipi_hash_lock protects the two inpcbinfo hash tables for looking up connections and bound sockets, manipulated using new INP_HASH_*() macros. This lock, combined with inpcb locks, protects the 4-tuple address space. Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb connection locks, so may be acquired while manipulating a connection on which a lock is already held, avoiding the need to acquire the inpcbinfo lock preemptively when a binding change might later be required. As a result, however, lookup operations necessarily go through a reference acquire while holding the lookup lock, later acquiring an inpcb lock -- if required. A new function in_pcblookup() looks up connections, and accepts flags indicating how to return the inpcb. Due to lock order changes, callers no longer need acquire locks before performing a lookup: the lookup routine will acquire the ipi_hash_lock as needed. In the future, it will also be able to use alternative lookup and locking strategies transparently to callers, such as pcbgroup lookup. New lookup flags are, supplementing the existing INPLOOKUP_WILDCARD flag: INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb Callers must pass exactly one of these flags (for the time being). Some notes: - All protocols are updated to work within the new regime; especially, TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely eliminated, and global hash lock hold times are dramatically reduced compared to previous locking. - The TCP syncache still relies on the pcbinfo lock, something that we may want to revisit. - Support for reverting to the FreeBSD 7.x locking strategy in TCP input is no longer available -- hash lookup locks are now held only very briefly during inpcb lookup, rather than for potentially extended periods. However, the pcbinfo ipi_lock will still be acquired if a connection state might change such that a connection is added or removed. - Raw IP sockets continue to use the pcbinfo ipi_lock for protection, due to maintaining their own hash tables. - The interface in6_pcblookup_hash_locked() is maintained, which allows callers to acquire hash locks and perform one or more lookups atomically with 4-tuple allocation: this is required only for TCPv6, as there is no in6_pcbconnect_setup(), which there should be. - UDPv6 locking remains significantly more conservative than UDPv4 locking, which relates to source address selection. This needs attention, as it likely significantly reduces parallelism in this code for multithreaded socket use (such as in BIND). - In the UDPv4 and UDPv6 multicast cases, we need to revisit locking somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which is no longer sufficient. A second check once the inpcb lock is held should do the trick, keeping the general case from requiring the inpcb lock for every inpcb visited. - This work reminds us that we need to revisit locking of the v4/v6 flags, which may be accessed lock-free both before and after this change. - Right now, a single lock name is used for the pcbhash lock -- this is undesirable, and probably another argument is required to take care of this (or a char array name field in the pcbinfo?). This is not an MFC candidate for 8.x due to its impact on lookup and locking semantics. It's possible some of these issues could be worked around with compatibility wrappers, if necessary. Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-30 09:43:55 +00:00
Andrey V. Elsukov	d832ded1a1	Wrap long line. MFC after: 2 weeks	2011-05-30 05:53:00 +00:00
Andrey V. Elsukov	41b6083752	Add tablearg support for ipfw setfib. PR: kern/156410 MFC after: 2 weeks	2011-05-30 05:37:26 +00:00
Michael Tuexen	14cfa970bf	Get rid of unused functions. MFC after: 1 week.	2011-05-29 18:41:06 +00:00
Qing Li	92322284cd	Supply the LLE_STATIC flag bit to in_ifscurb() when scrubbing interface address so that proper clean up will take place in the routing code. This patch fixes the bootp panic on startup problem. Also, added more error handling and logging code in function in_scrubprefix(). MFC after: 5 days	2011-05-29 02:21:35 +00:00
Bjoern A. Zeeb	8d5a3ca77b	Add FEATURE() definitions for IPv4 and IPv6 so that we can use feature_present(3) to dynamically decide whether to use one or the other family. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 10 days	2011-05-25 00:34:25 +00:00
Robert Watson	61401ec2de	An inpcb lock is no longer required in in_pcbref() since the move to refcount(9). MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-05-24 13:08:59 +00:00
Robert Watson	79bdc6e5d3	Continue to refine inpcb reference counting and locking, in preparation for reworking of inpcbinfo locking: (1) Convert inpcb reference counting from manually manipulated integers to the refcount(9) KPI. This allows the refcount to be managed atomically with an inpcb read lock rather than write lock, or even with no inpcb lock at all. As a result, in_pcbref() also no longer requires an inpcb lock, so can be performed solely using the lock used to look up an inpcb. (2) Shift more inpcb freeing activity from the in_pcbrele() context (via in_pcbfree_internal) to the explicit in_pcbfree() context. This means that the inpcb refcount is increasingly used only to maintain memory stability, not actually defer the clean up of inpcb protocol parts. This is desirable as many of those protocol parts required the pcbinfo lock, which we'd like not to acquire in in_pcbrele() contexts. Document this in comments better. (3) Introduce new read-locked and write-locked in_pcbrele() variations, in_pcbrele_rlocked() and in_pcbrele_wlocked(), which allow the inpcb to be properly unlocked as needed. in_pcbrele() is a wrapper around the latter, and should probably go away at some point. This makes it easier to use this weak reference model when holding only a read lock, as will happen in the future. This may well be safe to MFC, but some more KBI analysis is required. Reviewed by: bz MFC after: 3 weeks Sponsored by: Juniper Networks, Inc.	2011-05-23 19:32:02 +00:00
Robert Watson	68e0d7e06a	Move from passing a wildcard boolean to a general set up lookup flags into in_pcb_lport(), in_pcblookup_local(), and in_pcblookup_hash(), and similarly for IPv6 functions. In the future, we would like to support other flags relating to locking strategy. This change doesn't appear to modify the KBI in practice, as callers already passed in INPLOOKUP_WILDCARD rather than a simple boolean. MFC after: 3 weeks Reviewed by: bz Sponsored by: Juniper Networks, Inc.	2011-05-23 15:23:18 +00:00
Robert Watson	82a5be494a	A number of quite incremental refinements to struct inpcbinfo's definition: (1) Add a locking guide for inpcbinfo. (2) Annotate inpcbinfo fields with synchronisation information; not all annotations are 100% satisfactory. (3) Reorder inpcbinfo fields so that the lock is at the head of the structure, and close to fields it protects. (4) Sort fields that will eventually be hashlock/pcbgroup-related together even though they remain locked by ipi_lock for now. Reviewed by: bz Sponsored by: Juniper Networks X-MFC after: KBI analysis required	2011-05-23 13:51:57 +00:00
Qing Li	5b84dc789a	The statically configured (permanent) ARP entries are removed when an interface is brought down, even though the interface address is still valid. This patch maintains the permanent ARP entries as long as the interface address (having the same prefix as that of the ARP entries) is valid. Reviewed by: delphij MFC after: 5 days	2011-05-20 19:12:20 +00:00
Michael Tuexen	b7e08865e8	Unbreak INET-less build. Reported by bz@ MFC after: 1 week	2011-05-18 19:49:39 +00:00
Michael Tuexen	4f36da915f	Copy out the mtu when calling getsockopt() with SCTP_GET_PEER_ADDR_INFO. MFC after: 1 week.	2011-05-17 15:57:31 +00:00
Michael Tuexen	c954cac48b	Fix whitespacing. Reported by scf@ MFC after: 1 week.	2011-05-17 15:46:28 +00:00
Michael Tuexen	96f4bcfff2	Fix the source address selection for boundall sockets when sending INITs to a global IPv4 address having only private IPv4 address. Allow the usage of a private address and make sure that no other private address will be used by the association. Initial work was done by rrs@. MFC after: 1 week.	2011-05-14 18:22:14 +00:00
John Baldwin	5891ebd6cd	Oops, fix order of sequence numbers in KASSERT()'s to catch negative receive windows to match the labels in the panic message. Submitted by: trociny	2011-05-14 14:41:40 +00:00
Alexander Motin	bc7d18ae72	Refactor TCP ISN increment logic. Instead of firing callout at 100Hz to keep constant ISN growth rate, do the same directly inside tcp_new_isn(), taking into account how much time (ticks) passed since the last call. On my test systems this decreases idle interrupt rate from 140Hz to 70Hz.	2011-05-09 07:37:47 +00:00
Michael Tuexen	689e6a5fa3	Fix a locking issue showing up on Mac OS X when subscribing to authentication events. DTLS/SCTP renegotiations trigger the bug. MFC after: 2 weeks.	2011-05-08 09:11:59 +00:00
Michael Tuexen	936fc35bb3	Change the name of an internal structure, since the name is used by a structure of the (new) SCTP API. MFC after: 1 week.	2011-05-06 20:40:33 +00:00
Andrey V. Elsukov	318b735cc3	Convert delay parameter back to ms when reporting to user. PR: 156838 MFC after: 1 week	2011-05-06 07:13:34 +00:00
Michael Tuexen	c3d72c80d3	Implement Resource Pooling V2 and an MPTCP like congestion control. Based on a patch received from Martin Becke. MFC after: 2 weeks.	2011-05-04 21:27:05 +00:00
Michael Tuexen	274b0bd51d	Remove code with any effect.	2011-05-03 20:34:02 +00:00
Michael Tuexen	1d663b4658	Add a missing break. This bug was introduced in r221249. MFC after: 1 week	2011-05-03 20:32:21 +00:00
John Baldwin	f701e30d7f	Handle a rare edge case with nearly full TCP receive buffers. If a TCP buffer fills up causing the remote sender to enter into persist mode, but there is still room available in the receive buffer when a window probe arrives (either due to window scaling, or due to the local application very slowing draining data from the receive buffer), then the single byte of data in the window probe is accepted. However, this can cause rcv_nxt to be greater than rcv_adv. This condition will only last until the next ACK packet is pushed out via tcp_output(), and since the previous ACK advertised a zero window, the ACK should be pushed out while the TCP pcb is write-locked. During the window while rcv_nxt is greather than rcv_adv, a few places would compute the remaining receive window via rcv_adv - rcv_nxt. However, this value was then (uint32_t)-1. On a 64 bit machine this could expand to a positive 2^32 - 1 when cast to a long. In particular, when calculating the receive window in tcp_output(), the result would be that the receive window was computed as 2^32 - 1 resulting in advertising a far larger window to the remote peer than actually existed. Fix various places that compute the remaining receive window to either assert that it is not negative (i.e. rcv_nxt <= rcv_adv), or treat the window as full if rcv_nxt is greather than rcv_adv. Reviewed by: bz MFC after: 1 month	2011-05-02 21:05:52 +00:00
Michael Tuexen	ea5eba1157	Some more cleanups related to an kernel without INET. MFC after: 1 week	2011-05-02 15:53:00 +00:00
Bjoern A. Zeeb	29bd2010d4	Fix a mismerge from p4 in that in_localaddr() is not available without INET. Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 16:30:18 +00:00
Michael Tuexen	d085528d04	Remove some leftover debug code. MFC after: 1 week	2011-04-30 11:22:30 +00:00
Bjoern A. Zeeb	b287c6c70c	Make the TCP code compile without INET. Sort #includes and add #ifdef INETs. Add some comments at #endifs given more nestedness. To make the compiler happy, some default initializations were added in accordance with the style on the files. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:21:29 +00:00
Michael Tuexen	e6194c2ed4	Improve compilation of SCTP code without INET support. Some bugs where fixed while doing this: * ASCONF-ACK messages might use wrong port number when using IPv6. * Checking for additional addresses takes the correct address into account and also does not do more comparisons than necessary. This patch is based on one received from bz@ who was sponsored by The FreeBSD Foundation and iXsystems. MFC after: 1 week	2011-04-30 11:18:16 +00:00
Bjoern A. Zeeb	79288c112c	Make the UDP code compile without INET. Expose udp_usrreq.c to IPv6 only as well compiling out most functions adding or extending #ifdef INET coverage. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:17:00 +00:00
Bjoern A. Zeeb	67107f4594	Make the PCB code compile without INET support by adding #ifdef INETs and correcting few #includes. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-30 11:04:34 +00:00
John Baldwin	672dc4aea2	TCP reuses t_rxtshift to determine the backoff timer used for both the persist state and the retransmit timer. However, the code that implements "bad retransmit recovery" only checks t_rxtshift to see if an ACK has been received in during the first retransmit timeout window. As a result, if ticks has wrapped over to a negative value and a socket is in the persist state, it can incorrectly treat an ACK from the remote peer as a "bad retransmit recovery" and restore saved values such as snd_ssthresh and snd_cwnd. However, if the socket has never had a retransmit timeout, then these saved values will be zero, so snd_ssthresh and snd_cwnd will be set to 0. If the socket is in fast recovery (this can be caused by excessive duplicate ACKs such as those fixed by 220794), then each ACK that arrives triggers either NewReno or SACK partial ACK handling which clamps snd_cwnd to be no larger than snd_ssthresh. In effect, the socket's send window is permamently stuck at 0 even though the remote peer is advertising a much larger window and pending data is only sent via TCP window probes (so one byte every few seconds). Fix this by adding a new TCP pcb flag (TF_PREVVALID) that indicates that the various snd_*_prev fields in the pcb are valid and only perform "bad retransmit recovery" if this flag is set in the pcb. The flag is set on the first retransmit timeout that occurs and is cleared on subsequent retransmit timeouts or when entering the persist state. Reviewed by: bz MFC after: 2 weeks	2011-04-29 15:40:12 +00:00
Bjoern A. Zeeb	b8e463e644	MfP4 CH=192029: Expose ip_icmp.c to INET6 as well and only export badport_bandlim() along with the two sysctls in the non-INET case. The bandlim types work for all cases I reviewed in IPv6 as well and the sysctls are available as we export net.inet.* from in_proto.c. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:36:35 +00:00
Bjoern A. Zeeb	74e9dcf786	MfP4 CH=192004: Move ip_defttl to raw_ip.c where it is actually used. In an IPv6 only world we do not want to compile ip_input.c in for that and it is a shared default with INET6. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:32:27 +00:00
Bjoern A. Zeeb	a0ae8f04e8	Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-27 19:30:44 +00:00
Attilio Rao	2903309aca	Add the possibility to verify MD5 hash of incoming TCP packets. As long as this is a costy function, even when compiled in (along with the option TCP_SIGNATURE), it can be disabled via the net.inet.tcp.signature_verify_input sysctl. Sponsored by: Sandvine Incorporated Reviewed by: emaste, bz MFC after: 2 weeks	2011-04-25 17:13:40 +00:00
Bjoern A. Zeeb	acaeca65b3	Be less strict on includes than in r220746. We need in.h for both INET or INET6 as it holds all the IPPROTO_* definitions needed for the SYSCTL_NODE definitions. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 5 days	2011-04-25 16:36:16 +00:00
Gleb Smirnoff	acdef0460e	Use size_t for sopt_valsize. Submitted by: Brandon Gooch <jamesbrandongooch gmail.com>	2011-04-21 08:18:55 +00:00
Bjoern A. Zeeb	00c081e908	MFp4 CH=191760: When compiling out INET we still need the initialization routines as well as the tuning and montoring sysctls shared with IPv6. Move the two send/recvspace variables up from the middle of the file to ease compiling out the INET only code. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days	2011-04-20 08:03:22 +00:00
Bjoern A. Zeeb	aae49dd304	MFp4 CH=191470: Move the ipport_tick_callout and related functions from ip_input.c to in_pcb.c. The random source port allocation code has been merged and is now local to in_pcb.c only. Use a SYSINIT to get the callout started and no longer depend on initialization from the inet code, which would not work in an IPv6 only setup. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days	2011-04-20 08:00:29 +00:00
Bjoern A. Zeeb	ec4f97277f	MFp4 CH=191466: Move fw_one_pass to where it belongs: it is a property of ipfw, not of ip_input. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 3 days	2011-04-20 07:55:33 +00:00
Gleb Smirnoff	9d0a2ddf69	- Rewrite functions that copyin/out NAT configuration, so that they calculate required memory size dynamically. - Fix races on chain re-lock. - Introduce new field to ip_fw_chain - generation count. Now utilized only in the NAT configuration, but can be utilized wider in ipfw. - Get rid of NAT_BUF_LEN in ip_fw.h PR: kern/143653	2011-04-19 15:06:33 +00:00
Andrey V. Elsukov	e3665201f5	Add sysctl handlers for net.inet.ip.dummynet.hash_size, .pipe_byte_limit and .pipe_slot_limit oids to prevent to set incorrect values. MFC after: 2 weeks	2011-04-19 11:33:39 +00:00
Andrey V. Elsukov	8ad66025f6	ipdn_bound_var() functions is designed to bound a variable between specified minimum and maximum. In case when specified default value is out of bounds it does not work as expected and does not limit variable. Check that default value is in range and limit it if needed. Also bump max_hash_size value to 65536 to correspond with manual page. PR: kern/152887 MFC after: 2 weeks	2011-04-19 11:29:09 +00:00
Andrey V. Elsukov	3ab4af737d	Use M_WAITOK instead M_WAIT for malloc. Remove unneded checks. MFC after: 1 week	2011-04-19 05:59:37 +00:00
Gleb Smirnoff	ca47294ddf	LibAliasInit() should allocate memory with M_WAITOK flag. Modify it and its callers.	2011-04-18 20:07:08 +00:00
Gleb Smirnoff	d0e16e0d1e	Pullup up to TCP header length before matching against 'tcpopts'. PR: kern/156180 Reviewed by: luigi	2011-04-18 18:22:10 +00:00
John Baldwin	da84b2e6c5	When checking to see if a window update should be sent to the remote peer, don't force a window update if the window would not actually grow due to window scaling. Specifically, if the window scaling factor is larger than 2 * MSS, then after the local reader has drained 2 * MSS bytes from the socket, a window update can end up advertising the same window. If this happens, the supposed window update actually ends up being a duplicate ACK. This can result in an excessive number of duplicate ACKs when using a higher maximum socket buffer size. Reviewed by: bz MFC after: 1 month	2011-04-18 17:43:16 +00:00
Bjoern A. Zeeb	336d023b2e	Make in_proto.c dependent on either inet or inet6. While it does not provide any functionality for IPv6, it provides the sysctl nodes for net.inet.* that a lot of functionality shared between IPv4 and IPv6 depends on. We cannot change these anymore without breaking a lot of management and tuning. In case of IPv6 only, we compile out everything but the sysctl node declarations. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC After: 5 days	2011-04-17 16:35:16 +00:00
Edward Tomasz Napierala	79bb84fb15	Refactor udp_input(), moving calls to u_tun_func() into udp_append(). Obtained from: Wheel Systems Sp. z o.o. Reviewed by: bz@	2011-04-14 10:40:57 +00:00
Bjoern A. Zeeb	05b9d121aa	The mbuf_frag_size always was and is file local and not queried from base user space tools via kvm. Mark it static. MFC after: 3 days	2011-04-14 09:47:09 +00:00
Sergey Kandaurov	6bed196c35	Staticize malloc types. Approved by: lstewart MFC after: 1 week	2011-04-13 11:28:46 +00:00
Andrey V. Elsukov	9974d151ec	Restore previous behaviour - always match rule when we doing tagging, even when tag is already exists. Reported by: Vadim Goncharov MFC after: 1 week	2011-04-12 15:20:34 +00:00
Lawrence Stewart	891b8ed467	Use the full and proper company name for Swinburne University of Technology throughout the source tree. Requested by: Grenville Armitage, Director of CAIA at Swinburne University of Technology MFC after: 3 days	2011-04-12 08:13:18 +00:00
Jack F Vogel	c31aa19c53	Port of the LRO fix from mxge driver to the generic LRO code. Thanks to Andrew Gallatin for the change. MFC after: 7 days	2011-04-07 21:20:26 +00:00
Andrey V. Elsukov	a5620cc6c5	Fill up src_port and dst_port variables for SCTP over IPv4. PR: kern/153415 MFC after: 1 week	2011-03-31 16:30:14 +00:00
Andrey V. Elsukov	5600c92750	Fix malloc types. MFC after: 1 week	2011-03-31 15:11:12 +00:00
Andrey V. Elsukov	3d10d64fd3	Fix a memory leak. Memory that is allocated for schedulers hash table was not freed. PR: kern/156083 MFC after: 1 week	2011-03-31 15:10:41 +00:00
John Baldwin	766282cbe7	Clamp the initial advertised receive window when responding to a SYN/ACK to the maximum allowed window. Growing the window too large would cause an underflow in the calculations in tcp_output() to decide if a window update should be sent which would prevent the persist timer from being started if data was pending and the other end of the connection advertised an initial window size of 0. PR: kern/154006 Submitted by: Stefan `Sec` Zehl sec 42 org Reviewed by: bz MFC after: 1 week	2011-03-30 12:35:39 +00:00
Weongyo Jeong	c45e1b3cad	Covers values if (BYTES_THIS_ACK(tp, th) / tp->t_maxseg) value is from 2.0 to 3.0. Reviewed by: lstewart	2011-03-28 19:03:56 +00:00
Sergey Kandaurov	79d514355c	Reference ifaddr object before unlocking as it can be freed from another context at the moment of later access. PR: kern/155555 Submitted by: Andrew Boyer <aboyer att averesystems.com> Approved by: avg (mentor) MFC after: 2 weeks	2011-03-21 14:19:40 +00:00
Jeff Roberson	e4cd31dd3c	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.	2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb	4d457387fe	Properly check for an IPv4 socket after r219579. In some cases as udp6_connect() without an earlier bind(2) to an address, v4-mapped scokets allowed and a non mapped destination address, we can end up here with both v4 and v6 indicated: inp_vflag = (INP_IPV4\|INP_IPV6\|INP_IPV6PROTO) In that case however laddrp is NULL as the IPv6 path does not pass in a copy currently. Reported by: Pawel Worach (pawel.worach gmail.com) Tested by: Pawel Worach (pawel.worach gmail.com) MFC after: 6 days X-MFC with: r219579	2011-03-19 19:08:54 +00:00
Bjoern A. Zeeb	efc76f729a	Merge the two identical implementations for local port selections from in_pcbbind_setup() and in6_pcbsetport() in a single in_pcb_lport(). MFC after: 2 weeks	2011-03-12 21:46:37 +00:00
Randall Stewart	f79aab1866	Tunes and fixes the new DC-CC to seem to hit the right mix. Still may need some tweaks but it appears to almost not give away too much to an RFC2581 flow, but can really minimize the amount of buffers used in the net. MFC after: 3 months	2011-03-08 11:58:25 +00:00
Randall Stewart	48b6c64938	Adds a new Congestion Control that helps reduce the RTT that a flow will build up in buffers in transit. It is a slight modification to RFC2581 but is more friendly i.e. less aggressive. MFC after: 3 months	2011-03-01 00:37:46 +00:00
Dimitry Andric	cb8750c269	Fix breakage in sys/netinet/sctp_sysctl.c, introduced by r219057. If SCTP_HAS_RTTC is not defined, this file fails to compile. Insert the necessary #ifdefs to make it work. Pointy hat to: rrs	2011-02-26 22:45:40 +00:00
Randall Stewart	299108c5a2	Improvements to CC modules: 1) Add four new points that allow you to get more information to cc algo's 2) Fix the case where user changes module on a existing TCB, in such a case, the initialization module needs to be called on all nets. 3) Move htcp_cc structure to a union that other modules can use. 4) Add 5th point for get/set socket options for cc_module specific options MFC after: 2 months	2011-02-26 15:23:46 +00:00
Michael Tuexen	0191fb6de2	* Fix several bugs where the scaled versions of srtt and rttvar where used incorrectly. * Use appropriate variable names for RTO instead of RTT. MFC after: 3 months.	2011-02-24 22:58:15 +00:00
Michael Tuexen	be1d917696	* Cleanup the code computing the retransmission timeout. * Fix an initialization bug for the scaled variance of the RTO. MFC after: 3 months.	2011-02-24 22:36:40 +00:00
Rebecca Cran	6bccea7c2b	Fix typos - remove duplicate "the". PR: bin/154928 Submitted by: Eitan Adler <lists at eitanadler.com> MFC after: 3 days	2011-02-21 09:01:34 +00:00
Michael Tuexen	f0878bdcc5	Bugfix: Get per vnet sysctl variables and statistics working. MFC after:3 months.	2011-02-18 20:30:58 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Sergey Kandaurov	4fd8408ae7	Bump dummynet module version to meet dummynet schedulers' requirements, and thus unbreak loading dummynet.ko via /boot/loader.conf. Reported by: rihad <rihad att mail.ru> on freebsd-net Approved by: kib (mentor)	2011-02-16 15:43:35 +00:00
Randall Stewart	d69e7322cb	Fix a bug reported by Jonathan Leighton in his web-sctp testing at the Univ-of-Del. Basically when a 1-to-1 socket did a socket/bind/send(data)/close. If the timing was right we would dereference a socket that is NULL. MFC after: 1 month	2011-02-13 14:48:11 +00:00
Michael Tuexen	be2a6988a1	Fix several bugs related to stream scheduling. Obtained from: Robin Seggelmann MFC after: 3 months.	2011-02-13 13:53:28 +00:00
Daniel Eischen	9d22191d17	Oops, revert an accidental local change that got added in my last commit (r218627). No damage was done in the last commit, just some duplicated code was added (which is now removed).	2011-02-13 04:44:06 +00:00
Daniel Eischen	f7e6ce6d7a	Allow the SO_SETFIB socket option to select the default (0) routing table. Reviewed by: julian	2011-02-13 00:14:13 +00:00
Michael Tuexen	2678fe1ee9	Remove addresses from endpoint when there are no associations. This fixes a bug reported by brucec@. MFC after: 3 months.	2011-02-10 14:46:37 +00:00
Michael Tuexen	4c97400f86	Fix bugs related to M_FLOWID: * Store the flowid when receiving an SCTP/IPv6 packet. * Store the flowid when receiving an SCTP packet with wrong CRC. * Initilize flowid correctly. * Put test code under INVARIANTS. MFC after: 3 months.	2011-02-07 15:04:23 +00:00
Randall Stewart	f8140f7291	If not set (due to some error Michael is working on fixing) set it for the net. MFC after: 3 months	2011-02-07 08:12:24 +00:00
Randall Stewart	73403d4141	1) Track when flowid does get set. MFC after: 3 months	2011-02-07 08:10:29 +00:00
Randall Stewart	38521fb9b4	1) Use same scheme Michael and I discussed for a selected for a flowid 2) If flowid is not set, arrange so it is stored. 3) If flowid is set by lower layer, use it. MFC after: 3 Months	2011-02-06 13:17:40 +00:00
Luigi Rizzo	9b0456f075	correct the 'output_time' of packets generated by dummynet. In the dec.2009 rewrite I introduced a bug, using for the computation the arrival time instead of the time the packet has exited from the queue. The bandwidth computation was still correct because it is computed elsewhere, but traffic was sent out in bursts. The bug is also present in RELENG_8 after dec.2009 Thanks to Daikichi Osuga for investingating, finding and fixing the bug with detailed graphs of the behaviour before and after the fix. Submitted by: Daikichi Osuga MFC after: 2 weeks	2011-02-05 23:32:17 +00:00
Michael Tuexen	a4ae38f117	Add support for M_FLOWID.	2011-02-05 19:13:38 +00:00
Randall Stewart	5d40cf5d23	1) Typo correction in comments and one spacing change. 2) Mass update to all copyrights. MFC after: 3 Months	2011-02-05 12:12:51 +00:00
John Baldwin	d28b9e89a9	When turning off TCP_NOPUSH, only call tcp_output() to immediately flush any pending data if the connection is established. Submitted by: csjp Reviewed by: lstewart MFC after: 1 week	2011-02-04 14:13:15 +00:00
Randall Stewart	0071ee5ede	1) Fix cpu mapping per JB's suggestions 2) Fix it so INIT's don't always end up on CPU0 MFC after: 3 months	2011-02-04 13:50:30 +00:00
Rebecca Cran	492fddb2c4	Fix typo (Tuneable -> Tunable).	2011-02-04 12:03:48 +00:00
Michael Tuexen	252f7f93b0	Fix several bugs in the stream schedulers. From Robin Seggelmann. MFC after: 3 months.	2011-02-03 20:44:49 +00:00
Michael Tuexen	c446091b1e	Make sure that changing the ECN sysctl does not affect exisiting associations and endpoints. MFC after: 3 months.	2011-02-03 19:59:00 +00:00
Randall Stewart	dec0177df6	1) Move per John Baldwin to mp_maxid 2) Some signed/unsigned errors found by Mac OS compiler (from Michael) 3) a couple of copyright updates on the effected files. MFC after: 3 months	2011-02-03 19:22:21 +00:00
Randall Stewart	ae26e0a472	Fix the per CPU stats so that: 1) They don't use the giant "MAX_CPU" define and instead are allocated dynamically based on mp_ncpus 2) Will zero with the netstat -z -s -p sctp 3) Will be properly handled by both the sctp_init and finish (the multi-net stuff was incorrectly bzero'ing in sctp_init the wrong size.. the bzero is now moved to the right places). And of course the free is put in at the very end. MFC after: 3 Months	2011-02-03 11:52:22 +00:00
Randall Stewart	bfc46083b9	Adds an experimental option to create a pool of threads. These serve as input threads and are queued packets based on the V-tag number. This is similar to what a modern card can do with queue's for TCP... but alas modern cards know nothing about SCTP. MFC after: 3 months (maybe)	2011-02-03 10:05:30 +00:00
Randall Stewart	899288ae4b	1) Allow a chunk to track the cwnd it was at when sent. 2) Add separate max-bursts for retransmit and hb. These are set to sysctlable values but not settable via the socket api. This makes sure we don't blast out HB's or fast-retransmits. 3) Determine on the first data transmission on a net if its local-lan (by being under or over a RTT). This can later be used to think about different algorithms based on locallan vs big-i (experimental) 4) The cwnd should NOT be allowed to grow when an ECNEcho is seen (TCP has this same bug). We fix this in SCTP so an ECNe being seen prevents an advance of cwnd. 5) CWR's should not be sent multiple times to the same network, instead just updating the TSN being transmitted if needed. MFC after: 1 Month	2011-02-02 11:13:23 +00:00
Lawrence Stewart	03f0843bdb	Algorithm modules can define their own private congestion signal types in the top 8 bits of the 32 bit signal bit field space for internal use. These private signals should not be leaked outside of a module. Given that many algorithm modules use the NewReno hook functions to simplify their implementation, the obvious place such a leak would show up is in the NewReno cong_signal hook function. - Show the full number of significant bits in the signal type definitions in <netinet/cc.h>. - Add a bitmask to simplify figuring out if a given signal is in the private or public bit range. - Add a sanity check in newreno_cong_signal() to ensure private signals are not being leaked into the hook function. Sponsored by: FreeBSD Foundation Discussed with: David Hayes <dahayes at swin edu au> MFC after: 1 week X-MFC with: r215166	2011-02-01 13:32:27 +00:00
Lawrence Stewart	ec943febbb	Fix typo in comment: "course" -> "coarse" Sponsored by: FreeBSD Foundation Submitted by: jmallett MFC after: 3 months X-MFC with: r218152	2011-02-01 07:10:13 +00:00
Lawrence Stewart	0927e1a18b	Import an implementation of the CAIA-Hamilton-Delay (CHD) congestion control algorithm described in the paper "Improved coexistence and loss tolerance for delay based TCP congestion control" by Hayes and Armitage. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CHD enhances the approach taken by the Hamilton-Delay (HD) algorithm to provide tolerance to non-congestion related packet loss and improvements to coexistence with loss-based congestion control algorithms. A key idea in improving coexistence with loss-based congestion control algorithms is the use of a shadow window, which attempts to track how NewReno's congestion window (cwnd) would evolve. At the next packet loss congestion event, CHD uses the shadow window to correct cwnd in a way that reduces the amount of unfairness CHD experiences when competing with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 07:05:14 +00:00
Lawrence Stewart	ac230a79e1	Import a clean-room implementation of the Hamilton-Delay (HD) congestion control algorithm based on the paper "A strategy for fair coexistence of loss and delay-based congestion control algorithms" by Budzisz, Stanojevic, Shorten and Baker. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. HD uses a probabilistic approach to reacting to delay-based congestion. The probability of reducing cwnd is zero when the queuing delay is very small, increasing to a maximum at a set threshold, then back down to zero again when the queuing delay is high. Normal operation keeps the queuing delay below the set threshold. However, since loss-based congestion control algorithms push the queuing delay high when probing for bandwidth, having the probability of reducing cwnd drop back to zero for high delays allows HD to compete with loss-based algorithms. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:42:46 +00:00
Lawrence Stewart	1d4ed791d0	Import a clean-room implementation of the VEGAS congestion control algorithm based on the paper "TCP Vegas: end to end congestion avoidance on a global internet" by Brakmo and Peterson. It is implemented as a kernel module compatible with the recently committed modular congestion control framework. VEGAS uses network delay as a congestion indicator and unlike regular loss-based algorithms, attempts to keep the network operating with stable queuing delays and no congestion losses. By keeping network buffers used along the path within a set range, queuing delays are kept low while maintaining high throughput. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-02-01 06:17:00 +00:00
Randall Stewart	493d8e5a83	More ECN fixes: 1) We now remove ECN-Nonce since it will no longer continue as a I-D 2) Eliminate last_tsn_echo, this tied us to an assoc not the net and thus we were not doing m-homing on the ECN-Echo senders side right. 3) Increment the count going out even if the TSN in lower in the pending ECN-Echo, this way the receiver knows exactly how many packets were marked even with network re-ordering 4) Fix so we DO NOT stop doing delayed sack if a ECN Echo is in queue MFC after: 1 month	2011-01-31 11:50:11 +00:00
Bjoern A. Zeeb	7f79e7e4db	Remove duplicate printing of TF_NOPUSH in db_print_tflags(). MFC after: 10 days	2011-01-29 22:11:13 +00:00
Randall Stewart	a21779f050	Fixes to ECN in SCTP. 1) ECN was on an association basis, this is incorrect and will not work with CMT or for that matter if the user is sending to multiple addresses. This commit makes ECN on a per path basis. 2) Adopt the new format for the ECN internet draft. This also maintains compatability with old format chunks as well. 3) Keep track of the real time of a RTT down to micro seconds. For some future conditional features (for like a data center this is good information to have). MFC after: 1 month	2011-01-29 19:55:29 +00:00
Randall Stewart	410bcbef0a	Keep track of the real last RTT on each net. This will be used for Data Center congestion control, we won't want to engage it in the ECN code unless we KNOW that the RTT is less than 500us. MFC after: 1 week	2011-01-28 21:05:21 +00:00
Randall Stewart	d77e2e42b3	Fix a bug in the way ECN-Echo chunk sends were being accounted for. The counting was such that we counted only when we queued a chunk, not when we sent it. Now keep an additional counter for queuing and one for sending. MFC after: 1 week	2011-01-28 20:49:15 +00:00
Michael Tuexen	f8cdf87663	* Use 300 ms as the default for RTO_MIN. * Disable burst mitigation by default. * Remove unused constant. Discussed with rrs. MFC after: 3 months.	2011-01-26 21:38:17 +00:00
Michael Tuexen	507c72969d	Make SCTP_MAX_BURST compliant with the latest version of the socket API ID. This is not compatible with the API in stable/8.	2011-01-26 19:55:54 +00:00
Michael Tuexen	90fed1d88e	Change infrastructure for SCTP_MAX_BURST to allow compliance with the latest socket API ID. Especially it can be disabled. Full compliance needs changing the structure used in the socket option. Since this breaks the API, it will be a seperate commit which will not be MFCed to stable/8. MFC after: 3 months.	2011-01-26 19:49:03 +00:00
Daniel Eischen	e691be70f9	Prison check addresses set with multicast interface options. Reviewed by: bz MFC after: 1 week	2011-01-26 17:31:03 +00:00
Andrew Thompson	965615476e	When matching an incoming ARP against a bridge, ensure both interfaces belong to the same bridge. Submitted by: Alexander Zagrebin	2011-01-25 17:15:23 +00:00
Lawrence Stewart	050570efa7	Import the ERTT (Enhanced Round Trip Time) Khelp module. ERTT uses the Khelp/Hhook KPIs to hook into the TCP stack and maintain a per-connection, low noise estimate of the instantaneous RTT. ERTT's implementation is robust even in the face of delayed acknowledgements and/or TSO being in use for a connection. A high quality, low noise RTT estimate is a requirement for applications such as delay-based congestion control, for which we will be importing some algorithm implementations shortly. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz and others along the way MFC after: 3 months	2011-01-24 23:08:38 +00:00
Michael Tuexen	f7a77f6fd3	Add stream scheduling support. This work is based on a patch received from Robin Seggelmann. MFC after: 3 months.	2011-01-23 19:36:28 +00:00
Lawrence Stewart	a66ac850d7	An sbuf configured with SBUF_AUTOEXTEND will call malloc with M_WAITOK when a write to the buffer causes it to overflow. We therefore can't hold the CC list rwlock over a call to sbuf_printf() for an sbuf configured with SBUF_AUTOEXTEND. Switch to a fixed length sbuf which should be of sufficient size except in the very unlikely event that the sysctl is being processed as one or more new algorithms are loaded. If that happens, we accept the race and may fail the sysctl gracefully if there is insufficient room to print the names of all the algorithms. This should address a WITNESS warning and the potential panic that would occur if the sbuf call to malloc did sleep whilst holding the CC list rwlock. Sponsored by: FreeBSD Foundation Reported by: Nick Hibma Reviewed by: bz MFC after: 3 weeks X-MFC with: r215166	2011-01-23 13:00:25 +00:00
Michael Tuexen	afb048b8ef	Remove unnecessary checking of variable. MFC after: 3 months.	2011-01-23 07:27:35 +00:00
Lawrence Stewart	47f44cdd93	Some correctness and robustness fixes related to CUBIC's mean RTT estimate: - The mean RTT is updated at the end of each congestion epoch, but if we switch to congestion avoidance within the first epoch (e.g. if ssthresh was primed from the hostcache), we'll trigger a divide by zero panic in cubic_ack_received(). Set the mean to the min in cubic_record_rtt() if the mean is less than the min to ensure we have a sane mean for use in this situation. This fixes the panic reported by Nick Hibma. - Adjust conditions under which we update the mean RTT in cubic_post_recovery() to ensure a low latency path won't yield an RTT of less than 1. This avoids another potential divide by zero panic when running CUBIC in networks with sub-millisecond latencies. - Remove the "safety" assignment of min into mean when we don't update the mean because of failed conditions. The above change to the conditions for updating the mean ensures the safety issue is addressed and I feel it is better to keep our previous mean estimate around if we can't update than to revert to the min. - Initialise the mean RTT to 1 on connection startup to act as a safety belt if a situation we haven't considered and addressed with the above changes were to crop up in the wild. Sponsored by: FreeBSD Foundation Reported and tested by: Nick Hibma Discussed with: David Hayes <dahayes at swin edu au> MFC after: 5 weeks X-MFC with: r216114	2011-01-21 05:19:47 +00:00
Michael Tuexen	91f17c6faa	Improve comments. MFC after: 1 week.	2011-01-20 13:53:34 +00:00
Randall Stewart	a97009a5cd	Fix it so we align with new socket API draft for state's in destination (i.e. ACTIVE/INACTIVE/UNCONFIRMED) MFC after: 1 week	2011-01-20 12:40:09 +00:00
Michael Tuexen	0e9a9c104e	Cleanup the management of CC functions. MFC after: 3 months.	2011-01-19 22:10:35 +00:00
Randall Stewart	c3f9cbb0e1	Fix style 9 nit that snuck in when I grabbed the wrong patch ;-0 (thanks Daniel) MFC after: 1 week	2011-01-19 20:57:08 +00:00
Randall Stewart	a38b1c8c5e	Fix a bug where Multicast packets sent from a udp endpoint may end up echoing back to the sender even with OUT joining the multi-cast group. Reviewed by: gnn, bms, bz? Obtained from: deischen (with help from)	2011-01-19 19:07:16 +00:00
Matthew D Fleming	79c3d51b86	Specify a CTLTYPE_FOO so that a future sysctl(8) change does not need to rely on the format string. For SYSCTL_PROC instances that I noticed a discrepancy between the CTLTYPE and the format specifier, fix the CTLTYPE.	2011-01-18 21:14:13 +00:00
Michael Tuexen	ea8345d6a7	Add support for resource pooling to CMT. An original version of the patch was developed by Martin Becke and Thomas Dreibholz. MFC after: 3 months	2011-01-16 10:02:46 +00:00
John Baldwin	d5eadf1dde	Use a blocking malloc() to initialize the dummynet taskq. Reviewed by: luigi	2011-01-13 17:02:39 +00:00
Christian S.J. Peron	9844b02935	Un-break the build: use the correct format specifier for sizeof()	2011-01-12 23:07:51 +00:00
Matthew D Fleming	f88910cdf5	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the net* piece.	2011-01-12 19:53:50 +00:00
George V. Neville-Neil	09d3f8953e	Fix several bugs in the ARP code related to improperly formatted packets. ) Reject requests with a protocol length not equal to 4. This is IPv4 and there is no reason to accept anything else. ) Reject packets that have a multicast source hardware address. *) Drop requests where the hardware address length is not equal to the hardware address length of the interface. Pointed out by: Rozhuk Ivan MFC after: 1 week	2011-01-12 19:11:17 +00:00
Lawrence Stewart	f1f5cc47d8	Fixe some whitespace nits that were introduced in r216758. Sponsored by: FreeBSD Foundation Submitted by: pjd MFC after: 10 weeks X-MFC with: r216758	2011-01-11 01:32:08 +00:00
Lawrence Stewart	d64a46ea1a	Reset the last_sack_ack SACK hint for TCP input processing to ensure that the hint is 0 when no SACK data is received to update the hint with. This was accidentally omitted from r216753. Sponsored by: FreeBSD Foundation MFC after: 10 weeks X-MFC with: 216753	2011-01-10 06:12:01 +00:00
Daniel Eischen	d79fdd98c3	Make sure to always do source address selection on an unbound socket, regardless of any multicast options. If an address is specified via a multicast option, then let it override normal the source address selection. This fixes a bug where source address selection was not being performed when multicast options were present but without an interface being specified. Reviewed by: bz MFC after: 1 day	2011-01-08 22:33:46 +00:00
John Baldwin	79e955ed63	Trim extra spaces before tabs.	2011-01-07 21:40:34 +00:00
George V. Neville-Neil	ede990172f	Fix a memory leak in ARP queues. Pointed out by: jhb@ MFC after: 2 weeks	2011-01-07 20:02:05 +00:00
George V. Neville-Neil	90fdff0706	Adjust ARP hold queue locking. Submitted by: Rozhuk Ivan, jhb MFC after: 2 weeks	2011-01-07 18:14:58 +00:00
John Baldwin	e3e852231b	Use a regular taskqueue for dummynet rather than a "fast" taskqueue. Reviewed by: luigi	2011-01-07 16:47:20 +00:00
Michael Tuexen	2fad0e55b6	Bugfix: Make sure that the COMM_UP notificatin is delivered first also on the passive side. MFC after: 3 days.	2011-01-02 10:27:27 +00:00
Michael Tuexen	0a80a2de2b	Fix a typo. MFC after: 3 months.	2011-01-01 22:22:57 +00:00
Bjoern A. Zeeb	c744cde428	Try to catch a possible divide-by-zero as early as possible if "mtu" is 0 (also test for negative MTUs if checking it anyway). An MTU of 0 is arguably a bug elsewhere, but this at least gives us some more debugging hints. Sponsored by: ISPsystem (Early 2010) MFC after: 1 week	2010-12-31 21:47:11 +00:00
Michael Tuexen	20b07a4d85	Define and use SCTP_SSN_GE, SCTP_SSN_GT, SCTP_TSN_GE, SCTP_TSN_GT macros and use them instead of the generic compare_with_wrap. Retire compare_with_wrap. MFC after: 3 months.	2010-12-30 21:32:35 +00:00
Michael Tuexen	4a9ef3f833	Code cleanup: Use LIST_FOREACH, LIST_FOREACH_SAFE, TAILQ_FOREACH, TAILQ_FOREACH_SAFE where appropriate. No functional change. MFC after: 3 months.	2010-12-30 16:56:20 +00:00
Michael Tuexen	8ced7318a0	Fix three bugs related to the sequence number wrap-around affecting the processing of ECNE and ASCONF chunks. Reviewed by: rrs MFC after: 3 days.	2010-12-30 16:23:13 +00:00
Lawrence Stewart	e29f3cc76d	Add a comment for the ccv member of struct tcpcb. Sponsored by: FreeBSD Foundation MFC after: 5 weeks X-MFC with: r215166	2010-12-28 12:37:57 +00:00
Lawrence Stewart	39bc9de532	- Add some helper hook points to the TCP stack. The hooks allow Khelp modules to access inbound/outbound events and associated data for established TCP connections. The hooks only run if at least one hook function is registered for the hook point, ensuring the impact on the stack is effectively nil when no TCP Khelp modules are loaded. struct tcp_hhook_data is passed as contextual data to any registered Khelp module hook functions. - Add an OSD (Object Specific Data) pointer to struct tcpcb to allow Khelp modules to associate per-connection data with the TCP control block. - Bump __FreeBSD_version and add a note to UPDATING regarding to ABI changes introduced by this commit and r216753. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: bz, others along the way MFC after: 3 months	2010-12-28 12:13:30 +00:00
Lawrence Stewart	bee9ab2bc5	Add a new sack hint to track the most recent and highest sacked sequence number. This will be used by the incoming Enhanced RTT Khelp module. Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> Reviewed by: bz and others (as part of a larger patch) MFC after: 3 months	2010-12-28 03:27:20 +00:00
Lawrence Stewart	22968a7d56	Fix a whitespace nit introduced in r215166. Sponsored by: FreeBSD Foundation Spotted by: bz MFC after: 5 weeks X-MFC with: r215166	2010-12-28 01:38:52 +00:00
Robert Watson	eab54f6a13	Remove comment bemoaning the lack of an INP_INHASHLIST above in_pcbdrop(); I fixed this in r189657 in early 2009, so the comment is OBE. Reviewed by: bz MFC after: 3 days	2010-12-27 19:38:25 +00:00
Michael Tuexen	060bd88290	Provide a possibility to configure the inital congestion window to the value defined in RFC 4960. MFC after: 3 months.	2010-12-22 19:04:14 +00:00
Michael Tuexen	7c99d56fdf	Improve plausibility check in sctp_handle_sack(). Allow cmt_on_off to support values 0 (no CMT), 1 (CMT), and 2 (CMT/RP). MFC after: 3 months.	2010-12-22 17:59:38 +00:00
John Baldwin	b5224580a4	Fix a typo in a comment. MFC after: 1 week	2010-12-21 19:30:24 +00:00
Michael Tuexen	f23ba7b103	Fix a flightsize bug related to the processing of PKTDRP reports. MFC after: 3 days.	2010-12-17 15:39:55 +00:00
Michael Tuexen	8f777478ff	Bugfix: Take also the nr-mapping array into account when detecting gaps. Reviewed by: rrs@ MFC after: 3 days.	2010-12-16 21:01:02 +00:00
Michael Tuexen	36ec9f814d	Add a missing cast. Reported by blade_ly at yahoo.com.cn. MFC after: 1 day.	2010-12-16 09:49:16 +00:00
Bjoern A. Zeeb	8c9cef57ac	Bring back (most of) NATM to avoid further bitrot after r186119. Keep three lines disabled which I am unsure if they had been used at all. This will allow us to seek testers and possibly bring it all back. Discussed with: rwatson MFC after: 7 weeks	2010-12-15 22:58:45 +00:00
Michael Tuexen	0271d0cd13	Bugfix: Do correct accounting using the MIB counters when an association is aborted via sctp_abort_association(). MFC after: 3 days.	2010-12-12 20:50:44 +00:00
Bjoern A. Zeeb	08291968f2	Use correct field to track statistics counting error as bad header length. This assimilates the code to what ip_input has been doing since r1.1 in this case. Submitted by: Rozhuk Ivan (rozhuk.im gmail.com) MFC after: 4 days	2010-12-05 01:09:48 +00:00
Michael Tuexen	d9c5cfea61	Fix a bug where also the number of non-renegable gap reports was considered to be potentially renegable. MFC after: 1 day.	2010-12-04 19:29:49 +00:00
Lawrence Stewart	5728a0eae3	Import a clean-room implementation of the experimental H-TCP congestion control algorithm based on the Internet-Draft "draft-leith-tcp-htcp-06.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework. H-TCP was designed to provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "H-TCP: A framework for congestion control in high-speed and long-distance networks" provides additional detail. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months	2010-12-02 06:40:21 +00:00
Lawrence Stewart	67fef78ba4	Import a clean-room implementation of the experimental CUBIC congestion control algorithm based on the Internet-Draft "draft-rhee-tcpm-cubic-02.txt". It is implemented as a kernel module compatible with the recently committed modular congestion control framework. CUBIC was designed for provide increased throughput in fast and long-distance networks. It attempts to maintain fairness when competing with legacy NewReno TCP in lower speed scenarios where NewReno is able to operate adequately. The paper "CUBIC: A New TCP-Friendly High-Speed TCP Variant" provides additional detail. In collaboration with: David Hayes <dahayes at swin edu au> and Grenville Armitage <garmitage at swin edu au> Sponsored by: FreeBSD Foundation Reviewed by: rpaulo (older patch from a few weeks ago) MFC after: 3 months	2010-12-02 06:05:44 +00:00
Lawrence Stewart	74a5a1949e	General cleanup of the NewReno CC module (no functional changes): - Remove superfluous includes and unhelpful comments. - Alphabetically order functions. - Make functions static. Sponsored by: FreeBSD Foundation MFC after: 9 weeks X-MFC with: r215166	2010-12-02 02:32:46 +00:00
Lawrence Stewart	2ea8da28e9	- Reinstantiate the after_idle hook call in tcp_output(), which got lost somewhere along the way due to mismerging r211464 in our development tree. - Capture the essence of r211464 in NewReno's after_idle() hook. We don't use V_ss_fltsz/V_ss_fltsz_local yet which needs to be revisited. Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166	2010-12-02 01:36:00 +00:00
Lawrence Stewart	6157935fa5	Set ssthresh appropriately on RTO. This change was accidentally not ported from the pre modular CC stack. Sponsored by: FreeBSD Foundation Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166	2010-12-02 01:01:37 +00:00
Lawrence Stewart	b5af1b88a5	Pass NULL instead of 0 for the th pointer value. NULL != 0 on all platforms. Submitted by: David Hayes <dahayes at swin edu au> MFC after: 9 weeks X-MFC with: r215166	2010-12-02 00:47:55 +00:00
Gleb Smirnoff	a98c06f1c8	Use time_uptime instead of non-monotonic time_second to drive ARP timeouts. Suggested by: bde	2010-11-30 15:57:00 +00:00
Rebecca Cran	6d79f3f6ae	Fix more continuous/contiguous typos (cf. r215955)	2010-11-27 21:51:39 +00:00
Randall Stewart	6324ca614d	Adds new dtrace for cwnd functions and lay's groundwork for future dtrace points (rwnd flightsize etc). MFC after: 2 months	2010-11-25 13:39:55 +00:00
Gleb Smirnoff	0715546197	Redo r166423. It is important not only skip freeing multicast entires when underlying interface is detached, but also purge pointers to them, to avoid double-free in future.	2010-11-24 05:24:36 +00:00
Dimitry Andric	3e288e6238	After some off-list discussion, revert a number of changes to the DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various people working on the affected files. A better long-term solution is still being considered. This reversal may give some modules empty set_pcpu or set_vnet sections, but these are harmless. Changes reverted: ------------------------------------------------------------------------ r215318 \| dim \| 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) \| 4 lines Instead of unconditionally emitting .globl's for the __start_set_xxx and __stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu sections are actually defined. ------------------------------------------------------------------------ r215317 \| dim \| 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) \| 3 lines Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout the tree. ------------------------------------------------------------------------ r215316 \| dim \| 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) \| 2 lines Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.	2010-11-22 19:32:54 +00:00
Marko Zec	0593983963	Remove an apparently redundant CURVNET_SET() / CURVNET_RESTORE() pair. MFC after: 3 days	2010-11-22 14:16:23 +00:00
Lawrence Stewart	92ea5581dd	Fix a minor code redundancy nit. MFC after: 3 days	2010-11-20 08:40:37 +00:00
Lawrence Stewart	052aec123c	When enabling or disabling SIFTR with a VIMAGE kernel, ensure we add or remove the SIFTR pfil(9) hook functions to or from all network stacks. This patch allows packets inbound or outbound from a vnet to be "seen" by SIFTR. Additional work is required to allow SIFTR to actually generate log messages for all vnet related packets because the siftr_findinpcb() function does not yet search for inpcbs across all vnets. This issue will be fixed separately. Reported and tested by: David Hayes <dahayes at swin edu au> MFC after: 3 days	2010-11-20 07:36:43 +00:00

... 3 4 5 6 7 ...

4380 Commits