freebsd-skq

Author	SHA1	Message	Date
Gleb Smirnoff	c11a15bf8d	When processing ACK in tcp_do_segment, use sbcut_locked() instead of sbdrop_locked() to cut acked mbufs from the socket buffer. Free this chain a batch manner after the socket buffer lock is dropped. This measurably reduces contention on socket buffer. Sponsored by: Netflix Sponsored by: Nginx, Inc. Approved by: re (marius)	2013-10-09 12:00:38 +00:00
Mark Johnston	8298c17c6c	Add a separate translator for headers passed to the TCP probes in the input path. These probes get some of the fields in host order, whereas the output probes get them in network order, so a single translator isn't enough. This workaround ensures that the problem is essentially invisble to users: none of the probe arguments or their fields have changed. Approved by: re (hrs)	2013-10-02 17:14:12 +00:00
Bjoern A. Zeeb	a5f44cd7a1	Introduce spares in the TCP syncache and timewait structures so that fixed TCP_SIGNATURE handling can later be merged. This is derived from follow-up work to SVN r183001 posted to net@ on Sep 13 2008. Approved by: re (gjb)	2013-09-21 10:01:51 +00:00
Mikolaj Golub	4d3dfd450a	Unregister inet/inet6 pfil hooks on vnet destroy. Discussed with: andre Approved by: re (rodrigc)	2013-09-13 18:45:10 +00:00
Michael Tuexen	5dc80df9c5	Fix the aborting of association with the iterator using an empty user initiated error cause (using SCTP_ABORT\|SCTP_SENDALL). Approved by: re (delphij) MFC after: 1 week	2013-09-09 21:40:07 +00:00
Mikolaj Golub	1f6addd92c	Relese the interface in the last. Reviewed by: glebius Approved by: re (kib)	2013-09-08 18:19:40 +00:00
Michael Tuexen	d4d23375d3	When computing the partial delivery point, take the receiver socket buffer size correctly into account. MFC after: 1 week	2013-09-07 00:45:24 +00:00
John Baldwin	86d93a15ff	Use LIST_FOREACH_SAFE() instead of doing it by hand.	2013-09-05 14:26:37 +00:00
John Baldwin	fa302f207f	Use an unsigned long when indexing into mfchashtbl[] and mf6ctable[]. This matches the types used when computing hash indices and the type of the maximum size of mfchashtbl[]. PR: kern/181821 Submitted by: Sven-Thorsten Dietrich <sven@vyatta.com> (IPv4) MFC after: 1 week	2013-09-05 14:16:37 +00:00
Andrey V. Elsukov	d983befd2f	Remove unused code and sort variables declarations. PR: kern/181822 MFC after: 1 week	2013-09-05 08:12:36 +00:00
Michael Tuexen	0ddb429900	Remove redundant field pr_sctp_on. MFC after: 1 week	2013-09-03 19:31:59 +00:00
Michael Tuexen	a28c9ff0b7	Use uint16_t instead of in_port_t for consistency with the SCTP code. MFC after: 1 week	2013-09-02 23:27:53 +00:00
Michael Tuexen	e6b2b4b65b	All changes affect only SCTP-AUTH: * Remove non working code related to SHA224. * Remove support for non-standardised HMAC-IDs using SHA384 and SHA512. * Prefer SHA256 over SHA1. * Minor cleanup. MFC after: 2 weeks	2013-09-02 22:48:41 +00:00
Navdeep Parhar	7127e6acf0	Merge r254336 from user/np/cxl_tuning. Add a last-modified timestamp to each LRO entry and provide an interface to flush all inactive entries. Drivers decide when to flush and what the inactivity threshold should be. Network drivers that process an rx queue to completion can enter a livelock type situation when the rate at which packets are received reaches equilibrium with the rate at which the rx thread is processing them. When this happens the final LRO flush (normally when the rx routine is done) does not occur. Pure ACKs and segments with total payload < 64K can get stuck in an LRO entry. Symptoms are that TCP tx-mostly connections' performance falls off a cliff during heavy, unrelated rx on the interface. Flushing only inactive LRO entries works better than any of these alternates that I tried: - don't LRO pure ACKs - flush _all_ LRO entries periodically (every 'x' microseconds or every 'y' descriptors) - stop rx processing in the driver periodically and schedule remaining work for later. Reviewed by: andre	2013-08-28 23:00:34 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Mark Johnston	1ad19fb657	The second last argument of udp:::receive is supposed to contain the connection state, not the IP header. X-MFC with: r254889	2013-08-26 00:28:57 +00:00
Mark Johnston	57f6086735	Implement the ip, tcp, and udp DTrace providers. The probe definitions use dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month	2013-08-25 21:54:41 +00:00
Michael Tuexen	1a94cdbea7	Provide human readable debug output.	2013-08-25 12:44:03 +00:00
Andre Oppermann	9850f95989	For now limit printf(9) %x of the 64bit pkthdr.csum_flags field to 32bits. The upper 32bits are not occupied for now. Sponsored by: The FreeBSD Foundation	2013-08-25 09:49:00 +00:00
Andre Oppermann	1b4381afbb	Restructure the mbuf pkthdr to make it fit for upcoming capabilities and features. The changes in particular are: o Remove rarely used "header" pointer and replace it with a 64bit protocol/ layer specific union PH_loc for local use. Protocols can flexibly overlay their own 8 to 64 bit fields to store information while the packet is worked on. o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc instead of pkthdr.header. o Extend csum_flags to 64bits to allow for additional future offload information to be carried (e.g. iSCSI, IPsec offload, and others). o Move the RSS hash type enumerator from abusing m_flags to its own 8bit rsstype field. Adjust accessor macros. o Add cosqos field to store Class of Service / Quality of Service information with the packet. It is not yet supported in any drivers but allows us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with a modernized ALTQ. o Add four 8 bit fields l[2-5]hlen to store the relative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet and protocol headers to find out location of checksums and other information. Header parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. o Add another flexible 64bit union to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. o Restructure the CSUM flags to better signify their outbound (down the stack) and inbound (up the stack) use. The CSUM flags used to be a bit chaotic and rather poorly documented leading to incorrect use in many places. Bring clarity into their use through better naming. Compatibility mappings are provided to preserve the API. The drivers can be corrected one by one and MFC'd without issue. o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures). Sponsored by: The FreeBSD Foundation	2013-08-24 19:51:18 +00:00
Michael Tuexen	6be15a24c4	Export the inpcb features as a 64-bit entity. Bump __FreeBSD_version to 1000048 since the modified structure is user visible and used by netstat, for example.	2013-08-22 20:29:57 +00:00
Michael Tuexen	06c9f9bddf	Make also the features of the association 64-bit. When exporting to xinpcb, just export the lower 32-bit. Using there also 64-bits will break the ABI and will be committed separetly. MFC after: 2 weeks X-MFC with: 254248	2013-08-22 19:28:13 +00:00
Xin LI	acde2476c4	Fix an integer overflow in computing the size of a temporary buffer can result in a buffer which is too small for the requested operation. Security: CVE-2013-3077 Security: FreeBSD-SA-13:09.ip_multicast	2013-08-22 00:51:37 +00:00
Andre Oppermann	5fc98a7895	Reorder the mbuf defines to make more sense and group related flags together. Add M_FLAG_PRINTF for use with printf(9) %b indentifier. Use the generic mbuf flags print names in the net80211 code and adjust the protocol specific bits for their new positions. Change SCTP M_PROTO mapping from 5 to 1 to fit within the 16bit field they use internally to store some additional information. Discussed with: trociny, glebius	2013-08-19 14:25:11 +00:00
Andre Oppermann	86bd049144	Add m_clrprotoflags() to clear protocol specific mbuf flags at up and downwards layer crossings. Consistently use it within IP, IPv6 and ethernet protocols. Discussed with: trociny, glebius	2013-08-19 13:27:32 +00:00
Andre Oppermann	678d7b9461	Move the SCTP specific definition of M_NOTIFICATION onto a protocol specific mbuf flag from sys/mbuf.h to netinet/sctp_os_bsd.h. It is only relevant within SCTP. Discussed with: tuexen	2013-08-19 12:30:18 +00:00
Andre Oppermann	88388bdcbe	Move the global M_SKIP_FIREWALL mbuf flags to a protocol layer specific flag instead. The flag is only used within the IP and IPv6 layer 3 protocols. Because some firewall packages treat IPv4 and IPv6 packets the same the flag should have the same value for both. Discussed with: trociny, glebius	2013-08-19 11:08:36 +00:00
Andre Oppermann	b09dc7e328	Move ip_reassemble()'s use of the global M_FRAG mbuf flag to a protocol layer specific flag instead. The flag is only relevant while the packet stays in the IP reassembly queue. Discussed with: trociny, glebius	2013-08-19 10:34:10 +00:00
Andre Oppermann	fb86dfcd2f	Remove unused M_FRAG, M_FIRSTFRAG and M_LASTFRAG tagging from ip_fragment(). There wasn't any real driver (and hardware) support for it. Modern hardware does full fragmentation/segmentation offload instead.	2013-08-19 10:30:15 +00:00
Mark Johnston	7b77e1fe0f	Specify SDT probe argument types in the probe definition itself rather than using SDT_PROBE_ARGTYPE(). This will make it easy to extend the SDT(9) API to allow probes with dynamically-translated types. There is no functional change. MFC after: 2 weeks	2013-08-15 04:08:55 +00:00
Michael Tuexen	0e05fbded9	Don't send uninitialized memory (two instances of 4 bytes) in every cookie on the wire. This bug was reported in https://bugzilla.mozilla.org/show_bug.cgi?id=905080 MFC after: 3 days	2013-08-14 21:51:32 +00:00
Mikolaj Golub	c5c392e7ed	Virtualize carp(4) variables to have per vnet control. Reviewed by: ae, glebius	2013-08-13 19:59:49 +00:00
Michael Tuexen	2c9c61defa	Make the features a 64-bit value instead of 32-bit. This will allow an easier integration of the support for NDATA. While there, do also some minor cleanups. Obtained from: rrs@ MFC after: 2 weeks	2013-08-12 13:52:15 +00:00
Michael Tuexen	bfd1666aad	Micro-optimization suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=898234 by pchang9. While there simplify the code. MFC after: 1 week	2013-08-01 12:05:23 +00:00
Andrey V. Elsukov	6794f46021	Remove the large part of struct ipsecstat. Only few fields of this structure is used, but they already have equal fields in the struct newipsecstat, that was introduced with FAST_IPSEC and then was merged together with old ipsecstat structure. This fixes kernel stack overflow on some architectures after migration ipsecstat to PCPU counters. Reported by: Taku YAMAMOTO, Maciej Milewski	2013-07-23 14:14:24 +00:00
Michael Tuexen	88a95b1f25	Allow the code to be compiled without warnings for any combination of INET, INET6 and SCTP_DEBUG defines. The issue was reported by Lally Singh. MFC after: 2 weeks	2013-07-20 13:14:59 +00:00
Michael Tuexen	da24cfcb35	Get the code compiling without INET and INET6 being defined. This is not possible in FreeBSD, but in the upstream code. MFC after: 2 weeks	2013-07-19 21:16:59 +00:00
Andre Oppermann	ccd040ab18	Free the non-fatal "timestamp missing" debug string manually as it is not covered by the catch-all free for the error cases. Found by: Coverity	2013-07-16 16:37:08 +00:00
Mikolaj Golub	f122b319eb	A complete duplication of binding should be allowed if on both new and duplicated sockets a multicast address is bound and either SO_REUSEPORT or SO_REUSEADDR is set. But actually it works for the following combinations: * SO_REUSEPORT is set for the fist socket and SO_REUSEPORT for the new; * SO_REUSEADDR is set for the fist socket and SO_REUSEADDR for the new; * SO_REUSEPORT is set for the fist socket and SO_REUSEADDR for the new; and fails for this: * SO_REUSEADDR is set for the fist socket and SO_REUSEPORT for the new. Fix the last case. PR: 179901 MFC after: 1 month	2013-07-12 19:08:33 +00:00
Andre Oppermann	10c982958c	Unbreak VIMAGE by correctly naming the vnet pointer in struct tcp_syncache. Reported by: trociny, rodrigc	2013-07-12 07:43:56 +00:00
Andre Oppermann	81d392a09d	Improve SYN cookies by encoding the MSS, WSCALE (window scaling) and SACK information into the ISN (initial sequence number) without the additional use of timestamp bits and switching to the very fast and cryptographically strong SipHash-2-4 MAC hash algorithm to protect the SYN cookie against forgeries. The purpose of SYN cookies is to encode all necessary session state in the 32 bits of our initial sequence number to avoid storing any information locally in memory. This is especially important when under heavy spoofed SYN attacks where we would either run out of memory or the syncache would fill with bogus connection attempts swamping out legitimate connections. The original SYN cookies method only stored an indexed MSS values in the cookie. This isn't sufficient anymore and breaks down in the presence of WSCALE information which is only exchanged during SYN and SYN-ACK. If we can't keep track of it then we may severely underestimate the available send or receive window. This is compounded with large windows whose size information on the TCP segment header is even lower numerically. A number of years back SYN cookies were extended to store the additional state in the TCP timestamp fields, if available on a connection. While timestamps are common among the BSD, Linux and other *nix systems Windows never enabled them by default and thus are not present for the vast majority of clients seen on the Internet. The common parameters used on TCP sessions have changed quite a bit since SYN cookies very invented some 17 years ago. Today we have a lot more bandwidth available making the use window scaling almost mandatory. Also SACK has become standard making recovering from packet loss much more efficient. This change moves all necessary information into the ISS removing the need for timestamps. Both the MSS (16 bits) and send WSCALE (4 bits) are stored in 3 bit indexed form together with a single bit for SACK. While this is significantly less than the original range, it is sufficient to encode all common values with minimal rounding. The MSS depends on the MTU of the path and with the dominance of ethernet the main value seen is around 1460 bytes. Encapsulations for DSL lines and some other overheads reduce it by a few more bytes for many connections seen. Rounding down to the next lower value in some cases isn't a problem as we send only slightly more packets for the same amount of data. The send WSCALE index is bit more tricky as rounding down under-estimates the available send space available towards the remote host, however a small number values dominate and are carefully selected again. The receive WSCALE isn't encoded at all but recalculated based on the local receive socket buffer size when a valid SYN cookie returns. A listen socket buffer size is unlikely to change while active. The index values for MSS and WSCALE are selected for minimal rounding errors based on large traffic surveys. These values have to be periodically validated against newer traffic surveys adjusting the arrays tcp_sc_msstab[] and tcp_sc_wstab[] if necessary. In addition the hash MAC to protect the SYN cookies is changed from MD5 to SipHash-2-4, a much faster and cryptographically secure algorithm. Reviewed by: dwmalone Tested by: Fabian Keil <fk@fabiankeil.de>	2013-07-11 15:29:25 +00:00
Andre Oppermann	07dacf031e	Extend debug logging of TCP timestamp related specification violations. Update related comments and style.	2013-07-10 12:06:01 +00:00
Michael Tuexen	e5aeb83c42	Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting. X-MFC with: r252026	2013-07-09 14:38:26 +00:00
Andrey V. Elsukov	69edf037d7	Migrate struct carpstats to PCPU counters.	2013-07-09 10:02:51 +00:00
Andrey V. Elsukov	2841260cd6	Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters.	2013-07-09 09:59:46 +00:00
Andrey V. Elsukov	a786f67981	Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters.	2013-07-09 09:54:54 +00:00
Andrey V. Elsukov	5b7cb97c2b	Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU counters.	2013-07-09 09:50:15 +00:00
Andrey V. Elsukov	5da0521fce	Use new macros to implement ipstat and tcpstat using PCPU counters. Change interface of kread_counters() similar ot kread() in the netstat(1).	2013-07-09 09:43:03 +00:00
Andrey V. Elsukov	c80211e3cf	Prepare network statistics structures for migration to PCPU counters. Use uint64_t as type for all fields of structures. Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat, in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat, pfkeystat, pim6stat, pimstat, rip6stat, udpstat. Discussed with: arch@	2013-07-09 09:32:06 +00:00
Michael Tuexen	ee1ccd9258	Fix a bug were only 2048 streams where usable even though more than 2048 streams were negotiated on the wire. While there, remove the hard coded limit of 2048 streams. MFC after: 3 days	2013-07-05 10:08:49 +00:00
Michael Tuexen	5db47b3def	When processing an incoming ABORT, SHUTDOWN_COMPLETE or ERROR (NAT related) chunk, take always the T-bit into account, when checking the verification tag. MFC after: 3 days	2013-07-04 19:47:46 +00:00
Mikolaj Golub	efdf104bca	In r227207, to fix the issue with possible NULL inp_socket pointer dereferencing, when checking for SO_REUSEPORT option (and SO_REUSEADDR for multicast), INP_REUSEPORT flag was introduced to cache the socket option. It was decided then that one flag would be enough to cache both SO_REUSEPORT and SO_REUSEADDR: when processing SO_REUSEADDR setsockopt(2), it was checked if it was called for a multicast address and INP_REUSEPORT was set accordingly. Unfortunately that approach does not work when setsockopt(2) is called before binding to a multicast address: the multicast check fails and INP_REUSEPORT is not set. Fix this by adding INP_REUSEADDR flag to unconditionally cache SO_REUSEADDR. PR: 179901 Submitted by: Michael Gmelin freebsd grem.de (initial version) Reviewed by: rwatson MFC after: 1 week	2013-07-04 18:38:00 +00:00
Michael Tuexen	56f778aadf	Code cleanups. MFC after: 3 days	2013-07-03 18:48:43 +00:00
Navdeep Parhar	e364d8c44a	Catch up with r238990. LLE_DELETED does not clobber everything else in la_flags since said revision.	2013-07-03 17:27:32 +00:00
Hiroki Sato	e32d93954d	Fix a panic when leaving MC group in a kernel with VIMAGE enabled. in_leavegroup() is called from an asynchronous task, and igmp_change_state() requires that curvnet is set by the caller.	2013-07-02 16:39:12 +00:00
Lawrence Stewart	92a0637f73	Import an implementation of the CAIA Delay-Gradient (CDG) congestion control algorithm, which is based on the 2011 v0.1 patch release and described in the paper "Revisiting TCP Congestion Control using Delay Gradients" by David Hayes and Grenville Armitage. It is implemented as a kernel module compatible with the modular congestion control framework. CDG is a hybrid congestion control algorithm which reacts to both packet loss and inferred queuing delay. It attempts to operate as a delay-based algorithm where possible, but utilises heuristics to detect loss-based TCP cross traffic and will compete effectively as required. CDG is therefore incrementally deployable and suitable for use on shared networks. In collaboration with: David Hayes <david.hayes at ieee.org> and Grenville Armitage <garmitage at swin edu au> MFC after: 4 days Sponsored by: Cisco University Research Program and FreeBSD Foundation	2013-07-02 08:44:56 +00:00
Gleb Smirnoff	42a253e6a1	Fix kmod_*stat_inc() after r249276. The incorrect code actually increased the pointer, not the memory it points to. In collaboration with: kib Reported & tested by: Ian FREISLICH <ianf clue.co.za> Sponsored by: Nginx, Inc.	2013-06-21 06:36:26 +00:00
Andrey V. Elsukov	6659296cb0	Use IPSECSTAT_INC() and IPSEC6STAT_INC() macros for ipsec statistics accounting. MFC after: 2 weeks	2013-06-20 09:55:53 +00:00
Bruce M Simpson	c91950082d	Disable IGMPv3 link timers on a transition to IGMPv2. Submitted by: Alan Smithee	2013-06-07 17:12:08 +00:00
Andre Oppermann	3c914c547e	Allow drivers to specify a maximum TSO length in bytes if they are limited in the amount of data they can handle at once. Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet. The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by: cperciva MFC after: 1 week (using spare struct members to preserve ABI)	2013-06-03 12:55:13 +00:00
Michael Tuexen	fe1831e06f	Use LIST_EMPTY when appropriate. MFC after: 1 week	2013-06-02 10:35:08 +00:00
Michael Tuexen	fb4a67d207	Remove redundant checks. MFC after: 2 weeks	2013-05-28 09:25:58 +00:00
Michael Tuexen	3f61f926ea	Withdraw http://svnweb.freebsd.org/changeset/base/250809 since the real fix is in http://svnweb.freebsd.org/changeset/base/250952.	2013-05-24 09:21:18 +00:00
Michael Tuexen	e3581df21e	Initialize the fibnum for outgoing packets to 0. This avoids crashing due to the usage of uninitialized fibnum. This bugs became visiable after http://svnweb.freebsd.org/changeset/base/250700 MFC after: 2 weeks	2013-05-19 16:06:43 +00:00
Michael Tuexen	553bb0688c	Set errno to ETIMEDOUT if an SCTP association times out during setup. MFC after: 1 week	2013-05-17 22:26:05 +00:00
Michael Tuexen	b05fbf171e	Don't send an ABORT chunk with verification 0. MFC after: 1 week	2013-05-17 21:45:52 +00:00
Jim Harris	d13fc9954b	Fix typo in net.inet.tcp.minmss sysctl description. MFC after: 3 days	2013-05-13 19:55:27 +00:00
Hiroki Sato	b8992a6792	Add IFF_MONITOR support to gre(4). Tested by: Chip Marshall MFC after: 1 week	2013-05-11 19:05:38 +00:00
Gleb Smirnoff	5d81d09598	Rate limit the number of remotely triggered ARP log messages to 1 log message per second.	2013-05-11 10:51:32 +00:00
Michael Tuexen	3457ccdaea	Honor the net.inet6.ip6.v6only sysctl variable and the IPV6_V6ONLY socket option for SCTP sockets in the same way as for UDP or TCP sockets. MFC after: 2 weeks	2013-05-10 18:09:38 +00:00
Andre Oppermann	f89d4c3acf	Back out r249318, r249320 and r249327 due to a heisenbug most likely related to a race condition in the ipi_hash_lock with the exact cause currently unknown but under investigation.	2013-05-06 16:42:18 +00:00
Hiroki Sato	5df1b6b57e	Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group Address. Although KAME implementation used FF02:0:0:0:0:2::/96 based on older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed in RFC 4620. The kernel always joins the /104-prefixed address, and additionally does /96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1. The default value of the sysctl is 1. ping6(8) -N flag now uses /104-prefixed one. When this flag is specified twice, it uses /96-prefixed one instead. Reviewed by: ume Based on work by: Thomas Scheffler PR: conf/174957 MFC after: 2 weeks	2013-05-04 19:16:26 +00:00
Colin Percival	76089c9511	Move IPPROTO_IPV6 from #ifdef __BSD_VISIBLE to #if __POSIX_VISIBLE >= 201112 since POSIX 2001 states that it shall be defined. Reported by: sbruno Reviewed by: jilles MFC after: 1 week	2013-04-27 23:36:01 +00:00
Gleb Smirnoff	47e8d432d5	Add const qualifier to the dst parameter of the ifnet if_output method.	2013-04-26 12:50:32 +00:00
Gleb Smirnoff	414676ba31	Fix couple of mbuf leaks in incoming ARP processing.	2013-04-25 17:38:04 +00:00
Gleb Smirnoff	4c7a605968	Introduce a pointer to const variable gw, which points either at the same place as dst, or to the sockaddr in the routing table. The const constraint of gw makes us safe from modifing routing table accidentially. And "onstantness" of dst allows us to remove several bandaids, when we switched it back at &ro->ro_dst, now it always points there. Reviewed by: rrs	2013-04-25 12:42:09 +00:00
Randall Stewart	0be23a54cf	This fixes the issue with the "randomly changing" default route. What it was is there are two places in ip_output.c where we do a goto again. One place was fine, it copies out the new address and then resets dst = ro->rt_dst; But the other place does not do that, which means earlier when we found the gateway, we have dst pointing there aka dst = ro->rt_gateway is done.. then we do a goto again.. bam now we clobber the default route. The fix is just to move the again so we are always doing dst = &ro->rt_dst; in the again loop. PR: 174749,157796 MFC after: 1 week	2013-04-24 18:30:32 +00:00
Andre Oppermann	5628dd0893	When doing RFC3042 limited transmit on the first on second duplicate ACK make sure we actually have new data to send. This prevents us from sending unneccessary pure ACKs. Reported by: Matt Miller <matt@matthewjmiller.net> Tested by: Matt Miller <matt@matthewjmiller.net> MFC after: 2 weeks	2013-04-23 14:06:32 +00:00
Oleg Bulyzhin	1571132f14	Plug static llentry leak (ipv4 & ipv6 were affected). PR: kern/172985 MFC after: 1 month	2013-04-21 21:28:38 +00:00
Gabor Kovesdan	8fb3bbe770	- Corrrect mispellings of word useful Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:45:15 +00:00
Xin LI	f2297451fe	Fix incomplete printf. PR: kern/177889 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week	2013-04-16 19:32:12 +00:00
Xin LI	c1031303f0	Don't leak lock when returning. PR: kern/177888 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week	2013-04-16 19:25:41 +00:00
Andrey V. Elsukov	e3389419ef	Reflect removing of the counter_u64_subtract() function in the macro.	2013-04-12 16:29:15 +00:00
Gleb Smirnoff	0e2bc05c47	Fix tcp_output() so that tcpcb is updated in the same manner when an mbuf allocation fails, as in a case when ip_output() returns error. To achieve that, move large block of code that updates tcpcb below the out: label. This fixes a panic, that requires the following sequence to happen: 1) The SYN was sent to the network, tp->snd_nxt = iss + 1, tp->snd_una = iss 2) The retransmit timeout happened for the SYN we had sent, tcp_timer_rexmt() sets tp->snd_nxt = tp->snd_una, and calls tcp_output(). In tcp_output m_get() fails. 3) Later on the SYN\|ACK for the SYN sent in step 1) came, tcp_input sets tp->snd_una += 1, which leads to tp->snd_una > tp->snd_nxt inconsistency, that later panics in socket buffer code. For reference, this bug fixed in DragonflyBSD repo: http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/1ff9b7d322dc5a26f7173aa8c38ecb79da80e419 Reviewed by: andre Tested by: pho Sponsored by: Nginx, Inc. PR: kern/177456 Submitted by: HouYeFei&XiBoLiu <lglion718 163.com>	2013-04-11 18:23:56 +00:00
Gleb Smirnoff	18ba072a22	Fix build.	2013-04-10 08:09:25 +00:00
Andre Oppermann	e8b3186b6a	Change certain heavily used network related mutexes and rwlocks to reside on their own cache line to prevent false sharing with other nearby structures, especially for those in the .bss segment. NB: Those mutexes and rwlocks with variables next to them that get changed on every invocation do not benefit from their own cache line. Actually it may be net negative because two cache misses would be incurred in those cases.	2013-04-09 21:02:20 +00:00
Andre Oppermann	982c1675ff	Fix a race condition on tcp listen socket teardown with pending connections in the accept queue and contiguous new incoming SYNs. Compared to the original submitters patch I've moved the test next to the SYN handling to have it together in a logical unit and reworded the comment explaining the issue. Submitted by: Matt Miller <matt@matthewjmiller.net> Submitted by: Juan Mojica <jmojica@gmail.com> Reviewed by: Matt Miller (changes) Tested by: pho MFC after: 1 week	2013-04-09 20:52:26 +00:00
Gleb Smirnoff	4a21e86ec1	Fix VIMAGE build.	2013-04-09 09:15:26 +00:00
Andrey V. Elsukov	9cb8d207af	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week	2013-04-09 07:11:22 +00:00
Gleb Smirnoff	5923c29332	Merge from projects/counters: TCP/IP stats. Convert 'struct ipstat' and 'struct tcpstat' to counter(9). This speeds up IP forwarding at extreme packet rates, and makes accounting more precise. Sponsored by: Nginx, Inc.	2013-04-08 19:57:21 +00:00
Michael Tuexen	ebae998767	Add a macro for checking for IPv4 link local addresses. MFC after: 1 week	2013-03-31 18:27:46 +00:00
Ed Maste	ce7ad6640c	Keep fwd_tag around for subsequent pcb lookups For TIMEWAIT handling tcp_input may have to jump back for an additional pass through pcblookup. Prior to this change the fwd_tag had been discarded after the first lookup, so a new connection attempt delivered locally via 'ipfw fwd' would fail to find a match. As of r248886 the tag will be detached and freed when passed to the socket buffer.	2013-03-29 20:51:44 +00:00
Alexander V. Chernikov	ae01d73c04	Add ipfw support for setting/matching DiffServ codepoints (DSCP). Setting DSCP support is done via O_SETDSCP which works for both IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4. Dscp can be specified by name (AFXY, CSX, BE, EF), by value (0..63) or via tablearg. Matching DSCP is done via another opcode (O_DSCP) which accepts several classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words). Many people made their variants of this patch, the ones I'm aware of are (in alphabetic order): Dmitrii Tejblum Marcelo Araujo Roman Bogorodskiy (novel) Sergey Matveichuk (sem) Sergey Ryabin PR: kern/102471, kern/121122 MFC after: 2 weeks	2013-03-20 10:35:33 +00:00
Gleb Smirnoff	7525c48111	In m_megapullup() instead of reserving some space at the end of packet, m_align() it, reserving space to prepend data. Reviewed by: mav	2013-03-17 07:37:10 +00:00
Gleb Smirnoff	aa8bd99d99	- Replace compat macros with function calls.	2013-03-16 08:58:28 +00:00
Gleb Smirnoff	3c26f4a9bc	We can, and should use M_WAITOK here. Sponsored by: Nginx, Inc.	2013-03-15 13:10:06 +00:00
Gleb Smirnoff	dc4ad05ecd	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
Gleb Smirnoff	39f6074e2e	- Use m_getcl() instead of hand allocating. Sponsored by: Nginx, Inc.	2013-03-15 12:53:53 +00:00
Gleb Smirnoff	41a7572b26	Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.	2013-03-12 13:42:47 +00:00
Gleb Smirnoff	f4562a299c	Remove LIBALIAS_LOCK_ASSERT(), including a couple with an uninitialzed argument, in code that isn't compiled in kernel. PR: kern/176667 Sponsored by: Nginx, Inc.	2013-03-11 12:22:44 +00:00
Lawrence Stewart	1e0e83d760	The hashmask returned by hashinit() is a valid index in the returned hash array. Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in hashdestroy() by ensuring the last array index in the flow counter hash table is flushed of entries. MFC after: 3 days	2013-03-07 04:42:20 +00:00
Davide Italiano	5b999a6be0	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil	2013-03-04 11:09:56 +00:00
Michael Tuexen	e045904fdc	Fix a potential race in returning setting errno when an association goes down. Reported by Mozilla in https://bugzilla.mozilla.org/show_bug.cgi?id=845513 MFC after: 3 days	2013-02-27 19:51:47 +00:00
Andrew Gallatin	e5ca1ffab5	Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED. Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4 checksum is correct. Without this fix, the tcp_lro code will reject good IPv4 traffic from drivers that do not implement IPv4 header harder csum offload. Sponsored by: Myricom Inc. MFC after: 7 days	2013-02-21 17:00:35 +00:00
Sergey Kandaurov	46f2df9c13	ip_savecontrol() style fixes. No functional changes. - fix indentation - put the operator at the end of the line for long statements - remove spaces between the type and the variable in a cast - remove excessive parentheses Tested by: md5	2013-02-20 15:44:40 +00:00
Michael Tuexen	2416af26a0	Send the adaptation layer indication only if set by the user. MFC after: 3 days Discussed with: rrs	2013-02-11 21:02:49 +00:00
Michael Tuexen	c53f854a17	Don't send kernel provided information in the User Initiated ABORT cause, since the user can also provide this kind of information. So the receiver doesn't know who provided the information. While there: Fix a bug where the stack would send a malformed ABORT chunk when using a send() call with SCTP_ABORT\|SCT_SENDALL flags. MFC after: 3 days	2013-02-11 13:57:03 +00:00
Gleb Smirnoff	24421c1c32	Resolve source address selection in presense of CARP. Add a couple of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-11 10:58:22 +00:00
Michael Tuexen	f0d44a49a0	Make sure that received packets for removed addresses are handled consistently. While there, make variable names consistent. MFC after: 3 days	2013-02-10 19:57:19 +00:00
Michael Tuexen	a1cb341b5d	Cleanup the handling of address scopes. Announce in the INIT/INIT-ACK only the supported address types. While there, do some whitespace cleanups. MFC after: 1 week	2013-02-09 17:26:14 +00:00
Michael Tuexen	c39cfa1f7e	Fix a bug where HEARTBEATs were still sent in SHUTDOWN_SENT or SHUTDOWN_ACK_SENT state. While there, make the corresponding code consistent. MFC after: 1 week	2013-02-09 08:27:08 +00:00
John Baldwin	0d25fab44d	Add placeholder constants to reserve a portion of the socket option name space for use by downstream vendors to add custom options. MFC after: 2 weeks	2013-02-01 15:32:20 +00:00
Andre Oppermann	cda3447bb0	uma_zone_set_max() directly returns the rounded effective zone limit. Use the return value directly instead of doing a second uma_zone_set_max() step. MFC after: 1 week	2013-02-01 14:21:09 +00:00
Gleb Smirnoff	498944374f	- Move AUTHORS and ACKNOWLEDGEMENTS to the end of the page. - Add myself to list of authors.	2013-01-31 10:29:22 +00:00
Gleb Smirnoff	9711a168b9	Retire struct sockaddr_inarp. Since ARP and routing are separated, "proxy only" entries don't have any meaning, thus we don't need additional field in sockaddr to pass SIN_PROXY flag. New kernel is binary compatible with old tools, since sizes of sockaddr_inarp and sockaddr_in match, and sa_family are filled with same value. The structure declaration is left for compatibility with third party software, but in tree code no longer use it. Reviewed by: ru, andre, net@	2013-01-31 08:55:21 +00:00
Gleb Smirnoff	ea26ed7eea	Utilize m_get2() to get mbuf of appropriate size.	2013-01-30 18:40:19 +00:00
Navdeep Parhar	adfaf8f6ad	Add checks for SO_NO_OFFLOAD in a couple of places that I missed earlier in r245915.	2013-01-26 01:41:42 +00:00
Navdeep Parhar	20be068c8a	Teach toe_l2_resolve to resolve IPv6 destinations too. Reviewed by: bz@	2013-01-26 00:57:29 +00:00
Navdeep Parhar	4364ec0852	Move lle_event to if_llatbl.h lle_event replaced arp_update_event after the ARP rewrite and ended up in if_ether.h simply because arp_update_event used to be there too. IPv6 neighbor discovery is going to grow lle_event support and this is a good time to move it to if_llatbl.h. The two in-tree consumers of this event - OFED and toecore - are not affected. Reviewed by: bz@	2013-01-25 23:58:21 +00:00
Navdeep Parhar	460cf046c2	There is no need to call into the TOE driver twice in pru_rcvd (tod_rcvd and then tod_output right after that). Reviewed by: bz@	2013-01-25 22:50:52 +00:00
Navdeep Parhar	464dfeb43f	Add TCP_OFFLOAD hook in syncache_respond for IPv6 too, just like the one that exists for IPv4. Reviewed by: bz@	2013-01-25 22:16:35 +00:00
Navdeep Parhar	b218348bc3	Teach toe_4tuple_check() to deal with IPv6 4-tuples too. Reviewed by: bz@	2013-01-25 20:45:24 +00:00
Navdeep Parhar	37cc0ecb1b	Heed SO_NO_OFFLOAD. MFC after: 1 week	2013-01-25 20:23:33 +00:00
Navdeep Parhar	5cd3dcaa25	Remove redundant test, we know inp_lport is 0. MFC after: 1 week	2013-01-25 20:14:27 +00:00
John Baldwin	1d77fa5a26	Use decimal values for UDP and TCP socket options rather than hex to avoid implying that these constants should be treated as bit masks. Reviewed by: net MFC after: 1 week	2013-01-22 19:45:04 +00:00
Lawrence Stewart	5b648e797b	Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited" logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could therefore potentially corrupt the result (although under normal operation, neither variable should legitmately exceed 32 bits). [1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html Submitted by: jhb MFC after: 1 week	2013-01-22 09:44:21 +00:00
John Baldwin	6c0ef8957f	Don't drop options from the third retransmitted SYN by default. If the SYNs (or SYN/ACK replies) are dropped due to network congestion, then the remote end of the connection may act as if options such as window scaling are enabled but the local end will think they are not. This can result in very slow data transfers in the case of window scaling disagreements. The old behavior can be obtained by setting the net.inet.tcp.rexmit_drop_options sysctl to a non-zero value. Reviewed by: net@ MFC after: 2 weeks	2013-01-09 20:27:06 +00:00
Peter Wemm	8a1163e82f	Temporarily revert rev 244678. This is causing loopback problems with the lo (loopback) interfaces.	2013-01-03 10:21:28 +00:00
Michael Tuexen	11e03b3200	Some cleanups. MFC after: 3 days	2012-12-27 08:10:58 +00:00
Michael Tuexen	72c123a8b4	Minor cleanups of debug messages. MFC after: 3 days	2012-12-27 08:06:58 +00:00
Michael Tuexen	2c2e3218cb	Fix a copy and paste error. MFC after: 3 days	2012-12-27 08:02:58 +00:00
Gleb Smirnoff	c4d0697685	Garbage collect carp_cksum().	2012-12-25 14:29:38 +00:00
Gleb Smirnoff	7951008b47	Change net.inet.carp.demotion sysctl to add the supplied value to the current demotion factor instead of assigning it. This allows external scripts to control demotion factor together with kernel in a raceless manner.	2012-12-25 14:08:13 +00:00
Gleb Smirnoff	e8db9937f3	Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied, and arg2 doesn't pass size of arg1.	2012-12-25 13:55:21 +00:00
Gleb Smirnoff	468e45f3bd	The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify all interested parties in case if interface flag IFF_UP has changed. However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR and SIOCAIFADDR_IN6 can, too. The actual \|= is done not in the protocol code, but in code of interface drivers. To fix this historical layering violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the IFF_UP flag, and if it did, run the if_up() handler. This fixes configuring an address under CARP control on an interface that was initially !IFF_UP. P.S. I intentionally omitted handling the IFF_SMART flag. This flag was never ever used in any driver since it was introduced, and since it means another layering violation, it should be garbage collected instead of pretended to be supported.	2012-12-25 13:01:58 +00:00
Gleb Smirnoff	3e6c8b5366	Minor style(9) changes: - Remove declaration in initializer. - Add empty line between logical blocks.	2012-12-24 21:35:48 +00:00
Gleb Smirnoff	b8056fae06	Fix !INET6 build after r244365.	2012-12-18 08:14:16 +00:00
Gleb Smirnoff	dd029d52fa	Clear correct flag in INET6 case.	2012-12-18 08:09:44 +00:00
Andrey V. Elsukov	f491274582	Since we use different flags to detect tcp forwarding, and we share the same code for IPv4 and IPv6 in tcp_input, we should check both M_IP_NEXTHOP and M_IP6_NEXTHOP flags. MFC after: 3 days	2012-12-17 20:55:33 +00:00
Gleb Smirnoff	b1ec2940af	Fix problem in r238990. The LLE_LINKED flag should be tested prior to entering llentry_free(), and in case if we lose the race, we should simply perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread performing arptimer(), it will remove two references from the lle instead of one. Reported by: Ian FREISLICH <ianf clue.co.za>	2012-12-13 11:11:15 +00:00
Gleb Smirnoff	78a7880f64	Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it, but later after processing and freeing the tag, we need to jump back again to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to process and free the tag for second time. Reported & tested by: Pawel Tyll <ptyll nitronet.pl> MFC after: 3 days	2012-12-12 17:41:21 +00:00
Michael Tuexen	cca6f4a8f3	Get it compiling without INET and INET6 support (mainly userland stack). MFC after: 2 weeks	2012-12-08 15:11:09 +00:00
Pawel Jakub Dawidek	6acd596efb	More warnings for zones that depend on the kern.ipc.maxsockets limit. Obtained from: WHEEL Systems	2012-12-08 12:51:06 +00:00
Michael Tuexen	b11f07d86c	Use correct padding of the ABORT chunk in case of an user initiated abort cause is used. MFC after: 2 weeks	2012-12-08 09:50:38 +00:00
Michael Tuexen	3fb7827628	Ensure that the padding of the last parameter of an INIT chunk is not included in the chunk length as required by RFC 4960. While there, cleanup sctp_send_initiate(). MFC after: 2 weeks	2012-12-08 08:22:33 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Andre Oppermann	da2299c5c7	Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability. Checksumming the IP header of fragments is no different from doing normal IP headers. Discussed with: yongari MFC after: 1 week	2012-11-27 19:31:49 +00:00
Andre Oppermann	13feab8286	Add DELACK to list of timers. MFC after: 1 week	2012-11-27 19:07:28 +00:00
Navdeep Parhar	825fd1e437	Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not).	2012-11-27 06:42:44 +00:00
Alfred Perlstein	08373e0bc4	Auto size the tcbhashsize structure based on max sockets. While here, also make the code that enforces power-of-two more forgiving, instead of just resetting to 512, graciously round-down to the next lower power of two.	2012-11-27 03:04:24 +00:00
Michael Tuexen	a50f0e3152	Add support for sctp_peeloff() also in the front states of the association. MFC after: 3 days	2012-11-26 16:44:03 +00:00
Michael Tuexen	e3976bb8d7	Find the endpoint for an incoming packet also if the endpoint comes from sctp_peeloff(). MFC after: 3 days	2012-11-26 16:43:32 +00:00
Michael Tuexen	440da2d35b	Allow shutdown() to be used on fds returned from sctp_peeloff(). MFC after: 3 days	2012-11-26 08:50:00 +00:00
Michael Tuexen	a3158782c2	Remove unused function. MFC after: 1 week	2012-11-25 14:25:08 +00:00
Michael Tuexen	3a51a2647a	Add support for SCTP/UDP/IPV6. This completes the support of http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps MFC after: 1 week	2012-11-17 20:04:04 +00:00
Michael Tuexen	325c8c46b1	Get the accounting working. We now have counters how many chunks for each SCTP outgoing stream are in the send and sent queue. While there, improve the naming of NR-SACK related constants recently introduced. MFC after: 1 week	2012-11-16 19:39:10 +00:00
Roman Divacky	8252626fb4	Initialize hdrlen to 0 to avoid clang warning in NOINET case.	2012-11-10 10:41:00 +00:00
Bjoern A. Zeeb	ec89d0398b	Cleanup some whitspace in this file to get it out of an upcoming patch. MFC after: 10 days	2012-11-08 03:29:55 +00:00
Michael Tuexen	a7ad6026e0	Add per outgoing stream accounting for chunks in the send and sent queue. This provides no functional change, but is a preparation for an upcoming stream reset improvement. Done with rrs@. MFC after: 1 week	2012-11-07 22:11:38 +00:00
Michael Tuexen	2a4985847a	Add some missing changes missed in the last commit. MFC after: 1 week X-MFC with: 242708	2012-11-07 21:25:32 +00:00
Michael Tuexen	98f2956c11	Improve PR-SCTP if used in combination with NR-SACK. Based on work done by Mohammad Rajiullah. MFC after: 1 week	2012-11-07 20:59:00 +00:00
Kevin Lo	0f5e7edc14	Fix typo; s/ouput/output	2012-11-07 07:00:59 +00:00
Mateusz Guzik	8e1e6e5f4a	Fix possible spurious sbunlock in sctp_sorecvmsg. Reviewed by: tuexen Approved by: trasz (mentor) MFC after: 3 days	2012-11-06 23:04:23 +00:00
Michael Tuexen	f3b05218ea	Move from early SSN assignment to late SSN assignment. This doesn't change functionality, but makes upcoming change much easier. Developed with rrs@ at the IETF 85. MFC after: 1 week	2012-11-05 20:55:17 +00:00
Andre Oppermann	60ee3bb213	Back out r242262. The simplified window change/update logic wasn't complete and ready for production use. PR: kern/173309	2012-11-05 09:13:06 +00:00
Andrey V. Elsukov	ffdbf9da3b	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre	2012-11-02 01:20:55 +00:00
Michael Tuexen	21f67da7c4	Whitespace changes due to upstream integration of SCTP changes in the FreeBSD code base.	2012-10-29 20:47:32 +00:00
Michael Tuexen	24d4ce2c87	Add braces (as used elsewhere in the SCTP code).	2012-10-29 20:44:29 +00:00
Michael Tuexen	09c1c8563a	Use ntohs() and htons() in correct order. However, this doesn't change functionality.	2012-10-29 20:42:48 +00:00
Andre Oppermann	78f59b4bfd	Forced commit to provide the correct commit message to r242251: Defer sending an independent window update if a delayed ACK is pending saving a packet. The window update then gets piggy-backed on the next already scheduled ACK. Added grammar fixes as well. MFC after: 2 weeks	2012-10-29 13:16:33 +00:00
Andre Oppermann	8d045dbdf3	Define the delayed ACK timeout value directly as hz/10 instead of obfuscating it by going through PR_FASTHZ. No functional change. MFC after: 2 weeks	2012-10-29 12:17:02 +00:00
Andre Oppermann	322181c98e	If the user has closed the socket then drop a persisting connection after a much reduced timeout. Typically web servers close their sockets quickly under the assumption that the TCP connections goes away as well. That is not entirely true however. If the peer closed the window we're going to wait for a long time with lots of data in the send buffer. MFC after: 2 weeks	2012-10-28 19:58:20 +00:00
Andre Oppermann	09440655fe	Increase the initial CWND to 10 segments as defined in IETF TCPM draft-ietf-tcpm-initcwnd-05. It explains why the increased initial window improves the overall performance of many web services without risking congestion collapse. As long as it remains a draft it is placed under a sysctl marking it as experimental: net.inet.tcp.experimental.initcwnd10 = 1 When it becomes an official RFC soon the sysctl will be changed to the RFC number and moved to net.inet.tcp. This implementation differs from the RFC draft in that it is a bit more conservative in the case of packet loss on SYN or SYN\|ACK because we haven't reduced the default RTO to 1 second yet. Also the restart window isn't yet increased as allowed. Both will be adjusted with upcoming changes. Is is enabled by default. In Linux it is enabled since kernel 3.0. MFC after: 2 weeks	2012-10-28 19:47:46 +00:00
Andre Oppermann	77339e1cdc	Update comment to reflect the change made in r242263. MFC after: 2 weeks	2012-10-28 19:22:18 +00:00
Andre Oppermann	c4ab59c1a1	Add SACK_PERMIT to the list of TCP options that are switched off after retransmitting a SYN three times. MFC after: 2 weeks	2012-10-28 19:20:23 +00:00
Andre Oppermann	79ce26a08c	Simplify and enhance the window change/update acceptance logic, especially in the presence of bi-directional data transfers. snd_wl1 tracks the right edge, including data in the reassembly queue, of valid incoming data. This makes it like rcv_nxt plus reassembly. It never goes backwards to prevent older, possibly reordered segments from updating the window. snd_wl2 tracks the left edge of sent data. This makes it a duplicate of snd_una. However joining them right now is difficult due to separate update dependencies in different places in the code flow. snd_wnd tracks the current advertized send window by the peer. In tcp_output() the effective window is calculated by subtracting the already in-flight data, snd_nxt less snd_una, from it. ACK's become the main clock of window updates and will always update the window when the left edge of what we sent is advanced. The ACK clock is the primary signaling mechanism in ongoing data transfers. This works reliably even in the presence of reordering, reassembly and retransmitted segments. The ACK clock is most important because it determines how much data we are allowed to inject into the network. Zero window updates get us out of persistence mode are crucial. Here a segment that neither moves ACK nor SEQ but enlarges WND is accepted. When the ACK clock is not active (that is we're not or no longer sending any data) any segment that moves the extended right SEQ edge, including out-of-order segments, updates the window. This gives us updates especially during ping-pong transfers where the peer isn't done consuming the already acknowledged data from the receive buffer while responding with data. The SSH protocol is a prime candidate to benefit from the improved bi-directional window update logic as it has its own windowing mechanism on top of TCP and is frequently sending back protocol ACK's. Tcpdump provided by: darrenr Tested by: darrenr MFC after: 2 weeks	2012-10-28 19:16:22 +00:00
Andre Oppermann	024fd5b6bb	For retransmits of SYN\|ACK from the syncache use the slightly more aggressive special tcp_syn_backoff[] retransmit schedule instead of the normal tcp_backoff[] schedule for established connections. MFC after: 2 weeks	2012-10-28 19:02:07 +00:00
Andre Oppermann	f4748ef5fb	When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE, the default retransmit timeout, as base to calculate the backoff time until next try instead of the TCP_REXMTVAL() macro which only works correctly when we already have measured an actual RTT+RTTVAR. Before it would cause the first retransmit at RTOBASE, the next four at the same time (!) about 200ms later, and then another one again RTOBASE later. MFC after: 2 weeks	2012-10-28 18:56:57 +00:00
Andre Oppermann	602e8e45ee	Remove bogus 'else' in #ifdef that prevented the rttvar from being reset tcp_timer_rexmt() on retransmit for IPv6 sessions. MFC after: 2 weeks	2012-10-28 18:45:04 +00:00
Andre Oppermann	4faaea5505	Allow arbitrary MSS sizes and don't mind about the cluster size anymore. We've got more cluster sizes for quite some time now and the orginally imposed limits and the previously codified thoughts on efficiency gains are no longer true. MFC after: 2 weeks	2012-10-28 18:33:52 +00:00
Andre Oppermann	f3a10d7954	Change the syncache count reporting the current number of entries from an unprotected u_int that reports garbage on SMP to a function based sysctl obtaining the current value from UMA. Also read back the actual cache_limit after page size rounding by UMA. PR: kern/165879 MFC after: 2 weeks	2012-10-28 18:07:34 +00:00
Andre Oppermann	aafa0b4164	Simplify implementation of net.inet.tcp.reass.maxsegments and net.inet.tcp.reass.cursegments. MFC after: 2 weeks	2012-10-28 17:59:46 +00:00
Andre Oppermann	f62563d33c	Prevent a flurry of forced window updates when an application is doing small reads on a (partially) filled receive socket buffer. Normally one would a send a window update every time the available space in the socket buffer increases by two times MSS. This leads to a flurry of window updates that do not provide any meaningful new information to the sender. There still is available space in the window and the sender can continue sending data. All window updates then get carried by the regular ACKs. Only when the socket buffer was (almost) full and the window closed accordingly a window updates delivery new information and allows the sender to start sending more data again. Send window updates only every two MSS when the socket buffer has less than 1/8 space available, or the available space in the socket buffer increased by 1/4 its full capacity, or the socket buffer is very small. The next regular data ACK will carry and report the exact window size again. Reported by: sbruno Tested by: darrenr Tested by: Darren Baginski PR: kern/116335 MFC after: 2 weeks	2012-10-28 17:40:35 +00:00
Andre Oppermann	4249614cb0	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks	2012-10-28 17:30:28 +00:00
Andre Oppermann	cf8f04f4c0	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks	2012-10-28 17:25:08 +00:00
Andre Oppermann	22efabd40c	Adjust the initial default CWND upon connection establishment to the new and increased values specified by RFC5681 Section 3.1. The even larger initial CWND per RFC3390, if enabled, is not affected. MFC after: 2 weeks	2012-10-28 17:16:09 +00:00
Gleb Smirnoff	078468ede4	o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack. Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>	2012-10-26 21:06:33 +00:00
Andrey V. Elsukov	c1de64a495	Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks	2012-10-25 09:39:14 +00:00
Gleb Smirnoff	a7f707cd37	After r241923 the updated ip_len no longer needed.	2012-10-25 09:02:21 +00:00
Gleb Smirnoff	b6fcf6f9f5	Fix error in r241913 that had broken fragment reassembly.	2012-10-25 09:00:57 +00:00
Gleb Smirnoff	9e2a372fd2	Use ip_stripoptions() instead of handrolled version.	2012-10-23 10:30:09 +00:00
Gleb Smirnoff	4937a6561f	Simplify ip_stripoptions() reducing number of intermediate variables.	2012-10-23 10:29:31 +00:00
Gleb Smirnoff	8ad458a471	Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.	2012-10-23 08:33:13 +00:00
Xin LI	6f56329a25	Remove __P. Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months	2012-10-22 21:49:56 +00:00
Gleb Smirnoff	8f134647ca	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
Andrey Zonov	32fe38f123	- Update cachelimit after hashsize and bucketlimit were set. Reported by: az Reviewed by: melifaro Approved by: kib (mentor) MFC after: 1 week	2012-10-19 14:00:03 +00:00
Andre Oppermann	c9b652e3e8	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
Ed Maste	983731268c	Avoid potential bad pointer dereference. Previously RuleAdd would leave entry->la unset for the first entry in the proxyList. Sponsored by: ADARA Networks MFC After: 1 week	2012-10-17 20:23:07 +00:00
Gleb Smirnoff	e76163a539	We don't need to convert ip6_len to host byte order before ip6_output(), the IPv6 stack is working in net byte order. The reason this code worked before is that ip6_output() doesn't look at ip6_plen at all and recalculates it based on mbuf length.	2012-10-15 07:57:55 +00:00
Gleb Smirnoff	347d90acff	Fix a miss from r241344: in ip_mloopback() we need to go to net byte order prior to calling in_delayed_cksum(). Reported by: Olivier Cochard-Labbe <olivier cochard.me>	2012-10-14 15:08:07 +00:00
Alexander V. Chernikov	3bff27cd67	Cleanup documentation: cloning route support has been removed in r186119. MFC after: 2 weeks	2012-10-13 09:31:01 +00:00
Gleb Smirnoff	86b61e4748	Revert fixup of ip_len from r241480. Now stack isn't yet ready for that change.	2012-10-12 09:32:38 +00:00
Gleb Smirnoff	105bd2113b	In ip_stripoptions(): - Remove unused argument and incorrect comment. - Fixup ip_len after stripping.	2012-10-12 09:24:24 +00:00
Alexander V. Chernikov	3c2824b9ef	Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect is enabled. This eliminates one mtx_lock() per each routing lookup thus improving performance in several cases (routing to directly connected interface or routing to default gateway). Icmp redirects should not be used to provide routing direction nowadays, even for end hosts. Routers should not use them too (and this is explicitly restricted in IPv6, see RFC 4861, clause 8.2). Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every AF_INET routing table in given VNET instance on drop_redirect sysctl change. This change is part of bigger patch eliminating rte locking. Sponsored by: Yandex LLC MFC after: 2 weeks	2012-10-10 19:06:11 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Gleb Smirnoff	23e9c6dc1e	After r241245 it appeared that in_delayed_cksum(), which still expects host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().	2012-10-08 08:03:58 +00:00
Gleb Smirnoff	b7fb54d8ae	No reason to play with IP header before calling sctp_delayed_cksum() with offset beyond the IP header.	2012-10-08 07:21:32 +00:00
Gleb Smirnoff	21d172a3f1	A step in resolving mess with byte ordering for AF_INET. After this change: - All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated. Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)	2012-10-06 10:02:11 +00:00
Gleb Smirnoff	df4e91d386	There is a complex race in in_pcblookup_hash() and in_pcblookup_group(). Both functions need to obtain lock on the found PCB, and they can't do classic inter-lock with the PCB hash lock, due to lock order reversal. To keep the PCB stable, these functions put a reference on it and after PCB lock is acquired drop it. If the reference was the last one, this means we've raced with in_pcbfree() and the PCB is no longer valid. This approach works okay only if we are acquiring writer-lock on the PCB. In case of reader-lock, the following scenario can happen: - 2 threads locate pcb, and do in_pcbref() on it. - These 2 threads drop the inp hash lock. - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock, does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 2 aforementioned threads acquire reader lock on the pcb and run in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues, second gets 0 and considers pcb freed, returns. - The thread that got 1 continutes working with detached pcb, which later leads to panic in the underlying protocol level. To plumb that problem an additional INPCB flag introduced - INP_FREED. We check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend that that was the last reference. Discussed with: rwatson, jhb Reported by: Vladimir Medvedkin <medved rambler-co.ru>	2012-10-02 12:03:02 +00:00
Gleb Smirnoff	891122d180	carp_send_ad() should never return without rescheduling next run.	2012-09-29 05:52:19 +00:00
Gleb Smirnoff	85c05144f1	Fix bug in TCP_KEEPCNT setting, which slipped in in the last round of reviewing of r231025. Unlike other options from this family TCP_KEEPCNT doesn't specify time interval, but a count, thus parameter supplied doesn't need to be multiplied by hz. Reported & tested by: amdmi3	2012-09-27 07:13:21 +00:00
Michael Tuexen	e06f3469e0	Whitespace change. MFC after: 3 days	2012-09-23 07:43:10 +00:00
Michael Tuexen	a98809db78	Declare a static function as such. MFC after: 3 days	2012-09-23 07:23:18 +00:00
Michael Tuexen	efb0814c24	Fix a bug related to handling Re-config chunks. It is not true that the association can be removed if the socket is gone. MFC after: 3 days	2012-09-22 22:04:17 +00:00
Michael Tuexen	2089750009	Small cleanups. No functional change. MFC after: 10 days	2012-09-22 14:39:20 +00:00
Kevin Lo	b7e1113e8f	Fix typo: s/pakcet/packet	2012-09-20 03:29:43 +00:00
Eitan Adler	582212fa04	s/teh/the/g Approved by: cperciva MFC after: 3 days	2012-09-14 21:59:55 +00:00
Michael Tuexen	dcb68fba2d	Small cleanups. No functional change. MFC after: 10 days	2012-09-14 18:32:20 +00:00
Gleb Smirnoff	3b3a8eb937	o Create directory sys/netpfil, where all packet filters should reside, and move there ipfw(4) and pf(4). o Move most modified parts of pf out of contrib. Actual movements: sys/contrib/pf/net/.c -> sys/netpfil/pf/ sys/contrib/pf/net/.h -> sys/net/ contrib/pf/pfctl/.c -> sbin/pfctl contrib/pf/pfctl/.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/.4 -> share/man/man4 contrib/pf/pfctl/.5 -> share/man/man5 sys/netinet/ipfw -> sys/netpfil/ipfw The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice. Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd. The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match. Discussed with: bz, luigi	2012-09-14 11:51:49 +00:00
Michael Tuexen	8225a9bc85	Whitespace changes. MFC after: 10 days	2012-09-09 08:14:04 +00:00
Michael Tuexen	fe6bb0a788	Whitespace cleanup. MFC after: 10 days	2012-09-08 20:54:54 +00:00
Gleb Smirnoff	d6d3f01e0a	Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2012-09-08 06:41:54 +00:00
Michael Tuexen	a169d6ec2b	Don't include a structure containing a flexible array in another structure. MFC after: 10 days	2012-09-07 13:36:42 +00:00
Michael Tuexen	12780a595e	Get rid of a gcc'ism. MFC after: 10 days	2012-09-06 07:03:56 +00:00
Michael Tuexen	dd294dcec6	Using %p in a format string requires a void *. MFC after: 10 days	2012-09-05 18:52:01 +00:00
Michael Tuexen	2899aa8f65	Use the consistenly the size of a variable. This helps to keep the code simpler for the userland implementation. MFC after: 3 days	2012-09-04 22:45:00 +00:00
Michael Tuexen	c6328f940e	Whitespace change. MFC after: 3 days	2012-09-04 22:40:49 +00:00
Alexander V. Chernikov	7d4317bd40	Introduce new link-layer PFIL hook V_link_pfil_hook. Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit. Remove ipfw headers from ether/bridge code since they are unneeded now. Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet. MFC after: 3 weeks	2012-09-04 19:43:26 +00:00
Gleb Smirnoff	478df1d534	Provide a sysctl switch that allows to install ARP entries with multicast bit set. FreeBSD refuses to install such entries since 9.0, and this broke installations running Microsoft NLB, which are violating standards. Tested by: Tarasov Oleg <oleg_tarasov sg-tea.com>	2012-09-03 14:29:28 +00:00
Michael Tuexen	81eb4e6351	Fix a typo which results in RTT to be off by a factor of 10, if the RTT is larger than 1 second. MFC after: 3 days	2012-09-02 12:37:30 +00:00
Eitan Adler	64baf9fbe0	Mark the ipfw interface type as not being ether. This fixes an issue where uuidgen tried to obtain a ipfw device's mac address which was always zero. PR: 170460 Submitted by: wxs Reviewed by: bdrewery Reviewed by: delphij Approved by: cperciva MFC after: 1 week	2012-09-01 23:33:49 +00:00
Randall Stewart	ec03d5433f	This small change takes care of a race condition that can occur when both sides close at the same time. If that occurs, without this fix the connection enters FIN1 on both sides and they will forever send FIN\|ACK at each other until the connection times out. This is because we stopped processing the FIN\|ACK and thus did not advance the sequence and so never ACK'd each others FIN. This fix adjusts it so we do process the FIN properly and the race goes away ;-) MFC after: 1 month	2012-08-25 09:26:37 +00:00
Navdeep Parhar	06fd9875aa	Correctly handle the case where an inp has already been dropped by the time the TOE driver reports that an active open failed. toe_connect_failed is supposed to handle this but it should be provided the inpcb instead of the tcpcb which may no longer be around.	2012-08-21 18:09:33 +00:00
Randall Stewart	7db496de2c	Though I disagree, I conceed to jhb & Rui. Note that we still have a problem with this whole structure of locks and in_input.c [it does not lock which it should not, but this can lead to crashes]. (I have seen it in our SQA testbed.. besides the one with a refcnt issue that I will have SQA work on next week ;-)	2012-08-19 11:54:02 +00:00
Randall Stewart	9424879158	Ok jhb, lets move the ifa_free() down to the bottom to assure that all tables and such are removed before we start to free. This won't protect the Hash in ip_input.c but in theory should protect any other uses that do use locks. MFC after: 1 week (or more)	2012-08-17 05:51:46 +00:00
Lawrence Stewart	ee24d3b840	The TCP PAWS fix for kernels with fast tick rates (r231767) changed the TCP timestamp related stack variables to reference ms directly instead of ticks. The h_ertt(4) Khelp module relies on TCP timestamp information in order to calculate its enhanced RTT estimates, but was not updated as part of r231767. Consequently, h_ertt has not been calculating correct RTT estimates since r231767 was comitted, which in turn broke all delay-based congestion control algorithms because they rely on the h_ertt RTT estimates. Fix the breakage by switching h_ertt to use tcp_ts_getticks() in place of all previous uses of the ticks variable. This ensures all timestamp related variables in h_ertt use the same units as the TCP stack and therefore results in meaningful comparisons and RTT estimate calculations. Reported & tested by: Naeem Khademi (naeemk at ifi uio no) Discussed with: bz MFC after: 3 days	2012-08-17 01:49:51 +00:00
Randall Stewart	184749821f	Its never a good idea to double free the same address. MFC after: 1 week (after the other commits ahead of this gets MFC'd)	2012-08-16 17:55:16 +00:00
Luigi Rizzo	e5813a3bce	s/lenght/length/ in comments	2012-08-07 07:52:25 +00:00
Luigi Rizzo	17369272e4	move functions outside the SYSBEGIN/SYSEND block (SYSBEGIN/SYSEND are specific to ipfw/dummynet and are used to emulate sysctl on platforms that do not have them, and they work by creating an array which contains all the sysctl-ed symbols.)	2012-08-06 11:02:23 +00:00
Luigi Rizzo	00c4633285	use FREE_PKT instead of m_freem to free an mbuf. The former is the standard form used in ipfw/dummynet, so that it is easier to remap it to different memory managers depending on the platform.	2012-08-06 10:50:43 +00:00
Michael Tuexen	55b175e747	Fix a bug found by dim@: Don't use an uninitilized variable, if INVARIANTS is on and an illegal packet with destination 0 is received. MFC after: 3 days X-MFC with: 238003	2012-08-06 10:50:23 +00:00
Mikolaj Golub	655f934b78	In tcp timers, check INP_DROPPED flag a little later, after callout_deactivate(), so if INP_DROPPED is set we return with the timer active flag cleared. For me this fixes negative keep timer values reported by `netstat -x' for connections in CLOSE state. Approved by: net (silence) MFC after: 2 weeks	2012-08-05 17:30:17 +00:00
Michael Tuexen	63c6726e05	Fix a refcount issue. The called only decrements is stcb is NULL. MFC after: 3 days Discussed with: rrs	2012-08-05 10:47:18 +00:00
Michael Tuexen	832208514f	Fix a bug reported by Simon L. B. Nielsen: If an SCTP endpoint receives an ASCONF with a wildcard lookup address and incorrect verification tag, the system crashes. MFC after: 3 days.	2012-08-04 20:40:36 +00:00
Michael Tuexen	173be2b6cd	Testing an interface property should depend on the interface, not on an address. MFC after: 3 days	2012-08-04 08:03:30 +00:00
Gleb Smirnoff	ea53792942	Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via \|= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2012-08-02 13:57:49 +00:00
Luigi Rizzo	46f2f751e1	replace __unused with a portable construct; fix a couple of signed/unsigned warnings.	2012-08-02 12:45:13 +00:00
Luigi Rizzo	f5705b527d	replace inet_ntoa_r with the more standard inet_ntop(). As discussed on -current, inet_ntoa_r() is non standard, has different arguments in userspace and kernel, and almost unused (no clients in userspace, only net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c in the kernel)	2012-08-01 18:52:07 +00:00
Luigi Rizzo	71ca24f182	add a cast to avoid a signed/unsigned warning (to be removed when we will have TUNABLE_UINT constructors)	2012-08-01 18:49:00 +00:00

... 3 4 5 6 7 ...

4865 Commits