freebsd-dev

Author	SHA1	Message	Date
Gabor Kovesdan	8fb3bbe770	- Corrrect mispellings of word useful Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:45:15 +00:00
Xin LI	f2297451fe	Fix incomplete printf. PR: kern/177889 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week	2013-04-16 19:32:12 +00:00
Xin LI	c1031303f0	Don't leak lock when returning. PR: kern/177888 Submitted by: Sven-Thorsten Dietrich <sven vyatta com> MFC after: 1 week	2013-04-16 19:25:41 +00:00
Andrey V. Elsukov	e3389419ef	Reflect removing of the counter_u64_subtract() function in the macro.	2013-04-12 16:29:15 +00:00
Gleb Smirnoff	0e2bc05c47	Fix tcp_output() so that tcpcb is updated in the same manner when an mbuf allocation fails, as in a case when ip_output() returns error. To achieve that, move large block of code that updates tcpcb below the out: label. This fixes a panic, that requires the following sequence to happen: 1) The SYN was sent to the network, tp->snd_nxt = iss + 1, tp->snd_una = iss 2) The retransmit timeout happened for the SYN we had sent, tcp_timer_rexmt() sets tp->snd_nxt = tp->snd_una, and calls tcp_output(). In tcp_output m_get() fails. 3) Later on the SYN\|ACK for the SYN sent in step 1) came, tcp_input sets tp->snd_una += 1, which leads to tp->snd_una > tp->snd_nxt inconsistency, that later panics in socket buffer code. For reference, this bug fixed in DragonflyBSD repo: http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/1ff9b7d322dc5a26f7173aa8c38ecb79da80e419 Reviewed by: andre Tested by: pho Sponsored by: Nginx, Inc. PR: kern/177456 Submitted by: HouYeFei&XiBoLiu <lglion718 163.com>	2013-04-11 18:23:56 +00:00
Gleb Smirnoff	18ba072a22	Fix build.	2013-04-10 08:09:25 +00:00
Andre Oppermann	e8b3186b6a	Change certain heavily used network related mutexes and rwlocks to reside on their own cache line to prevent false sharing with other nearby structures, especially for those in the .bss segment. NB: Those mutexes and rwlocks with variables next to them that get changed on every invocation do not benefit from their own cache line. Actually it may be net negative because two cache misses would be incurred in those cases.	2013-04-09 21:02:20 +00:00
Andre Oppermann	982c1675ff	Fix a race condition on tcp listen socket teardown with pending connections in the accept queue and contiguous new incoming SYNs. Compared to the original submitters patch I've moved the test next to the SYN handling to have it together in a logical unit and reworded the comment explaining the issue. Submitted by: Matt Miller <matt@matthewjmiller.net> Submitted by: Juan Mojica <jmojica@gmail.com> Reviewed by: Matt Miller (changes) Tested by: pho MFC after: 1 week	2013-04-09 20:52:26 +00:00
Gleb Smirnoff	4a21e86ec1	Fix VIMAGE build.	2013-04-09 09:15:26 +00:00
Andrey V. Elsukov	9cb8d207af	Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats. MFC after: 1 week	2013-04-09 07:11:22 +00:00
Gleb Smirnoff	5923c29332	Merge from projects/counters: TCP/IP stats. Convert 'struct ipstat' and 'struct tcpstat' to counter(9). This speeds up IP forwarding at extreme packet rates, and makes accounting more precise. Sponsored by: Nginx, Inc.	2013-04-08 19:57:21 +00:00
Michael Tuexen	ebae998767	Add a macro for checking for IPv4 link local addresses. MFC after: 1 week	2013-03-31 18:27:46 +00:00
Ed Maste	ce7ad6640c	Keep fwd_tag around for subsequent pcb lookups For TIMEWAIT handling tcp_input may have to jump back for an additional pass through pcblookup. Prior to this change the fwd_tag had been discarded after the first lookup, so a new connection attempt delivered locally via 'ipfw fwd' would fail to find a match. As of r248886 the tag will be detached and freed when passed to the socket buffer.	2013-03-29 20:51:44 +00:00
Alexander V. Chernikov	ae01d73c04	Add ipfw support for setting/matching DiffServ codepoints (DSCP). Setting DSCP support is done via O_SETDSCP which works for both IPv4 and IPv6 packets. Fast checksum recalculation (RFC 1624) is done for IPv4. Dscp can be specified by name (AFXY, CSX, BE, EF), by value (0..63) or via tablearg. Matching DSCP is done via another opcode (O_DSCP) which accepts several classes at once (af11,af22,be). Classes are stored in bitmask (2 u32 words). Many people made their variants of this patch, the ones I'm aware of are (in alphabetic order): Dmitrii Tejblum Marcelo Araujo Roman Bogorodskiy (novel) Sergey Matveichuk (sem) Sergey Ryabin PR: kern/102471, kern/121122 MFC after: 2 weeks	2013-03-20 10:35:33 +00:00
Gleb Smirnoff	7525c48111	In m_megapullup() instead of reserving some space at the end of packet, m_align() it, reserving space to prepend data. Reviewed by: mav	2013-03-17 07:37:10 +00:00
Gleb Smirnoff	aa8bd99d99	- Replace compat macros with function calls.	2013-03-16 08:58:28 +00:00
Gleb Smirnoff	3c26f4a9bc	We can, and should use M_WAITOK here. Sponsored by: Nginx, Inc.	2013-03-15 13:10:06 +00:00
Gleb Smirnoff	dc4ad05ecd	Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.	2013-03-15 12:55:30 +00:00
Gleb Smirnoff	39f6074e2e	- Use m_getcl() instead of hand allocating. Sponsored by: Nginx, Inc.	2013-03-15 12:53:53 +00:00
Gleb Smirnoff	41a7572b26	Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2(). Sorry for churn, but better now than later.	2013-03-12 13:42:47 +00:00
Gleb Smirnoff	f4562a299c	Remove LIBALIAS_LOCK_ASSERT(), including a couple with an uninitialzed argument, in code that isn't compiled in kernel. PR: kern/176667 Sponsored by: Nginx, Inc.	2013-03-11 12:22:44 +00:00
Lawrence Stewart	1e0e83d760	The hashmask returned by hashinit() is a valid index in the returned hash array. Fix a siftr(4) potential memory leak and INVARIANTS triggered kernel panic in hashdestroy() by ensuring the last array index in the flow counter hash table is flushed of entries. MFC after: 3 days	2013-03-07 04:42:20 +00:00
Davide Italiano	5b999a6be0	- Make callout(9) tickless, relying on eventtimers(4) as backend for precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil	2013-03-04 11:09:56 +00:00
Michael Tuexen	e045904fdc	Fix a potential race in returning setting errno when an association goes down. Reported by Mozilla in https://bugzilla.mozilla.org/show_bug.cgi?id=845513 MFC after: 3 days	2013-02-27 19:51:47 +00:00
Andrew Gallatin	e5ca1ffab5	Fix tcp_lro_rx_ipv4() for drivers that do not set CSUM_IP_CHECKED. Specifcially, in_cksum_hdr() returns 0 (not 0xffff) when the IPv4 checksum is correct. Without this fix, the tcp_lro code will reject good IPv4 traffic from drivers that do not implement IPv4 header harder csum offload. Sponsored by: Myricom Inc. MFC after: 7 days	2013-02-21 17:00:35 +00:00
Sergey Kandaurov	46f2df9c13	ip_savecontrol() style fixes. No functional changes. - fix indentation - put the operator at the end of the line for long statements - remove spaces between the type and the variable in a cast - remove excessive parentheses Tested by: md5	2013-02-20 15:44:40 +00:00
Michael Tuexen	2416af26a0	Send the adaptation layer indication only if set by the user. MFC after: 3 days Discussed with: rrs	2013-02-11 21:02:49 +00:00
Michael Tuexen	c53f854a17	Don't send kernel provided information in the User Initiated ABORT cause, since the user can also provide this kind of information. So the receiver doesn't know who provided the information. While there: Fix a bug where the stack would send a malformed ABORT chunk when using a send() call with SCTP_ABORT\|SCT_SENDALL flags. MFC after: 3 days	2013-02-11 13:57:03 +00:00
Gleb Smirnoff	24421c1c32	Resolve source address selection in presense of CARP. Add a couple of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc	2013-02-11 10:58:22 +00:00
Michael Tuexen	f0d44a49a0	Make sure that received packets for removed addresses are handled consistently. While there, make variable names consistent. MFC after: 3 days	2013-02-10 19:57:19 +00:00
Michael Tuexen	a1cb341b5d	Cleanup the handling of address scopes. Announce in the INIT/INIT-ACK only the supported address types. While there, do some whitespace cleanups. MFC after: 1 week	2013-02-09 17:26:14 +00:00
Michael Tuexen	c39cfa1f7e	Fix a bug where HEARTBEATs were still sent in SHUTDOWN_SENT or SHUTDOWN_ACK_SENT state. While there, make the corresponding code consistent. MFC after: 1 week	2013-02-09 08:27:08 +00:00
John Baldwin	0d25fab44d	Add placeholder constants to reserve a portion of the socket option name space for use by downstream vendors to add custom options. MFC after: 2 weeks	2013-02-01 15:32:20 +00:00
Andre Oppermann	cda3447bb0	uma_zone_set_max() directly returns the rounded effective zone limit. Use the return value directly instead of doing a second uma_zone_set_max() step. MFC after: 1 week	2013-02-01 14:21:09 +00:00
Gleb Smirnoff	498944374f	- Move AUTHORS and ACKNOWLEDGEMENTS to the end of the page. - Add myself to list of authors.	2013-01-31 10:29:22 +00:00
Gleb Smirnoff	9711a168b9	Retire struct sockaddr_inarp. Since ARP and routing are separated, "proxy only" entries don't have any meaning, thus we don't need additional field in sockaddr to pass SIN_PROXY flag. New kernel is binary compatible with old tools, since sizes of sockaddr_inarp and sockaddr_in match, and sa_family are filled with same value. The structure declaration is left for compatibility with third party software, but in tree code no longer use it. Reviewed by: ru, andre, net@	2013-01-31 08:55:21 +00:00
Gleb Smirnoff	ea26ed7eea	Utilize m_get2() to get mbuf of appropriate size.	2013-01-30 18:40:19 +00:00
Navdeep Parhar	adfaf8f6ad	Add checks for SO_NO_OFFLOAD in a couple of places that I missed earlier in r245915.	2013-01-26 01:41:42 +00:00
Navdeep Parhar	20be068c8a	Teach toe_l2_resolve to resolve IPv6 destinations too. Reviewed by: bz@	2013-01-26 00:57:29 +00:00
Navdeep Parhar	4364ec0852	Move lle_event to if_llatbl.h lle_event replaced arp_update_event after the ARP rewrite and ended up in if_ether.h simply because arp_update_event used to be there too. IPv6 neighbor discovery is going to grow lle_event support and this is a good time to move it to if_llatbl.h. The two in-tree consumers of this event - OFED and toecore - are not affected. Reviewed by: bz@	2013-01-25 23:58:21 +00:00
Navdeep Parhar	460cf046c2	There is no need to call into the TOE driver twice in pru_rcvd (tod_rcvd and then tod_output right after that). Reviewed by: bz@	2013-01-25 22:50:52 +00:00
Navdeep Parhar	464dfeb43f	Add TCP_OFFLOAD hook in syncache_respond for IPv6 too, just like the one that exists for IPv4. Reviewed by: bz@	2013-01-25 22:16:35 +00:00
Navdeep Parhar	b218348bc3	Teach toe_4tuple_check() to deal with IPv6 4-tuples too. Reviewed by: bz@	2013-01-25 20:45:24 +00:00
Navdeep Parhar	37cc0ecb1b	Heed SO_NO_OFFLOAD. MFC after: 1 week	2013-01-25 20:23:33 +00:00
Navdeep Parhar	5cd3dcaa25	Remove redundant test, we know inp_lport is 0. MFC after: 1 week	2013-01-25 20:14:27 +00:00
John Baldwin	1d77fa5a26	Use decimal values for UDP and TCP socket options rather than hex to avoid implying that these constants should be treated as bit masks. Reviewed by: net MFC after: 1 week	2013-01-22 19:45:04 +00:00
Lawrence Stewart	5b648e797b	Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited" logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could therefore potentially corrupt the result (although under normal operation, neither variable should legitmately exceed 32 bits). [1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html Submitted by: jhb MFC after: 1 week	2013-01-22 09:44:21 +00:00
John Baldwin	6c0ef8957f	Don't drop options from the third retransmitted SYN by default. If the SYNs (or SYN/ACK replies) are dropped due to network congestion, then the remote end of the connection may act as if options such as window scaling are enabled but the local end will think they are not. This can result in very slow data transfers in the case of window scaling disagreements. The old behavior can be obtained by setting the net.inet.tcp.rexmit_drop_options sysctl to a non-zero value. Reviewed by: net@ MFC after: 2 weeks	2013-01-09 20:27:06 +00:00
Peter Wemm	8a1163e82f	Temporarily revert rev 244678. This is causing loopback problems with the lo (loopback) interfaces.	2013-01-03 10:21:28 +00:00
Michael Tuexen	11e03b3200	Some cleanups. MFC after: 3 days	2012-12-27 08:10:58 +00:00

1 2 3 4 5 ...

4586 Commits