freebsd-skq

Author	SHA1	Message	Date
glebius	45adeac7f3	When processing ICMP need frag message, ignore the suggested MTU unless it is smaller than the current one for this connection. This is behavior specified by RFC 1191, and this is how original BSD stack behaved, but this was unintentionally regressed in r182851. Reported & tested by: Richard Russo <russor whatsapp.com> Differential Revision: D3567 Sponsored by: Nginx, Inc.	2015-09-30 03:37:37 +00:00
melifaro	91b3356875	Eliminate nd6_nud_hint() and its TCP bindings. Initially function was introduced in r53541 (KAME initial commit) to "provide hints from upper layer protocols that indicate a connection is making "forward progress"" (quote from RFC 2461 7.3.1 Reachability Confirmation). However, it was converted to do nothing (e.g. just return) in r122922 (tcp_hostcache implementation) back in 2003. Some defines were moved to tcp_var.h in r169541. Then, it was broken (for non-corner cases) by r186119 (L2<>L3 split) in 2008 (NULL ifp in nd6_lookup). So, right now this code is broken and has no "real" base users. Differential Revision: https://reviews.freebsd.org/D3699	2015-09-27 05:29:34 +00:00
melifaro	4fed811000	rtsock requests for deleting interface address lles started to return EPERM instead of old "ignore-and-return 0" in r287789. This broke arp -da / ndp -cn behavior (they exit on rtsock command failure). Fix this by translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and making arp/ndp ignore these entries in batched delete. MFC after: 2 weeks	2015-09-27 04:54:29 +00:00
melifaro	7e64f01f85	Replace toe_nd6_resolve() with nd6_resolve(). Reviewed by: np	2015-09-22 19:05:44 +00:00
melifaro	536c20752f	Unify nd6 state switching by using newly-created nd6_llinfo_setstate() function. The change is mostly mechanical with the following exception: Last piece of nd6_resolve_slow() was refactored: ND6_LLINFO_PERMANENT condition was removed as always-true, explicit ND6_LLINFO_NOSTATE -> ND6_LLINFO_INCOMPLETE state transition was removed as duplicate. Reviewed by: ae Sponsored by: Yandex LLC	2015-09-21 11:19:53 +00:00
glebius	8c2720775c	Use proper byteswap macro. This isn't a functional change.	2015-09-17 17:27:49 +00:00
glebius	c2b26bf37d	In tcp_ctlinput() separate the (ip == NULL) block from the rest of the function to reduce so many levels of indentation. Style the lines that got now indentation reduced. No functional change. Checked with: md5	2015-09-16 21:42:33 +00:00
melifaro	44be0cdaa8	Unify loopback route switching: * prepare gateway before insertion * use RTM_CHANGE instead of explicit find/change route * Remove fib argument from ifa_switch_loopback_route added in r264887: if old ifp fib differes from new one, that the caller is doing something wrong * Make ifa_*_loopback_route call single ifa_maintain_loopback_route().	2015-09-16 06:23:15 +00:00
brd	32f63fe59a	Remove redundant 'man page' Reviewed by: allanjude	2015-09-15 21:16:45 +00:00
hiren	85848b2b16	Remove unnecessary tcp state transition call. Differential Revision: D3451 Reviewed by: markj MFC after: 2 weeks Sponsored by: Limelight Networks	2015-09-15 20:04:30 +00:00
melifaro	d0c2460548	* Improve logging invalid arp messages * Remove redundant check in ip_arpinput Suggested by: glebius MFC after: 2 weeks	2015-09-15 08:50:44 +00:00
melifaro	b42c7af3b5	* Require explicitl lle unlink prior to calling llentry_delete(). This one slightly decreases time of holding afdata wlock. * While here, make nd6_free() return void. No one has used its return value since r186119.	2015-09-15 06:48:19 +00:00
melifaro	5ad1f2444d	* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573	2015-09-14 16:48:19 +00:00
melifaro	e64a8234e3	* Improve error checking for arp messages. * Clean stale headers from if_ether.c. Reported by: rozhuk.im at gmail.com Reviewed by: ae MFC after: 2 weeks	2015-09-14 10:28:47 +00:00
hselasky	ac0c211c77	Update TSO limits to include all headers. To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks	2015-09-14 08:36:22 +00:00
gnn	e39dbc6166	dd DTrace probe points, translators and a corresponding script to provide the TCPDEBUG functionality with pure DTrace. Reviewed by: rwatson MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: D3530	2015-09-13 15:50:55 +00:00
tuexen	63946e657e	Fix compilation issue introduced in r287717. Thanks to bz@ for making me aware of it. MFC after: 1 week	2015-09-12 21:23:24 +00:00
tuexen	779fa4b9f9	Address a compile warning. MFC after: 1 week	2015-09-12 18:00:06 +00:00
tuexen	5429764526	Cleanup the handling of error causes for ERROR chunks. This fixes an inconsistency of the padding handling. The final padding is now considered to be a chunk padding. MFC after: 1 week	2015-09-12 17:08:51 +00:00
tuexen	1947f60716	Ensure that ERROR chunks are always padded by implementing this in the routine, which queues an ERROR chunk, instead on relyinh on the callers to do so. Since one caller missed this, this actially fixes a bug. MFC after: 1 week	2015-09-11 13:54:33 +00:00
tuexen	8a1adc38eb	RFC 4960 requires that packets containing an INIT chunk bundled with another chunk are silently discarded. Do so, instead of sending an ABORT. MFC after: 1 week	2015-09-07 14:00:38 +00:00
allanjude	1b61937107	missed file that should have been included in r287528 PR: 184110 Submitted by: Marie Helene Kvello-Aune <marieheleneka@gmail.com> Approved by: wblock (mentor)	2015-09-07 02:00:05 +00:00
adrian	d8aa14f523	Replace rss_m2cpuid with rss_soft_m2cpuid_v4 for ip_direct_nh.nh_m2cpuid, because the RSS hash may need to be recalculated. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3564	2015-09-06 20:20:48 +00:00
melifaro	e31cc5ffc6	Do not pass lle to nd6_ns_output(). Use newly-added nd6_llinfo_get_holdsrc() to extract desired IPv6 source from holdchain and pass it to the nd6_ns_output().	2015-09-05 14:14:03 +00:00
glebius	790dc6f94a	Use Jenkins hash for TCP syncache. o Unlike xor, in Jenkins hash every bit of input affects virtually every bit of output, thus salting the hash actually works. With xor salting only provides a false sense of security, since if hash(x) collides with hash(y), then of course, hash(x) ^ salt would also collide with hash(y) ^ salt. [1] o Jenkins provides much better distribution than xor, very close to ideal. TCP connection setup/teardown benchmark has shown a 10% increase with default hash size, and with bigger hashes that still provide possibility for collisions. With enormous hash size, when dataset is by an order of magnitude smaller than hash size, the benchmark has shown 4% decrease in performance decrease, which is expected and acceptable. Noticed by: Jeffrey Knockel <jeffk cs.unm.edu> [1] Benchmarks by: jch Reviewed by: jch, pkelsey, delphij Security: strengthens protection against hash collision DoS Sponsored by: Nginx, Inc.	2015-09-05 10:15:19 +00:00
glebius	f3c8a935a4	Make tcp_mtudisc() static and void. No functional changes. Sponsored by: Nginx, Inc.	2015-09-04 12:02:12 +00:00
tuexen	760f2d0d08	Don't leak memory in an error case. MFC after: 1 week	2015-09-04 09:24:07 +00:00
tuexen	a66dd7d374	Add a NULL pointer check to silence the clang code analyzer. MFC after: 1 week	2015-09-04 09:22:16 +00:00
tuexen	056def9261	Fix a bug where two SHUTDOWN_ACK chunks were sent if a SHUTDOWN chunk was received acking all outstanding data.	2015-09-03 22:15:56 +00:00
jch	c4ab4117ed	Put r284245 back in place: If at first this fix was seen as a temporary workaround for a callout(9) issue, it turns out it is instead the right way to use callout in mpsafe mode without using callout_drain(). r284245 commit message: Fix a callout race condition introduced in TCP timers callouts with r281599. In TCP timer context, it is not enough to check callout_stop() return value to decide if a callout is still running or not, previous callout_reset() return values have also to be checked. Differential Revision: https://reviews.freebsd.org/D2763	2015-08-30 13:44:39 +00:00
tuexen	646fffa685	Use 5 times RTO.Max as the default for the shutdown guard timer as required by RFC 4960. The sysctl variable can be used to overwrite this. Discussed with: rrs MFC after: 1 week	2015-08-29 17:26:29 +00:00
tuexen	b2ac8e86d2	Fix the exporting of SCTP association states to userland. Without this, associations in SHUTDOWN-PENDING were never reported correctly. MFC after: 3 weeks	2015-08-29 09:14:32 +00:00
adrian	4672b4c15b	Rename rss_soft_m2cpuid() -> rss_soft_m2cpuid_v4() in preparation for an IPv6 version to show up. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504	2015-08-29 06:58:30 +00:00
adrian	2d6b12b499	Replace the printf()s with optional rate limited debugging for RSS. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471	2015-08-28 05:58:16 +00:00
bz	6d4420afb7	get_inpcbinfo() and get_pcblist() are UDP local functions and do not do what one would expect by name. Prefix them with "udp_" to at least obviously limit the scope. This is a non-functional change. Reviewed by: gnn, rwatson MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3505	2015-08-27 15:27:41 +00:00
jch	9661bffa8e	Revert r284245: "Fix a callout race condition introduced in TCP timers callouts with r281599." r281599 fixed a TCP timer race condition, but due a callout(9) bug it also introduced another race condition workaround-ed with r284245. The callout(9) bug being fixed with r286880, we can now revert the workaround (r284245). Differential Revision: https://reviews.freebsd.org/D2079 (Initial change) Differential Revision: https://reviews.freebsd.org/D2763 (Workaround) Differential Revision: https://reviews.freebsd.org/D3078 (Fix) Sponsored by: Verisign, Inc. MFC after: 2 weeks	2015-08-24 09:30:27 +00:00
melifaro	54b3b78856	* Split allocation and table linking for lle's. Before that, the logic besides lle_create() was the following: return existing if found, create if not. This behaviour was error-prone since we had to deal with 'sudden' static<>dynamic lle changes. This commit fixes bunch of different issues like: - refcount leak when lle is converted to static. Simple check case: console 1: while true; do for i in `arp -an\|awk '$4~/incomp/{print$2}'\|tr -d '()'`; do arp -s $i 00:22:44:66:88:00 ; arp -d $i; done; done console 2: ping -f any-dead-host-in-L2 console 3: # watch for memory consumption: vmstat -m \| awk '$1~/lltable/{print$2}' - possible problems in arptimer() / nd6_timer() when dropping/reacquiring lock. New logic explicitly handles use-or-create cases in every lla_create user. Basically, most of the changes are purely mechanical. However, we explicitly avoid using existing lle's for interface/static LLE records. * While here, call lle_event handlers on all real table lle change. * Create lltable_free_entry() calling existing per-lltable lle_free_t callback for entry deletion	2015-08-20 12:05:17 +00:00
melifaro	35a4e79d8f	Check value return from lle_create() for NULL. This bug sneaked unnoticed in r286722. Reported by: adrian	2015-08-19 21:08:42 +00:00
jch	6c45383db9	Make clear that TIME_WAIT timeout expiration is managed solely by tcp_tw_2msl_scan(). Sponsored by: Verisign, Inc.	2015-08-18 08:27:26 +00:00
melifaro	235fbf1304	Fix panic when handling non-inet arp message introduced in r286825. Submitted by: delphij	2015-08-18 06:16:19 +00:00
melifaro	bc522110e3	Split arpresolve() into fast/slow path. This change isolates the most common case (e.g. successful lookup) from more complicates scenarios. It also (tries to) make code more simple by avoiding retry: cycle. The actual goal is to prepare code to the upcoming change that will allow LL address retrieval without acquiring LLE lock at all. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D3383	2015-08-16 12:23:58 +00:00
tuexen	41e76c86c5	Allow the path MTU to grow up to the outgoing interface MTU. MFC after: 3 days	2015-08-14 14:26:13 +00:00
melifaro	0ddc1c9d3a	Move lle update code from from gigantic ip_arpinput() to separate bunch of functions. The goal is to isolate actual lle updates to permit more fine-grained locking. Do all lle link-level update under AFDATA wlock. Sponsored by: Yandex LLC	2015-08-13 13:38:09 +00:00
hiren	3d944dc04e	Remove unused TCPTV_SRTTDFLT. We initialize srtt with TCPTV_SRTTBASE when we don't have any rtt estimate. Differential Revision: D3334 Sponsored by: Limelight Networks	2015-08-12 16:08:37 +00:00
melifaro	0c24547a66	Use single 'lle_timer' callout in lltable instead of two different names of the same timer.	2015-08-11 12:38:54 +00:00
melifaro	d8f92ce2cf	Store addresses instead of sockaddrs inside llentry. This permits us having all (not fully true yet) all the info needed in lookup process in first 64 bytes of 'struct llentry'. struct llentry layout: BEFORE: [rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]] AFTER [ in[6]_addr MAC .. state .. rwlock ] Currently, address part of struct llentry has only 16 bytes for the key. However, lltable does not restrict any custom lltable consumers with long keys use the previous approach (store key at (lle+1)). Sponsored by: Yandex LLC	2015-08-11 09:26:11 +00:00
melifaro	8e6b3a8d59	MFP r276712. * Split lltable_init() into lltable_allocate_htbl() (alloc hash table with default callbacks) and lltable_link() ( links any lltable to the list). * Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field. * Move lltable setup to separate functions in in[6]_domifattach.	2015-08-11 05:51:00 +00:00
melifaro	ba06112c24	Rename rt_foreach_fib() to rt_foreach_fib_walk(). Suggested by: julian	2015-08-10 20:50:31 +00:00
melifaro	4f240a9c31	Partially merge r274887,r275334,r275577,r275578,r275586 to minimize differences between projects/routing and HEAD. This commit tries to keep code logic the same while changing underlying code to use unified callbacks. * Add llt_foreach_entry method to traverse all entries in given llt * Add llt_dump_entry method to export particular lle entry in sysctl/rtsock format (code is not indented properly to minimize diff). Will be fixed in the next commits. * Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle. * Add llt_fill_sa_entry method to export address in the lle to sockaddr format. * Add llt_hash method to use in generic hash table support code. * Add llt_free_entry method which is used in llt_prefix_free code. * Prepare for fine-grained locking by separating lle unlink and deletion in lltable_free() and lltable_prefix_free(). * Provide lltable_get<ifp\|af>() functions to reduce direct 'struct lltable' access by external callers. * Remove @llt agrument from lle_free() lle callback since it was unused. * Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting. * Switch to per-af hashing code. * Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method. Update description from these functions. * Use unified lltable_free_entry() function instead of per-af one. Reviewed by: ae	2015-08-10 12:03:59 +00:00
kp	120a48fab6	tcp_reass_zone is not a VNET variable. This fixes a panic during 'sysctl -a' on VIMAGE kernels. The tcp_reass_zone variable is not VNET_DEFINE() so we can not mark it as a VNET variable (with CTLFLAG_VNET).	2015-08-09 19:07:24 +00:00

1 2 3 4 5 ...

5252 Commits