freebsd-skq

Author	SHA1	Message	Date
tuexen	f1eb961773	Move from early SSN assignment to late SSN assignment. This doesn't change functionality, but makes upcoming change much easier. Developed with rrs@ at the IETF 85. MFC after: 1 week	2012-11-05 20:55:17 +00:00
andre	b51779d7ea	Back out r242262. The simplified window change/update logic wasn't complete and ready for production use. PR: kern/173309	2012-11-05 09:13:06 +00:00
ae	4354018055	Remove the recently added sysctl variable net.pfil.forward. Instead, add protocol specific mbuf flags M_IP_NEXTHOP and M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup only when this flag is set. Suggested by: andre	2012-11-02 01:20:55 +00:00
tuexen	139b791e20	Whitespace changes due to upstream integration of SCTP changes in the FreeBSD code base.	2012-10-29 20:47:32 +00:00
tuexen	bd5ecc606d	Add braces (as used elsewhere in the SCTP code).	2012-10-29 20:44:29 +00:00
tuexen	02bbac6d05	Use ntohs() and htons() in correct order. However, this doesn't change functionality.	2012-10-29 20:42:48 +00:00
andre	844d4d2472	Forced commit to provide the correct commit message to r242251: Defer sending an independent window update if a delayed ACK is pending saving a packet. The window update then gets piggy-backed on the next already scheduled ACK. Added grammar fixes as well. MFC after: 2 weeks	2012-10-29 13:16:33 +00:00
andre	abf1521166	Define the delayed ACK timeout value directly as hz/10 instead of obfuscating it by going through PR_FASTHZ. No functional change. MFC after: 2 weeks	2012-10-29 12:17:02 +00:00
andre	07dc51f3cc	If the user has closed the socket then drop a persisting connection after a much reduced timeout. Typically web servers close their sockets quickly under the assumption that the TCP connections goes away as well. That is not entirely true however. If the peer closed the window we're going to wait for a long time with lots of data in the send buffer. MFC after: 2 weeks	2012-10-28 19:58:20 +00:00
andre	b824892b57	Increase the initial CWND to 10 segments as defined in IETF TCPM draft-ietf-tcpm-initcwnd-05. It explains why the increased initial window improves the overall performance of many web services without risking congestion collapse. As long as it remains a draft it is placed under a sysctl marking it as experimental: net.inet.tcp.experimental.initcwnd10 = 1 When it becomes an official RFC soon the sysctl will be changed to the RFC number and moved to net.inet.tcp. This implementation differs from the RFC draft in that it is a bit more conservative in the case of packet loss on SYN or SYN\|ACK because we haven't reduced the default RTO to 1 second yet. Also the restart window isn't yet increased as allowed. Both will be adjusted with upcoming changes. Is is enabled by default. In Linux it is enabled since kernel 3.0. MFC after: 2 weeks	2012-10-28 19:47:46 +00:00
andre	36473a548b	Update comment to reflect the change made in r242263. MFC after: 2 weeks	2012-10-28 19:22:18 +00:00
andre	ab8a697d0a	Add SACK_PERMIT to the list of TCP options that are switched off after retransmitting a SYN three times. MFC after: 2 weeks	2012-10-28 19:20:23 +00:00
andre	b21f6ebbaa	Simplify and enhance the window change/update acceptance logic, especially in the presence of bi-directional data transfers. snd_wl1 tracks the right edge, including data in the reassembly queue, of valid incoming data. This makes it like rcv_nxt plus reassembly. It never goes backwards to prevent older, possibly reordered segments from updating the window. snd_wl2 tracks the left edge of sent data. This makes it a duplicate of snd_una. However joining them right now is difficult due to separate update dependencies in different places in the code flow. snd_wnd tracks the current advertized send window by the peer. In tcp_output() the effective window is calculated by subtracting the already in-flight data, snd_nxt less snd_una, from it. ACK's become the main clock of window updates and will always update the window when the left edge of what we sent is advanced. The ACK clock is the primary signaling mechanism in ongoing data transfers. This works reliably even in the presence of reordering, reassembly and retransmitted segments. The ACK clock is most important because it determines how much data we are allowed to inject into the network. Zero window updates get us out of persistence mode are crucial. Here a segment that neither moves ACK nor SEQ but enlarges WND is accepted. When the ACK clock is not active (that is we're not or no longer sending any data) any segment that moves the extended right SEQ edge, including out-of-order segments, updates the window. This gives us updates especially during ping-pong transfers where the peer isn't done consuming the already acknowledged data from the receive buffer while responding with data. The SSH protocol is a prime candidate to benefit from the improved bi-directional window update logic as it has its own windowing mechanism on top of TCP and is frequently sending back protocol ACK's. Tcpdump provided by: darrenr Tested by: darrenr MFC after: 2 weeks	2012-10-28 19:16:22 +00:00
andre	ee161fee4d	For retransmits of SYN\|ACK from the syncache use the slightly more aggressive special tcp_syn_backoff[] retransmit schedule instead of the normal tcp_backoff[] schedule for established connections. MFC after: 2 weeks	2012-10-28 19:02:07 +00:00
andre	891f33973f	When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE, the default retransmit timeout, as base to calculate the backoff time until next try instead of the TCP_REXMTVAL() macro which only works correctly when we already have measured an actual RTT+RTTVAR. Before it would cause the first retransmit at RTOBASE, the next four at the same time (!) about 200ms later, and then another one again RTOBASE later. MFC after: 2 weeks	2012-10-28 18:56:57 +00:00
andre	06a013a7a6	Remove bogus 'else' in #ifdef that prevented the rttvar from being reset tcp_timer_rexmt() on retransmit for IPv6 sessions. MFC after: 2 weeks	2012-10-28 18:45:04 +00:00
andre	ff213d7494	Allow arbitrary MSS sizes and don't mind about the cluster size anymore. We've got more cluster sizes for quite some time now and the orginally imposed limits and the previously codified thoughts on efficiency gains are no longer true. MFC after: 2 weeks	2012-10-28 18:33:52 +00:00
andre	2d42646150	Change the syncache count reporting the current number of entries from an unprotected u_int that reports garbage on SMP to a function based sysctl obtaining the current value from UMA. Also read back the actual cache_limit after page size rounding by UMA. PR: kern/165879 MFC after: 2 weeks	2012-10-28 18:07:34 +00:00
andre	df63a1d6ea	Simplify implementation of net.inet.tcp.reass.maxsegments and net.inet.tcp.reass.cursegments. MFC after: 2 weeks	2012-10-28 17:59:46 +00:00
andre	a04f01c8df	Prevent a flurry of forced window updates when an application is doing small reads on a (partially) filled receive socket buffer. Normally one would a send a window update every time the available space in the socket buffer increases by two times MSS. This leads to a flurry of window updates that do not provide any meaningful new information to the sender. There still is available space in the window and the sender can continue sending data. All window updates then get carried by the regular ACKs. Only when the socket buffer was (almost) full and the window closed accordingly a window updates delivery new information and allows the sender to start sending more data again. Send window updates only every two MSS when the socket buffer has less than 1/8 space available, or the available space in the socket buffer increased by 1/4 its full capacity, or the socket buffer is very small. The next regular data ACK will carry and report the exact window size again. Reported by: sbruno Tested by: darrenr Tested by: Darren Baginski PR: kern/116335 MFC after: 2 weeks	2012-10-28 17:40:35 +00:00
andre	afe4bf4cff	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks	2012-10-28 17:30:28 +00:00
andre	79dbdb05fd	When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to reduce the initial CWND to one segment. This reduction got lost some time ago due to a change in initialization ordering. Additionally in tcp_timer_rexmt() avoid entering fast recovery when we're still in TCPS_SYN_SENT state. MFC after: 2 weeks	2012-10-28 17:25:08 +00:00
andre	5589a42386	Adjust the initial default CWND upon connection establishment to the new and increased values specified by RFC5681 Section 3.1. The even larger initial CWND per RFC3390, if enabled, is not affected. MFC after: 2 weeks	2012-10-28 17:16:09 +00:00
glebius	f79061ff05	o Remove last argument to ip_fragment(), and obtain all needed information on checksums directly from mbuf flags. This simplifies code. o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in hardware. Some driver may not announce CSUM_IP in theur if_hwassist, although try to do checksums if CSUM_IP set on mbuf. Example is em(4). o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP. After this change CSUM_DELAY_IP vanishes from the stack. Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>	2012-10-26 21:06:33 +00:00
ae	71112b5a8e	Remove the IPFIREWALL_FORWARD kernel option and make possible to turn on the related functionality in the runtime via the sysctl variable net.pfil.forward. It is turned off by default. Sponsored by: Yandex LLC Discussed with: net@ MFC after: 2 weeks	2012-10-25 09:39:14 +00:00
glebius	3d11eb1465	After r241923 the updated ip_len no longer needed.	2012-10-25 09:02:21 +00:00
glebius	a5c4b7118d	Fix error in r241913 that had broken fragment reassembly.	2012-10-25 09:00:57 +00:00
glebius	285432154c	Use ip_stripoptions() instead of handrolled version.	2012-10-23 10:30:09 +00:00
glebius	e4588fbb85	Simplify ip_stripoptions() reducing number of intermediate variables.	2012-10-23 10:29:31 +00:00
glebius	fea857f2a8	Do not reduce ip_len by size of IP header in the ip_input() before passing a packet to protocol input routines. For several protocols this mean that now protocol needs to do subtraction itself, and for another half this means that we do not need to add header length back to the packet. Make ip_stripoptions() to adjust ip_len, since now we enter this function with a packet header whose ip_len does represent length of entire packet, not payload only.	2012-10-23 08:33:13 +00:00
delphij	3948ce713c	Remove __P. Submitted by: kevlo Reviewed by: md5(1) MFC after: 2 months	2012-10-22 21:49:56 +00:00
glebius	5cc3ac5902	Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>	2012-10-22 21:09:03 +00:00
zont	5d9ce2d3e8	- Update cachelimit after hashsize and bucketlimit were set. Reported by: az Reviewed by: melifaro Approved by: kib (mentor) MFC after: 1 week	2012-10-19 14:00:03 +00:00
andre	34a9a386cb	Mechanically remove the last stray remains of spl* calls from net/. They have been Noop's for a long time now.	2012-10-18 13:57:24 +00:00
emaste	da1e109451	Avoid potential bad pointer dereference. Previously RuleAdd would leave entry->la unset for the first entry in the proxyList. Sponsored by: ADARA Networks MFC After: 1 week	2012-10-17 20:23:07 +00:00
glebius	eecb11a14e	We don't need to convert ip6_len to host byte order before ip6_output(), the IPv6 stack is working in net byte order. The reason this code worked before is that ip6_output() doesn't look at ip6_plen at all and recalculates it based on mbuf length.	2012-10-15 07:57:55 +00:00
glebius	b6c0be02b6	Fix a miss from r241344: in ip_mloopback() we need to go to net byte order prior to calling in_delayed_cksum(). Reported by: Olivier Cochard-Labbe <olivier cochard.me>	2012-10-14 15:08:07 +00:00
melifaro	85ee5d74ce	Cleanup documentation: cloning route support has been removed in r186119. MFC after: 2 weeks	2012-10-13 09:31:01 +00:00
glebius	1e75ca6470	Revert fixup of ip_len from r241480. Now stack isn't yet ready for that change.	2012-10-12 09:32:38 +00:00
glebius	9879a454af	In ip_stripoptions(): - Remove unused argument and incorrect comment. - Fixup ip_len after stripping.	2012-10-12 09:24:24 +00:00
melifaro	02e40e1b73	Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect is enabled. This eliminates one mtx_lock() per each routing lookup thus improving performance in several cases (routing to directly connected interface or routing to default gateway). Icmp redirects should not be used to provide routing direction nowadays, even for end hosts. Routers should not use them too (and this is explicitly restricted in IPv6, see RFC 4861, clause 8.2). Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every AF_INET routing table in given VNET instance on drop_redirect sysctl change. This change is part of bigger patch eliminating rte locking. Sponsored by: Yandex LLC MFC after: 2 weeks	2012-10-10 19:06:11 +00:00
kevlo	ceb08698f2	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
kevlo	8747a46991	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
glebius	9086143e8c	After r241245 it appeared that in_delayed_cksum(), which still expects host byte order, was sometimes called with net byte order. Since we are moving towards net byte order throughout the stack, the function was converted to expect net byte order, and its consumers fixed appropriately: - ip_output(), ipfilter(4) not changed, since already call in_delayed_cksum() with header in net byte order. - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order there and back. - mrouting code and IPv6 ipsec now need to switch byte order there and back, but I hope, this is temporary solution. - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum(). - pf_route() catches up on r241245 changes to ip_output().	2012-10-08 08:03:58 +00:00
glebius	ef52d4e591	No reason to play with IP header before calling sctp_delayed_cksum() with offset beyond the IP header.	2012-10-08 07:21:32 +00:00
glebius	f3a0231bff	A step in resolving mess with byte ordering for AF_INET. After this change: - All packets in NETISR_IP queue are in net byte order. - ip_input() is entered in net byte order and converts packet to host byte order right _after_ processing pfil(9) hooks. - ip_output() is entered in host byte order and converts packet to net byte order right _before_ processing pfil(9) hooks. - ip_fragment() accepts and emits packet in net byte order. - ip_forward(), ip_mloopback() use host byte order (untouched actually). - ip_fastforward() no longer modifies packet at all (except ip_ttl). - Swapping of byte order there and back removed from the following modules: pf(4), ipfw(4), enc(4), if_bridge(4). - Swapping of byte order added to ipfilter(4), based on __FreeBSD_version - __FreeBSD_version bumped. - pfil(9) manual page updated. Reviewed by: ray, luigi, eri, melifaro Tested by: glebius (LE), ray (BE)	2012-10-06 10:02:11 +00:00
glebius	a73c365c3b	There is a complex race in in_pcblookup_hash() and in_pcblookup_group(). Both functions need to obtain lock on the found PCB, and they can't do classic inter-lock with the PCB hash lock, due to lock order reversal. To keep the PCB stable, these functions put a reference on it and after PCB lock is acquired drop it. If the reference was the last one, this means we've raced with in_pcbfree() and the PCB is no longer valid. This approach works okay only if we are acquiring writer-lock on the PCB. In case of reader-lock, the following scenario can happen: - 2 threads locate pcb, and do in_pcbref() on it. - These 2 threads drop the inp hash lock. - Another thread comes to delete pcb via in_pcbfree(), it obtains hash lock, does in_pcbremlists(), drops hash lock, and runs in_pcbrele_wlocked(), which doesn't free the pcb due to two references on it. Then it unlocks the pcb. - 2 aforementioned threads acquire reader lock on the pcb and run in_pcbrele_rlocked(). One gets 1 from in_pcbrele_rlocked() and continues, second gets 0 and considers pcb freed, returns. - The thread that got 1 continutes working with detached pcb, which later leads to panic in the underlying protocol level. To plumb that problem an additional INPCB flag introduced - INP_FREED. We check for that flag in the in_pcbrele_rlocked() and if it is set, we pretend that that was the last reference. Discussed with: rwatson, jhb Reported by: Vladimir Medvedkin <medved rambler-co.ru>	2012-10-02 12:03:02 +00:00
glebius	a16cfb3463	carp_send_ad() should never return without rescheduling next run.	2012-09-29 05:52:19 +00:00
glebius	b83730f01b	Fix bug in TCP_KEEPCNT setting, which slipped in in the last round of reviewing of r231025. Unlike other options from this family TCP_KEEPCNT doesn't specify time interval, but a count, thus parameter supplied doesn't need to be multiplied by hz. Reported & tested by: amdmi3	2012-09-27 07:13:21 +00:00
tuexen	7e065d782a	Whitespace change. MFC after: 3 days	2012-09-23 07:43:10 +00:00
tuexen	0f22c3a78e	Declare a static function as such. MFC after: 3 days	2012-09-23 07:23:18 +00:00
tuexen	fa9edb685b	Fix a bug related to handling Re-config chunks. It is not true that the association can be removed if the socket is gone. MFC after: 3 days	2012-09-22 22:04:17 +00:00
tuexen	392169f5f0	Small cleanups. No functional change. MFC after: 10 days	2012-09-22 14:39:20 +00:00
kevlo	98ccaea0f9	Fix typo: s/pakcet/packet	2012-09-20 03:29:43 +00:00
eadler	752dbba611	s/teh/the/g Approved by: cperciva MFC after: 3 days	2012-09-14 21:59:55 +00:00
tuexen	68d0c9b61d	Small cleanups. No functional change. MFC after: 10 days	2012-09-14 18:32:20 +00:00
glebius	0ccf4838d7	o Create directory sys/netpfil, where all packet filters should reside, and move there ipfw(4) and pf(4). o Move most modified parts of pf out of contrib. Actual movements: sys/contrib/pf/net/.c -> sys/netpfil/pf/ sys/contrib/pf/net/.h -> sys/net/ contrib/pf/pfctl/.c -> sbin/pfctl contrib/pf/pfctl/.h -> sbin/pfctl contrib/pf/pfctl/pfctl.8 -> sbin/pfctl contrib/pf/pfctl/.4 -> share/man/man4 contrib/pf/pfctl/.5 -> share/man/man5 sys/netinet/ipfw -> sys/netpfil/ipfw The arguable movement is pf/net/*.h -> sys/net. There are future plans to refactor pf includes, so I decided not to break things twice. Not modified bits of pf left in contrib: authpf, ftp-proxy, tftp-proxy, pflogd. The ipfw(4) movement is planned to be merged to stable/9, to make head and stable match. Discussed with: bz, luigi	2012-09-14 11:51:49 +00:00
tuexen	3e31feb073	Whitespace changes. MFC after: 10 days	2012-09-09 08:14:04 +00:00
tuexen	5f95805e1a	Whitespace cleanup. MFC after: 10 days	2012-09-08 20:54:54 +00:00
glebius	5190d38ee3	Merge the projects/pf/head branch, that was worked on for last six months, into head. The most significant achievements in the new code: o Fine grained locking, thus much better performance. o Fixes to many problems in pf, that were specific to FreeBSD port. New code doesn't have that many ifdefs and much less OpenBSDisms, thus is more attractive to our developers. Those interested in details, can browse through SVN log of the projects/pf/head branch. And for reference, here is exact list of revisions merged: r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330, r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656, r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782, r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868, r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223, r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456, r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505, r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168, r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230, r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398, r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548, r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672, r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169, r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442, r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522, r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661, r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212. I'd like to thank people who participated in early testing: Tested by: Florian Smeets <flo freebsd.org> Tested by: Chekaluk Vitaly <artemrts ukr.net> Tested by: Ben Wilber <ben desync.com> Tested by: Ian FREISLICH <ianf cloudseed.co.za>	2012-09-08 06:41:54 +00:00
tuexen	001c4aa6c4	Don't include a structure containing a flexible array in another structure. MFC after: 10 days	2012-09-07 13:36:42 +00:00
tuexen	53b4991234	Get rid of a gcc'ism. MFC after: 10 days	2012-09-06 07:03:56 +00:00
tuexen	1f0bc9debb	Using %p in a format string requires a void *. MFC after: 10 days	2012-09-05 18:52:01 +00:00
tuexen	924bc7cbef	Use the consistenly the size of a variable. This helps to keep the code simpler for the userland implementation. MFC after: 3 days	2012-09-04 22:45:00 +00:00
tuexen	a58740bf0b	Whitespace change. MFC after: 3 days	2012-09-04 22:40:49 +00:00
melifaro	1fbae66b6e	Introduce new link-layer PFIL hook V_link_pfil_hook. Merge ether_ipfw_chk() and part of bridge_pfil() into unified ipfw_check_frame() function called by PFIL. This change was suggested by rwatson? @ DevSummit. Remove ipfw headers from ether/bridge code since they are unneeded now. Note this thange introduce some (temporary) performance penalty since PFIL read lock has to be acquired for every link-level packet. MFC after: 3 weeks	2012-09-04 19:43:26 +00:00
glebius	9b72c7eaa7	Provide a sysctl switch that allows to install ARP entries with multicast bit set. FreeBSD refuses to install such entries since 9.0, and this broke installations running Microsoft NLB, which are violating standards. Tested by: Tarasov Oleg <oleg_tarasov sg-tea.com>	2012-09-03 14:29:28 +00:00
tuexen	03552c901b	Fix a typo which results in RTT to be off by a factor of 10, if the RTT is larger than 1 second. MFC after: 3 days	2012-09-02 12:37:30 +00:00
eadler	bb5f6cf89c	Mark the ipfw interface type as not being ether. This fixes an issue where uuidgen tried to obtain a ipfw device's mac address which was always zero. PR: 170460 Submitted by: wxs Reviewed by: bdrewery Reviewed by: delphij Approved by: cperciva MFC after: 1 week	2012-09-01 23:33:49 +00:00
rrs	4952c1e53e	This small change takes care of a race condition that can occur when both sides close at the same time. If that occurs, without this fix the connection enters FIN1 on both sides and they will forever send FIN\|ACK at each other until the connection times out. This is because we stopped processing the FIN\|ACK and thus did not advance the sequence and so never ACK'd each others FIN. This fix adjusts it so we do process the FIN properly and the race goes away ;-) MFC after: 1 month	2012-08-25 09:26:37 +00:00
np	7a7bbaad5a	Correctly handle the case where an inp has already been dropped by the time the TOE driver reports that an active open failed. toe_connect_failed is supposed to handle this but it should be provided the inpcb instead of the tcpcb which may no longer be around.	2012-08-21 18:09:33 +00:00
rrs	09ab09a1b5	Though I disagree, I conceed to jhb & Rui. Note that we still have a problem with this whole structure of locks and in_input.c [it does not lock which it should not, but this can lead to crashes]. (I have seen it in our SQA testbed.. besides the one with a refcnt issue that I will have SQA work on next week ;-)	2012-08-19 11:54:02 +00:00
rrs	1bcd97d239	Ok jhb, lets move the ifa_free() down to the bottom to assure that all tables and such are removed before we start to free. This won't protect the Hash in ip_input.c but in theory should protect any other uses that do use locks. MFC after: 1 week (or more)	2012-08-17 05:51:46 +00:00
lstewart	71de2d67ba	The TCP PAWS fix for kernels with fast tick rates (r231767) changed the TCP timestamp related stack variables to reference ms directly instead of ticks. The h_ertt(4) Khelp module relies on TCP timestamp information in order to calculate its enhanced RTT estimates, but was not updated as part of r231767. Consequently, h_ertt has not been calculating correct RTT estimates since r231767 was comitted, which in turn broke all delay-based congestion control algorithms because they rely on the h_ertt RTT estimates. Fix the breakage by switching h_ertt to use tcp_ts_getticks() in place of all previous uses of the ticks variable. This ensures all timestamp related variables in h_ertt use the same units as the TCP stack and therefore results in meaningful comparisons and RTT estimate calculations. Reported & tested by: Naeem Khademi (naeemk at ifi uio no) Discussed with: bz MFC after: 3 days	2012-08-17 01:49:51 +00:00
rrs	7c7c85dcac	Its never a good idea to double free the same address. MFC after: 1 week (after the other commits ahead of this gets MFC'd)	2012-08-16 17:55:16 +00:00
luigi	1f3be6fa90	s/lenght/length/ in comments	2012-08-07 07:52:25 +00:00
luigi	eed7c1d3a5	move functions outside the SYSBEGIN/SYSEND block (SYSBEGIN/SYSEND are specific to ipfw/dummynet and are used to emulate sysctl on platforms that do not have them, and they work by creating an array which contains all the sysctl-ed symbols.)	2012-08-06 11:02:23 +00:00
luigi	b53e8390d6	use FREE_PKT instead of m_freem to free an mbuf. The former is the standard form used in ipfw/dummynet, so that it is easier to remap it to different memory managers depending on the platform.	2012-08-06 10:50:43 +00:00
tuexen	af3a9a01e1	Fix a bug found by dim@: Don't use an uninitilized variable, if INVARIANTS is on and an illegal packet with destination 0 is received. MFC after: 3 days X-MFC with: 238003	2012-08-06 10:50:23 +00:00
trociny	6bade2af3b	In tcp timers, check INP_DROPPED flag a little later, after callout_deactivate(), so if INP_DROPPED is set we return with the timer active flag cleared. For me this fixes negative keep timer values reported by `netstat -x' for connections in CLOSE state. Approved by: net (silence) MFC after: 2 weeks	2012-08-05 17:30:17 +00:00
tuexen	f3596e49d4	Fix a refcount issue. The called only decrements is stcb is NULL. MFC after: 3 days Discussed with: rrs	2012-08-05 10:47:18 +00:00
tuexen	3ad906801b	Fix a bug reported by Simon L. B. Nielsen: If an SCTP endpoint receives an ASCONF with a wildcard lookup address and incorrect verification tag, the system crashes. MFC after: 3 days.	2012-08-04 20:40:36 +00:00
tuexen	09a1e2c3bc	Testing an interface property should depend on the interface, not on an address. MFC after: 3 days	2012-08-04 08:03:30 +00:00
glebius	abf245020a	Fix races between in_lltable_prefix_free(), lla_lookup(), llentry_free() and arptimer(): o Use callout_init_rw() for lle timeout, this allows us safely disestablish them. - This allows us to simplify the arptimer() and make it race safe. o Consistently use ifp->if_afdata_lock to lock access to linked lists in the lle hashes. o Introduce new lle flag LLE_LINKED, which marks an entry that is attached to the hash. - Use LLE_LINKED to avoid double unlinking via consequent calls to llentry_free(). - Mark lle with LLE_DELETED via \|= operation istead of =, so that other flags won't be lost. o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more consistent and provide more informative KASSERTs. The patch is a collaborative work of all submitters and myself. PR: kern/165863 Submitted by: Andrey Zonov <andrey zonov.org> Submitted by: Ryan Stone <rysto32 gmail.com> Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>	2012-08-02 13:57:49 +00:00
luigi	55897c521e	replace __unused with a portable construct; fix a couple of signed/unsigned warnings.	2012-08-02 12:45:13 +00:00
luigi	15efb3237e	replace inet_ntoa_r with the more standard inet_ntop(). As discussed on -current, inet_ntoa_r() is non standard, has different arguments in userspace and kernel, and almost unused (no clients in userspace, only net/flowtable.c, net/if_llatbl.c, netinet/in_pcb.c, netinet/tcp_subr.c in the kernel)	2012-08-01 18:52:07 +00:00
luigi	bf14eed582	add a cast to avoid a signed/unsigned warning (to be removed when we will have TUNABLE_UINT constructors)	2012-08-01 18:49:00 +00:00
glebius	588de42f27	Some more whitespace cleanup.	2012-08-01 09:00:26 +00:00
glebius	53cb168f80	Some style(9) and whitespace changes. Together with: Andrey Zonov <andrey zonov.org>	2012-07-31 11:31:12 +00:00
luigi	bf32267c00	nobody uses this file except the userspace ipfw code, but the cast of a pointer to an integer needs a cast to prevent a warning for size mismatch. MFC after: 1 week	2012-07-31 08:04:49 +00:00
tuexen	df16a3505f	Fix the sctp_sockstore union such that userland programs don't depend on INET and/or INET6 to be defined and in-tune with how the kernel was compiled. MFC after: 3 days Discussed with: rrs	2012-07-26 08:10:29 +00:00
bz	d78628df35	Fix a problem when CARP is enabled on the interface for IPv4 but not for IPv6. The current checks in nd6_nbr.c along with the old version will result in ifa being NULL and subsequently the packet will be dropped. This prevented NS/NA, from working and with that IPv6. Now return the ifa from the carp lookup function in two cases: 1) if the address matches, is a carp address, and we are MASTER (as before), 2) if the address matches but it is not a carp address at all (new). Reported by: Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood) Tested on: New Y! FreeBSD cluster machines Reviewed by: glebius	2012-07-25 12:14:39 +00:00
rwatson	bb5e5ce48a	Update some stale comments regarding tcbinfo locking in the TCP input path: read locks on tcbinfo are no longer used, so won't happen. No functional change. MFC after: 3 days	2012-07-22 17:31:36 +00:00
glebius	3b4ff3bafb	Plug a reference leak: before doing 'goto again' we need to unref ia->ia_ifa if there is any. Submitted by: Andrey Zonov <andrey zonov.org>	2012-07-18 08:58:30 +00:00
glebius	636d3aa68f	When traversing global in_ifaddr list in the IFP_TO_IA() macro, we need to obtain IN_IFADDR_RLOCK().	2012-07-18 08:41:00 +00:00
tuexen	d0f5f8dadb	Fix a refcount bug when freeing an association. While there: Change code to be consistent. Discussed with rrs@. MFC after: 3 days	2012-07-17 13:03:47 +00:00
glebius	b5cd2a8e46	If ip_output() returns EMSGSIZE to tcp_output(), then the latter calls tcp_mtudisc(), which in its turn may call tcp_output(). Under certain conditions (must admit they are very special) an infinite recursion can happen. To avoid recursion we can pass struct route to ip_output() and obtain correct mtu. This allows us not to use tcp_mtudisc() but call tcp_mss_update() directly. PR: kern/155585 Submitted by: Andrey Zonov <andrey zonov.org> (original version of patch)	2012-07-16 07:08:34 +00:00
tuexen	2357a49326	Changes which improve compilation if neither INET nor INET6 is defined. MFC after: 3 days	2012-07-15 20:16:17 +00:00
tuexen	5895ece053	#ifdef INET and INET6 consistently. This also fixes a bug, where it was done wrong. MFC after: 3 days	2012-07-15 11:04:49 +00:00
tuexen	86ea1d09c9	Provide the correct notification type (SCTP_SEND_FAILED_EVENT) for unsent messages. MFC after: 3 days	2012-07-14 21:25:14 +00:00

1 2 3 4 5 ...

4554 Commits