Commit Graph

4607 Commits

Author SHA1 Message Date
Michael Tuexen
f0d44a49a0 Make sure that received packets for removed addresses are handled
consistently. While there, make variable names consistent.

MFC after: 3 days
2013-02-10 19:57:19 +00:00
Michael Tuexen
a1cb341b5d Cleanup the handling of address scopes. Announce in the INIT/INIT-ACK
only the supported address types. While there, do some whitespace
cleanups.

MFC after: 1 week
2013-02-09 17:26:14 +00:00
Michael Tuexen
c39cfa1f7e Fix a bug where HEARTBEATs were still sent in SHUTDOWN_SENT or
SHUTDOWN_ACK_SENT state. While there, make the corresponding
code consistent.

MFC after: 1 week
2013-02-09 08:27:08 +00:00
John Baldwin
0d25fab44d Add placeholder constants to reserve a portion of the socket option
name space for use by downstream vendors to add custom options.

MFC after:	2 weeks
2013-02-01 15:32:20 +00:00
Andre Oppermann
cda3447bb0 uma_zone_set_max() directly returns the rounded effective zone
limit.  Use the return value directly instead of doing a second
uma_zone_set_max() step.

MFC after:	1 week
2013-02-01 14:21:09 +00:00
Gleb Smirnoff
498944374f - Move AUTHORS and ACKNOWLEDGEMENTS to the end of the page.
- Add myself to list of authors.
2013-01-31 10:29:22 +00:00
Gleb Smirnoff
9711a168b9 Retire struct sockaddr_inarp.
Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by:	ru, andre, net@
2013-01-31 08:55:21 +00:00
Gleb Smirnoff
ea26ed7eea Utilize m_get2() to get mbuf of appropriate size. 2013-01-30 18:40:19 +00:00
Navdeep Parhar
adfaf8f6ad Add checks for SO_NO_OFFLOAD in a couple of places that I missed earlier
in r245915.
2013-01-26 01:41:42 +00:00
Navdeep Parhar
20be068c8a Teach toe_l2_resolve to resolve IPv6 destinations too.
Reviewed by:	bz@
2013-01-26 00:57:29 +00:00
Navdeep Parhar
4364ec0852 Move lle_event to if_llatbl.h
lle_event replaced arp_update_event after the ARP rewrite and ended up
in if_ether.h simply because arp_update_event used to be there too.
IPv6 neighbor discovery is going to grow lle_event support and this is a
good time to move it to if_llatbl.h.

The two in-tree consumers of this event - OFED and toecore - are not
affected.

Reviewed by:	bz@
2013-01-25 23:58:21 +00:00
Navdeep Parhar
460cf046c2 There is no need to call into the TOE driver twice in pru_rcvd (tod_rcvd
and then tod_output right after that).

Reviewed by:	bz@
2013-01-25 22:50:52 +00:00
Navdeep Parhar
464dfeb43f Add TCP_OFFLOAD hook in syncache_respond for IPv6 too, just like the one
that exists for IPv4.

Reviewed by:	bz@
2013-01-25 22:16:35 +00:00
Navdeep Parhar
b218348bc3 Teach toe_4tuple_check() to deal with IPv6 4-tuples too.
Reviewed by:	bz@
2013-01-25 20:45:24 +00:00
Navdeep Parhar
37cc0ecb1b Heed SO_NO_OFFLOAD.
MFC after:	1 week
2013-01-25 20:23:33 +00:00
Navdeep Parhar
5cd3dcaa25 Remove redundant test, we know inp_lport is 0.
MFC after:	1 week
2013-01-25 20:14:27 +00:00
John Baldwin
1d77fa5a26 Use decimal values for UDP and TCP socket options rather than hex to avoid
implying that these constants should be treated as bit masks.

Reviewed by:	net
MFC after:	1 week
2013-01-22 19:45:04 +00:00
Lawrence Stewart
5b648e797b Simplify and fix a bug in cc_ack_received()'s "are we congestion window limited"
logic (refer to [1] for associated discussion). snd_cwnd and snd_wnd are
unsigned long and on 64 bit hosts, min() will truncate them to 32 bits and could
therefore potentially corrupt the result (although under normal operation,
neither variable should legitmately exceed 32 bits).

[1] http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034297.html

Submitted by:	jhb
MFC after:	1 week
2013-01-22 09:44:21 +00:00
John Baldwin
6c0ef8957f Don't drop options from the third retransmitted SYN by default. If the
SYNs (or SYN/ACK replies) are dropped due to network congestion, then the
remote end of the connection may act as if options such as window scaling
are enabled but the local end will think they are not.  This can result in
very slow data transfers in the case of window scaling disagreements.

The old behavior can be obtained by setting the
net.inet.tcp.rexmit_drop_options sysctl to a non-zero value.

Reviewed by:	net@
MFC after:	2 weeks
2013-01-09 20:27:06 +00:00
Peter Wemm
8a1163e82f Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.
2013-01-03 10:21:28 +00:00
Michael Tuexen
11e03b3200 Some cleanups.
MFC after: 3 days
2012-12-27 08:10:58 +00:00
Michael Tuexen
72c123a8b4 Minor cleanups of debug messages.
MFC after: 3 days
2012-12-27 08:06:58 +00:00
Michael Tuexen
2c2e3218cb Fix a copy and paste error.
MFC after: 3 days
2012-12-27 08:02:58 +00:00
Gleb Smirnoff
c4d0697685 Garbage collect carp_cksum(). 2012-12-25 14:29:38 +00:00
Gleb Smirnoff
7951008b47 Change net.inet.carp.demotion sysctl to add the supplied value
to the current demotion factor instead of assigning it.

  This allows external scripts to control demotion factor together
with kernel in a raceless manner.
2012-12-25 14:08:13 +00:00
Gleb Smirnoff
e8db9937f3 Fix sysctl_handle_int() usage. Either arg1 or arg2 should be supplied,
and arg2 doesn't pass size of arg1.
2012-12-25 13:55:21 +00:00
Gleb Smirnoff
468e45f3bd The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

  However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

  This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
2012-12-25 13:01:58 +00:00
Gleb Smirnoff
3e6c8b5366 Minor style(9) changes:
- Remove declaration in initializer.
- Add empty line between logical blocks.
2012-12-24 21:35:48 +00:00
Gleb Smirnoff
b8056fae06 Fix !INET6 build after r244365. 2012-12-18 08:14:16 +00:00
Gleb Smirnoff
dd029d52fa Clear correct flag in INET6 case. 2012-12-18 08:09:44 +00:00
Andrey V. Elsukov
f491274582 Since we use different flags to detect tcp forwarding, and we share the
same code for IPv4 and IPv6 in tcp_input, we should check both
M_IP_NEXTHOP and M_IP6_NEXTHOP flags.

MFC after:	3 days
2012-12-17 20:55:33 +00:00
Gleb Smirnoff
b1ec2940af Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by:	Ian FREISLICH <ianf clue.co.za>
2012-12-13 11:11:15 +00:00
Gleb Smirnoff
78a7880f64 Fix a crash in tcp_input(), that happens when mbuf has a fwd_tag on it,
but later after processing and freeing the tag, we need to jump back again
to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to
process and free the tag for second time.

Reported & tested by:	Pawel Tyll <ptyll nitronet.pl>
MFC after:		3 days
2012-12-12 17:41:21 +00:00
Michael Tuexen
cca6f4a8f3 Get it compiling without INET and INET6 support (mainly userland stack).
MFC after: 2 weeks
2012-12-08 15:11:09 +00:00
Pawel Jakub Dawidek
6acd596efb More warnings for zones that depend on the kern.ipc.maxsockets limit.
Obtained from:	WHEEL Systems
2012-12-08 12:51:06 +00:00
Michael Tuexen
b11f07d86c Use correct padding of the ABORT chunk in case of an user initiated
abort cause is used.

MFC after: 2 weeks
2012-12-08 09:50:38 +00:00
Michael Tuexen
3fb7827628 Ensure that the padding of the last parameter of an INIT chunk
is not included in the chunk length as required by RFC 4960.
While there, cleanup sctp_send_initiate().

MFC after: 2 weeks
2012-12-08 08:22:33 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Andre Oppermann
da2299c5c7 Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability.
Checksumming the IP header of fragments is no different from doing
normal IP headers.

Discussed with:	yongari
MFC after:	1 week
2012-11-27 19:31:49 +00:00
Andre Oppermann
13feab8286 Add DELACK to list of timers.
MFC after:	1 week
2012-11-27 19:07:28 +00:00
Navdeep Parhar
825fd1e437 Make sure that tcp_timer_activate() correctly sees TCP_OFFLOAD (or not). 2012-11-27 06:42:44 +00:00
Alfred Perlstein
08373e0bc4 Auto size the tcbhashsize structure based on max sockets.
While here, also make the code that enforces power-of-two more
forgiving, instead of just resetting to 512, graciously round-down
to the next lower power of two.
2012-11-27 03:04:24 +00:00
Michael Tuexen
a50f0e3152 Add support for sctp_peeloff() also in the front states of the
association.

MFC after: 3 days
2012-11-26 16:44:03 +00:00
Michael Tuexen
e3976bb8d7 Find the endpoint for an incoming packet also if the endpoint
comes from sctp_peeloff().

MFC after: 3 days
2012-11-26 16:43:32 +00:00
Michael Tuexen
440da2d35b Allow shutdown() to be used on fds returned from sctp_peeloff().
MFC after: 3 days
2012-11-26 08:50:00 +00:00
Michael Tuexen
a3158782c2 Remove unused function.
MFC after: 1 week
2012-11-25 14:25:08 +00:00
Michael Tuexen
3a51a2647a Add support for SCTP/UDP/IPV6.
This completes the support of
http://tools.ietf.org/html/draft-ietf-tsvwg-sctp-udp-encaps

MFC after: 1 week
2012-11-17 20:04:04 +00:00
Michael Tuexen
325c8c46b1 Get the accounting working. We now have counters how many
chunks for each SCTP outgoing stream are in the send and
sent queue.
While there, improve the naming of NR-SACK related constants
recently introduced.

MFC after: 1 week
2012-11-16 19:39:10 +00:00
Roman Divacky
8252626fb4 Initialize hdrlen to 0 to avoid clang warning in NOINET case. 2012-11-10 10:41:00 +00:00
Bjoern A. Zeeb
ec89d0398b Cleanup some whitspace in this file to get it out of an upcoming patch.
MFC after:	10 days
2012-11-08 03:29:55 +00:00
Michael Tuexen
a7ad6026e0 Add per outgoing stream accounting for chunks in the send
and sent queue. This provides no functional change, but is
a preparation for an upcoming stream reset improvement.
Done with rrs@.

MFC after: 1 week
2012-11-07 22:11:38 +00:00
Michael Tuexen
2a4985847a Add some missing changes missed in the last commit.
MFC after: 1 week
X-MFC with: 242708
2012-11-07 21:25:32 +00:00
Michael Tuexen
98f2956c11 Improve PR-SCTP if used in combination with NR-SACK.
Based on work done by Mohammad Rajiullah.

MFC after: 1 week
2012-11-07 20:59:00 +00:00
Kevin Lo
0f5e7edc14 Fix typo; s/ouput/output 2012-11-07 07:00:59 +00:00
Mateusz Guzik
8e1e6e5f4a Fix possible spurious sbunlock in sctp_sorecvmsg.
Reviewed by:	tuexen
Approved by:	trasz (mentor)
MFC after:	3 days
2012-11-06 23:04:23 +00:00
Michael Tuexen
f3b05218ea Move from early SSN assignment to late SSN assignment.
This doesn't change functionality, but makes upcoming change
much easier.
Developed with rrs@ at the IETF 85.

MFC after: 1 week
2012-11-05 20:55:17 +00:00
Andre Oppermann
60ee3bb213 Back out r242262. The simplified window change/update logic wasn't
complete and ready for production use.

PR:	kern/173309
2012-11-05 09:13:06 +00:00
Andrey V. Elsukov
ffdbf9da3b Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by:	andre
2012-11-02 01:20:55 +00:00
Michael Tuexen
21f67da7c4 Whitespace changes due to upstream integration of SCTP changes in the
FreeBSD code base.
2012-10-29 20:47:32 +00:00
Michael Tuexen
24d4ce2c87 Add braces (as used elsewhere in the SCTP code). 2012-10-29 20:44:29 +00:00
Michael Tuexen
09c1c8563a Use ntohs() and htons() in correct order. However, this doesn't change
functionality.
2012-10-29 20:42:48 +00:00
Andre Oppermann
78f59b4bfd Forced commit to provide the correct commit message to r242251:
Defer sending an independent window update if a delayed ACK is pending
  saving a packet.  The window update then gets piggy-backed on the next
  already scheduled ACK.

Added grammar fixes as well.

MFC after:	2 weeks
2012-10-29 13:16:33 +00:00
Andre Oppermann
8d045dbdf3 Define the delayed ACK timeout value directly as hz/10 instead of
obfuscating it by going through PR_FASTHZ.  No functional change.

MFC after:	2 weeks
2012-10-29 12:17:02 +00:00
Andre Oppermann
322181c98e If the user has closed the socket then drop a persisting connection
after a much reduced timeout.

Typically web servers close their sockets quickly under the assumption
that the TCP connections goes away as well.  That is not entirely true
however.  If the peer closed the window we're going to wait for a long
time with lots of data in the send buffer.

MFC after:	2 weeks
2012-10-28 19:58:20 +00:00
Andre Oppermann
09440655fe Increase the initial CWND to 10 segments as defined in IETF TCPM
draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
window improves the overall performance of many web services without
risking congestion collapse.

As long as it remains a draft it is placed under a sysctl marking it
as experimental:
 net.inet.tcp.experimental.initcwnd10 = 1
When it becomes an official RFC soon the sysctl will be changed to
the RFC number and moved to net.inet.tcp.

This implementation differs from the RFC draft in that it is a bit
more conservative in the case of packet loss on SYN or SYN|ACK because
we haven't reduced the default RTO to 1 second yet.  Also the restart
window isn't yet increased as allowed.  Both will be adjusted with
upcoming changes.

Is is enabled by default.  In Linux it is enabled since kernel 3.0.

MFC after:	2 weeks
2012-10-28 19:47:46 +00:00
Andre Oppermann
77339e1cdc Update comment to reflect the change made in r242263.
MFC after:	2 weeks
2012-10-28 19:22:18 +00:00
Andre Oppermann
c4ab59c1a1 Add SACK_PERMIT to the list of TCP options that are switched off after
retransmitting a SYN three times.

MFC after:	2 weeks
2012-10-28 19:20:23 +00:00
Andre Oppermann
79ce26a08c Simplify and enhance the window change/update acceptance logic,
especially in the presence of bi-directional data transfers.

snd_wl1 tracks the right edge, including data in the reassembly
queue, of valid incoming data.  This makes it like rcv_nxt plus
reassembly.  It never goes backwards to prevent older, possibly
reordered segments from updating the window.

snd_wl2 tracks the left edge of sent data.  This makes it a duplicate
of snd_una.  However joining them right now is difficult due to
separate update dependencies in different places in the code flow.

snd_wnd tracks the current advertized send window by the peer.  In
tcp_output() the effective window is calculated by subtracting the
already in-flight data, snd_nxt less snd_una, from it.

ACK's become the main clock of window updates and will always update
the window when the left edge of what we sent is advanced.  The ACK
clock is the primary signaling mechanism in ongoing data transfers.
This works reliably even in the presence of reordering, reassembly
and retransmitted segments.  The ACK clock is most important because
it determines how much data we are allowed to inject into the network.

Zero window updates get us out of persistence mode are crucial.  Here
a segment that neither moves ACK nor SEQ but enlarges WND is accepted.

When the ACK clock is not active (that is we're not or no longer
sending any data) any segment that moves the extended right SEQ edge,
including out-of-order segments, updates the window.  This gives us
updates especially during ping-pong transfers where the peer isn't
done consuming the already acknowledged data from the receive buffer
while responding with data.

The SSH protocol is a prime candidate to benefit from the improved
bi-directional window update logic as it has its own windowing
mechanism on top of TCP and is frequently sending back protocol ACK's.

Tcpdump provided by:	darrenr
Tested by:	darrenr
MFC after:	2 weeks
2012-10-28 19:16:22 +00:00
Andre Oppermann
024fd5b6bb For retransmits of SYN|ACK from the syncache use the slightly more
aggressive special tcp_syn_backoff[] retransmit schedule instead of
the normal tcp_backoff[] schedule for established connections.

MFC after:	2 weeks
2012-10-28 19:02:07 +00:00
Andre Oppermann
f4748ef5fb When retransmitting SYN in TCPS_SYN_SENT state use TCPTV_RTOBASE,
the default retransmit timeout, as base to calculate the backoff
time until next try instead of the TCP_REXMTVAL() macro which only
works correctly when we already have measured an actual RTT+RTTVAR.

Before it would cause the first retransmit at RTOBASE, the next
four at the same time (!) about 200ms later, and then another one
again RTOBASE later.

MFC after:	2 weeks
2012-10-28 18:56:57 +00:00
Andre Oppermann
602e8e45ee Remove bogus 'else' in #ifdef that prevented the rttvar from being reset
tcp_timer_rexmt() on retransmit for IPv6 sessions.

MFC after:	2 weeks
2012-10-28 18:45:04 +00:00
Andre Oppermann
4faaea5505 Allow arbitrary MSS sizes and don't mind about the cluster size anymore.
We've got more cluster sizes for quite some time now and the orginally
imposed limits and the previously codified thoughts on efficiency gains
are no longer true.

MFC after:	2 weeks
2012-10-28 18:33:52 +00:00
Andre Oppermann
f3a10d7954 Change the syncache count reporting the current number of entries
from an unprotected u_int that reports garbage on SMP to a function
based sysctl obtaining the current value from UMA.

Also read back the actual cache_limit after page size rounding by UMA.

PR:		kern/165879
MFC after:	2 weeks
2012-10-28 18:07:34 +00:00
Andre Oppermann
aafa0b4164 Simplify implementation of net.inet.tcp.reass.maxsegments and
net.inet.tcp.reass.cursegments.

MFC after:	2 weeks
2012-10-28 17:59:46 +00:00
Andre Oppermann
f62563d33c Prevent a flurry of forced window updates when an application is
doing small reads on a (partially) filled receive socket buffer.

Normally one would a send a window update every time the available
space in the socket buffer increases by two times MSS.  This leads
to a flurry of window updates that do not provide any meaningful
new information to the sender.  There still is available space in
the window and the sender can continue sending data.  All window
updates then get carried by the regular ACKs.  Only when the socket
buffer was (almost) full and the window closed accordingly a window
updates delivery new information and allows the sender to start
sending more data again.

Send window updates only every two MSS when the socket buffer
has less than 1/8 space available, or the available space in the
socket buffer increased by 1/4 its full capacity, or the socket
buffer is very small.  The next regular data ACK will carry and
report the exact window size again.

Reported by:	sbruno
Tested by:	darrenr
Tested by:	Darren Baginski
PR:		kern/116335
MFC after:	2 weeks
2012-10-28 17:40:35 +00:00
Andre Oppermann
4249614cb0 When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
reduce the initial CWND to one segment.  This reduction got lost
some time ago due to a change in initialization ordering.

Additionally in tcp_timer_rexmt() avoid entering fast recovery when
we're still in TCPS_SYN_SENT state.

MFC after:	2 weeks
2012-10-28 17:30:28 +00:00
Andre Oppermann
cf8f04f4c0 When SYN or SYN/ACK had to be retransmitted RFC5681 requires us to
reduce the initial CWND to one segment.  This reduction got lost
some time ago due to a change in initialization ordering.

Additionally in tcp_timer_rexmt() avoid entering fast recovery when
we're still in TCPS_SYN_SENT state.

MFC after:	2 weeks
2012-10-28 17:25:08 +00:00
Andre Oppermann
22efabd40c Adjust the initial default CWND upon connection establishment to the
new and increased values specified by RFC5681 Section 3.1.

The even larger initial CWND per RFC3390, if enabled, is not affected.

MFC after:	2 weeks
2012-10-28 17:16:09 +00:00
Gleb Smirnoff
078468ede4 o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
  hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
  although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
  After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by:	Sebastian Kuzminsky <seb lineratesystems.com>
2012-10-26 21:06:33 +00:00
Andrey V. Elsukov
c1de64a495 Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by:	Yandex LLC
Discussed with:	net@
MFC after:	2 weeks
2012-10-25 09:39:14 +00:00
Gleb Smirnoff
a7f707cd37 After r241923 the updated ip_len no longer needed. 2012-10-25 09:02:21 +00:00
Gleb Smirnoff
b6fcf6f9f5 Fix error in r241913 that had broken fragment reassembly. 2012-10-25 09:00:57 +00:00
Gleb Smirnoff
9e2a372fd2 Use ip_stripoptions() instead of handrolled version. 2012-10-23 10:30:09 +00:00
Gleb Smirnoff
4937a6561f Simplify ip_stripoptions() reducing number of intermediate
variables.
2012-10-23 10:29:31 +00:00
Gleb Smirnoff
8ad458a471 Do not reduce ip_len by size of IP header in the ip_input()
before passing a packet to protocol input routines.
  For several protocols this mean that now protocol needs to
do subtraction itself, and for another half this means that
we do not need to add header length back to the packet.

  Make ip_stripoptions() to adjust ip_len, since now we enter
this function with a packet header whose ip_len does represent
length of entire packet, not payload only.
2012-10-23 08:33:13 +00:00
Xin LI
6f56329a25 Remove __P.
Submitted by:	kevlo
Reviewed by:	md5(1)
MFC after:	2 months
2012-10-22 21:49:56 +00:00
Gleb Smirnoff
8f134647ca Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

  After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

  After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by:	luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by:	ray, Olivier Cochard-Labbe <olivier cochard.me>
2012-10-22 21:09:03 +00:00
Andrey Zonov
32fe38f123 - Update cachelimit after hashsize and bucketlimit were set.
Reported by:	az
Reviewed by:	melifaro
Approved by:	kib (mentor)
MFC after:	1 week
2012-10-19 14:00:03 +00:00
Andre Oppermann
c9b652e3e8 Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
2012-10-18 13:57:24 +00:00
Ed Maste
983731268c Avoid potential bad pointer dereference.
Previously RuleAdd would leave entry->la unset for the first entry in
the proxyList.

Sponsored by: ADARA Networks
MFC After: 1 week
2012-10-17 20:23:07 +00:00
Gleb Smirnoff
e76163a539 We don't need to convert ip6_len to host byte order before
ip6_output(), the IPv6 stack is working in net byte order.

The reason this code worked before is that ip6_output()
doesn't look at ip6_plen at all and recalculates it based
on mbuf length.
2012-10-15 07:57:55 +00:00
Gleb Smirnoff
347d90acff Fix a miss from r241344: in ip_mloopback() we need to go to
net byte order prior to calling in_delayed_cksum().

Reported by:	 Olivier Cochard-Labbe <olivier cochard.me>
2012-10-14 15:08:07 +00:00
Alexander V. Chernikov
3bff27cd67 Cleanup documentation: cloning route support has been removed in r186119.
MFC after:	2 weeks
2012-10-13 09:31:01 +00:00
Gleb Smirnoff
86b61e4748 Revert fixup of ip_len from r241480. Now stack isn't yet
ready for that change.
2012-10-12 09:32:38 +00:00
Gleb Smirnoff
105bd2113b In ip_stripoptions():
- Remove unused argument and incorrect comment.
  - Fixup ip_len after stripping.
2012-10-12 09:24:24 +00:00
Alexander V. Chernikov
3c2824b9ef Do not check if found IPv4 rte is dynamic if net.inet.icmp.drop_redirect is
enabled. This eliminates one mtx_lock() per each routing lookup thus improving
performance in several cases (routing to directly connected interface or routing
to default gateway).

Icmp redirects should not be used to provide routing direction nowadays, even
for end hosts. Routers should not use them too (and this is explicitly restricted
in IPv6, see RFC 4861, clause 8.2).

Current commit changes rnh_machaddr function to 'stock' rn_match (and back) for every
AF_INET routing table in given VNET instance on drop_redirect sysctl change.

This change is part of bigger patch eliminating rte locking.

Sponsored by:	Yandex LLC
MFC after:	2 weeks
2012-10-10 19:06:11 +00:00
Kevin Lo
9823d52705 Revert previous commit...
Pointyhat to:	kevlo (myself)
2012-10-10 08:36:38 +00:00
Kevin Lo
a10cee30c9 Prefer NULL over 0 for pointers 2012-10-09 08:27:40 +00:00
Gleb Smirnoff
23e9c6dc1e After r241245 it appeared that in_delayed_cksum(), which still expects
host byte order, was sometimes called with net byte order. Since we are
moving towards net byte order throughout the stack, the function was
converted to expect net byte order, and its consumers fixed appropriately:
  - ip_output(), ipfilter(4) not changed, since already call
    in_delayed_cksum() with header in net byte order.
  - divert(4), ng_nat(4), ipfw_nat(4) now don't need to swap byte order
    there and back.
  - mrouting code and IPv6 ipsec now need to switch byte order there and
    back, but I hope, this is temporary solution.
  - In ipsec(4) shifted switch to net byte order prior to in_delayed_cksum().
  - pf_route() catches up on r241245 changes to ip_output().
2012-10-08 08:03:58 +00:00
Gleb Smirnoff
b7fb54d8ae No reason to play with IP header before calling sctp_delayed_cksum()
with offset beyond the IP header.
2012-10-08 07:21:32 +00:00