Commit Graph

3890 Commits

Author SHA1 Message Date
Luigi Rizzo
3f18b51c8d put back the assigment to sched_time. It was correct, and
it was necessary.

Submitted by:	Riccardo Panicucci
2010-10-01 15:38:35 +00:00
Bjoern A. Zeeb
544794507a Proper bracketing.
PR:		kern/151100
Submitted by:	SunMinghao (sunminghao hotmail.com)
MFC after:	3 days
2010-10-01 11:48:14 +00:00
Luigi Rizzo
e53a34a766 remove an unnecessary (and wrong) assignment.
It was meant to reset idle_time (and it was not needed),
but i even used the wrong field.

Obtained from:	Oleg
MFC after:	3 days
2010-09-29 21:02:31 +00:00
Luigi Rizzo
38cc301f9f whitespace changes in preparation for future commits 2010-09-29 09:40:20 +00:00
Luigi Rizzo
a47ee22718 fix handling of initial credit for an idle pipe.
This fixes the bug where setting bw > 1 MTU/tick resulted in
infinite bandwidth if io_fast=1

PR:		147245 148429
Obtained from:	Riccardo Panicucci
MFC after:	3 days
2010-09-29 09:22:12 +00:00
Luigi Rizzo
8d74ca8ce9 fix breakage in in-kernel NAT: the code did not honor
net.inet.ip.fw.one_pass and always moved to the next rule
in case of a successful nat.

This should fix several related PR (waiting for feedback
before closing them)

PR:		145167 149572 150141
MFC after:	3 days
2010-09-28 23:23:23 +00:00
Luigi Rizzo
c08e545e99 Whitespace changes to reduce diffs wrt the most recent ipfw/dummynet code:
+ remove an unused macro,
+ adjust the constants in an enum
+ small whitespace changes

MFC after:	3 days
2010-09-28 22:46:13 +00:00
Xin LI
64e0f48e7c Add a bandaid for a long-standing race condition during route entry
un-expiring.

The previous version of code have no locking when testing rt_refcnt.
The result of the lack of locking may result in a condition where
a routing entry have a reference count but at the same time have
RTPRF_OURS bit set and an expiration timer.  These would eventually
lead to a panic:

	panic: rtqkill route really not free

When the system have ICMP redirects accepted from local gateway
in a moderate frequency, for instance.

Commit this workaround for now until we have some better solution.

PR:		kern/149804
Reviewed by:	bz
Tested by:	Zhao Xin, Pete French
MFC after:	2 weeks
2010-09-27 19:26:56 +00:00
Lawrence Stewart
d4d3e21865 Log the number of segments currently in the reassembly queue.
Sponsored by:	FreeBSD Foundation
2010-09-25 09:16:46 +00:00
Lawrence Stewart
0c236c4ebd Internalise reassembly queue related functionality and variables which should
not be used outside of the reassembly queue implementation. Provide a new
function to flush all segments from a reassembly queue and call it from the
appropriate places instead of manipulating the queue directly.

Sponsored by:	FreeBSD Foundation
Reviewed by:	andre, gnn, rpaulo
MFC after:	2 weeks
2010-09-25 04:58:46 +00:00
Attilio Rao
109c1de8ba Make the RPC specific __rpc_inet_ntop() and __rpc_inet_pton() general
in the kernel (just as inet_ntoa() and inet_aton()) are and sync their
prototype accordingly with already mentioned functions.

Sponsored by:	Sandvine Incorporated
Reviewed by:	emaste, rstone
Approved by:	dfr
MFC after:	2 weeks
2010-09-24 15:01:45 +00:00
Attilio Rao
5f6bf4518d IP_BINDANY is not correctly handled in getsockopt() case.
Fix it by specifying the correct bits.

Sponsored by:	Sandvine Incorporated
Reviewed by:	bz, emaste, rstone
Obtained from:	Sandvine Incorporated
MFC after:	10 days
2010-09-24 14:38:54 +00:00
Gleb Smirnoff
6baf7a243a Do not convert some meaningful error value to EINVAL.
Reviewed by:	will
2010-09-20 12:23:10 +00:00
Michael Tuexen
1ea735c802 Fix a locking issue which resulted in aborted associations
due to a corrupted nr-mapping array.

MFC after: 2 weeks.
2010-09-20 12:19:11 +00:00
Michael Tuexen
231b700b17 Allow the initial congestion window to be configure
to one MTU. Improve the description.

MFC after: 2 weeks.
2010-09-19 11:57:21 +00:00
Michael Tuexen
f8faf20cf6 Fix a locking issue which shows up when the code is used
on Mac OS X.

MFC after: 2 weeks.
2010-09-19 11:42:16 +00:00
Andre Oppermann
ed42031102 Rearrange the TSO code to make it more readable and to clearly
separate the decision logic, of whether we can do TSO, and the
calculation of the burst length into two distinct parts.

Change the way the TSO burst length calculation is done. While
TSO could do bursts of 65535 bytes that can't be represented in
ip_len together with the IP and TCP header. Account for that and
use IP_MAXPACKET instead of TCP_MAXWIN as base constant (both
have the same value of 64K). When more data is available prevent
less than MSS sized segments from being sent during the current
TSO burst.

Add two more KASSERTs to ensure the integrity of the packets.

Tested by:	Ben Wilber <ben-at-desync com>
MFC after:	10 days
2010-09-17 22:05:27 +00:00
Michael Tuexen
99ddc825f3 Fix a bug where the wrong PR-SCTP policy was considered.
While there, use always the same code for the check of
TTL expiration.

MFC after: 2 weeks.
2010-09-17 19:20:39 +00:00
Michael Tuexen
dcfc062535 Make the initial congestion window configurable via sysctl.
MFC after: 2 weeks.
2010-09-17 18:53:07 +00:00
Michael Tuexen
25a2a18706 * Implement initial version of send buffer splitting.
* Make send/recv buffer splitting switchable via sysctl.
* While there: Fix some comments.
2010-09-17 16:20:29 +00:00
Andre Oppermann
1c18314d17 Remove the TCP inflight bandwidth limiter as announced in r211315
to give way for the pluggable congestion control framework.  It is
the task of the congestion control algorithm to set the congestion
window and amount of inflight data without external interference.

In 'struct tcpcb' the variables previously used by the inflight
limiter are renamed to spares to keep the ABI intact and to have
some more space for future extensions.

In 'struct tcp_info' the variable 'tcpi_snd_bwnd' is not removed to
preserve the ABI.  It is always set to 0.

In siftr.c in 'struct pkt_node' the variable 'snd_bwnd' is not removed
to preserve the ABI.  It is always set to 0.

These unused variable in the various structures may be reused in the
future or garbage collected before the next release or at some other
point when an ABI change happens anyway for other reasons.

No MFC is planned.  The inflight bandwidth limiter stays disabled by
default in the other branches but remains available.
2010-09-16 21:06:45 +00:00
Andre Oppermann
2c9879e8d3 Improve comment to TCP_MINMSS by taking the wording from lstewart (with
a small difference in the last paragraph though) as suggested by jhb.

Clarify that the 'reviewed by' in r212653 by lstewart was for the
functional change, not the comments in the committed version.
2010-09-16 12:13:06 +00:00
Michael Tuexen
b3f7949dc5 Remove old debug code.
MFC after: 2 weeks.
2010-09-15 23:56:25 +00:00
Michael Tuexen
94b0d96992 Remove unused variable/assignment.
MFC after: 3 weeks.
2010-09-15 23:40:36 +00:00
Michael Tuexen
9eea4a2da7 Delay the assignment of a path for DATA chunk until they hit
the sent_queue. Honor a given path when the SCTP_ADDR_OVER
flag is set.

MFC after: 2 weeks.
2010-09-15 23:10:45 +00:00
Michael Tuexen
24f52bbd9b Use TAILQ_EMPTY() for testing if a tail queue is empty.
Set whoFrom to NULL after freeing whoFrom.
2010-09-15 21:53:10 +00:00
Michael Tuexen
3c8c191bae Remove unused variable/assignment.
MFC after: 2 weeks.
2010-09-15 21:19:54 +00:00
Michael Tuexen
b90b577ff3 Remove assignment without effect.
MFC after: 2 weeks.
2010-09-15 21:08:57 +00:00
Michael Tuexen
107cad7449 * Use !TAILQ_EMPTY() for checking if a tail queue is not empty.
* Remove assignment without any effect.

MFC after: 2 weeks.
2010-09-15 20:53:20 +00:00
Andre Oppermann
c183b9c683 Change the default MSS for IPv4 and IPv6 TCP connections from an
artificial power-of-2 rounded number to their real values specified
in RFC879 and RFC2460.

From the history and existing comments it appears that the rounded
numbers were intended to be advantageous for the kernel and mbuf
system.  However this hasn't been the case at for at least a long
time.  The mbuf clusters used in tcp_output() have enough space
to hold the larger real value for the default MSS for both IPv4 and
IPv6.  Note that the default MSS is only used when path MTU discovery
is disabled.

Update and expand related comments.

Reviewed by:	lsteward (including some word-smithing)
MFC after:	2 weeks
2010-09-15 10:39:30 +00:00
Qing Li
a458eaa039 Adding an address on an interface also requires the loopback route to
that address be installed.

PR:		kern/150481
Submitted by:	Ingo Flaschberger <if at xip.at>
MFC after:	5 days
2010-09-12 18:04:47 +00:00
Michael Tuexen
e95307c5c5 * Remove code which has no effect.
* Clean up the handling in sctp_lower_sosend().

MFC after: 3 weeks.
2010-09-09 20:51:23 +00:00
Will Andrews
15249f73e9 Fix CARP in backup mode by properly registering its hooks for INET and INET6
using ipproto_{un,}register() and the newly created ip6proto_{un,}register()
so that it can again receive IPPROTO_CARP packets allowing its state machine
to work.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-09-06 21:06:06 +00:00
Will Andrews
e24fa11d3e Fix static kernel builds with carp(4) by changing its SYSINIT order so that
it is initialized after basic protocol initialization, which allows it to
register via pf_proto_register().

Reviewed by:	bz
Approved by:	ken (mentor)
2010-09-06 21:03:30 +00:00
Gleb Smirnoff
14a268a073 in_delayed_cksum() requires host byte order.
Reported by:	Alexander Levin <amindomao googlemail.com>
MFC after:	1 week
2010-09-06 13:17:01 +00:00
Michael Tuexen
049640c1f0 Implement correct handling of address parameter and
sendinfo for SCTP send calls.

MFC after: 4 weeks.
2010-09-05 20:13:07 +00:00
Randall Stewart
52129fcd78 Fix some CLANG warnings. One clang warning is left
due to the fact that its bogus.. nam->sa_family will
not change from AF_INET6 to AF_INET (but clang
thinks it does ;-D)
2010-09-05 13:41:45 +00:00
Bjoern A. Zeeb
42db1b87d6 In case of RADIX_MPATH do not leak the IN_IFADDR read lock on
early return.

MFC after:	3 days
2010-09-04 16:06:01 +00:00
Bjoern A. Zeeb
1b48d24533 MFp4 CH=183052 183053 183258:
In protosw we define pr_protocol as short, while on the wire
  it is an uint8_t.  That way we can have "internal" protocols
  like DIVERT, SEND or gaps for modules (PROTO_SPACER).
  Switch ipproto_{un,}register to accept a short protocol number(*)
  and do an upfront check for valid boundries. With this we
  also consistently report EPROTONOSUPPORT for out of bounds
  protocols, as we did for proto == 0.  This allows a caller
  to not error for this case, which is especially important
  if we want to automatically call these from domain handling.

  (*) the functions have been without any in-tree consumer
  since the initial introducation, so this is considered save.

  Implement ip6proto_{un,}register() similarly to their legacy IP
  counter parts to allow modules to hook up dynamically.

Reviewed by:	philip, will
MFC after:	1 week
2010-09-02 17:43:44 +00:00
Michael Tuexen
fc0487080a Fix a bug which results in peer IPv4 addresses a.b.c.d with 224<=d<=239
incorrectly being detected as multicast addresses on little endian systems.

MFC after: 2 weeks
2010-09-01 16:11:26 +00:00
Maxim Konovalov
5a47f206a1 o Some programs could send broadcast/multicast traffic to ipfw
pseudo-interface.  This leads to a panic due to uninitialized
if_broadcastaddr address.  Initialize it and implement ip_output()
method to prevent mbuf leak later.

ipfw pseudo-interface should never send anything therefore call
panic(9) in if_start() method.

PR:		kern/149807
Submitted by:	Dmitrij Tejblum
MFC after:	2 weeks
2010-08-30 09:29:51 +00:00
Michael Tuexen
9c7635e18b Fix the the SCTP_WITH_NO_CSUM option when used in combination with
interface supporting CRC offload. While at it, make use of the
feature that the loopback interface provides CRC offloading.

MFC after: 4 weeks
2010-08-29 18:50:30 +00:00
Michael Tuexen
e24ea413e0 Bugfix: Do not send a packet drop report in response to a received
INIT-ACK with incorrect CRC.
2010-08-28 21:15:00 +00:00
Michael Tuexen
20083c2eb1 Fix the switching on/off of CMT using sysctl and socket option.
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.

MFC after: 4 weeks
2010-08-28 17:59:51 +00:00
John Baldwin
98b9eb0db2 Simplify the tcp pcblist estimate logic slightly.
MFC after:	3 days
2010-08-27 18:17:46 +00:00
Andre Oppermann
8502ec25dc Use timestamp modulo comparison macro for automatic receive buffer
scaling to correctly handle wrapping of ticks value.

MFC after:	1 week
2010-08-27 12:34:53 +00:00
Ana Kukec
1db8d1f843 MFp4: anchie_soc2009 branch:
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space.  The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by:	bz (mentor)
2010-08-19 11:31:03 +00:00
Andre Oppermann
c3f0bdc66b If a TCP connection has been idle for one retransmit timeout or more
it must reset its congestion window back to the initial window.

RFC3390 has increased the initial window from 1 segment to up to
4 segments.

The initial window increase of RFC3390 wasn't reflected into the
restart window which remained at its original defaults of 4 segments
for local and 1 segment for all other connections.  Both values are
controllable through sysctl net.inet.tcp.local_slowstart_flightsize
and net.inet.tcp.slowstart_flightsize.

The increase helps TCP's slow start algorithm to open up the congestion
window much faster.

Reviewed by:	lstewart
MFC after:	1 week
2010-08-18 18:05:54 +00:00
Andre Oppermann
b7d747ecec Untangle the net.inet.tcp.log_in_vain and net.inet.tcp.log_debug
sysctl's and remove any side effects.

Both sysctl's share the same backend infrastructure and due to the
way it was implemented enabling net.inet.tcp.log_in_vain would also
cause log_debug output to be generated.  This was surprising and
eventually annoying to the user.

The log output backend is kept the same but a little shim is inserted
to properly separate log_in_vain and log_debug and to remove any side
effects.

PR:		kern/137317
MFC after:	1 week
2010-08-18 17:39:47 +00:00
Bjoern A. Zeeb
2278f9927d When calculating the expected memory size for userspace, also take the
number of syncache entries into account for the surplus we add to account
for a possible increase of records in the re-entry window.

Discussed with:		jhb, silby
MFC after:		1 week
2010-08-18 09:28:12 +00:00