Commit Graph

3610 Commits

Author SHA1 Message Date
jhb
5eac2449e3 Several years ago a feature was added to TCP that casued soreceive() to
send an ACK right away if data was drained from a TCP socket that had
previously advertised a zero-sized window.  The current code requires the
receive window to be exactly zero for this to kick in.  If window scaling is
enabled and the window is smaller than the scale, then the effective window
that is advertised is zero.  However, in that case the zero-sized window
handling is not enabled because the window is not exactly zero.  The fix
changes the code to check the raw window value against zero.

Reviewed by:	bz
MFC after:	1 week
2009-11-06 16:55:05 +00:00
oleg
f9007b2aae Fix two issues that can lead to exceeding configured pipe bandwidth:
- do not expire queues which are not ready to be expired.
- properly calculate available burst size.

MFC after:	3 days
2009-11-03 08:41:14 +00:00
tuexen
6196cae2a7 Improve round robin stream scheduler and cleanup some code.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-29 17:40:33 +00:00
brueffer
8dff34f349 Close a stream file descriptor leak.
PR:		138130
Submitted by:	Patroklos Argyroudis <argp@census-labs.com>
MFC after:	1 week
2009-10-28 12:10:29 +00:00
tuexen
8c832e3349 Bugfix: Use formula from section 7.2.3 of RFC 4960. Reported by Martin Becke.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-27 18:17:07 +00:00
tuexen
78210debc4 Improve the round robin stream scheduler.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-26 19:23:34 +00:00
rwatson
a5668d2042 Correct spelling typo in ip_input comment.
Pointed out by:	N.J. Mann <njm at njm.me.uk>,
		John Nielsen <john at jnielsen.net>, julian (!), lstewart
MFC after:	2 days
2009-10-24 09:18:26 +00:00
qingli
c96d27ad80 Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by:	ru
MFC after:	3 days
2009-10-23 18:27:34 +00:00
rwatson
552ae71ff3 Improve grammar in ip_input comment while attempting to maintain what
might be its meaning.

MFC after:	3 days
2009-10-23 13:35:00 +00:00
qingli
eeb330ad1e In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by:	discussed with kmacy
MFC after:	immediately
Verified by:	rnoland, yongari
2009-10-20 17:55:42 +00:00
rwatson
ec5eebfd83 Rewrap ip_input() comment so that it prints more nicely.
MFC after:	3 days
2009-10-18 11:23:56 +00:00
qingli
7d73ff246e This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
   ARP entry was timing out too quickly (1 second timeout), as
   such, a new entry is created each time arpresolve() is called.
   Therefore the maximum attempts made is always 1. Consequently
   the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
   lifetime.
3. Return "incomplete" entries to the application.

Reviewed by:	kmacy
MFC after:	3 days
2009-10-15 06:12:04 +00:00
bz
58b36bef21 Compare pointer to NULL rather than 0.
MFC after:	1 month
2009-10-13 20:29:14 +00:00
tuexen
6e9ccb9f7a Fix a race condition where a mutex was destroyed while sleeping on it.
Found while analyzing a report from julian. It might fix his bug.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-11 12:23:56 +00:00
julian
79c1f884ef Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
tuexen
879da4fe23 Correct include order as indicated by bz.
Approved by: re (mentor)
MFC after: 3 days
2009-10-10 13:59:18 +00:00
tuexen
085d02030a Do not include vnet.h twice.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-09 19:30:23 +00:00
tuexen
b69eca12f6 Use correct arguments when calling SCTP_RTALLOC().
Approved by: rrs (mentor)
MFC after: 0 days
2009-10-08 20:33:12 +00:00
rrs
d29665c3a8 Fix so that round robing stream scheduling works as advertised
MFC after:	0 days
2009-10-08 11:36:06 +00:00
rwatson
ef46e20857 Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.

MFC after:	3 days
2009-10-06 20:35:41 +00:00
rwatson
cff0b225cd In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order.  In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.

MFC after:	3 days
Reported by:	Kamigishi Rei <spambox at haruhiism.net>
Reviewed by:	bz
2009-10-05 22:24:13 +00:00
qingli
a1020b6c31 Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

MFC after:	immediately
2009-10-02 01:45:11 +00:00
qingli
81ff2327e3 Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR:		kern/139113
MFC after:	3 days
2009-10-02 01:34:55 +00:00
tuexen
990095d301 Fix handling of sctp_drain().
Approved by: rrs (mentor)
MFC after: 2 month
2009-09-20 11:33:39 +00:00
tuexen
0ceae3cfd1 Fix errnos.
Approved by: rrs(mentor)
MFC after: 3 days.
2009-09-20 11:32:22 +00:00
tuexen
064ad1c10a Use appropriate locking when using interface list.
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:55:12 +00:00
tuexen
9fa52644a0 Fix the disabling of sctp_drain().
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:18:42 +00:00
tuexen
de3c71bd61 Get SCTP working in combination with VIMAGE.
Contains code from bz.
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:02:16 +00:00
bms
178f20a475 Return ENOBUFS consistently if user attempts to exceed
in_mcast_maxsocksrc resource limit.

Submitted by:	syrinx
MFC after:	3 days
2009-09-18 15:12:31 +00:00
rrs
1418771847 Support for VNET in SCTP (hopefully) 2009-09-17 15:11:12 +00:00
tuexen
9695ccf66a Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.

Approved by: rrs (mentor)
MFC after: immediately
2009-09-16 14:23:31 +00:00
silby
ded53b4033 Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by:	andre, rwatson, Eric Van Gyzen
2009-09-16 05:33:15 +00:00
andre
3e3fe69f6b -Put the optimized soreceive_stream() under a compile time option called
TCP_SORECEIVE_STREAM for the time being.

Requested by:	brooks

Once compiled in make it easily switchable for testers by using a tuneable
 net.inet.tcp.soreceive_stream
and a corresponding read-only sysctl to report the current state.

Suggested by:	rwatson

MFC after:	2 days
-This line, and those below, will be ignored--
> Description of fields to fill in above:                     76 columns --|
> PR:            If a GNATS PR is affected by the change.
> Submitted by:  If someone else sent in the change.
> Reviewed by:   If someone else reviewed your modification.
> Approved by:   If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after:     N [day[s]|week[s]|month[s]].  Request a reminder email.
> Security:      Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.

M    sys/conf/options
M    sys/kern/uipc_socket.c
M    sys/netinet/tcp_subr.c
M    sys/netinet/tcp_usrreq.c
2009-09-15 22:23:45 +00:00
qingli
3a82e44273 Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 19:18:34 +00:00
qingli
620c678111 This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 18:39:27 +00:00
qingli
d073699112 The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

Reviewed by:    kmacy
2009-09-15 01:01:03 +00:00
qingli
47af097237 Previously local end of point-to-point interface is not reachable
within the system that owns the interface. Packets destined to
the local end point leak to the wire towards the default gateway
if one exists. This behavior is changed as part of the L2/L3
rewrite efforts. The local end point is now reachable within the
system. The inpcb code needs to consider this fact during the
address selection process.

Reviewed by:	bz
MFC after:	immediately
2009-09-14 22:19:47 +00:00
rrs
314c72fba8 Fixes two bugs:
1) A lock issue, if we ever had to try again
   we would double lock the INP lock.
2) We were allowing (at wrap) associd 0... which really
   we cannot allow since 0 normally means in most socket
   API calls that we are wishing to effect something on
   the INP not TCB.

MFC after:	1 week
2009-09-13 17:45:31 +00:00
bms
b7a222746c In expire_mfc(), add an assert on the multicast forwarding cache mutex.
PR:		138666
2009-09-13 01:00:24 +00:00
bms
9dcdfc8226 Comment some flawed assumptions in inp_join_group() about
mixing SSM full-state and delta-based APIs.

ENOTIME to fix right now.  No functional changes.

MFC after:	5 days
2009-09-12 20:37:44 +00:00
bms
ebe8706174 Don't allow joins w/o source on an existing group.
This is almost always pilot error.

We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IP_MSFILTER
is also disallowed.

Deals with the case described in PR 137164 upfront, cumulative
with the fix in svn rev 197132 which only calls imo_match_source()
if the source address family was not unspecified.

PR:		137164
MFC after:	5 days
2009-09-12 20:18:23 +00:00
bms
e3b721990b Tighten input checking in inp_join_group():
* Don't try to use the source address, when its family is unspecified.
 * If we get a join without a source, on an existing inclusive
   mode group, this is an error, as it would change the filter mode.

Fix a problem with the handling of in_mfilter for new memberships:
 * Do not rely on imf being NULL; it is explicitly initialized to a
   non-NULL pointer when constructing a membership.
 * Explicitly initialize *imf to EX mode when the source address
   is unspecified.

This fixes a problem with in_mfilter slot recycling in the join path.

PR:		138690
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 19:45:55 +00:00
bms
fe0bf7a268 Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.

PR:		138691
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 19:07:03 +00:00
bms
3031a90b58 Fix an API issue in leave processing for IPv4 multicast groups.
* Do not assume that the group lookup performed by imo_match_group()
   is valid when ifp is NULL in this case.
 * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
   membership we are being asked to leave.

Caveat user:
 * The way IPv4 multicast memberships are implemented in the inpcb layer
   at the moment, has the side-effect that struct ip_moptions will
   still hold the membership, under the old ifp, until ip_freemoptions()
   is called for the parent inpcb.
 * The underlying issue is: the inpcb layer does not get notification
   of ifp being detached going away in a thread-safe manner.
   This is non-trivial to fix.

But hey, at least the kernel should't panic when you unplug a card.

PR:		138689
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 18:55:15 +00:00
np
ba75578c03 Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by:	gnn, kmacy
Approved by:	gnn (mentor)
MFC after:	1 month
2009-09-08 21:17:17 +00:00
phk
c8ab4ab72e Move the duplicate definition of struct sockaddr_storage to its own
include file, and include this where the previous duplicate definitions were.

Static program checkers like FlexeLint rightfully take a dim view of
duplicate definitions, even if they currently are identical.
2009-09-08 10:39:38 +00:00
syrinx
805d7d8fea When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.

Reviewed by:	bms
Approved by:	bms

MFC after:	3 days
2009-09-07 16:00:33 +00:00
pjd
ce8377f813 Correct comment. 2009-09-06 07:29:22 +00:00
gnn
60eb51e7a3 Add ARP statistics to the kernel and netstat.
New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after:	2 weeks
In collaboration with: bz
2009-09-03 21:10:57 +00:00
bz
867ea93927 In case an upper layer protocol tries to send a packet but the
L2 code does not have the ethernet address for the destination
within the broadcast domain in the table, we remember the
original mbuf in `la_hold' in arpresolve() and send out a
different packet with an arp request.
In case there will be more upper layer packets to send we will
free an earlier one held in `la_hold' and queue the new one.

Once we get a packet in, with which we can perfect our arp table
entry we send out the original 'on hold' packet, should there
be any.
Rather than continuing to process the packet that we received,
we returned without freeing the packet that came in, which
basically means that we leaked an mbuf for every arp request
we sent.

Rather than freeing the received packet and returning, continue
to process the incoming arp packet as well.
This should (a) improve some setups, also proxy-arp, in case it was an
incoming arp request and (b) resembles the behaviour FreeBSD had
from day 1, which alignes with RFC826 "Packet reception" (merge case).

Rename 'm0' to 'hold' to make the code more understandable as
well as diffable to earlier versions more easily.

Handle the link-layer entry 'la' lock comepletely in the block
where needed and release it as early as possible, rather than
holding it longer, down to the end of the function.

Found by:			pointyhat, ns1
Bug hunting session with:	erwin, simon, rwatson
Tested by:			simon on cluster machines
Reviewed by:			ratson, kmacy, julian
MFC after:			3 days
2009-09-01 17:53:01 +00:00