Commit Graph

3596 Commits

Author SHA1 Message Date
Qing Li
fc02323563 In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by:	discussed with kmacy
MFC after:	immediately
Verified by:	rnoland, yongari
2009-10-20 17:55:42 +00:00
Robert Watson
6426657e9f Rewrap ip_input() comment so that it prints more nicely.
MFC after:	3 days
2009-10-18 11:23:56 +00:00
Qing Li
93704ac5d7 This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
   ARP entry was timing out too quickly (1 second timeout), as
   such, a new entry is created each time arpresolve() is called.
   Therefore the maximum attempts made is always 1. Consequently
   the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
   lifetime.
3. Return "incomplete" entries to the application.

Reviewed by:	kmacy
MFC after:	3 days
2009-10-15 06:12:04 +00:00
Bjoern A. Zeeb
852da713c3 Compare pointer to NULL rather than 0.
MFC after:	1 month
2009-10-13 20:29:14 +00:00
Michael Tuexen
f71e78a1d9 Fix a race condition where a mutex was destroyed while sleeping on it.
Found while analyzing a report from julian. It might fix his bug.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-11 12:23:56 +00:00
Julian Elischer
0b4b0b0fee Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
Michael Tuexen
45623593fb Correct include order as indicated by bz.
Approved by: re (mentor)
MFC after: 3 days
2009-10-10 13:59:18 +00:00
Michael Tuexen
3b1de911e0 Do not include vnet.h twice.
Approved by: rrs (mentor)
MFC after: 3 days
2009-10-09 19:30:23 +00:00
Michael Tuexen
9dd512290c Use correct arguments when calling SCTP_RTALLOC().
Approved by: rrs (mentor)
MFC after: 0 days
2009-10-08 20:33:12 +00:00
Randall Stewart
806a5b8414 Fix so that round robing stream scheduling works as advertised
MFC after:	0 days
2009-10-08 11:36:06 +00:00
Robert Watson
f681a5fdd4 Remove tcp_input lock statistics; these are intended for debugging only
and are not intended to ship in 8.0 as they dirty additional cache
lines in a performance-critical per-packet path.

MFC after:	3 days
2009-10-06 20:35:41 +00:00
Robert Watson
883e9bc41d In tcp_input(), we acquire a global write lock at first only if a
segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
If we later have to upgrade the lock, we acquire an inpcb reference
and drop both global/inpcb locks before reacquiring in-order.  In
that gap, the connection may transition into TIMEWAIT, so we need
to loop back and reevaluate the inpcb after relocking.

MFC after:	3 days
Reported by:	Kamigishi Rei <spambox at haruhiism.net>
Reviewed by:	bz
2009-10-05 22:24:13 +00:00
Qing Li
b4a22c365c Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

MFC after:	immediately
2009-10-02 01:45:11 +00:00
Qing Li
fa3cfd39ff Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR:		kern/139113
MFC after:	3 days
2009-10-02 01:34:55 +00:00
Michael Tuexen
4b6492f5ab Fix handling of sctp_drain().
Approved by: rrs (mentor)
MFC after: 2 month
2009-09-20 11:33:39 +00:00
Michael Tuexen
2c19e7fa86 Fix errnos.
Approved by: rrs(mentor)
MFC after: 3 days.
2009-09-20 11:32:22 +00:00
Michael Tuexen
4af6c75c39 Use appropriate locking when using interface list.
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:55:12 +00:00
Michael Tuexen
30c3a8430c Fix the disabling of sctp_drain().
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:18:42 +00:00
Michael Tuexen
8518270e20 Get SCTP working in combination with VIMAGE.
Contains code from bz.
Approved by: rrs (mentor)
MFC after: 1 month.
2009-09-19 14:02:16 +00:00
Bruce M Simpson
99bf30cf01 Return ENOBUFS consistently if user attempts to exceed
in_mcast_maxsocksrc resource limit.

Submitted by:	syrinx
MFC after:	3 days
2009-09-18 15:12:31 +00:00
Randall Stewart
482444b4a5 Support for VNET in SCTP (hopefully) 2009-09-17 15:11:12 +00:00
Michael Tuexen
d830c305ea Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.

Approved by: rrs (mentor)
MFC after: immediately
2009-09-16 14:23:31 +00:00
Mike Silbersack
b8614722ff Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by:	andre, rwatson, Eric Van Gyzen
2009-09-16 05:33:15 +00:00
Andre Oppermann
11c99a6d7b -Put the optimized soreceive_stream() under a compile time option called
TCP_SORECEIVE_STREAM for the time being.

Requested by:	brooks

Once compiled in make it easily switchable for testers by using a tuneable
 net.inet.tcp.soreceive_stream
and a corresponding read-only sysctl to report the current state.

Suggested by:	rwatson

MFC after:	2 days
2009-09-15 22:23:45 +00:00
Qing Li
9bb7d0f47a Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 19:18:34 +00:00
Qing Li
cd29a7797d This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 18:39:27 +00:00
Qing Li
96ed1732bb The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

Reviewed by:    kmacy
2009-09-15 01:01:03 +00:00
Qing Li
f0bb05fca5 Previously local end of point-to-point interface is not reachable
within the system that owns the interface. Packets destined to
the local end point leak to the wire towards the default gateway
if one exists. This behavior is changed as part of the L2/L3
rewrite efforts. The local end point is now reachable within the
system. The inpcb code needs to consider this fact during the
address selection process.

Reviewed by:	bz
MFC after:	immediately
2009-09-14 22:19:47 +00:00
Randall Stewart
f3d06a3c68 Fixes two bugs:
1) A lock issue, if we ever had to try again
   we would double lock the INP lock.
2) We were allowing (at wrap) associd 0... which really
   we cannot allow since 0 normally means in most socket
   API calls that we are wishing to effect something on
   the INP not TCB.

MFC after:	1 week
2009-09-13 17:45:31 +00:00
Bruce M Simpson
6cbbe26f98 In expire_mfc(), add an assert on the multicast forwarding cache mutex.
PR:		138666
2009-09-13 01:00:24 +00:00
Bruce M Simpson
fa2eebfce6 Comment some flawed assumptions in inp_join_group() about
mixing SSM full-state and delta-based APIs.

ENOTIME to fix right now.  No functional changes.

MFC after:	5 days
2009-09-12 20:37:44 +00:00
Bruce M Simpson
0eebc0d7b4 Don't allow joins w/o source on an existing group.
This is almost always pilot error.

We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IP_MSFILTER
is also disallowed.

Deals with the case described in PR 137164 upfront, cumulative
with the fix in svn rev 197132 which only calls imo_match_source()
if the source address family was not unspecified.

PR:		137164
MFC after:	5 days
2009-09-12 20:18:23 +00:00
Bruce M Simpson
1fc39d5424 Tighten input checking in inp_join_group():
* Don't try to use the source address, when its family is unspecified.
 * If we get a join without a source, on an existing inclusive
   mode group, this is an error, as it would change the filter mode.

Fix a problem with the handling of in_mfilter for new memberships:
 * Do not rely on imf being NULL; it is explicitly initialized to a
   non-NULL pointer when constructing a membership.
 * Explicitly initialize *imf to EX mode when the source address
   is unspecified.

This fixes a problem with in_mfilter slot recycling in the join path.

PR:		138690
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 19:45:55 +00:00
Bruce M Simpson
cc5776b24d Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.

PR:		138691
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 19:07:03 +00:00
Bruce M Simpson
67e89408e5 Fix an API issue in leave processing for IPv4 multicast groups.
* Do not assume that the group lookup performed by imo_match_group()
   is valid when ifp is NULL in this case.
 * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
   membership we are being asked to leave.

Caveat user:
 * The way IPv4 multicast memberships are implemented in the inpcb layer
   at the moment, has the side-effect that struct ip_moptions will
   still hold the membership, under the old ifp, until ip_freemoptions()
   is called for the parent inpcb.
 * The underlying issue is: the inpcb layer does not get notification
   of ifp being detached going away in a thread-safe manner.
   This is non-trivial to fix.

But hey, at least the kernel should't panic when you unplug a card.

PR:		138689
Submitted by:	Stef Walter
MFC after:	5 days
2009-09-12 18:55:15 +00:00
Navdeep Parhar
9a31144537 Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by:	gnn, kmacy
Approved by:	gnn (mentor)
MFC after:	1 month
2009-09-08 21:17:17 +00:00
Poul-Henning Kamp
2ac047d1fe Move the duplicate definition of struct sockaddr_storage to its own
include file, and include this where the previous duplicate definitions were.

Static program checkers like FlexeLint rightfully take a dim view of
duplicate definitions, even if they currently are identical.
2009-09-08 10:39:38 +00:00
Shteryana Shopova
e72ae6eafd When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.

Reviewed by:	bms
Approved by:	bms

MFC after:	3 days
2009-09-07 16:00:33 +00:00
Pawel Jakub Dawidek
360488410f Correct comment. 2009-09-06 07:29:22 +00:00
George V. Neville-Neil
54fc657d59 Add ARP statistics to the kernel and netstat.
New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after:	2 weeks
In collaboration with: bz
2009-09-03 21:10:57 +00:00
Bjoern A. Zeeb
cc7e9d4325 In case an upper layer protocol tries to send a packet but the
L2 code does not have the ethernet address for the destination
within the broadcast domain in the table, we remember the
original mbuf in `la_hold' in arpresolve() and send out a
different packet with an arp request.
In case there will be more upper layer packets to send we will
free an earlier one held in `la_hold' and queue the new one.

Once we get a packet in, with which we can perfect our arp table
entry we send out the original 'on hold' packet, should there
be any.
Rather than continuing to process the packet that we received,
we returned without freeing the packet that came in, which
basically means that we leaked an mbuf for every arp request
we sent.

Rather than freeing the received packet and returning, continue
to process the incoming arp packet as well.
This should (a) improve some setups, also proxy-arp, in case it was an
incoming arp request and (b) resembles the behaviour FreeBSD had
from day 1, which alignes with RFC826 "Packet reception" (merge case).

Rename 'm0' to 'hold' to make the code more understandable as
well as diffable to earlier versions more easily.

Handle the link-layer entry 'la' lock comepletely in the block
where needed and release it as early as possible, rather than
holding it longer, down to the end of the function.

Found by:			pointyhat, ns1
Bug hunting session with:	erwin, simon, rwatson
Tested by:			simon on cluster machines
Reviewed by:			ratson, kmacy, julian
MFC after:			3 days
2009-09-01 17:53:01 +00:00
Qing Li
1bf38b1292 This patch fixes the following issues:
- Routing messages are not generated when adding and removing
  interface address aliases.
- Loopback route installed for an interface address alias is
  not deleted from the routing table when that address alias
  is removed from the associated interface.
- Function in_ifscrub() is called extraneously.

Reviewed by:	gnn, kmacy, sam
MFC after:	3 days
2009-08-31 21:02:48 +00:00
Michael Tuexen
2b77dd0181 Fix a bug where vlan interfaces are not supported by SCTP.
Approved by: rrs (mentor)
MFC after: 3 days
2009-08-28 08:41:59 +00:00
Qing Li
0437a93339 Do not try to free the rt_lle entry of the cached route in
ip_output() if the cached route was not initialized from the
flow-table. The rt_lle entry is invalid unless it has been
initialized through the flow-table.

Reviewed by:	kmacy, rwatson
MFC after:	immediately
2009-08-28 05:37:31 +00:00
Robert Watson
dc56e98f0d Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables.  This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by:	bz, kmacy
MFC after:	3 days
2009-08-25 09:52:38 +00:00
Michael Tuexen
24ae5c4a73 This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT
was not honored, if the socket buffer size was not 4 times that large.

Approved by: rrs (mentor)
MFC after: 3 days.
2009-08-24 11:46:40 +00:00
Randall Stewart
0fa753b3fb This fixes two bugs in the NR-Sack code:
1) When calculating the table offset for sliding the sack
    array, the two byte values must be "ored" together in order
    for us to do the correct sliding of the arrays.
 2) We were NOT properly doing CC and other changes to things only
    NR-Sacked. The solution here is to make a separate function that
    will actually do both CC/updates and free things if its NR sack'd.
    This actually shrinks out common code from three places (much better).

MFC after:	3 days
2009-08-24 11:13:32 +00:00
Marko Zec
2b73aacaf9 Introduce a div_destroy() function which takes over per-vnet cleanup tasks
from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds.  In
nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
handler, resulting in effectively identical operation as it was prior this
change.  div_destroy() also tears down hashtables used by ipdivert, which
were previously left behind on ipdivert kldunloads.

For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
because without introducing additional locking logic it is impossible to
atomically check whether all ipdivert instances in all vnets are idle, and
proceed with cleanup without opening a race window for a vnet to open an
ipdivert socket while ipdivert tear-down is in progress.

While here, staticize div_init(), because it is not used outside of
ip_divert.c.

In cooperation with:	julian
Approved by:	re (rwatson), julian (mentor)
MFC after:	3 days
2009-08-24 10:06:02 +00:00
Robert Watson
77dfcdc445 Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
Julian Elischer
d3cef1d91e Fix another typo right next to the previous one, that amazingly, I did
not see before.

MFC after:	 1 week
2009-08-23 08:49:32 +00:00