Commit Graph

3583 Commits

Author SHA1 Message Date
Robert Watson
1120ce6b69 Merge r198438 from head to stable/8:
Correct spelling typo in ip_input comment.

  Pointed out by:       N.J. Mann <njm at njm.me.uk>,
                John Nielsen <john at jnielsen.net>, julian (!), lstewart
2009-12-14 11:53:02 +00:00
Robert Watson
ec610c212b Merge r198393 from head to stable/8:
Improve grammar in ip_input comment while attempting to maintain what
  might be its meaning.

(Note, merge of the revision correcting a spelling error in this commit
will follow as well!)
2009-12-14 11:15:47 +00:00
Michael Tuexen
cf19fced17 MFC 197288,197326,197327,197328,197342,197914,197929,
197955,199365,199370,199371,199373,199866
This MFCs all SCTP/VNET relevant fixes from head.

Approved by: rrs (mentor)
2009-12-07 07:33:51 +00:00
Bjoern A. Zeeb
b4e227f473 MFC r198050:
Compare pointer to NULL rather than 0.
2009-12-05 19:44:16 +00:00
Luigi Rizzo
3cdcbc4885 some simple MFC:
r200020:
  change the type of the opcode from enum *:8  to u_int8_t
  so the size and alignment of the ipfw_insn is not compiler dependent.
  No changes in the code generated by gcc.

r200023:
  Add new sockopt names for ipfw and dummynet.

  This commit is just grabbing entries for the new names
  that will be used in the future, so you don't need to
  rebuild anything now.

r200034
  Dispatch sockopt calls to ipfw and dummynet
  using the new option numbers, IP_FW3 and IP_DUMMYNET3.
  Right now the modules return an error if called with those arguments
  so there is no danger of unwanted behaviour.

r200040
  - initialize src_ip in the main loop to prevent a compiler warning
    (gcc 4.x under linux, not sure how real is the complaint).
  - rename a macro argument to prevent name clashes.
  -  add the macro name on a couple of #endif
  - add a blank line for readability.
2009-12-05 12:51:51 +00:00
Attilio Rao
a5e831ded9 MFC r199208, r199223:
Move inet_aton() (specular to inet_ntoa(), already present in libkern)
into libkern in order to made it usable by other modules than alias_proxy.

Sponsored by:	Sandvine Incorporated
2009-11-22 16:04:49 +00:00
Bruce M Simpson
025bbb4984 MFC r199522..199528:
Pullup IPv6 mcast SSM KPI fixes from HEAD, including fix for
  filter deallocation from Stef Walter.
2009-11-20 12:30:40 +00:00
Michael Tuexen
21bd3c552d MFC 199477
Fix a bug where the system panics when a SHUTDOWN is received with an
illegal TSN.
This bug was reported by Irene Ruengeler.

Approved by: re, rrs (mentor)
2009-11-18 15:35:03 +00:00
John Baldwin
24b458cf34 MFC 198990:
Several years ago a feature was added to TCP that casued soreceive() to
send an ACK right away if data was drained from a TCP socket that had
previously advertised a zero-sized window.  The current code requires the
receive window to be exactly zero for this to kick in.  If window scaling is
enabled and the window is smaller than the scale, then the effective window
that is advertised is zero.  However, in that case the zero-sized window
handling is not enabled because the window is not exactly zero.  The fix
changes the code to check the raw window value against zero.
2009-11-17 16:17:11 +00:00
Bruce M Simpson
7ea239483a MFC r199287:
Fix a functional regression in multicast.

  Userland daemons need to see IGMP traffic regardless of the group;
  omit the imo filter check if the proto is IGMP. The kernel part
  of IGMP will have already filtered appropriately at this point.

Submitted by:   Franz Struwig
Reported by:    Ivor Prebeg, Franz Struwig
2009-11-17 10:59:51 +00:00
Oleg Bulyzhin
c366b7c1e2 MFC r198845:
Fix two issues that can lead to exceeding configured pipe bandwidth:
- do not expire queues which are not ready to be expired.
- properly calculate available burst size.

MFC r199073:
style(9): add missing parentheses
2009-11-09 10:13:24 +00:00
Christian Brueffer
a4f93c1075 MFC: r198539
Close a stream file descriptor leak.
2009-11-04 13:30:32 +00:00
Qing Li
f60909e3e2 MFC r198418
Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by:	ru
2009-10-28 21:45:25 +00:00
Qing Li
0eb4d28bac MFC 198301
In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by:	discussed with kmacy
Approved by:	re
Verified by:	rnoland, yongari
2009-10-20 21:36:56 +00:00
Qing Li
6f99a646e4 MFC r198111
This patch fixes the following issues in the ARP operation:

1. There is a regression issue in the ARP code. The incomplete
   ARP entry was timing out too quickly (1 second timeout), as
   such, a new entry is created each time arpresolve() is called.
   Therefore the maximum attempts made is always 1. Consequently
   the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
   lifetime.
3. Return "incomplete" entries to the application.
4. The return error code was incorrect.

Reviewed by:	kmacy
Approved by:	re
2009-10-20 17:44:50 +00:00
Robert Watson
92b52ada7c Merge r198196 from head to stable/8:
Rewrap ip_input() comment so that it prints more nicely.

Approved by:	re (kib)
2009-10-20 16:22:31 +00:00
Michael Tuexen
bef10df88e MFC r197868.
Use correct arguments when calling SCTP_RTALLOC().
Approved by: re, rrs (mentor)
2009-10-14 17:26:05 +00:00
Robert Watson
3e5cbaa4c7 Merge r197814 from head to stable/8:
Remove tcp_input lock statistics; these are intended for debugging only
  and are not intended to ship in 8.0 as they dirty additional cache
  lines in a performance-critical per-packet path.

Approved by:	re (kib, bz)
2009-10-09 09:18:22 +00:00
Robert Watson
f41dd6dca9 Merge r197795 from head to stable/8:
In tcp_input(), we acquire a global write lock at first only if a
  segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
  If we later have to upgrade the lock, we acquire an inpcb reference
  and drop both global/inpcb locks before reacquiring in-order.  In
  that gap, the connection may transition into TIMEWAIT, so we need
  to loop back and reevaluate the inpcb after relocking.

  Reported by:        Kamigishi Rei <spambox at haruhiism.net>
  Reviewed by:        bz

Approved by:	re (kib)
2009-10-08 11:07:15 +00:00
Qing Li
c8c92b5491 MFC r197696
Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

Approved by:	re
2009-10-06 20:33:02 +00:00
Qing Li
7ec99f713d MFC 197695
Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR:		kern/139113
Reviewed by:	bz
Approved by:	re
2009-10-06 19:44:44 +00:00
Michael Tuexen
fe36e02918 MFC r197341.
Fix errnos.

Approved by: re (bz), rrs (mentor)
2009-09-28 18:32:28 +00:00
Bruce M Simpson
bfcfe77605 MFC revs 197129,197130,197132:
Fixes to mcast userland API.
--
  Fix an API issue in leave processing for IPv4 multicast groups.
   * Do not assume that the group lookup performed by imo_match_group()
     is valid when ifp is NULL in this case.
   * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
     membership we are being asked to leave.

  Caveat user:
   * The way IPv4 multicast memberships are implemented in the inpcb layer
     at the moment, has the side-effect that struct ip_moptions will
     still hold the membership, under the old ifp, until ip_freemoptions()
     is called for the parent inpcb.
   * The underlying issue is: the inpcb layer does not get notification
     of ifp being detached going away in a thread-safe manner.
     This is non-trivial to fix.
--
  Fix an obvious logic error in the IPv4 multicast leave processing,
  where the filter mode vector was not updated correctly after the leave.
--
  Tighten input checking in inp_join_group():
   * Don't try to use the source address, when its family is unspecified.
   * If we get a join without a source, on an existing inclusive
     mode group, this is an error, as it would change the filter mode.

  Fix a problem with the handling of in_mfilter for new memberships:
   * Do not rely on imf being NULL; it is explicitly initialized to a
     non-NULL pointer when constructing a membership.
   * Explicitly initialize *imf to EX mode when the source address
     is unspecified.
  This fixes a problem with in_mfilter slot recycling in the join path.
--
  Don't allow joins w/o source on an existing group.
  This is almost always pilot error.

  We don't need to check for group filter UNDEFINED state at t1,
  because we only ever allocate filters with their groups, so we
  unconditionally reject such calls with EINVAL.
  Trying to change the active filter mode w/o going through IP_MSFILTER
  is also disallowed.

  Deals with the case described in PR 137164 upfront, cumulative
  with the fix in svn rev 197132 which only calls imo_match_source()
  if the source address family was not unspecified.
--

Revision 197136 has a text conflict, however it is a comment only change.

PR:		137164, 138689, 138690, 138691
Submitted by:	Stef Walter (with fixups)
Approved by:	re (kib)
2009-09-17 13:41:59 +00:00
Michael Tuexen
6b3c18a020 MFC 197257:
Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.

Approved by: re, rrs (mentor)
2009-09-16 14:47:50 +00:00
Michael Tuexen
04a34c6c34 Fixes two bugs:
1) A lock issue, if we ever had to try again
   we would double lock the INP lock.
2) We were allowing (at wrap) associd 0... which really
   we cannot allow since 0 normally means in most socket
   API calls that we are wishing to effect something on
   the INP not TCB.

Approved by: re, rrs (mentor)
2009-09-16 13:44:12 +00:00
Qing Li
553a7dec4b MFC r197227
Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
Approved by:	re
2009-09-15 22:46:06 +00:00
Qing Li
bb3b75e86f MFC r197225
This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by:	bz
Approved by:	re
2009-09-15 22:37:17 +00:00
Qing Li
77eb2069ce MFC r197210, 197212, 197235
The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

r197235 reverts file nfs_vfsops.c

Reviewed by:	kmacy
Approved by:	re
2009-09-15 22:25:19 +00:00
Qing Li
6d8337ba49 MFC r196714
This patch fixes the following issues:

- Routing messages are not generated when adding and removing
  interface address aliases.
- Loopback route installed for an interface address alias is
  not deleted from the routing table when that address alias
  is removed from the associated interface.
- Function in_ifscrub() is called extraneously.

Reviewed by:	gnn, kmacy, sam
Approved by:	re
2009-09-15 19:58:33 +00:00
Qing Li
599f45c5dd MFC r197203
Previously local end of point-to-point interface is not reachable
within the system that owns the interface. Packets destined to
the local end point leak to the wire towards the default gateway
if one exists. This behavior is changed as part of the L2/L3
rewrite efforts. The local end point is now reachable within the
system. The inpcb code needs to consider this fact during the
address selection process.

Reviewed by:	bz
Approved by:	re
2009-09-15 19:38:29 +00:00
Michael Tuexen
ceda2d70e4 MFC 196610:
Fix a bug where vlan interfaces are not supported by SCTP.

Approved by: re, rrs (mentor)
2009-09-12 18:08:44 +00:00
Michael Tuexen
f222133ab7 This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT
was not honored, if the socket buffer size was not 4 times that large.
MFC of 196509.

Approved by: re, rrs (mentor)`
2009-09-12 17:58:15 +00:00
Shteryana Shopova
d51cecd143 MFC r196932:
When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.

Reviewed by:	bms
Approved by:	re (kib)
2009-09-11 15:07:36 +00:00
Bjoern A. Zeeb
5b628e0c26 MFC r196738:
In case an upper layer protocol tries to send a packet but the
  L2 code does not have the ethernet address for the destination
  within the broadcast domain in the table, we remember the
  original mbuf in `la_hold' in arpresolve() and send out a
  different packet with an arp request.
  In case there will be more upper layer packets to send we will
  free an earlier one held in `la_hold' and queue the new one.

  Once we get a packet in, with which we can perfect our arp table
  entry we send out the original 'on hold' packet, should there
  be any.
  Rather than continuing to process the packet that we received,
  we returned without freeing the packet that came in, which
  basically means that we leaked an mbuf for every arp request
  we sent.

  Rather than freeing the received packet and returning, continue
  to process the incoming arp packet as well.
  This should (a) improve some setups, also proxy-arp, in case it was an
  incoming arp request and (b) resembles the behaviour FreeBSD had
  from day 1, which alignes with RFC826 "Packet reception" (merge case).

  Rename 'm0' to 'hold' to make the code more understandable as
  well as diffable to earlier versions more easily.

  Handle the link-layer entry 'la' lock comepletely in the block
  where needed and release it as early as possible, rather than
  holding it longer, down to the end of the function.

  Found by:			pointyhat, ns1
  Bug hunting session with:	erwin, simon, rwatson
  Tested by:			simon on cluster machines
  Reviewed by:			ratson, kmacy, julian

Approved by:	re (kib)
2009-09-02 16:35:57 +00:00
Qing Li
d84f95cd4a MFC r196608
Do not try to free the rt_lle entry of the cached route in
ip_output() if the cached route was not initialized from the
flow-table. The rt_lle entry is invalid unless it has been
initialized through the flow-table.

Reviewed by:	kmacy, rwatson
Approved by:	re
2009-08-30 22:39:49 +00:00
Robert Watson
a0021692f2 Merge r196535 from head to stable/8:
Use locks specific to the lltable code, rather than borrow the ifnet
  list/index locks, to protect link layer address tables.  This avoids
  lock order issues during interface teardown, but maintains the bug that
  sysctl copy routines may be called while a non-sleepable lock is held.

  Reviewed by:  bz, kmacy, qingli

Approved by:	re (kib)
2009-08-28 21:10:26 +00:00
Robert Watson
3ef94f2b72 Merge r196481 from head to stable/8:
Rework global locks for interface list and index management, correcting
  several critical bugs, including race conditions and lock order issues:

  Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
  sxlock.  Either can be held to stablize the lists and indexes, but both
  are required to write.  This allows the list to be held stable in both
  network interrupt contexts and sleepable user threads across sleeping
  memory allocations or device driver interactions.  As before, writes to
  the interface list must occur from sleepable contexts.

  Reviewed by:  bz, julian

Approved by:	re (kib)
2009-08-28 20:06:02 +00:00
Marko Zec
f04e871efc MFC r196502:
Introduce a div_destroy() function which takes over per-vnet cleanup tasks
  from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
  in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds.  In
  nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
  handler, resulting in effectively identical operation as it was prior this
  change.  div_destroy() also tears down hashtables used by ipdivert, which
  were previously left behind on ipdivert kldunloads.

  For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
  because without introducing additional locking logic it is impossible to
  atomically check whether all ipdivert instances in all vnets are idle, and
  proceed with cleanup without opening a race window for a vnet to open an
  ipdivert socket while ipdivert tear-down is in progress.

  While here, staticize div_init(), because it is not used outside of
  ip_divert.c.

  In cooperation with:  julian
  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-28 19:10:58 +00:00
Julian Elischer
f8f0b70474 MFC r196423
Fix ipfw's initialization functions to get the correct order of evaluation
  to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
  to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
  instead of the modevent handler. Correct some spelling errors in comments
  in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
  ipfw is unloaded.

  This patch is a minimal patch for 8.0
  I have a much larger patch that actually fixes the underlying problems
  that will be applied after 8.0

Reviewed by:	zec@, rwatson@, bz@(earlier version)
Approved by:	re (rwatson)
2009-08-21 11:23:29 +00:00
Peter Wemm
21f6a3982f MFC rev 196410 - deal with 'ticks' going negative after 24 days of uptime
with the default 1000hz clock in the timewait expiration code.

Approved by:    re (kensmith)
2009-08-20 23:07:53 +00:00
Will Andrews
566abe95b2 MFC r196397 from head:
Fix CARP memory leaks on carp_if's malloc'd using M_CARP.  This occurs when
  CARP tries to free them using M_IFADDR after the last address for a virtual
  host is removed and when detaching from the parent interface.

Approved by:	re (kib), ken (mentor)
2009-08-20 02:49:43 +00:00
Michael Tuexen
d51d92a789 Fix a bug in the handling of unreliable messages which
results in stalled associations.

Approved by: re, rrs (mentor)
2009-08-19 12:12:51 +00:00
Kip Macy
670151d0e4 MFC 196368
- change the interface to flowtable_lookup so that we don't rely on
    the mbuf for obtaining the fib index
  - check that a cached flow corresponds to the same fib index as the
    packet for which we are doing the lookup
  - at interface detach time flush any flows referencing stale rtentrys
    associated with the interface that is going away (fixes reported
    panics)
  - reduce the time between cleans in case the cleaner is running at
    the time the eventhandler is called and the wakeup is missed less
    time will elapse before the eventhandler returns
  - separate per-vnet initialization from global initialization
    (pointed out by jeli@)

Reviewed by:	sam@
Approved by:	re@
2009-08-18 20:39:35 +00:00
Michael Tuexen
3da1fd00cf Fix a panic when using one-to-one style sockets in non-blocking
mode and there is no listening server.
PR: 137795
Approved by: re, rrs (mentor)
2009-08-18 20:06:00 +00:00
Michael Tuexen
ca007251f2 MFC r196260.
* Fix a bug where PR-SCTP settings are ignore when using implicit
   association setup.
 * Fix a bug where message with illegal stream ids are not deleted.
 * Fix a crash when reporting back unsent messages from the send_queue.
 * Fix a bug related to INIT retransmission when the socket is already
   closed.
 * Fix a bug where associations were stalled when partial delivery API
   was enabled.
 * Fix a bug where the receive buffer size was smaller than the
   partial_delivery_point.

Approved by: re, rrs (mentor)
2009-08-15 21:37:16 +00:00
Qing Li
6e12c67559 MFC 196234
In function ip_output(), the cached route is flushed when there is a
mismatch between the cached entry and the intended destination. The
cached rtentry{} is flushed but the associated llentry{} is not. This
causes the wrong destination MAC address being used in the output
packets. The fix is to flush the llentry{} when rtentry{} is cleared.

Reviewed by:	kmacy, rwatson
Approved by:	re
2009-08-15 00:04:12 +00:00
Marko Zec
0e9c71019b MFC r196229:
SCTP is not yet compatible with options VIMAGE kernels although it compiles
  with VIMAGE defined, so explicitly disallow building such kernels.

  Reviewed by:  rrs
  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-14 23:01:21 +00:00
Julian Elischer
f394882d89 MFC of r196201
URL: http://svn.freebsd.org/changeset/base/196201

  Fix ipfw crash on uid or gid check.
  Receiving any ip packet for which there is no existing socket will
  crash if ipfw has a uid or gid test rule, as the uid/gid
  of the non existent owner of said non existent socket is tested.
  Brooks introduced this error as part of his >16 gids patch.
  It appears to be a cut-n-paste error from similar code a few lines
  before. The old code used the 'pcb' variable here, but in the
  new code that switched the 'inp' variable, which is often NULL
  and what is tested in the code further up. The rest of the multi-gid
  patch for ipfw seems solid (and cleaner than previous code).

p.s. What's up with all the properties changing? It is a fresh checkout.

Reviewed by:	brooks
Approved by:	re (rwatson)
2009-08-14 10:25:14 +00:00
Robert Watson
9d2eb78bcb Add padding to struct inpcb, missed during our padding sweep earlier in
the release cycle.

Approved by:	re (kensmith)
2009-08-02 22:47:08 +00:00
Robert Watson
315e3e38fa Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem.  These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored.  This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

      if_bridge
      if_cxgb
      if_gif
      ip_mroute
      ipdivert
      pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack.  However, that change is too agressive for
this point in the release cycle.

Reviewed by:	bz
Approved by:	re (kib)
2009-08-02 19:43:32 +00:00