Commit Graph

3568 Commits

Author SHA1 Message Date
Robert Watson
92b52ada7c Merge r198196 from head to stable/8:
Rewrap ip_input() comment so that it prints more nicely.

Approved by:	re (kib)
2009-10-20 16:22:31 +00:00
Michael Tuexen
bef10df88e MFC r197868.
Use correct arguments when calling SCTP_RTALLOC().
Approved by: re, rrs (mentor)
2009-10-14 17:26:05 +00:00
Robert Watson
3e5cbaa4c7 Merge r197814 from head to stable/8:
Remove tcp_input lock statistics; these are intended for debugging only
  and are not intended to ship in 8.0 as they dirty additional cache
  lines in a performance-critical per-packet path.

Approved by:	re (kib, bz)
2009-10-09 09:18:22 +00:00
Robert Watson
f41dd6dca9 Merge r197795 from head to stable/8:
In tcp_input(), we acquire a global write lock at first only if a
  segment is likely to trigger a TCP state change (i.e., FIN/RST/SYN).
  If we later have to upgrade the lock, we acquire an inpcb reference
  and drop both global/inpcb locks before reacquiring in-order.  In
  that gap, the connection may transition into TIMEWAIT, so we need
  to loop back and reevaluate the inpcb after relocking.

  Reported by:        Kamigishi Rei <spambox at haruhiism.net>
  Reviewed by:        bz

Approved by:	re (kib)
2009-10-08 11:07:15 +00:00
Qing Li
c8c92b5491 MFC r197696
Remove a log message from production code. This log message can be
triggered by a misconfigured host that is sending out gratuious ARPs.
This log message can also be triggered during a network renumbering
event when multiple prefixes co-exist on a single network segment.

Approved by:	re
2009-10-06 20:33:02 +00:00
Qing Li
7ec99f713d MFC 197695
Previously, if an address alias is configured on an interface, and
this address alias has a prefix matching that of another address
configured on the same interface, then the ARP entry for the alias
is not deleted from the ARP table when that address alias is removed.
This patch fixes the aforementioned issue.

PR:		kern/139113
Reviewed by:	bz
Approved by:	re
2009-10-06 19:44:44 +00:00
Michael Tuexen
fe36e02918 MFC r197341.
Fix errnos.

Approved by: re (bz), rrs (mentor)
2009-09-28 18:32:28 +00:00
Bruce M Simpson
bfcfe77605 MFC revs 197129,197130,197132:
Fixes to mcast userland API.
--
  Fix an API issue in leave processing for IPv4 multicast groups.
   * Do not assume that the group lookup performed by imo_match_group()
     is valid when ifp is NULL in this case.
   * Instead, return EADDRNOTAVAIL if the ifp cannot be resolved for the
     membership we are being asked to leave.

  Caveat user:
   * The way IPv4 multicast memberships are implemented in the inpcb layer
     at the moment, has the side-effect that struct ip_moptions will
     still hold the membership, under the old ifp, until ip_freemoptions()
     is called for the parent inpcb.
   * The underlying issue is: the inpcb layer does not get notification
     of ifp being detached going away in a thread-safe manner.
     This is non-trivial to fix.
--
  Fix an obvious logic error in the IPv4 multicast leave processing,
  where the filter mode vector was not updated correctly after the leave.
--
  Tighten input checking in inp_join_group():
   * Don't try to use the source address, when its family is unspecified.
   * If we get a join without a source, on an existing inclusive
     mode group, this is an error, as it would change the filter mode.

  Fix a problem with the handling of in_mfilter for new memberships:
   * Do not rely on imf being NULL; it is explicitly initialized to a
     non-NULL pointer when constructing a membership.
   * Explicitly initialize *imf to EX mode when the source address
     is unspecified.
  This fixes a problem with in_mfilter slot recycling in the join path.
--
  Don't allow joins w/o source on an existing group.
  This is almost always pilot error.

  We don't need to check for group filter UNDEFINED state at t1,
  because we only ever allocate filters with their groups, so we
  unconditionally reject such calls with EINVAL.
  Trying to change the active filter mode w/o going through IP_MSFILTER
  is also disallowed.

  Deals with the case described in PR 137164 upfront, cumulative
  with the fix in svn rev 197132 which only calls imo_match_source()
  if the source address family was not unspecified.
--

Revision 197136 has a text conflict, however it is a comment only change.

PR:		137164, 138689, 138690, 138691
Submitted by:	Stef Walter (with fixups)
Approved by:	re (kib)
2009-09-17 13:41:59 +00:00
Michael Tuexen
6b3c18a020 MFC 197257:
Fix a bug reported by Daniel Mentz:
When authenticating DATA chunks some DATA chunks
might get stuck when the MTU gets decreased via
an ICMP message.

Approved by: re, rrs (mentor)
2009-09-16 14:47:50 +00:00
Michael Tuexen
04a34c6c34 Fixes two bugs:
1) A lock issue, if we ever had to try again
   we would double lock the INP lock.
2) We were allowing (at wrap) associd 0... which really
   we cannot allow since 0 normally means in most socket
   API calls that we are wishing to effect something on
   the INP not TCB.

Approved by: re, rrs (mentor)
2009-09-16 13:44:12 +00:00
Qing Li
553a7dec4b MFC r197227
Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
Approved by:	re
2009-09-15 22:46:06 +00:00
Qing Li
bb3b75e86f MFC r197225
This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by:	bz
Approved by:	re
2009-09-15 22:37:17 +00:00
Qing Li
77eb2069ce MFC r197210, 197212, 197235
The bootp code installs an interface address and the nfs client
module tries to install the same address again. This extra code
is removed, which was discovered by the removal of a call to
in_ifscrub() in r196714. This call to in_ifscrub is put back here
because the SIOCAIFADDR command can be used to change the prefix
length of an existing alias.

r197235 reverts file nfs_vfsops.c

Reviewed by:	kmacy
Approved by:	re
2009-09-15 22:25:19 +00:00
Qing Li
6d8337ba49 MFC r196714
This patch fixes the following issues:

- Routing messages are not generated when adding and removing
  interface address aliases.
- Loopback route installed for an interface address alias is
  not deleted from the routing table when that address alias
  is removed from the associated interface.
- Function in_ifscrub() is called extraneously.

Reviewed by:	gnn, kmacy, sam
Approved by:	re
2009-09-15 19:58:33 +00:00
Qing Li
599f45c5dd MFC r197203
Previously local end of point-to-point interface is not reachable
within the system that owns the interface. Packets destined to
the local end point leak to the wire towards the default gateway
if one exists. This behavior is changed as part of the L2/L3
rewrite efforts. The local end point is now reachable within the
system. The inpcb code needs to consider this fact during the
address selection process.

Reviewed by:	bz
Approved by:	re
2009-09-15 19:38:29 +00:00
Michael Tuexen
ceda2d70e4 MFC 196610:
Fix a bug where vlan interfaces are not supported by SCTP.

Approved by: re, rrs (mentor)
2009-09-12 18:08:44 +00:00
Michael Tuexen
f222133ab7 This fixes a bug where the value set by SCTP_PARTIAL_DELIVERY_POINT
was not honored, if the socket buffer size was not 4 times that large.
MFC of 196509.

Approved by: re, rrs (mentor)`
2009-09-12 17:58:15 +00:00
Shteryana Shopova
d51cecd143 MFC r196932:
When joining a multicast group, the inp_lookup_mcast_ifp call
does a KASSERT that the group address is multicast, so the
check if this is indeed true and eventually return a EINVAL if not,
should be done before calling inp_lookup_mcast_ifp. This fixes a kernel
crash when calling setsockopt (sock, IPPROTO_IP, IP_ADD_MEMBERSHIP,...)
with invalid group address.

Reviewed by:	bms
Approved by:	re (kib)
2009-09-11 15:07:36 +00:00
Bjoern A. Zeeb
5b628e0c26 MFC r196738:
In case an upper layer protocol tries to send a packet but the
  L2 code does not have the ethernet address for the destination
  within the broadcast domain in the table, we remember the
  original mbuf in `la_hold' in arpresolve() and send out a
  different packet with an arp request.
  In case there will be more upper layer packets to send we will
  free an earlier one held in `la_hold' and queue the new one.

  Once we get a packet in, with which we can perfect our arp table
  entry we send out the original 'on hold' packet, should there
  be any.
  Rather than continuing to process the packet that we received,
  we returned without freeing the packet that came in, which
  basically means that we leaked an mbuf for every arp request
  we sent.

  Rather than freeing the received packet and returning, continue
  to process the incoming arp packet as well.
  This should (a) improve some setups, also proxy-arp, in case it was an
  incoming arp request and (b) resembles the behaviour FreeBSD had
  from day 1, which alignes with RFC826 "Packet reception" (merge case).

  Rename 'm0' to 'hold' to make the code more understandable as
  well as diffable to earlier versions more easily.

  Handle the link-layer entry 'la' lock comepletely in the block
  where needed and release it as early as possible, rather than
  holding it longer, down to the end of the function.

  Found by:			pointyhat, ns1
  Bug hunting session with:	erwin, simon, rwatson
  Tested by:			simon on cluster machines
  Reviewed by:			ratson, kmacy, julian

Approved by:	re (kib)
2009-09-02 16:35:57 +00:00
Qing Li
d84f95cd4a MFC r196608
Do not try to free the rt_lle entry of the cached route in
ip_output() if the cached route was not initialized from the
flow-table. The rt_lle entry is invalid unless it has been
initialized through the flow-table.

Reviewed by:	kmacy, rwatson
Approved by:	re
2009-08-30 22:39:49 +00:00
Robert Watson
a0021692f2 Merge r196535 from head to stable/8:
Use locks specific to the lltable code, rather than borrow the ifnet
  list/index locks, to protect link layer address tables.  This avoids
  lock order issues during interface teardown, but maintains the bug that
  sysctl copy routines may be called while a non-sleepable lock is held.

  Reviewed by:  bz, kmacy, qingli

Approved by:	re (kib)
2009-08-28 21:10:26 +00:00
Robert Watson
3ef94f2b72 Merge r196481 from head to stable/8:
Rework global locks for interface list and index management, correcting
  several critical bugs, including race conditions and lock order issues:

  Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
  sxlock.  Either can be held to stablize the lists and indexes, but both
  are required to write.  This allows the list to be held stable in both
  network interrupt contexts and sleepable user threads across sleeping
  memory allocations or device driver interactions.  As before, writes to
  the interface list must occur from sleepable contexts.

  Reviewed by:  bz, julian

Approved by:	re (kib)
2009-08-28 20:06:02 +00:00
Marko Zec
f04e871efc MFC r196502:
Introduce a div_destroy() function which takes over per-vnet cleanup tasks
  from the existing modevent / MOD_UNLOAD handler, and register div_destroy()
  in protosw as per-vnet .pr_destroy() handler for options VIMAGE builds.  In
  nooptions VIMAGE builds, div_destroy() will be invoked from the modevent
  handler, resulting in effectively identical operation as it was prior this
  change.  div_destroy() also tears down hashtables used by ipdivert, which
  were previously left behind on ipdivert kldunloads.

  For options VIMAGE builds only, temporarily disable kldunloading of ipdivert,
  because without introducing additional locking logic it is impossible to
  atomically check whether all ipdivert instances in all vnets are idle, and
  proceed with cleanup without opening a race window for a vnet to open an
  ipdivert socket while ipdivert tear-down is in progress.

  While here, staticize div_init(), because it is not used outside of
  ip_divert.c.

  In cooperation with:  julian
  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-28 19:10:58 +00:00
Julian Elischer
f8f0b70474 MFC r196423
Fix ipfw's initialization functions to get the correct order of evaluation
  to allow vnet and non vnet operation. Move some functions from ip_fw_pfil.c
  to ip_fw2.c and mode to mostly using the SYSINIT and VNET_SYSINIT handlers
  instead of the modevent handler. Correct some spelling errors in comments
  in the affected code. Note this bug fixes a crash in NON VIMAGE kernels when
  ipfw is unloaded.

  This patch is a minimal patch for 8.0
  I have a much larger patch that actually fixes the underlying problems
  that will be applied after 8.0

Reviewed by:	zec@, rwatson@, bz@(earlier version)
Approved by:	re (rwatson)
2009-08-21 11:23:29 +00:00
Peter Wemm
21f6a3982f MFC rev 196410 - deal with 'ticks' going negative after 24 days of uptime
with the default 1000hz clock in the timewait expiration code.

Approved by:    re (kensmith)
2009-08-20 23:07:53 +00:00
Will Andrews
566abe95b2 MFC r196397 from head:
Fix CARP memory leaks on carp_if's malloc'd using M_CARP.  This occurs when
  CARP tries to free them using M_IFADDR after the last address for a virtual
  host is removed and when detaching from the parent interface.

Approved by:	re (kib), ken (mentor)
2009-08-20 02:49:43 +00:00
Michael Tuexen
d51d92a789 Fix a bug in the handling of unreliable messages which
results in stalled associations.

Approved by: re, rrs (mentor)
2009-08-19 12:12:51 +00:00
Kip Macy
670151d0e4 MFC 196368
- change the interface to flowtable_lookup so that we don't rely on
    the mbuf for obtaining the fib index
  - check that a cached flow corresponds to the same fib index as the
    packet for which we are doing the lookup
  - at interface detach time flush any flows referencing stale rtentrys
    associated with the interface that is going away (fixes reported
    panics)
  - reduce the time between cleans in case the cleaner is running at
    the time the eventhandler is called and the wakeup is missed less
    time will elapse before the eventhandler returns
  - separate per-vnet initialization from global initialization
    (pointed out by jeli@)

Reviewed by:	sam@
Approved by:	re@
2009-08-18 20:39:35 +00:00
Michael Tuexen
3da1fd00cf Fix a panic when using one-to-one style sockets in non-blocking
mode and there is no listening server.
PR: 137795
Approved by: re, rrs (mentor)
2009-08-18 20:06:00 +00:00
Michael Tuexen
ca007251f2 MFC r196260.
* Fix a bug where PR-SCTP settings are ignore when using implicit
   association setup.
 * Fix a bug where message with illegal stream ids are not deleted.
 * Fix a crash when reporting back unsent messages from the send_queue.
 * Fix a bug related to INIT retransmission when the socket is already
   closed.
 * Fix a bug where associations were stalled when partial delivery API
   was enabled.
 * Fix a bug where the receive buffer size was smaller than the
   partial_delivery_point.

Approved by: re, rrs (mentor)
2009-08-15 21:37:16 +00:00
Qing Li
6e12c67559 MFC 196234
In function ip_output(), the cached route is flushed when there is a
mismatch between the cached entry and the intended destination. The
cached rtentry{} is flushed but the associated llentry{} is not. This
causes the wrong destination MAC address being used in the output
packets. The fix is to flush the llentry{} when rtentry{} is cleared.

Reviewed by:	kmacy, rwatson
Approved by:	re
2009-08-15 00:04:12 +00:00
Marko Zec
0e9c71019b MFC r196229:
SCTP is not yet compatible with options VIMAGE kernels although it compiles
  with VIMAGE defined, so explicitly disallow building such kernels.

  Reviewed by:  rrs
  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-14 23:01:21 +00:00
Julian Elischer
f394882d89 MFC of r196201
URL: http://svn.freebsd.org/changeset/base/196201

  Fix ipfw crash on uid or gid check.
  Receiving any ip packet for which there is no existing socket will
  crash if ipfw has a uid or gid test rule, as the uid/gid
  of the non existent owner of said non existent socket is tested.
  Brooks introduced this error as part of his >16 gids patch.
  It appears to be a cut-n-paste error from similar code a few lines
  before. The old code used the 'pcb' variable here, but in the
  new code that switched the 'inp' variable, which is often NULL
  and what is tested in the code further up. The rest of the multi-gid
  patch for ipfw seems solid (and cleaner than previous code).

p.s. What's up with all the properties changing? It is a fresh checkout.

Reviewed by:	brooks
Approved by:	re (rwatson)
2009-08-14 10:25:14 +00:00
Robert Watson
9d2eb78bcb Add padding to struct inpcb, missed during our padding sweep earlier in
the release cycle.

Approved by:	re (kensmith)
2009-08-02 22:47:08 +00:00
Robert Watson
315e3e38fa Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem.  These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored.  This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

      if_bridge
      if_cxgb
      if_gif
      ip_mroute
      ipdivert
      pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack.  However, that change is too agressive for
this point in the release cycle.

Reviewed by:	bz
Approved by:	re (kib)
2009-08-02 19:43:32 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Xin LI
16e324fc56 Show interface name which received short CARP packet (e.g. a VRRP packet),
in order to match other codepaths nearby.  This makes troubleshooting
easier.

Approved by:	re (kib)
MFC after:	1 month
2009-07-30 17:40:47 +00:00
Julian Elischer
3d1001cb11 Startup the vnet part of initialization a bit after the global part.
Fixes crash on boot if ipfw compiled in.

Submitted by:	tegge@
Reviewed by:	tegge@
Approved by:	re (kib)
2009-07-28 19:58:07 +00:00
Julian Elischer
7973fba3a4 Somewhere along the line accept sockets stopped honoring the
FIB selected for them. Fix this.

Reviewed by:	ambrisko
Approved by:	re (kib)
MFC after:	3 days
2009-07-28 19:43:27 +00:00
Michael Tuexen
bf3d517756 Fix a bug where wrong initialization value
in used for an SCTP specific sysctl variable.

Approved by: re, rrs(mentor).
MFC after: 2 weeks.
2009-07-28 15:07:41 +00:00
Randall Stewart
cfde3ff70b Turns out that when a receiver forwards through its TNS's the
processing code holds the read lock (when processing a
FWD-TSN for pr-sctp). If it finds stranded data that
can be given to the application, it calls sctp_add_to_readq().
The readq function also grabs this lock. So if INVAR is on
we get a double recurse on a non-recursive lock and panic.

This fix will change it so that readq() function gets a
flag to tell if the lock is held, if so then it does not
get the lock.

Approved by:	re@freebsd.org (Kostik Belousov)
MFC after:	1 week
2009-07-28 14:09:06 +00:00
Qing Li
df813b7ea2 This patch does the following:
- Allow loopback route to be installed for address assigned to
      interface of IFF_POINTOPOINT type.
    - Install loopback route for an IPv4 interface addreess when the
      "useloopback" sysctl variable is enabled. Similarly, install
      loopback route for an IPv6 interface address when the sysctl variable
      "nd6_useloopback" is enabled. Deleting loopback routes for interface
      addresses is unconditional in case these sysctl variables were
      disabled after an interface address has been assigned.

Reviewed by:	bz
Approved by:	re
2009-07-27 17:08:06 +00:00
Michael Tuexen
8e71b6947a Fix the handling of unordered messages when using
PR-SCTP.

Approved by: re, rrs (mentor)
MFC after: 3 weeks.
2009-07-27 13:41:45 +00:00
Michael Tuexen
4420d9a062 Get rid of unused field. This will also be deleted
in the official speciication of the SCTP socket API.

Approved by:re, rrs (mentor)
2009-07-27 12:09:32 +00:00
Michael Tuexen
47a490cbbc Add a missing unlock for the inp lock when
returning early from sctp_add_to_readq().

Approved by: re, rrs (mentor)
MFC after: 2 weeks.
2009-07-26 15:06:59 +00:00
Julian Elischer
9d85f50ad5 Catch ipfw up to the rest of the vimage code.
It got left behind when it moved to its new location.

Approved by:	re (kensmith)
2009-07-25 06:42:42 +00:00
Robert Watson
d0728d7174 Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
Bjoern A. Zeeb
a08362ce46 sysctl_msec_to_ticks is used with both virtualized and
non-vrtiualized sysctls so we cannot used one common function.

Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.

Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.

Convert the two over places to use the new macro.

Reviewed by:	rwatson
Approved by:	re (kib)
2009-07-21 21:58:55 +00:00
Robert Watson
a511354af4 Back out the moving in r195782 of V_ip_id's initialization from the top
back to the bottom of ip_init() as found in 7.x.  I missed the fact that
the bottom half of the init routine only runs in the !VNET case.

Submitted by:	zec
Approved by:	re (vimage blanket)
2009-07-20 19:40:09 +00:00
Robert Watson
0a4747d4d0 Garbage collect vnet module registrations that have neither constructors
nor destructors, as there's no actual work to do.

In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.

Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-07-20 13:55:33 +00:00