3664 Commits

Author SHA1 Message Date
Robert Watson
4fc9f6b81e Merge r204806 from head to stable/8:
Wrap use of rw_try_upgrade() on pcbinfo with macro INP_INFO_TRY_UPGRADE()
  to match other pcbinfo locking macros.

Approved by:	re (bz)
2010-06-01 14:18:44 +00:00
Kenneth D. Merry
6774e0f9c8 MFC r206844:
Don't clear other flags (e.g. CSUM_TCP) when setting CSUM_TSO.  This was
causing TSO to break for the Xen netfront driver.

Reviewed by:	gibbs, rwatson
2010-05-21 04:47:22 +00:00
Randall Stewart
b5889e7a0d MFC 207985
Fix an old long time bug in generating a
 fwd-tsn. This would appear when greater than
 the size of mbuf TSN's would need to be skipped.
2010-05-16 16:52:56 +00:00
Randall Stewart
31bd7e42f9 MFC 207983
More PR-SCTP bugs:
   - Make sure that when you kick the streams you add correctly
     using a 16 bit unsigned.
   - Make sure when sending out you allow FWD-TSN to skip over
     and list the ACKED chunks in the stream/seq list (so the
     rcv will kick the stream)
2010-05-16 16:51:44 +00:00
Randall Stewart
d536af657c MFC 207966 (for Michael)
Get rid of unused constants.
2010-05-16 16:50:33 +00:00
Randall Stewart
c7a8100b47 MFC of 207963
This fixes PR-SCTP issues:
  - Slide the map at the proper place.
  - Mark the bits in the nr_array ONLY if there
    is no marking.
  - When generating a FWD-TSN we allow us to skip past
    ACKED chunks too.
2010-05-16 16:45:49 +00:00
Randall Stewart
93c3efa7cf MFC of 207924:
This fixes a bug with the one-2-one model socket when a
user sets up a socket to a server sends data and closes
the socket before the server has called accept(). It used
to NOT work at all. Now we add a flag to the assoc and
defer assoc cleanup so that the accept will succeed
2010-05-16 16:42:52 +00:00
Michael Tuexen
0bd5a0aeb4 MFC 206758, 206840, 206891, 206892, 207099, 207191, 207197
* Fix a bug where SACKs are not sent when they should.
* Get delayed SACK working again.
* Really print the nr_mapping array when it should be printed.
* Update highest_tsn variables when sliding mapping arrays.
* Sending a FWDTSN chunk should not affect the retran count.
* Cleanups.
2010-05-07 20:02:36 +00:00
Bjoern A. Zeeb
480d7c6c41 MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

  "Whitspace" churn after the VIMAGE/VNET whirls.

  Remove the need for some "init" functions within the network
  stack, like pim6_init(), icmp_init() or significantly shorten
  others like ip6_init() and nd6_init(), using static initialization
  again where possible and formerly missed.

  Move (most) variables back to the place they used to be before the
  container structs and VIMAGE_GLOABLS (before r185088) and try to
  reduce the diff to stable/7 and earlier as good as possible,
  to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

  This also removes some header file pollution for putatively
  static global variables.

  Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
  no longer needed.

  Reviewed by:	jhb
  Discussed with:	rwatson
  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	CK Software GmbH
2010-05-06 06:44:19 +00:00
Bruce M Simpson
e58a96a73f MFC r207275:
Fix a regression where DVMRP diagnostic traffic, such as that used
  by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
  traffic must always have a TTL of 1.

Submitted by:	Matthew Luckie
2010-05-03 09:31:51 +00:00
Bjoern A. Zeeb
7f6b24dccf MFC r207277:
Enhance the historic behaviour of raw sockets and jails in a way
  that we allow all possible jail IPs as source address rather than
  forcing the "primary". While IPv6 naturally has source address
  selection, for legacy IP we do not go through the pain in case
  IP_HDRINCL was not set. People should bind(2) for that.

  This will, for example, allow ping(|6) -S to work correctly for
  non-primary addresses.

  Reported by:  (ten 211.ru)
  Tested by:    (ten 211.ru)
2010-05-02 16:36:15 +00:00
Bjoern A. Zeeb
8b5076c9df MFC r206989:
Avoid memory access after free.  Use the (shortend) copy for the
  ipsec mtu lookup as well.

PR:		kern/145736
Submitted by:	Peter Molnar (peter molnar.cc)
2010-05-02 15:55:29 +00:00
Bruce M Simpson
3db0099738 MFC 206452:
Fix a few issues related to the legacy 4.4 BSD multicast APIs.

  IPv4 addresses can and do change during normal operation. Testing by
  pfSense developers exposed an issue where OpenOSPFD was using the IPv4
  address to leave the OSPF link-scope multicast groups on a dynamic
  OpenVPN tun interface, rather than using RFC 3678 with the interface
  index, which won't be raced when the interface's addresses change.

  In inp_join_group():
   If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
   MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
   legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
   not violate POLA for apps using the old API.
   It also stops us falling through to kicking IGMP state transactions
   in what is otherwise a no-op case.
   [This has already been dealt with in HEAD, but make it explicit before
    we MFC the change to 8.]

  In inp_leave_group():
   Fix a bogus conditional.
   Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
   where it actually belongs.
   If an interface was specified, by primary IPv4 address, for ioctl
   IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
   then and only then should we look up the ifp from the IPv4 address in
   mreqs.imr_interface.
   If not, we fall through to imo_match_group() as before, but only in
   the IP_DROP_MEMBERSHIP case.

  With these changes, the legacy 4.4BSD multicast API idempotence should
  be mostly preserved in the SSM enabled IPv4 stack.

  [Note: this is not a straight svn merge as head and 8 differ slightly]

Found by:	ermal (with pfSense)
2010-04-27 13:50:15 +00:00
Bjoern A. Zeeb
feb3a5f7df MFC r206481:
Plug reference leaks in the link-layer code ("new-arp") that previously
  prevented the link-layer entry from being freed.

  In both in.c and in6.c (though that code path seems to be basically dead)
  plug a reference leak in case of a pending callout being drained.

  In if_ether.c consistently add a reference before resetting the callout
  and in case we canceled a pending one remove the reference for that.
  In the final case in arptimer, before freeing the expired entry, remove
  the reference again and explicitly call callout_stop() to clear the active
  flag.

  In nd6.c:nd6_free() we are only ever called from the callout function and
  thus need to remove the reference there as well before calling into
  llentry_free().

  In if_llatbl.c when freeing the entire tables make sure that in case we
  cancel a pending callout to remove the reference as well.

  Reviewed by:          qingli (earlier version)
  MFC after:            10 days
  Problem observed, patch tested by: simon on ipv6gw.f.o,
                        Christian Kratzer (ck cksoft.de),
                        Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-21 19:51:22 +00:00
Rui Paulo
55f05ae7e5 MFC r206456:
Honor the CE bit even when the CWR bit is set.

 PR:		145600
 Submitted by:	Richard Scheffenegger <rs at netapp.com>
2010-04-17 17:40:12 +00:00
Randall Stewart
0099361644 MFC of 206281
Final MFC of all the IETF hack a-thon.. head and stable are
now in sync ;-)
2010-04-17 04:19:18 +00:00
Randall Stewart
56be5eba8b MFC of 206151 2010-04-17 04:17:17 +00:00
Randall Stewart
17f2eabb2b MFC of 206137
This is Part III of the great IETF hack-a-thon to fix
the NR-Sack code. (the last one on the cpu options
was a lull.. i.e MFC 205629).. still 2 more to go.
2010-04-17 04:15:46 +00:00
Randall Stewart
07072810f0 MFC of 205629
Adds the option of seperating out the sctp stats per
processor. This will be refined further and is definetly
exploratory (which is why its an option) i.e. making it
allocate the actual number of processors is coming ;-D.
2010-04-17 04:13:52 +00:00
Randall Stewart
469ff22797 MFC of 205628
Out goes the nr_mapping_array expand.
2010-04-17 04:11:45 +00:00
Randall Stewart
f1fb6dd5de MFC of 205627
Part II (more to follow) of the great IETF hack-a-thon to
fix the NR-Sack code.
2010-04-17 04:10:29 +00:00
Randall Stewart
dc47896e05 MFC of 204141
Cleans up so we can have a vtag reflected argument.
One of Michaels fixes ;-)
2010-04-17 04:08:51 +00:00
Randall Stewart
ce6856644b MFC of 204096
One of Michaels changes to fix some sign issues and
some minor locking.
2010-04-17 04:06:40 +00:00
Randall Stewart
6c16609631 MFD 204040
Fixes some argument calsl (u_long vs uint32_t).
2010-04-17 04:02:27 +00:00
Randall Stewart
ec15b65695 MFC of 203847
Puts in missing packed declarations (from Michael). It worked
only because it was properly aligned anyway ;-)
2010-04-17 04:00:57 +00:00
Randall Stewart
2b7bba217f MFC of 203503
A fix to how the checksum code works that Michael put in.
2010-04-17 03:58:56 +00:00
Randall Stewart
3c9d6800fc MFC of 202782
Michaels changes that took out [0] -> for []
2010-04-17 03:57:16 +00:00
Randall Stewart
535f992c6d MFC of 202526
The first round of some of Michael's changes to
get the sack processing in better shape.
2010-04-17 03:55:49 +00:00
Randall Stewart
835d439e3a MFC of 205502
The firste of Michael and my long fight at the IETF to
get the NR sack code fixed and aligned.
2010-04-17 03:53:44 +00:00
Randall Stewart
9eb2a664ab MFC of 202523
This fixes a closing race condition that is unlikely
to ever happen.. but good to fix ;-)
2010-04-17 03:51:13 +00:00
Randall Stewart
57f0b741c6 MFC 202521
More stray ifdef's that had worked their way into the
code base somehow (yes thats ifdef Windows going out.. our
stack runs on windows .. big thanks for that goes to
Kozuka-san and Bruce Cran ;-D)
2010-04-17 03:49:21 +00:00
Randall Stewart
d50db6bd56 MFC of 202520
This aligns us to the socket api of the stream
reset with proper naming.. and a define for backward
compatibility.
2010-04-17 03:47:04 +00:00
Randall Stewart
394ddd21a6 MFC of 202518
More ifdefs that should not be present...
2010-04-17 03:44:28 +00:00
Randall Stewart
0146f692b5 MFC of 202517
Again gets rid of some rather strange ifdef's for
APPLE/USERSPACE that drifted in through our scrubber
programs.
2010-04-17 03:43:02 +00:00
Randall Stewart
aab42fa148 MFC 202516
This gets rid of some stray #ifdef APPLE that drifted in
some how.
2010-04-17 03:40:48 +00:00
Randall Stewart
42fc501864 Merge of SVN 196507.
This optimizes the sack handling a bit and
restructures it so its much more readable ;-)
2010-04-17 03:38:26 +00:00
Luigi Rizzo
54d63d7b13 add priority scheduler. 2010-04-07 13:18:58 +00:00
Luigi Rizzo
7b3c0af43e fix breakage in ipfw removal. 2010-04-07 12:42:49 +00:00
Randall Stewart
54bb41671a MFC of 2 items to fix the csum for v6 issue:
Revision 205075 and 205104:

---------205075----------
With the recent change of the sctp checksum to support offload,
no delayed checksum was added to the ip6 output code. This
causes cards that do not support SCTP checksum offload to
have SCTP packets that are IPv6 NOT have the sctp checksum
performed. Thus you could not communicate with a peer. This
adds the missing bits to make the checksum happen for these cards.
-------------------------
---------205104----------
The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
-------------------------
PR:		144529
2010-04-05 13:48:23 +00:00
Qing Li
c951da56b4 MFC 204902
One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
2010-04-02 05:02:50 +00:00
Qing Li
ca2d42b2a1 MFC 201131
introduce a local variable rte acting as a cache of ro->ro_rt
within ip_output, achieving (in random order of importance):
- a reduction of the number of 'r's in the source code;
- improved legibility;
- a reduction of 64 bytes in the .text
2010-04-02 04:58:17 +00:00
Kip Macy
e952596a10 MFC 205066, 205069, 205093, 205097, 205488:
r205066:

Log:
 - restructure flowtable to support ipv6
 - add a name argument to flowtable_alloc for printing with ddb commands
 - extend ddb commands to print destination address or 4-tuples
 - don't parse ports in ulp header if FL_HASH_ALL is not passed
 - add kern_flowtable_insert to enable more generic use of flowtable
   (e.g. system calls for adding entries)
 - don't hash loopback addresses
 - cleanup whitespace
 - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
 - add sysctls to accumulate stats and report aggregate

r205069:
Log:
 fix stats reporting sysctl

r205093:
Log:
 re-update copyright to 2010
 pointed out by danfe@

r205097:

Log:
 flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB

 pointed out by jkim@

r205488:

Log:
 - boot-time size the ipv4 flowtable and the maximum number of flows
 - increase flow cleaning frequency and decrease flow caching time
   when near the flow limit
 - stop allocating new flows when within 3% of maxflows don't start
   allocating again until below 12.5%
2010-04-01 00:36:40 +00:00
Luigi Rizzo
353be77138 A last-minute change in the previous commit broke rule deletion,
so i am fixing it, this time with a more detailed description
of what the code is supposed to do.
2010-03-31 01:51:08 +00:00
Luigi Rizzo
d15984d46e mfc 205830:
fixes to rule set handling (including potential kernel panics)
2010-03-29 12:32:16 +00:00
Luigi Rizzo
0d3003c0c8 remove a leftover debugging message 2010-03-29 12:29:34 +00:00
Bjoern A. Zeeb
397069f2c5 MFC r205251:
Add pcb reference counting to the pcblist sysctl handler functions
  to ensure type stability while caching the pcb pointers for the
  copyout.

  Reviewed by:  rwatson
2010-03-27 17:51:27 +00:00
Bjoern A. Zeeb
62f500d0c2 MFC r204838:
Destroy TCP UMA zones (empty or not) upon network stack teardown
  to not leak them, otherwise making UMA/vmstat unhappy with every
  stoped vnet.
  We will still leak pages (especially for zones marked NOFREE).

  Reshuffle cleanup order in tcp_destroy() to get rid of what we can
  easily free first.

  Reviewed by:  rwatson
2010-03-27 17:50:02 +00:00
Bjoern A. Zeeb
3662f299d2 MFC r204807:
Destroy UDP UMA zones (empty or not) upon network stack teardown
  to not leak them making UMA/vmstat -z unhappy with every stoped vnet.
  We will still leak pages (especially as zones are marked NOFREE).
2010-03-27 17:46:06 +00:00
Bjoern A. Zeeb
1198bd71ba MFC r204143:
Upon virtual network stack teardown properly release the TCP syncache
  resources.

  Reviewed by:  rwatson
2010-03-27 17:36:52 +00:00
Bjoern A. Zeeb
e47658ce90 MFC r204140:
Split up ip_drain() into an outer lock and iterator part and
  a "locked" version that will only handle a single network stack
  instance. The latter is called directly from ip_destroy().

  Hook up an ip_destroy() function to release resources from the
  legacy IP network layer upon virtual network stack teardown.

  Reviewed by:  rwatson
2010-03-27 17:34:57 +00:00