* Fix a bug where SACKs are not sent when they should.
* Get delayed SACK working again.
* Really print the nr_mapping array when it should be printed.
* Update highest_tsn variables when sliding mapping arrays.
* Sending a FWDTSN chunk should not affect the retran count.
* Cleanups.
MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.
Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively
static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.
Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Fix a regression where DVMRP diagnostic traffic, such as that used
by mrinfo and mtrace, was dropped by the IGMP TTL check. IGMP control
traffic must always have a TTL of 1.
Submitted by: Matthew Luckie
Enhance the historic behaviour of raw sockets and jails in a way
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.
This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.
Reported by: (ten 211.ru)
Tested by: (ten 211.ru)
Fix a few issues related to the legacy 4.4 BSD multicast APIs.
IPv4 addresses can and do change during normal operation. Testing by
pfSense developers exposed an issue where OpenOSPFD was using the IPv4
address to leave the OSPF link-scope multicast groups on a dynamic
OpenVPN tun interface, rather than using RFC 3678 with the interface
index, which won't be raced when the interface's addresses change.
In inp_join_group():
If we are already a member of an ASM group, and IP_ADD_MEMBERSHIP or
MCAST_JOIN_GROUP ioctls are re-issued, return EADDRINUSE as per the
legacy 4.4BSD multicast API. This bends RFC 3678 slightly, but does
not violate POLA for apps using the old API.
It also stops us falling through to kicking IGMP state transactions
in what is otherwise a no-op case.
[This has already been dealt with in HEAD, but make it explicit before
we MFC the change to 8.]
In inp_leave_group():
Fix a bogus conditional.
Move the ifp null check to ioctls MCAST_LEAVE* in the switch..case
where it actually belongs.
If an interface was specified, by primary IPv4 address, for ioctl
IP_DROP_MEMBERSHIP or MCAST_LEAVE_GROUP (an ASM full leave operation),
then and only then should we look up the ifp from the IPv4 address in
mreqs.imr_interface.
If not, we fall through to imo_match_group() as before, but only in
the IP_DROP_MEMBERSHIP case.
With these changes, the legacy 4.4BSD multicast API idempotence should
be mostly preserved in the SSM enabled IPv4 stack.
[Note: this is not a straight svn merge as head and 8 differ slightly]
Found by: ermal (with pfSense)
Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.
In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.
In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.
In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().
In if_llatbl.c when freeing the entire tables make sure that in case we
cancel a pending callout to remove the reference as well.
Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE
This is Part III of the great IETF hack-a-thon to fix
the NR-Sack code. (the last one on the cpu options
was a lull.. i.e MFC 205629).. still 2 more to go.
Adds the option of seperating out the sctp stats per
processor. This will be refined further and is definetly
exploratory (which is why its an option) i.e. making it
allocate the actual number of processors is coming ;-D.
More stray ifdef's that had worked their way into the
code base somehow (yes thats ifdef Windows going out.. our
stack runs on windows .. big thanks for that goes to
Kozuka-san and Bruce Cran ;-D)
Revision 205075 and 205104:
---------205075----------
With the recent change of the sctp checksum to support offload,
no delayed checksum was added to the ip6 output code. This
causes cards that do not support SCTP checksum offload to
have SCTP packets that are IPv6 NOT have the sctp checksum
performed. Thus you could not communicate with a peer. This
adds the missing bits to make the checksum happen for these cards.
-------------------------
---------205104----------
The proper fix for the delayed SCTP checksum is to
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
-------------------------
PR: 144529
One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.
The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.
Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
introduce a local variable rte acting as a cache of ro->ro_rt
within ip_output, achieving (in random order of importance):
- a reduction of the number of 'r's in the source code;
- improved legibility;
- a reduction of 64 bytes in the .text
r205066:
Log:
- restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
(e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate
r205069:
Log:
fix stats reporting sysctl
r205093:
Log:
re-update copyright to 2010
pointed out by danfe@
r205097:
Log:
flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB
pointed out by jkim@
r205488:
Log:
- boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
allocating again until below 12.5%
Add pcb reference counting to the pcblist sysctl handler functions
to ensure type stability while caching the pcb pointers for the
copyout.
Reviewed by: rwatson
Destroy TCP UMA zones (empty or not) upon network stack teardown
to not leak them, otherwise making UMA/vmstat unhappy with every
stoped vnet.
We will still leak pages (especially for zones marked NOFREE).
Reshuffle cleanup order in tcp_destroy() to get rid of what we can
easily free first.
Reviewed by: rwatson
Destroy UDP UMA zones (empty or not) upon network stack teardown
to not leak them making UMA/vmstat -z unhappy with every stoped vnet.
We will still leak pages (especially as zones are marked NOFREE).
Split up ip_drain() into an outer lock and iterator part and
a "locked" version that will only handle a single network stack
instance. The latter is called directly from ip_destroy().
Hook up an ip_destroy() function to release resources from the
legacy IP network layer upon virtual network stack teardown.
Reviewed by: rwatson
Properly free resources when destroying the TCP hostcache while
tearing down a network stack (in the VIMAGE jail+vnet case).
For that break out the logic from tcp_hc_purge() into an internal
function we can call from both, the sysctl handler and the
tcp_hc_destroy().
Reviewed by: silby, lstewart
Honor ip.fw.one_pass when a packet comes out of a pipe without being delayed.
I forgot to handle this case when i did the mtag cleanup three months ago.
I am merging immediately because this bugfix is important for
people using RELENG_8.
PR: 145004
done in CURRENT over the last 4 months.
HEAD and RELENG_8 are almost in sync now for ipfw, dummynet
the pfil hooks and related components.
Among the most noticeable changes:
- r200855 more efficient lookup of skipto rules, and remove O(N)
blocks from critical sections in the kernel;
- r204591 large restructuring of the dummynet module, with support
for multiple scheduling algorithms (4 available so far)
See the original commit logs for details.
Changes in the kernel/userland ABI should be harmless because the
kernel is able to understand previous requests from RELENG_8 and
RELENG_7. For this reason, this changeset would be applicable
to RELENG_7 as well, but i am not sure if it is worthwhile.
201408,201325,200089,198822,197373,197372,197214,196162). Since one of those
changes was a semicolon cleanup from somebody else, this touches a lot more.
Some of the existing ppp and vpn related scripts create and set
the IP addresses of the tunnel end points to the same value. In
these cases the loopback route is not installed for the local
end.
Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.
and
Unbreak the VIMAGE build with IPSEC, broken with r197952 by
virtualizing the pfil hooks.
For consistency add the V_ to virtualize the pfil hooks in here as well.
(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)