3393 Commits

Author SHA1 Message Date
rwatson
67227c12c0 Put TCPSTAT_ADD() and TCPSTAT_INC() behind _KERNEL.
MFC after:	3 days
2009-04-12 21:28:35 +00:00
rwatson
2c81196a78 Update stats in struct carpstats using two new macros: CARPSTATS_ADD()
and CARPSTATS_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the implementation
of these statistics, such as using per-CPU versions of the data
structure.

MFC after:	3 days
2009-04-12 14:19:37 +00:00
rwatson
0d601ec4b5 Update stats in struct pimstat using two new macros: PIMSTAT_ADD()
and PIMSTAT_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structure.

MFC after:	3 days
2009-04-12 14:06:26 +00:00
rwatson
7dde6eabe7 Update stats in struct mrtstat using two new macros: MRTSTAT_ADD()
and MRTSTAT_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structure.

MFC after:	3 days
2009-04-12 14:00:36 +00:00
rwatson
b6c3ef293c Update stats in struct igmpstat using two new macros:
IGMPSTAT_ADD() and IGMPSTAT_INC(), rather than directly
manipulating the fields of the structure.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-12 13:41:13 +00:00
rwatson
4801e9aee9 Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after:	3 days
2009-04-12 13:22:33 +00:00
rwatson
5dc0256f14 Update stats in struct udpstat using two new macros, UDPSTAT_ADD()
and UDPSTAT_INC(), rather than directly manipulating the fields
across the kernel.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structures.

MFC after:	3 days
2009-04-12 11:42:40 +00:00
rwatson
692f8aa2fa Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel.  This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 23:35:20 +00:00
rwatson
b79ff9a30d Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and
TCPSTAT_INC(), rather than directly manipulating the fields across the
kernel.  This will make it easier to change the implementation of
these statistics, such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 22:07:19 +00:00
piso
dc27114478 What's the point of adjusting a checksum if we are going to toss the
packet? Anticipate the check/return code.
2009-04-11 15:26:31 +00:00
piso
a36990c3db Plug two bugs introduced with modules conversion:
-UdpAliasIn(): correctly check return code after modules ran.
-alias_nbt: in case of malformed packets (or some other unrecoverable
 error), toss the packet.
2009-04-11 15:19:09 +00:00
piso
e605b01ef4 Remove stale comments. 2009-04-11 15:05:19 +00:00
zec
b39b54e6de Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
kmacy
fbd3646842 Import "flowid" support for serializing flows across transmit queues
Reviewed by:	rwatson and jeli
2009-04-10 06:16:14 +00:00
luigi
5c7675fccb Add emulation of delay profiles, which lets you model various
types of MAC overheads such as preambles, link level retransmissions
and more.

Note- this commit changes the userland/kernel ABI for pipes
(but not for ordinary firewall rules) so you need to rebuild
kernel and /sbin/ipfw to use dummynet features.

Please check the manpage for details on the new feature.

The MFC would be trivial but it breaks the ABI, so it will
be postponed until after 7.2 is released.

Interested users are welcome to apply the patch manually
to their RELENG_7 tree.

Work supported by the European Commission, Projects Onelab and
Onelab2 (contract 224263).
2009-04-09 12:46:00 +00:00
rrs
1b546b35c3 Fix a FR bug. When doing PR-SCTP with number rtx
set to a low number. The check for skipping was in the
incorrect place. Which meant we would FR chunks we
should not.
MFC after:	1 Month
2009-04-08 12:52:05 +00:00
rrs
0d84cb3f8b Add more padding and a new variable. This will
help us be able to keep ABI compatibility between
8 and 9.
MFC after:	Never
2009-04-08 12:49:36 +00:00
piso
5551ee2e3d -don't pass down, to module's fingerprint function, unused data like
a pointer to the ip header.
-style
-spacing
2009-04-08 11:56:49 +00:00
bz
055adbbfb8 With the right comparison we get a proper wscale value and thus
more adequate TCP performance with IPv6.

Changes for IPv4, r166403 and r172795, both ignored the
IPv6 counterpart and left it in the state of art of year 2000.

The same logic in syncache already shares code between v4 and v6 so
things do not need to be adapted there.

Reported by:	Steinar Haug (sthaug nethelp.no)
Tested by:	Steinar Haug (sthaug nethelp.no)
MFC after:	3 days
2009-04-07 14:42:40 +00:00
zec
c85551e0bc First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
kan
6f5bda3790 If KTR_SUBSYS is compiled in, it does not necessarily mean that user
is interested in being spammed by mcast-related printfs.

Use proper check against ktr_mask instead KTR_COMPILE.
2009-04-05 23:25:06 +00:00
bms
d3a98ed39c Fix mbuf chain layout pessimization:
in the case where a single mbuf is allocated due to
 m_getcl() returning NULL, we already call MH_ALIGN,
 so do not increment m->m_data in this case.

Found during MLDv2 port.
2009-04-04 15:32:23 +00:00
bms
b8d7b83e60 Do not obliterate QQI with MAXRESP.
Found during MLDv2 port.
2009-04-04 15:26:32 +00:00
rrs
f72ef579b2 Many bug fixes (from the IETF hack-fest):
- PR-SCTP had major issues when skipping through a multi-part message.
  o Did not look at socket buffer.
  o Did not properly handle the reassmebly queue.
  o The MARKED segments could interfere and un-skip a chunk causing
    a problem with the proper FWD-TSN.
  o No FR of FWD-TSN's was being done.
- NR-Sack code was basically disabled. It needed fixes that
  never got into the real code.
- CMT code had issues when the two paths were NOT the same b/w. We
  found a few small bugs, but also the critcal one here was not
  dividing the rwnd amongst the paths.

Obtained from:	Michael Tuexen and myself at the IETF hack-fest ;-)
2009-04-04 11:43:32 +00:00
piso
c9b4c10995 Implement an ipfw action to reassemble ip packets: reass. 2009-04-01 20:23:47 +00:00
bms
635596f1db Don't call m_freem() after ip_output(), as it always consumes
the mbuf chain provided to it.

Found by:	Pierre Guinoiseau
2009-03-24 01:22:12 +00:00
jmallett
4cae140c41 Remove local in6_addr variables for local and foreign addresses in sysctl_drop,
they were passed uninitialized to in6_pcblookup_hash.  Instead, do as is done
for IPv4 and use the addresses within the sockaddr structure, which are
correctly populated.

This fixes tcpdrop(8) for IPv6 address pairs.

Reviewed by:	bz
2009-03-22 00:45:47 +00:00
bms
76ed51bd5a Fix brainos introduced during mechanical KTR change.
Pointy hat to:	bms
2009-03-20 13:13:50 +00:00
bms
70bfeb48b5 Cleanup: Nuke debug.mrtdebug, and replace it with KTR. 2009-03-19 14:14:21 +00:00
bms
76f193cd69 Introduce a number of changes to the MROUTING code.
This is purely a forwarding plane cleanup; no control plane
code is involved.

Summary:
 * Split IPv4 and IPv6 MROUTING support. The static compile-time
   kernel option remains the same, however, the modules may now
   be built for IPv4 and IPv6 separately as ip_mroute_mod and
   ip6_mroute_mod.
 * Clean up the IPv4 multicast forwarding code to use BSD queue
   and hash table constructs. Don't build our own timer abstractions
   when ratecheck() and timevalclear() etc will do.
 * Expose the multicast forwarding cache (MFC) and virtual interface
   table (VIF) as sysctls, to reduce netstat's dependence on libkvm
   for this information for running kernels.
   * bandwidth meters however still require libkvm.
 * Make the MFC hash table size a boot/load-time tunable ULONG,
   net.inet.ip.mfchashsize (defaults to 256).
 * Remove unused members from struct vif and struct mfc.
 * Kill RSVP support, as no current RSVP implementation uses it.
   These stubs could be moved to raw_ip.c.
 * Don't share locks or initialization between IPv4 and IPv6.
 * Don't use a static struct route_in6 in ip6_mroute.c.
   The v6 code is still using a cached struct route_in6, this is
   moved to mif6 for the time being.
 * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.

v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.

Reviewed by:	Pavlin Radoslavov
2009-03-19 01:43:03 +00:00
bms
be7293a078 Comment IGMP_PIM as being very historic, as in, don't use. 2009-03-19 01:15:26 +00:00
bms
f2006dd38e Deal with the case where ifma_protospec may be NULL, during
any IPv4 multicast operations which reference it.

There is a potential race because ifma_protospec is set to NULL
when we discover the underlying ifnet has gone away. This write
is not covered by the IF_ADDR_LOCK, and it's difficult to widen
its scope without making it a recursive lock. It isn't clear why
this manifests more quickly with 802.11 interfaces, but does not
seem to manifest at all with wired interfaces.

With this change, the 802.11 related panics reported by sam@
and cokane@ should go away. It is not the right fix, that requires
more thought before 8.0.

Idea from:	sam
Tested by:	cokane
2009-03-17 14:41:54 +00:00
rwatson
70b6a8119c Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free.  This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile.  They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

        if_ar
        if_axe
        if_aue
        if_cdce
        if_cue
        if_kue
        if_ray
        if_rue
        if_rum
        if_sr
        if_udav
        if_ural
        if_zyd

Drivers that were already disabled because of tty changes:

        if_ppp
        if_sl

Discussed on:	arch@
2009-03-15 14:21:05 +00:00
rwatson
038bfe209e Correct a number of evolved problems with inp_vflag and inp_flags:
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted.  Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):

  INP_TIMEWAIT
  INP_SOCKREF
  INP_ONESBCAST
  INP_DROPPED

Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.

MFC after:      1 week (or after dependencies are MFC'd)
Reviewed by:    bz
2009-03-15 09:58:31 +00:00
rrs
723e44888d Opps.. I missed a file on the commit :-) 2009-03-14 23:13:16 +00:00
das
58fce43140 Namespace: Defining htonl() and friends here instead of arpa/inet.h is
a BSD extension.
2009-03-14 20:16:54 +00:00
rrs
f25b21ac98 Fixes several PR-SCTP releated bugs.
- When sending large PR-SCTP messages over a
   lossy link we would incorrectly calculate the fwd-tsn
 - When receiving large multipart pr-sctp packets we would
   incorrectly send back a SACK that would renege improperly
   on already received packets thus causing unneeded retransmissions.
2009-03-14 13:42:13 +00:00
rwatson
fae6f1ab82 Add INP_INHASHLIST flag for inpcb->inp_flags to indicate whether
or not the inpcb is currenty on various hash lookup lists, rather
than using (lport != 0) to detect this.  This means that the full
4-tuple of a connection can be retained after close, which should
lead to more sensible netstat output in the window between TCP
close and socket close.

MFC after:	2 weeks
2009-03-11 00:29:22 +00:00
rwatson
f0bf25503d Remove unused v6 macro aliases for inpcb fields:
in6p_ip6_nxt
        in6p_vflag
        in6p_flags
        in6p_socket
        in6p_lport
        in6p_fport
        in6p_ppcb

Remove unused v6 macro aliases for inpcb flags:

        IN6P_HIGHPORT
        IN6P_LOWPORT
        IN6P_ANONPORT
        IN6P_RECVIF
        IN6P_MTUDISC
        IN6P_FAITH
        IN6P_CONTROLOPTS

References to in6p_lport and in6_fport in sockstat are also replaced with
normal inp_lport and inp_fport references.

MFC after:	3 days
Reviewed by:	bz
2009-03-10 17:57:41 +00:00
bms
b00c7265e0 Don't print inm_print() chatter when KTR_IGMPV3 is not enabled
in the KTR_COMPILE mask.

Found by:	gnn
2009-03-10 17:48:49 +00:00
rwatson
d085935615 Remove now-unused INP_UNMAPPABLEOPTS.
MFC after:	3 days
Discussed with:	bz
2009-03-10 11:04:19 +00:00
bms
4961c33f6b Fix uninitialized use of ifp for ii.
Found by:	Peter Holm
2009-03-09 22:54:17 +00:00
bms
71233409ea Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.
2009-03-09 17:53:05 +00:00
marius
74f63d4ce1 On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR:		131921
MFC after:	1 week
2009-03-07 19:08:58 +00:00
rrs
41426e5a37 Fixes for window probes:
1) WP should never be marked unless flight size is 0
 2) When recovering from wp if the peer ack's it we don't mark for retran
 3) When recovering, we must assure a timer is still running.
2009-03-06 11:03:52 +00:00
rrs
f2254573b9 - PR-SCTP bug, where the CUM-ACK was not being updated
into the advance_peer_ack point so we would incorrectly
  send a wrong value in the FWD-TSN
- PR-SCTP bug, where an PR packet is used for a window
  probe which could incorrectly get the packet moved
  back into the send_queue, which will cause major issues and
  should not happen.
- Fix a trace to use the proper macro.
2009-03-04 20:54:42 +00:00
bms
6370f947cc In ip_output(), do not acquire the IN_MULTI_LOCK(),
and do not attempt to perform a group lookup.
This is a socket layer lock, and the bottom half of IP
really has no business taking it.

Use the value of the in_mcast_loop sysctl to determine
if we should loop back by default, in the absence of
any multicast socket options. Because the check on
group membership is now deferred to the input path,
an m_copym() is now required.

This should increase multicast send performance where the
source has not requested loopback, although this has not been
benchmarked or measured.

It is also a necessary change for IN_MULTI_LOCK to become
non-recursive, which is required in order to implement IGMPv3
in a thread-safe way.
2009-03-04 03:45:34 +00:00
bms
ea92a55057 Add sysctl net.inet.ip.mcast.loop. This controls whether or not
IPv4 multicast sends are looped back to senders by default
on a stack-wide basis, rather than relying on the socket option.
Note that the sysctl only applies to newly created multicast sockets.
2009-03-04 03:40:02 +00:00
bms
24af7b630b Merge header file definitions used by the new IGMPv3 implementation.
This is a partial merge. Compatibility defines are retained for
the existing IGMPv2 implementation.
2009-03-04 03:22:03 +00:00
bms
28b42e9e29 Add various defines/macros required by IGMPv3:
* MCAST_UNDEFINED state.
 * in_allhosts() macro (group is 224.0.0.1).
   This uses a const endian comparison.
 * IP_MAX_GROUP_SRC_FILTER, IP_MAX_SOCK_SRC_FILTER
   default resource limits.
2009-03-04 03:01:05 +00:00