Commit Graph

3481 Commits

Author SHA1 Message Date
Randall Stewart
096ed42dad repository sync to multi-OS repo ... spaceing change 2009-05-07 16:43:49 +00:00
Randall Stewart
892f1c7141 ABI expansions to hopefully future-proof our MIB/netstat code for 8.0 2009-05-07 16:42:45 +00:00
Marko Zec
94e9f5a1c2 Remove unnecessary CURVNET_SET() calls where curvnet context is
(i.e. seems to be) already set.

This should reduce console noise due to curvnet recursion reports.

This change has no impact on nooptions VIMAGE builds.
Approved by:	julian (mentor)
2009-05-06 13:30:46 +00:00
Marko Zec
743da3bcdb Unbreak options VIMAGE kernel builds.
Approved by:	julian (mentor)
2009-05-06 08:49:39 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Marko Zec
5f416f8e84 Make indentation more uniform accross vnet container structs.
This is a purely cosmetic / NOP change.

Reviewed by:	bz
Approved by:	julian (mentor)
Verified by:	svn diff -x -w producing no output
2009-05-02 08:16:26 +00:00
Marko Zec
d7fcc52895 Unbreak options VIMAGE + nooptions INVARIANTS kernel builds.
Submitted by:	julian
Approved by:	julian (mentor)
2009-05-02 05:02:28 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Bruce M Simpson
33cde13046 Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev.  Summary of changes:

 * Connect netinet6/in6_mcast.c to build.
   The legacy KAME KPIs are mostly preserved.
 * Eliminate now dead code from ip6_output.c.
   Don't do mbuf bingo, we are not going to do RFC 2292 style
   CMSG tricks for multicast options as they are not required
   by any current IPv6 normative reference.
 * Refactor transports (UDP, raw_ip6) to do own mcast filtering.
   SCTP, TCP unaffected by this change.
 * Add ip6_msource, in6_msource structs to in6_var.h.
 * Hookup mld_ifinfo state to in6_ifextra, allocate from
   domifattach path.
 * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
   Kernel consumers which need this should use in6m_lookup().
 * Refactor IPv6 socket group memberships to use a vector (like IPv4).
 * Update ifmcstat(8) for IPv6 SSM.
 * Add witness lock order for IN6_MULTI_LOCK.
 * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
 * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
 * Update carp(4) for new IPv6 SSM KPIs.
 * Virtualize ip6_mrouter socket.
   Changes mostly localized to IPv6 MROUTING.
 * Don't do a local group lookup in MROUTING.
 * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
 * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
 * Bump __FreeBSD_version to 800084.
 * Update UPDATING.

NOTE WELL:
 * This code hasn't been tested against real MLDv2 queriers
   (yet), although the on-wire protocol has been verified in Wireshark.
 * There are a few unresolved issues in the socket layer APIs to
   do with scope ID propagation.
 * There is a LOR present in ip6_output()'s use of
   in6_setscope() which needs to be resolved. See comments in mld6.c.
   This is believed to be benign and can't be avoided for the moment
   without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
Bruce M Simpson
9efc1a1bbf Add MLDv2 prototypes and defines. 2009-04-29 10:20:17 +00:00
Bruce M Simpson
5cf93e5d2c Use KTR_INET for MROUTING CTRs. 2009-04-29 10:17:08 +00:00
Bruce M Simpson
c566b47669 Cut over to KTR_INET for CTR.
For clarity, put pointer incremement/size decrement on own line
when copying out in-mode source filters to userland.
2009-04-29 10:14:16 +00:00
Bruce M Simpson
1096332a4a Do not assume that ip6_moptions is always set, it is
a lazy-allocated structure.
2009-04-29 10:13:22 +00:00
Bruce M Simpson
31a3e65dc2 Fix a problem whereby enqueued IGMPv3 filter list changes would be
incorrectly output, if the RB-tree enumeration happened to reuse the
same chain for a mode switch: that is, both ALLOW and BLOCK records
were appended for the same group, in the same mbuf packet chain.

This was introduced during an mbuf chain layout bug fix involving
m_getptr(), which obviously cannot count from offset 0 on the
second pass through the RB-tree when serializing the IGMPv3
group records into the pending mbuf chain.

Cut over to KTR_INET for IGMPv3 CTR usage.
2009-04-29 10:12:01 +00:00
Edward Tomasz Napierala
1a4998162e Don't require packet to match a route (any route; this information wasn't
used anyway, so a typical workaround was to add a dummy route) if it's going
to be sent through IPSec tunnel.

Reviewed by:	bz
2009-04-28 11:10:33 +00:00
Oleg Bulyzhin
a3a981b7f9 Optimize packet flow: if net.inet.ip.fw.one_pass != 0 and packet was
processed by ipfw once - avoid second ipfw_chk() call.
This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and
ip/tcp/udp header parsing.

MFC after:	2 month
2009-04-27 17:37:36 +00:00
Marko Zec
093f25f8c8 In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by:	bz (an older version of the patch)
Approved by:	julian (mentor)
2009-04-26 22:06:42 +00:00
Robert Watson
db091502fb Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead
(colloquially known as if_addrlist).  Currently not acquired around
interface address loops that call out to the routing code due to
potential lock order issues.

MFC after:	3 weeks
2009-04-26 19:05:40 +00:00
Robert Watson
588885f2f5 Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial
lookup of 'ia' from if_addrhead through most use.  Note that we
currently have to drop it prematurely in some cases due to calls out to
the routing and interface code while using 'ia', but this closes many
races.  Annotate several potential races that persist after this change.
Move to using M_NOWAIT for allocating new interface addresses due to
lock(s) being held.

MFC after:	3 weeks
2009-04-25 23:02:57 +00:00
Robert Watson
07cde5e92c In in_purgemaddrs(), remove the inm being freed from the address list
before freeing it, rather than vice version, to avoid potential use
after free.

Reviewed by:	bms
2009-04-24 22:11:53 +00:00
Robert Watson
cf7b18f15e Relocate permissions checking code in in_control() to before the body
of the implementation of ioctls.  This makes the mapping of ioctls to
specific privileges more explicit, and also simplifies the
implementation by reducing the use of FALLTHROUGH handling in switch.

While this is not intended to be a functional change, it does mean
that certain privilege checks are now performed earlier, so EPERM
might be returned in preference to EADDRNOTAVAIL for management
ioctls that could have failed for both reasons.

MFC after:	3 weeks
2009-04-24 09:54:46 +00:00
Robert Watson
bbb3fb6194 Reorganize in_control() so that invariants are more obvious, and so
that it is easier to lock:

- Handle the unsupported ioctl case at the beginning of in_control(),
  handing off to ifp->if_ioctl, rather than looking up interfaces and
  addresses unnecessarily in this case.

- Make it an invariant that ifp is always non-NULL when running
  in_control()-implemented ioctls, simplifying the code structure.

MFC after:	3 weeks
2009-04-23 21:41:37 +00:00
Bruce M Simpson
86979280fc Bracket struct mfc and struct rtdetq with #ifdef _KERNEL.
Match the bracketing in netstat.
Since the cleanup of MROUTING, ports have broken because they
expect to include <netinet/ip_mroute.h> without including
<sys/queue.h>. Fix breakage at source.

The real fix, of course, is to fix the MROUTING APIs by blowing them
away and replacing them with something else...
2009-04-21 12:47:09 +00:00
Bruce M Simpson
5def3edcad remove IFF_ASSERTGIANT 2009-04-21 09:43:51 +00:00
Robert Watson
4ed6f8c1f1 Prefer actual field names (if_addrhead, ifa_link) to macros aliasing
those field names in FreeBSD code.

MFC after:	2 weeks
2009-04-20 22:40:44 +00:00
Robert Watson
0aade26e6d In ip_input(), cache the received mbuf's network interface in a local
variable.  Acquire the interface address list lock when iterating over
the interface address list searching for a matching received broadcast
address.

MFC after:	2 weeks
2009-04-20 14:35:42 +00:00
Robert Watson
33c4f96d88 In icmp_reflect(), acquire the inteface address list lock when
searching for a source address to use.

MFC after:	2 weeks
Reviewed by:	bz
2009-04-20 13:45:39 +00:00
Robert Watson
072b8f8ea7 Lock the interface address list when searching for a matching interface
by address, or when implementing 'me' rules on IPv6.  Prefer the field
name if_addrhead to the macro if_addrlist.

MFC after:	2 weeks
2009-04-19 22:34:35 +00:00
Robert Watson
b132600ab2 In divert_packet(), lock the interface address list before iterating over
it in search of an address.

MFC after:	2 weeks
2009-04-19 22:29:16 +00:00
Robert Watson
9317b04e46 Lock interface address lists in in_pcbladdr() when searching for a
source address for a connection and there's no route or now interface
for the route.

MFC after:	2 weeks
2009-04-19 22:25:09 +00:00
Robert Watson
8021456a24 Protect against some writer-writer races in in_control() by acquiring
the interface address list lock around interface address list
modifications.  More to do here.

MFC after:	2 weeks
2009-04-19 22:16:19 +00:00
Bruce M Simpson
b5fbc0b98f Now that IFF_NEEDSGIANT has been removed from the network
stack, catch up with this in IGMPv3 and remove dead code.
This has the side-effect of not being back-portable to RELENG_7
w/o further changes.
2009-04-19 08:14:21 +00:00
Kip Macy
65111ec7aa - Allocate a small flowtable in ip_input.c (changeable by tuneable)
- Use for accelerating ip_output
2009-04-19 04:44:05 +00:00
Kip Macy
ab25fa3558 s/void/void */ 2009-04-16 23:02:56 +00:00
Kip Macy
114f15c686 restore spare pointers for MFCing 2009-04-16 22:47:43 +00:00
Kip Macy
279aa3d419 Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
Kip Macy
8b12a7c2a6 - convert pspare pointers in inpcb to an llentry and rtentry cache
- add flags to indicate their validity
2009-04-15 22:22:00 +00:00
Kip Macy
773b573a96 - add second flags field to to inpcb
- update comments in vflag
2009-04-15 22:09:42 +00:00
Kip Macy
82c33e73f2 provide additional convenience macros for inpcb locking (upgrade, downgrade, exclusive) 2009-04-15 21:39:56 +00:00
Kip Macy
582b6122ab make LLTABLE visible to netinet 2009-04-15 20:49:59 +00:00
Kip Macy
de4ab55e43 add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry
2009-04-15 20:34:19 +00:00
Randall Stewart
e261340ef7 Add missing address lock when we look at the ifa list 2009-04-14 19:20:27 +00:00
Randall Stewart
544e35bd97 Move the flight size reduction to right after
we recognize its a retransmit, ahead of the PR-SCTP
work. Without this fix, we end up NOT reducing flight
size and causing an miscalculation when PR-SCTP is active
and data is skipped.

Obtained from:	Michael Tuexen.
2009-04-14 07:50:29 +00:00
Robert Watson
de231a063a Put TCPSTAT_ADD() and TCPSTAT_INC() behind _KERNEL.
MFC after:	3 days
2009-04-12 21:28:35 +00:00
Robert Watson
6bf65bcf3a Update stats in struct carpstats using two new macros: CARPSTATS_ADD()
and CARPSTATS_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the implementation
of these statistics, such as using per-CPU versions of the data
structure.

MFC after:	3 days
2009-04-12 14:19:37 +00:00
Robert Watson
07cf7ab29c Update stats in struct pimstat using two new macros: PIMSTAT_ADD()
and PIMSTAT_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structure.

MFC after:	3 days
2009-04-12 14:06:26 +00:00
Robert Watson
fb83a36856 Update stats in struct mrtstat using two new macros: MRTSTAT_ADD()
and MRTSTAT_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structure.

MFC after:	3 days
2009-04-12 14:00:36 +00:00
Robert Watson
bd88cce2ed Update stats in struct igmpstat using two new macros:
IGMPSTAT_ADD() and IGMPSTAT_INC(), rather than directly
manipulating the fields of the structure.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-12 13:41:13 +00:00
Robert Watson
e27b0c8775 Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after:	3 days
2009-04-12 13:22:33 +00:00
Robert Watson
026decb8f3 Update stats in struct udpstat using two new macros, UDPSTAT_ADD()
and UDPSTAT_INC(), rather than directly manipulating the fields
across the kernel.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structures.

MFC after:	3 days
2009-04-12 11:42:40 +00:00
Robert Watson
86425c62a0 Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel.  This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 23:35:20 +00:00
Robert Watson
78b5071407 Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and
TCPSTAT_INC(), rather than directly manipulating the fields across the
kernel.  This will make it easier to change the implementation of
these statistics, such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 22:07:19 +00:00
Paolo Pisati
50d25dda1b What's the point of adjusting a checksum if we are going to toss the
packet? Anticipate the check/return code.
2009-04-11 15:26:31 +00:00
Paolo Pisati
ea80b0ac03 Plug two bugs introduced with modules conversion:
-UdpAliasIn(): correctly check return code after modules ran.
-alias_nbt: in case of malformed packets (or some other unrecoverable
 error), toss the packet.
2009-04-11 15:19:09 +00:00
Paolo Pisati
1cd68a24c7 Remove stale comments. 2009-04-11 15:05:19 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Kip Macy
80cb9f211a Import "flowid" support for serializing flows across transmit queues
Reviewed by:	rwatson and jeli
2009-04-10 06:16:14 +00:00
Luigi Rizzo
4bb7ae9deb Add emulation of delay profiles, which lets you model various
types of MAC overheads such as preambles, link level retransmissions
and more.

Note- this commit changes the userland/kernel ABI for pipes
(but not for ordinary firewall rules) so you need to rebuild
kernel and /sbin/ipfw to use dummynet features.

Please check the manpage for details on the new feature.

The MFC would be trivial but it breaks the ABI, so it will
be postponed until after 7.2 is released.

Interested users are welcome to apply the patch manually
to their RELENG_7 tree.

Work supported by the European Commission, Projects Onelab and
Onelab2 (contract 224263).
2009-04-09 12:46:00 +00:00
Randall Stewart
abe15ad66c Fix a FR bug. When doing PR-SCTP with number rtx
set to a low number. The check for skipping was in the
incorrect place. Which meant we would FR chunks we
should not.
MFC after:	1 Month
2009-04-08 12:52:05 +00:00
Randall Stewart
e29d4aa6bd Add more padding and a new variable. This will
help us be able to keep ABI compatibility between
8 and 9.
MFC after:	Never
2009-04-08 12:49:36 +00:00
Paolo Pisati
43197d291a -don't pass down, to module's fingerprint function, unused data like
a pointer to the ip header.
-style
-spacing
2009-04-08 11:56:49 +00:00
Bjoern A. Zeeb
970caf60dd With the right comparison we get a proper wscale value and thus
more adequate TCP performance with IPv6.

Changes for IPv4, r166403 and r172795, both ignored the
IPv6 counterpart and left it in the state of art of year 2000.

The same logic in syncache already shares code between v4 and v6 so
things do not need to be adapted there.

Reported by:	Steinar Haug (sthaug nethelp.no)
Tested by:	Steinar Haug (sthaug nethelp.no)
MFC after:	3 days
2009-04-07 14:42:40 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Alexander Kabaev
024a4bd626 If KTR_SUBSYS is compiled in, it does not necessarily mean that user
is interested in being spammed by mcast-related printfs.

Use proper check against ktr_mask instead KTR_COMPILE.
2009-04-05 23:25:06 +00:00
Bruce M Simpson
448895b7fc Fix mbuf chain layout pessimization:
in the case where a single mbuf is allocated due to
 m_getcl() returning NULL, we already call MH_ALIGN,
 so do not increment m->m_data in this case.

Found during MLDv2 port.
2009-04-04 15:32:23 +00:00
Bruce M Simpson
0fd99912de Do not obliterate QQI with MAXRESP.
Found during MLDv2 port.
2009-04-04 15:26:32 +00:00
Randall Stewart
8933fa13b6 Many bug fixes (from the IETF hack-fest):
- PR-SCTP had major issues when skipping through a multi-part message.
  o Did not look at socket buffer.
  o Did not properly handle the reassmebly queue.
  o The MARKED segments could interfere and un-skip a chunk causing
    a problem with the proper FWD-TSN.
  o No FR of FWD-TSN's was being done.
- NR-Sack code was basically disabled. It needed fixes that
  never got into the real code.
- CMT code had issues when the two paths were NOT the same b/w. We
  found a few small bugs, but also the critcal one here was not
  dividing the rwnd amongst the paths.

Obtained from:	Michael Tuexen and myself at the IETF hack-fest ;-)
2009-04-04 11:43:32 +00:00
Paolo Pisati
eb2e411915 Implement an ipfw action to reassemble ip packets: reass. 2009-04-01 20:23:47 +00:00
Bruce M Simpson
5b35d05538 Don't call m_freem() after ip_output(), as it always consumes
the mbuf chain provided to it.

Found by:	Pierre Guinoiseau
2009-03-24 01:22:12 +00:00
Juli Mallett
34f27ade44 Remove local in6_addr variables for local and foreign addresses in sysctl_drop,
they were passed uninitialized to in6_pcblookup_hash.  Instead, do as is done
for IPv4 and use the addresses within the sockaddr structure, which are
correctly populated.

This fixes tcpdrop(8) for IPv6 address pairs.

Reviewed by:	bz
2009-03-22 00:45:47 +00:00
Bruce M Simpson
545dff6fd1 Fix brainos introduced during mechanical KTR change.
Pointy hat to:	bms
2009-03-20 13:13:50 +00:00
Bruce M Simpson
98b59af731 Cleanup: Nuke debug.mrtdebug, and replace it with KTR. 2009-03-19 14:14:21 +00:00
Bruce M Simpson
443fc3176d Introduce a number of changes to the MROUTING code.
This is purely a forwarding plane cleanup; no control plane
code is involved.

Summary:
 * Split IPv4 and IPv6 MROUTING support. The static compile-time
   kernel option remains the same, however, the modules may now
   be built for IPv4 and IPv6 separately as ip_mroute_mod and
   ip6_mroute_mod.
 * Clean up the IPv4 multicast forwarding code to use BSD queue
   and hash table constructs. Don't build our own timer abstractions
   when ratecheck() and timevalclear() etc will do.
 * Expose the multicast forwarding cache (MFC) and virtual interface
   table (VIF) as sysctls, to reduce netstat's dependence on libkvm
   for this information for running kernels.
   * bandwidth meters however still require libkvm.
 * Make the MFC hash table size a boot/load-time tunable ULONG,
   net.inet.ip.mfchashsize (defaults to 256).
 * Remove unused members from struct vif and struct mfc.
 * Kill RSVP support, as no current RSVP implementation uses it.
   These stubs could be moved to raw_ip.c.
 * Don't share locks or initialization between IPv4 and IPv6.
 * Don't use a static struct route_in6 in ip6_mroute.c.
   The v6 code is still using a cached struct route_in6, this is
   moved to mif6 for the time being.
 * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.

v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.

Reviewed by:	Pavlin Radoslavov
2009-03-19 01:43:03 +00:00
Bruce M Simpson
1975dc405a Comment IGMP_PIM as being very historic, as in, don't use. 2009-03-19 01:15:26 +00:00
Bruce M Simpson
56663a40eb Deal with the case where ifma_protospec may be NULL, during
any IPv4 multicast operations which reference it.

There is a potential race because ifma_protospec is set to NULL
when we discover the underlying ifnet has gone away. This write
is not covered by the IF_ADDR_LOCK, and it's difficult to widen
its scope without making it a recursive lock. It isn't clear why
this manifests more quickly with 802.11 interfaces, but does not
seem to manifest at all with wired interfaces.

With this change, the 802.11 related panics reported by sam@
and cokane@ should go away. It is not the right fix, that requires
more thought before 8.0.

Idea from:	sam
Tested by:	cokane
2009-03-17 14:41:54 +00:00
Robert Watson
e5adda3d51 Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free.  This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile.  They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

        if_ar
        if_axe
        if_aue
        if_cdce
        if_cue
        if_kue
        if_ray
        if_rue
        if_rum
        if_sr
        if_udav
        if_ural
        if_zyd

Drivers that were already disabled because of tty changes:

        if_ppp
        if_sl

Discussed on:	arch@
2009-03-15 14:21:05 +00:00
Robert Watson
ad71fe3c35 Correct a number of evolved problems with inp_vflag and inp_flags:
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted.  Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):

  INP_TIMEWAIT
  INP_SOCKREF
  INP_ONESBCAST
  INP_DROPPED

Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.

MFC after:      1 week (or after dependencies are MFC'd)
Reviewed by:    bz
2009-03-15 09:58:31 +00:00
Randall Stewart
49633f4b36 Opps.. I missed a file on the commit :-) 2009-03-14 23:13:16 +00:00
David Schultz
b3c11b5b91 Namespace: Defining htonl() and friends here instead of arpa/inet.h is
a BSD extension.
2009-03-14 20:16:54 +00:00
Randall Stewart
0c0982b80c Fixes several PR-SCTP releated bugs.
- When sending large PR-SCTP messages over a
   lossy link we would incorrectly calculate the fwd-tsn
 - When receiving large multipart pr-sctp packets we would
   incorrectly send back a SACK that would renege improperly
   on already received packets thus causing unneeded retransmissions.
2009-03-14 13:42:13 +00:00
Robert Watson
111d57a69c Add INP_INHASHLIST flag for inpcb->inp_flags to indicate whether
or not the inpcb is currenty on various hash lookup lists, rather
than using (lport != 0) to detect this.  This means that the full
4-tuple of a connection can be retained after close, which should
lead to more sensible netstat output in the window between TCP
close and socket close.

MFC after:	2 weeks
2009-03-11 00:29:22 +00:00
Robert Watson
4cf172fd65 Remove unused v6 macro aliases for inpcb fields:
in6p_ip6_nxt
        in6p_vflag
        in6p_flags
        in6p_socket
        in6p_lport
        in6p_fport
        in6p_ppcb

Remove unused v6 macro aliases for inpcb flags:

        IN6P_HIGHPORT
        IN6P_LOWPORT
        IN6P_ANONPORT
        IN6P_RECVIF
        IN6P_MTUDISC
        IN6P_FAITH
        IN6P_CONTROLOPTS

References to in6p_lport and in6_fport in sockstat are also replaced with
normal inp_lport and inp_fport references.

MFC after:	3 days
Reviewed by:	bz
2009-03-10 17:57:41 +00:00
Bruce M Simpson
30e239fe64 Don't print inm_print() chatter when KTR_IGMPV3 is not enabled
in the KTR_COMPILE mask.

Found by:	gnn
2009-03-10 17:48:49 +00:00
Robert Watson
b9bbb597b1 Remove now-unused INP_UNMAPPABLEOPTS.
MFC after:	3 days
Discussed with:	bz
2009-03-10 11:04:19 +00:00
Bruce M Simpson
c75aa3548f Fix uninitialized use of ifp for ii.
Found by:	Peter Holm
2009-03-09 22:54:17 +00:00
Bruce M Simpson
d10910e6ce Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.
2009-03-09 17:53:05 +00:00
Marius Strobl
c89c8a1029 On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR:		131921
MFC after:	1 week
2009-03-07 19:08:58 +00:00
Randall Stewart
5171328bd6 Fixes for window probes:
1) WP should never be marked unless flight size is 0
 2) When recovering from wp if the peer ack's it we don't mark for retran
 3) When recovering, we must assure a timer is still running.
2009-03-06 11:03:52 +00:00
Randall Stewart
dfb11ef895 - PR-SCTP bug, where the CUM-ACK was not being updated
into the advance_peer_ack point so we would incorrectly
  send a wrong value in the FWD-TSN
- PR-SCTP bug, where an PR packet is used for a window
  probe which could incorrectly get the packet moved
  back into the send_queue, which will cause major issues and
  should not happen.
- Fix a trace to use the proper macro.
2009-03-04 20:54:42 +00:00
Bruce M Simpson
8b889dbb9e In ip_output(), do not acquire the IN_MULTI_LOCK(),
and do not attempt to perform a group lookup.
This is a socket layer lock, and the bottom half of IP
really has no business taking it.

Use the value of the in_mcast_loop sysctl to determine
if we should loop back by default, in the absence of
any multicast socket options. Because the check on
group membership is now deferred to the input path,
an m_copym() is now required.

This should increase multicast send performance where the
source has not requested loopback, although this has not been
benchmarked or measured.

It is also a necessary change for IN_MULTI_LOCK to become
non-recursive, which is required in order to implement IGMPv3
in a thread-safe way.
2009-03-04 03:45:34 +00:00
Bruce M Simpson
dd7fd7c07c Add sysctl net.inet.ip.mcast.loop. This controls whether or not
IPv4 multicast sends are looped back to senders by default
on a stack-wide basis, rather than relying on the socket option.
Note that the sysctl only applies to newly created multicast sockets.
2009-03-04 03:40:02 +00:00
Bruce M Simpson
346e3178ea Merge header file definitions used by the new IGMPv3 implementation.
This is a partial merge. Compatibility defines are retained for
the existing IGMPv2 implementation.
2009-03-04 03:22:03 +00:00
Bruce M Simpson
b554b6ca91 Add various defines/macros required by IGMPv3:
* MCAST_UNDEFINED state.
 * in_allhosts() macro (group is 224.0.0.1).
   This uses a const endian comparison.
 * IP_MAX_GROUP_SRC_FILTER, IP_MAX_SOCK_SRC_FILTER
   default resource limits.
2009-03-04 03:01:05 +00:00
Bruce M Simpson
f0dcb78326 Add function ip_checkrouteralert(), which will be used
by IGMPv3 to check for the IPv4 Router Alert [RFC2113]
option in a pulled-up IP mbuf chain.
2009-03-04 02:51:22 +00:00
Bjoern A. Zeeb
1263305f0c Start removing IPv6 Type 0 Routing header code.
RH0 was deprecated by RFC 5095.

While most of the code had been disabled by #if 0 already, leave a
bit of infrastructure for possible RH2 code and a log message under
BURN_BRIDGES in case a user still tries to send RH0 packets.

Reviewed by:	gnn (a bit back, earlier version)
2009-03-03 13:12:12 +00:00
Luigi Rizzo
ac6bb60e0a curr_time is a 64 bit variable so SYSCTL_LONG is not appropriate
as a handler.
The variable was exported only for debugging, but there is little reason
to do it now that the timekeeping is supported by various other variables.
For the time being just comment out the sysctl, but I think this
should go away.
2009-03-02 22:16:50 +00:00
Luigi Rizzo
0906f40fd8 fw_debug has been unused for ages, so remove it from the list
of sysctl_variables.
I would also remove it from the VNET record but I am unsure if
there is any ABI issue -- so for the time being just mark it as
unused in ip_fw.h, and then we will collect the garbage at some
appropriate time in the future.

MFC after:	3 days
2009-03-02 22:11:48 +00:00
Bjoern A. Zeeb
2bebb49117 Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by:	peter
2009-03-01 11:01:00 +00:00
Robert Watson
8e5057ed20 Remove unreachable code for generating RST segments from tcp_twcheck();
this code became stale when T/TCP support was removed.

Discussed with:	bz, sam
MFC after:	1 month
2009-02-28 22:58:52 +00:00
Randall Stewart
8aae94933f Fix the add stream feature of strm-reset to really work:
- Fix the copy, we can't do a blind copy but must transfer
   the data from the old to the new.
 - Fix the ACK processing so we properly stop retransmitting
   the thing.
 - Fix it so if we get a retran we will properly reply with
   the saved response without doing anything.

MFC after:	1 month
2009-02-27 20:54:45 +00:00