Commit Graph

229 Commits

Author SHA1 Message Date
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Marko Zec
0593983963 Remove an apparently redundant CURVNET_SET() / CURVNET_RESTORE() pair.
MFC after:	3 days
2010-11-22 14:16:23 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
George V. Neville-Neil
e162ea60d4 Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry.  This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out.  The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by:	bz, rpaulo
MFC after: 3 weeks
2010-11-12 22:03:02 +00:00
John Baldwin
33b31db666 Don't leak the LLE lock if the arptimer callout is pending or inactive.
Reported by:	David Rhodus
MFC after:	1 month
2010-11-02 13:00:56 +00:00
Gleb Smirnoff
27bf126d23 Remove meaningless XXXXX, that is a remain of comment, removed in r186200. 2010-10-29 11:13:42 +00:00
Gleb Smirnoff
28e1f17c81 Revert a small part of the r198301, that is entirely unrelated to the
r198301 itself. It also broke the logic of not sending more than one
ARP request per second, that consequently lead to a potential problem
of flooding network with broadcast packets.

MFC after:	1 week
2010-10-29 10:57:18 +00:00
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Bjoern A. Zeeb
19291ab3de Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer.  It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by:	rwatson
MFC after:	6 days
2010-07-31 21:33:18 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Qing Li
ee8a75d320 An existing incomplete ARP entry would expire a subsequent
statically configured entry of the same host. This bug was
due to the expiration timer was not cancelled when installing
the static entry. Since there exist a potential race condition
with respect to timer cancellation, simply check for the
LLE_STATIC bit inside the expiration function instead of
cancelling the active timer.

MFC after:	5 days
2010-01-05 00:35:46 +00:00
Navdeep Parhar
567145993f Avoid NULL dereference in arpresolve. 2010-01-03 06:43:13 +00:00
Qing Li
6cb2b4e7a8 Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by:	ru
MFC after:	3 days
2009-10-23 18:27:34 +00:00
Qing Li
fc02323563 In the ARP callout timer expiration function, the current time_second
is compared against the entry expiration time value (that was set based
on time_second) to check if the current time is larger than the set
expiration time. Due to the +/- timer granularity value, the comparison
returns false, causing the alternative code to be executed. The
alternative code path freed the memory without removing that entry
from the table list, causing a use-after-free bug.

Reviewed by:	discussed with kmacy
MFC after:	immediately
Verified by:	rnoland, yongari
2009-10-20 17:55:42 +00:00
Qing Li
93704ac5d7 This patch fixes the following issues in the ARP operation:
1. There is a regression issue in the ARP code. The incomplete
   ARP entry was timing out too quickly (1 second timeout), as
   such, a new entry is created each time arpresolve() is called.
   Therefore the maximum attempts made is always 1. Consequently
   the error code returned to the application is always 0.
2. Set the expiration of each incomplete entry to a 20-second
   lifetime.
3. Return "incomplete" entries to the application.

Reviewed by:	kmacy
MFC after:	3 days
2009-10-15 06:12:04 +00:00
Qing Li
cd29a7797d This patch enables the node to respond to ARP requests for
configured proxy ARP entries.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 18:39:27 +00:00
Navdeep Parhar
9a31144537 Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by:	gnn, kmacy
Approved by:	gnn (mentor)
MFC after:	1 month
2009-09-08 21:17:17 +00:00
George V. Neville-Neil
54fc657d59 Add ARP statistics to the kernel and netstat.
New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after:	2 weeks
In collaboration with: bz
2009-09-03 21:10:57 +00:00
Bjoern A. Zeeb
cc7e9d4325 In case an upper layer protocol tries to send a packet but the
L2 code does not have the ethernet address for the destination
within the broadcast domain in the table, we remember the
original mbuf in `la_hold' in arpresolve() and send out a
different packet with an arp request.
In case there will be more upper layer packets to send we will
free an earlier one held in `la_hold' and queue the new one.

Once we get a packet in, with which we can perfect our arp table
entry we send out the original 'on hold' packet, should there
be any.
Rather than continuing to process the packet that we received,
we returned without freeing the packet that came in, which
basically means that we leaked an mbuf for every arp request
we sent.

Rather than freeing the received packet and returning, continue
to process the incoming arp packet as well.
This should (a) improve some setups, also proxy-arp, in case it was an
incoming arp request and (b) resembles the behaviour FreeBSD had
from day 1, which alignes with RFC826 "Packet reception" (merge case).

Rename 'm0' to 'hold' to make the code more understandable as
well as diffable to earlier versions more easily.

Handle the link-layer entry 'la' lock comepletely in the block
where needed and release it as early as possible, rather than
holding it longer, down to the end of the function.

Found by:			pointyhat, ns1
Bug hunting session with:	erwin, simon, rwatson
Tested by:			simon on cluster machines
Reviewed by:			ratson, kmacy, julian
MFC after:			3 days
2009-09-01 17:53:01 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Qing Li
df813b7ea2 This patch does the following:
- Allow loopback route to be installed for address assigned to
      interface of IFF_POINTOPOINT type.
    - Install loopback route for an IPv4 interface addreess when the
      "useloopback" sysctl variable is enabled. Similarly, install
      loopback route for an IPv6 interface address when the sysctl variable
      "nd6_useloopback" is enabled. Deleting loopback routes for interface
      addresses is unconditional in case these sysctl variables were
      disabled after an interface address has been assigned.

Reviewed by:	bz
Approved by:	re
2009-07-27 17:08:06 +00:00
Robert Watson
1e77c1056a Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Robert Watson
2d9cfabad4 Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
Robert Watson
f8574c7a22 Add missing unlock of if_addr_mtx when an unmatched ARP packet is received.
Reported by:	lstewart
MFC after:	6 weeks
2009-06-24 14:49:26 +00:00
Robert Watson
09d547787f In ARP input, more consistently acquire and release ifaddr references.
MFC after:	6 weeks
2009-06-24 10:33:35 +00:00
Bjoern A. Zeeb
5736e6fb9d After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
Bjoern A. Zeeb
8d8bc0182e After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Robert Watson
d4b5cae49b Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
  workstream, or set of per-protocol queues.  Threads may be bound
  to specific CPUs, or allowed to migrate, based on a global policy.

  In the future it would be desirable to support topology-centric
  policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
  currently be one of:

  NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
    an implicit or explicit source (such as an interface or socket).

  NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
    as well as allowing protocols to provide a flow generation function
    for mbufs without flow identifers (m2flow).  Falls back on
    NETISR_POLICY_SOURCE if now flow ID is available.

  NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
    each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
  being used, as well as a mapping function from workstream to CPU ID,
  which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
  get and clear drop counters, which query data or apply changes across
  all workstreams.

- Add a more extensible netisr registration interface, in which
  protocols declare 'struct netisr_handler' structures for each
  registered NETISR_ type.  These include name, handler function,
  optional mbuf to flow ID function, optional mbuf to CPU ID function,
  queue limit, and ordering policy.  Padding is present to allow these
  to be expanded in the future.  If no queue limit is declared, then
  a default is used.

- Queue limits are now per-workstream, and raised from the previous
  IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
  with the exception of netnatm, default queue limits.  Most protocols
  register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
  NETISR_POLICY_FLOW, and will therefore take advantage of driver-
  generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
  the netisr, rather than having polling pretend to be two protocols.
  Provide two explicit hooks in the netisr worker for start and end
  events for runs: netisr_poll() and netisr_pollmore(), as well as a
  function, netisr_sched_poll(), to allow the polling code to schedule
  netisr execution.  DEVICE_POLLING still embeds single-netisr
  assumptions in its implementation, so for now if it is compiled into
  the kernel, a single and un-bound netisr thread is enforced
  regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking.  It will be enabled once further rmlock
optimization has taken place.  However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by:	bz
2009-06-01 10:41:38 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Kip Macy
279aa3d419 Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
Kip Macy
582b6122ab make LLTABLE visible to netinet 2009-04-15 20:49:59 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Bruce M Simpson
d10910e6ce Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.
2009-03-09 17:53:05 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Kip Macy
5e96c0a13e Fix missed unlock and reference drop of lle
Found by: pho
2008-12-24 05:31:26 +00:00
Qing Li
ce9122fd3e Don't create a bogus ARP entry for 0.0.0.0. 2008-12-23 03:33:32 +00:00
Qing Li
897d75c98e The proxy-arp code was broken and responds to ARP
requests for addresses that are not proxied locally.
2008-12-19 11:07:34 +00:00
Kip Macy
00a46b3122 default to doing lla_lookup with shared afdata lock and returning a
shared lock on the lle - thus restoring parallel performance to
pre-arpv2 level
2008-12-17 00:14:28 +00:00
Kip Macy
86cd829d64 don't unlock lle if it is NULL 2008-12-16 02:48:12 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
Christian S.J. Peron
4e57bc3338 in_rtalloc1(9) returns a locked route, so make sure that we use
RTFREE_LOCKED() here.  This macro makes sure the reference count
on the route is being managed properly.  This elimates another
case which results in the following message being printed to the
console:

rtfree: 0xc841ee88 has 1 refs

Reviewed by:	bz
MFC after:	2 weeks
2008-12-06 19:09:38 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Marko Zec
97021c2464 Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by:  bz, julian
Approved by:  julian (mentor)
Obtained from:        //depot/projects/vimage-commit2/...
X-MFC after:  never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-11-26 22:32:07 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
Marko Zec
3ff0b2135b Remove a useless global static variable.
Approved by:	bz (ad-hoc mentor)
2008-10-16 12:31:03 +00:00