Commit Graph

156 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
abbe8356ea In nd6_options() ignore the RFC 6106 options completely rather than printing
them if nd6_debug is enabled as unknown.  Leave a comment about the RFC4191
option as I am undecided so far.

Discussed with:	hrs
MFC after:	3 days
2012-03-04 18:51:45 +00:00
Hiroki Sato
a1875676ca Remove a redundant check. 2012-03-02 07:22:04 +00:00
Bjoern A. Zeeb
81d5d46b3c Add multi-FIB IPv6 support to the core network stack supplementing
the original IPv4 implementation from r178888:

- Use RT_DEFAULT_FIB in the IPv4 implementation where noticed.
- Use rt*fib() KPI with explicit RT_DEFAULT_FIB where applicable in
  the NFS code.
- Use the new in6_rt* KPI in TCP, gif(4), and the IPv6 network stack
  where applicable.
- Split in6_rtqtimo() and in6_mtutimo() as done in IPv4 and equally
  prevent multiple initializations of callouts in in6_inithead().
- Use wrapper functions where needed to preserve the current KPI to
  ease MFCs.  Use BURN_BRIDGES to indicate expected future cleanup.
- Fix (related) comments (both technical or style).
- Convert to rtinit() where applicable and only use custom loops where
  currently not possible otherwise.
- Multicast group, most neighbor discovery address actions and faith(4)
  are locked to the default FIB.  Individual IPv6 addresses will only
  appear in the default FIB, however redirect information and prefixes
  of connected subnets are automatically propagated to all FIBs by
  default (mimicking IPv4 behavior as closely as possible).

Sponsored by:	Cisco Systems, Inc.
2012-02-03 13:08:44 +00:00
Sergey Kandaurov
8e4609a4a3 Remove unused variable.
The actual ia6->ia6_lifetime access is hidden in
IFA6_IS_INVALID/IFA6_IS_DEPRECATED macros since a long time ago
(see netinet6/nd6.c, r1.104 of KAME for the reference).

MFC after:	3 days
2012-01-25 08:53:42 +00:00
John Baldwin
137f91e80f Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
John Baldwin
3b0b2840be Use queue(3) macros instead of home-rolled versions in several places in
the INET6 code.  This includes retiring the 'ndpr_next' and 'pfr_next'
macros.

Submitted by:	pluknet (earlier version)
Reviewed by:	pluknet
2011-12-29 18:25:18 +00:00
Gleb Smirnoff
08b68b0e4c A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
Qing Li
0f1aca6519 A default route learned from the RAs could be deleted manually
after its installation. This removal may be accidental and can
prevent the default route from being installed in the future if
the associated default router has the best preference. The cause
is the lack of status update in the default router on the state
of its route installation in the kernel FIB. This patch fixes
the described problem.

Reviewed by:	hrs, discussed with hrs
MFC after:	5 days
2011-11-11 23:22:38 +00:00
Hiroki Sato
154d5f7321 Fix a problem that an interface unexpectedly becomes IFF_UP by
just doing "ifconfing inet6 -ifdisabled" when the interface has
ND6_IFF_AUTO_LINKLOCAL flag and no link-local address.
2011-10-16 19:46:52 +00:00
Bjoern A. Zeeb
d7ae37140a Fix an obvious bug from r186196 shadowing a variable, not correctly
appending the new mbuf to the chain reference but possibly causing an mbuf
nextpkt loop leading to a memory used after handoff (or having been freed)
and leaking an mbuf here.

Reviewed by:	rwatson, brooks
MFC after:	3 days
2011-09-30 18:20:16 +00:00
Hiroki Sato
23be782526 Do not activate automatic LL addr configuration when 0/1->1 transition of
ND6_IFF_IFDISABLED flag.
2011-06-06 04:12:57 +00:00
Hiroki Sato
77bc49858c - Make the code more proactively clear an ND6_IFF_IFDISABLED flag when
an explicit action for INET6 configuration happens.  The changes are:

  1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
     setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
     an attempt to clear the ND6_IFF_IFDISABLED flag.

  2. When an AF_INET6 address is added successfully to an interface and
     it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
     ND6_IFF_IFDISABLED happens.

  This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
  in most cases manual configuration is no longer needed.

- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
  an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
  a link-local address.

  This change ensures link-local address configuration when "ifconfig IF inet6"
  command is invoked.  For example, "ifconfig IF inet6 auto_linklocal" now
  always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
  set to 1 (i.e. down/up cycle is no longer needed).

Reviewed by:	bz
2011-06-06 02:37:38 +00:00
Hiroki Sato
e7fa8d0ada - Accept Router Advertisement messages even when net.inet6.ip6.forwarding=1.
- A new per-interface knob IFF_ND6_NO_RADR and sysctl IPV6CTL_NO_RADR.
  This controls if accepting a route in an RA message as the default route.
  The default value for each interface can be set by net.inet6.ip6.no_radr.
  The system wide default value is 0.

- A new sysctl: net.inet6.ip6.norbit_raif.  This controls if setting R-bit in
  NA on RA accepting interfaces.  The default is 0 (R-bit is set based on
  net.inet6.ip6.forwarding).

Background:

 IPv6 host/router model suggests a router sends an RA and a host accepts it for
 router discovery.  Because of that, KAME implementation does not allow
 accepting RAs when net.inet6.ip6.forwarding=1.  Accepting RAs on a router can
 make the routing table confused since it can change the default router
 unintentionally.

 However, in practice there are cases where we cannot distinguish a host from
 a router clearly.  For example, a customer edge router often works as a host
 against the ISP, and as a router against the LAN at the same time.  Another
 example is a complex network configurations like an L2TP tunnel for IPv6
 connection to Internet over an Ethernet link with another native IPv6 subnet.
 In this case, the physical interface for the native IPv6 subnet works as a
 host, and the pseudo-interface for L2TP works as the default IP forwarding
 route.

Problem:

 Disabling processing RA messages when net.inet6.ip6.forwarding=1 and
 accepting them when net.inet6.ip6.forward=0 cause the following practical
 issues:

 - A router cannot perform SLAAC.  It becomes a problem if a box has
   multiple interfaces and you want to use SLAAC on some of them, for
   example.  A customer edge router for IPv6 Internet access service
   using an IPv6-over-IPv6 tunnel sometimes needs SLAAC on the
   physical interface for administration purpose; updating firmware
   and so on (link-local addresses can be used there, but GUAs by
   SLAAC are often used for scalability).

 - When a host has multiple IPv6 interfaces and it receives multiple RAs on
   them, controlling the default route is difficult.  Router preferences
   defined in RFC 4191 works only when the routers on the links are
   under your control.

Details of Implementation Changes:

 Router Advertisement messages will be accepted even when
 net.inet6.ip6.forwarding=1.  More precisely, the conditions are as
 follow:

 (ACCEPT_RTADV && !NO_RADR && !ip6.forwarding)
	=> Normal RA processing on that interface. (as IPv6 host)

 (ACCEPT_RTADV && (NO_RADR || ip6.forwarding))
	=> Accept RA but add the router to the defroute list with
	   rtlifetime=0 unconditionally.  This effectively prevents
	   from setting the received router address as the box's
	   default route.

 (!ACCEPT_RTADV)
	=> No RA processing on that interface.

 ACCEPT_RTADV and NO_RADR are per-interface knob.  In short, all interface
 are classified as "RA-accepting" or not.  An RA-accepting interface always
 processes RA messages regardless of ip6.forwarding.  The difference caused by
 NO_RADR or ip6.forwarding is whether the RA source address is considered as
 the default router or not.

 R-bit in NA on the RA accepting interfaces is set based on
 net.inet6.ip6.forwarding.  While RFC 6204 W-1 rule (for CPE case) suggests
 a router should disable the R-bit completely even when the box has
 net.inet6.ip6.forwarding=1, I believe there is no technical reason with
 doing so.  This behavior can be set by a new sysctl net.inet6.ip6.norbit_raif
 (the default is 0).

Usage:

 # ifconfig fxp0 inet6 accept_rtadv
	=> accept RA on fxp0
 # ifconfig fxp0 inet6 accept_rtadv no_radr
	=> accept RA on fxp0 but ignore default route information in it.
 # sysctl net.inet6.ip6.norbit_no_radr=1
	=> R-bit in NAs on RA accepting interfaces will always be set to 0.
2011-06-06 02:14:23 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Bjoern A. Zeeb
1d5089c2c2 Loosen the locking in nd6-free() again after r216022 to avoid
a LOR and a recursed lock.

Reported by:	delphij
Tested by:	delphij
PR:		kern/148857
MFC After:	3 days
2010-12-07 22:43:29 +00:00
Bjoern A. Zeeb
e6950476b9 Plug well observed races on la_hold entries with the callout handler.
Call the handler function with the lock held, return unlocked as we
might free the entry.  Rework functions later in the call graph to be
either called with the lock held or, only if needed, unlocked.

Place asserts to document and tighten assumptions on various lle locking,
which were not always true before.

We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was
decentralized to minimize possible complexity introduced with the formerly
missing locking there.  This also resulted in a push down of local
variable scopes into smaller blocks.

Reported by:	many
PR:		kern/148857
Submitted by:	Dmitrij Tejblum (tejblum yandex-team.ru) (original version)
MFC After:	4 days
2010-11-29 00:04:08 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Bjoern A. Zeeb
683525038b Do not initialize flag variables before needed.
Consistently use the LLE_ prefix for lla_lookup() and the ND6_ prefix
for nd6_lookup() even though both are defined the same. Use the right
flag variable when checking each.

No real functional change.

MFC after:	4 days
2010-11-17 10:43:20 +00:00
Bjoern A. Zeeb
20723e34e3 No need to re-initialize the callout. We initially do it in in6_lltable_new()
right after allocation.  Worse, we are losing the right flags here.

MFC after:	4 days
2010-11-17 09:25:08 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
Ana Kukec
1db8d1f843 MFp4: anchie_soc2009 branch:
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.

The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.

Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space.  The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.

The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.

Approved by:	bz (mentor)
2010-08-19 11:31:03 +00:00
Bjoern A. Zeeb
19291ab3de Document the mandatory argument to the arptimer() and
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.

In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer.  It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.

Reviewed by:	rwatson
MFC after:	6 days
2010-07-31 21:33:18 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb
d715e397f0 We are holding a write lock here so avoid aquiring it twice calling
the "locked" version rather than the wrapper function.

MFC after:	6 days
2010-03-25 10:29:00 +00:00
Qing Li
c1752bcd65 Use reference counting instead of locking to secure an address while
that address is being used to generate temporary IPv6 address. This
approach is sufficient and avoids recursive locking.

MFC after:	3 days
2010-02-27 07:12:25 +00:00
Qing Li
baf7c37373 Multiple IPv6 addresses of the same prefix can be installed on the
same interface. The first address will install the prefix route into
the kernel routing table and that prefix will be marked as on-link.
Without RADIX_MPATH enabled, the other address aliases of the same
prefix will update the prefix reference count but no other routes
will be installed. Consequently the prefixes associated with these
addresses would not be marked as on-link. As such, incoming packets
destined to these address aliases will fail the ND6 on-link check
on input. This patch fixes the above problem by searching the kernel
routing table and try to find an on-link prefix on the given interface.

MFC after:	5 days
2009-12-30 21:51:23 +00:00
Hajimu UMEMOTO
ef8d671cca - We are not guaranteed that we're not dropping a reference that
we did not add.  Call LLE_REMREF() only when callout_stop()
  actually canceled a pending callout.
- callout_reset() may cancel a pending callout.  When
  callout_reset() canceled a pending callout, call LLE_REMREF()
  to drop a reference for the canceled callout.

MFC after:	1 week
2009-11-12 14:48:36 +00:00
Hajimu UMEMOTO
f0c0b1430c CURVNET_RESTORE() was not called in certain cases.
MFC after:	3 days
2009-11-11 08:28:18 +00:00
Hajimu UMEMOTO
287e3cb475 Make nd6_llinfo_timer() does its job, again. ln->la_expire was
greater than time_second, in most cases.

MFC after:	3 days
2009-11-06 17:34:26 +00:00
Hajimu UMEMOTO
2eb10edccb Don't call LLE_FREE() after nd6_free().
MFC after:	3 days
2009-11-06 10:07:38 +00:00
Hiroki Sato
a283298ce3 Improve flexibility of receiving Router Advertisement and
automatic link-local address configuration:

- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
  default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
  global knob.  The default value of the sysctl is 0.

- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
  sysctl net.inet6.ip6.auto_linklocal to one for its default
  value.  The default value of the sysctl is 1.

- Make ND6_IFF_IFDISABLED more robust.  It can be used to disable
  IPv6 functionality of an interface now.

- Receiving RA is allowed if ip6_forwarding==0 *and*
  ND6_IFF_ACCEPT_RTADV is set on that interface.  The former
  condition will be revisited later to support a "host + router" box
  like IPv6 CPE router.  The current behavior is compatible with
  the older releases of FreeBSD.

- The ifconfig(8) now supports these ND6 flags as well as "nud",
  "prefer_source", and "disabled" in ndp(8).  The ndp(8) now
  supports "auto_linklocal".

Discussed with:	bz and jinmei
Reviewed by:	bz
MFC after:	3 days
2009-09-12 22:08:20 +00:00
Robert Watson
77dfcdc445 Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
1e77c1056a Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Robert Watson
d1da0a0672 Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by:	bz
MFC after:	6 weeks
2009-06-25 16:35:28 +00:00
Robert Watson
80af0152f3 Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead).  Adopt
the code styles and conventions present in netinet where possible.

Reviewed by:	gnn, bz
MFC after:	6 weeks (possibly not MFCable?)
2009-06-24 21:00:25 +00:00
Robert Watson
8c0fec805f Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references.  The following routines now return references:

  ifaddr_byindex
  ifa_ifwithaddr
  ifa_ifwithbroadaddr
  ifa_ifwithdstaddr
  ifa_ifwithnet
  ifaof_ifpforaddr
  ifa_ifwithroute
  ifa_ifwithroute_fib
  rt_getifa
  rt_getifa_fib
  IFP_TO_IA
  ip_rtaddr
  in6_ifawithifp
  in6ifa_ifpforlinklocal
  in6ifa_ifpwithaddr
  in6_ifadd
  carp_iamatch6
  ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

  IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc).  In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit.  Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by:	bz
Obtained from:	Apple, Inc. (portions)
MFC after:	6 weeks (portions)
2009-06-23 20:19:09 +00:00
Bjoern A. Zeeb
8d8bc0182e After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
Marko Zec
bc29160df3 Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state.  The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers.  Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels.  Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by:	bz, julian
Approved by:	rwatson, kib (re), julian (mentor)
2009-06-08 17:15:40 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Marko Zec
0733f6a615 Remove an #undef MIN that slipped under the radar and led me to
hastily introduce an #define MIN() a few lines below in r191816.

Approved by:	julian (mentor)
Discussed with:	bz
2009-06-01 20:59:40 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Robert Watson
c4dd3fe108 Prefer structure fields (ifa_link) to macro aliases for them
(ifa_list).

MFC after:	2 weeks
2009-04-20 22:45:21 +00:00
Robert Watson
1e6a41398c Acquire interface address list lock around access to if_addrhead,
closing several writer-writer races, and some read-write races.

MFC after:	2 weeks
2009-04-20 21:37:46 +00:00
Robert Watson
f68ffa034b Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually
accessing queue(9) structure fields for if_addrhead.

Prefer FreeBSD field name if_addrhead to compatibility macro
if_addrlist.

MFC after:	2 weeks
2009-04-20 21:05:37 +00:00
Kip Macy
279aa3d419 Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
Robert Watson
e27b0c8775 Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after:	3 days
2009-04-12 13:22:33 +00:00