Commit Graph

928 Commits

Author SHA1 Message Date
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Marko Zec
5f416f8e84 Make indentation more uniform accross vnet container structs.
This is a purely cosmetic / NOP change.

Reviewed by:	bz
Approved by:	julian (mentor)
Verified by:	svn diff -x -w producing no output
2009-05-02 08:16:26 +00:00
Bruce M Simpson
a3d3b633a9 Limit scope of acquisition of INP_RLOCK for multicast input filter
to the scope of its use, even though this may thrash the lock if
the INP is referenced for other purposes.

Tested by:	David Wolfskill
2009-05-01 11:05:24 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Bruce M Simpson
33cde13046 Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev.  Summary of changes:

 * Connect netinet6/in6_mcast.c to build.
   The legacy KAME KPIs are mostly preserved.
 * Eliminate now dead code from ip6_output.c.
   Don't do mbuf bingo, we are not going to do RFC 2292 style
   CMSG tricks for multicast options as they are not required
   by any current IPv6 normative reference.
 * Refactor transports (UDP, raw_ip6) to do own mcast filtering.
   SCTP, TCP unaffected by this change.
 * Add ip6_msource, in6_msource structs to in6_var.h.
 * Hookup mld_ifinfo state to in6_ifextra, allocate from
   domifattach path.
 * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
   Kernel consumers which need this should use in6m_lookup().
 * Refactor IPv6 socket group memberships to use a vector (like IPv4).
 * Update ifmcstat(8) for IPv6 SSM.
 * Add witness lock order for IN6_MULTI_LOCK.
 * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
 * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
 * Update carp(4) for new IPv6 SSM KPIs.
 * Virtualize ip6_mrouter socket.
   Changes mostly localized to IPv6 MROUTING.
 * Don't do a local group lookup in MROUTING.
 * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
 * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
 * Bump __FreeBSD_version to 800084.
 * Update UPDATING.

NOTE WELL:
 * This code hasn't been tested against real MLDv2 queriers
   (yet), although the on-wire protocol has been verified in Wireshark.
 * There are a few unresolved issues in the socket layer APIs to
   do with scope ID propagation.
 * There is a LOR present in ip6_output()'s use of
   in6_setscope() which needs to be resolved. See comments in mld6.c.
   This is believed to be benign and can't be avoided for the moment
   without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
Bruce M Simpson
0279cfbe91 Add MLDv2 protocol header, but do not connect it to the build. 2009-04-29 11:31:23 +00:00
Bruce M Simpson
8f002c6ce7 Import IPv6 SSM module but do not connect it to the build. 2009-04-29 11:26:45 +00:00
Bruce M Simpson
ba970783a9 Add IN6ADDR_LINKLOCAL_ALLV2ROUTERS_INIT, in6addr_linklocal_allv2routers
for use by MLDv2.
Add IPv6 SSM socket layer membership vector size constants and
tree bounds.
Remove unreferenced struct ipv6_mreq_source; SSM for IPv6 goes
straight to the RFC 3678 socket options.
2009-04-29 10:22:44 +00:00
Marko Zec
093f25f8c8 In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by:	bz (an older version of the patch)
Approved by:	julian (mentor)
2009-04-26 22:06:42 +00:00
Bjoern A. Zeeb
3f795dd3c7 Compare protosw pointer with NULL.
MFC after:	1 month
2009-04-23 17:41:54 +00:00
Robert Watson
93c83dd8bf Assert the interface address list lock in IFP_TO_IA6(), as it will
iterate the interface address list.  Marginally expand IF_ADDR_LOCK()
coverage in mld6.c to make sure it's held when IFP_TO_IA6() is called.

MFC after:	2 weeks
2009-04-20 22:56:34 +00:00
Robert Watson
c4dd3fe108 Prefer structure fields (ifa_link) to macro aliases for them
(ifa_list).

MFC after:	2 weeks
2009-04-20 22:45:21 +00:00
Robert Watson
1e6a41398c Acquire interface address list lock around access to if_addrhead,
closing several writer-writer races, and some read-write races.

MFC after:	2 weeks
2009-04-20 21:37:46 +00:00
Robert Watson
f68ffa034b Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually
accessing queue(9) structure fields for if_addrhead.

Prefer FreeBSD field name if_addrhead to compatibility macro
if_addrlist.

MFC after:	2 weeks
2009-04-20 21:05:37 +00:00
Robert Watson
ac6ba96269 Close some but not all writer-writer races when maintaining IPv6
interface address lists by locking the interface address list lock.

MFC after:	2 weeks
2009-04-20 16:05:16 +00:00
Robert Watson
1e1d603e2f Lock interface address lists before iterating over them in nd6.
MFC after:	2 weeks
2009-04-20 14:41:23 +00:00
Kip Macy
279aa3d419 Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
Kip Macy
de4ab55e43 add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry
2009-04-15 20:34:19 +00:00
Robert Watson
e27b0c8775 Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after:	3 days
2009-04-12 13:22:33 +00:00
Robert Watson
f68f9f77fe Commit file omitted in r190962:
Update stats in struct udpstat using two new macros, UDPSTAT_ADD()
and UDPSTAT_INC(), rather than directly manipulating the fields
across the kernel.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structures.

MFC after:    3 days
2009-04-12 11:53:12 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Bruce M Simpson
443fc3176d Introduce a number of changes to the MROUTING code.
This is purely a forwarding plane cleanup; no control plane
code is involved.

Summary:
 * Split IPv4 and IPv6 MROUTING support. The static compile-time
   kernel option remains the same, however, the modules may now
   be built for IPv4 and IPv6 separately as ip_mroute_mod and
   ip6_mroute_mod.
 * Clean up the IPv4 multicast forwarding code to use BSD queue
   and hash table constructs. Don't build our own timer abstractions
   when ratecheck() and timevalclear() etc will do.
 * Expose the multicast forwarding cache (MFC) and virtual interface
   table (VIF) as sysctls, to reduce netstat's dependence on libkvm
   for this information for running kernels.
   * bandwidth meters however still require libkvm.
 * Make the MFC hash table size a boot/load-time tunable ULONG,
   net.inet.ip.mfchashsize (defaults to 256).
 * Remove unused members from struct vif and struct mfc.
 * Kill RSVP support, as no current RSVP implementation uses it.
   These stubs could be moved to raw_ip.c.
 * Don't share locks or initialization between IPv4 and IPv6.
 * Don't use a static struct route_in6 in ip6_mroute.c.
   The v6 code is still using a cached struct route_in6, this is
   moved to mif6 for the time being.
 * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.

v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.

Reviewed by:	Pavlin Radoslavov
2009-03-19 01:43:03 +00:00
Robert Watson
e5adda3d51 Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free.  This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile.  They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

        if_ar
        if_axe
        if_aue
        if_cdce
        if_cue
        if_kue
        if_ray
        if_rue
        if_rum
        if_sr
        if_udav
        if_ural
        if_zyd

Drivers that were already disabled because of tty changes:

        if_ppp
        if_sl

Discussed on:	arch@
2009-03-15 14:21:05 +00:00
Robert Watson
ad71fe3c35 Correct a number of evolved problems with inp_vflag and inp_flags:
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted.  Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):

  INP_TIMEWAIT
  INP_SOCKREF
  INP_ONESBCAST
  INP_DROPPED

Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.

MFC after:      1 week (or after dependencies are MFC'd)
Reviewed by:    bz
2009-03-15 09:58:31 +00:00
Marius Strobl
c89c8a1029 On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR:		131921
MFC after:	1 week
2009-03-07 19:08:58 +00:00
Bjoern A. Zeeb
1263305f0c Start removing IPv6 Type 0 Routing header code.
RH0 was deprecated by RFC 5095.

While most of the code had been disabled by #if 0 already, leave a
bit of infrastructure for possible RH2 code and a log message under
BURN_BRIDGES in case a user still tries to send RH0 packets.

Reviewed by:	gnn (a bit back, earlier version)
2009-03-03 13:12:12 +00:00
Bjoern A. Zeeb
2bebb49117 Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by:	peter
2009-03-01 11:01:00 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Bjoern A. Zeeb
61cab5d638 Shuffle the vimage.h includes or add where missing. 2009-02-27 13:22:26 +00:00
Robert Watson
a714e55f73 Assert the radix head lock in in6_rtqkill().
MFC after:	3 days
2009-02-23 22:58:59 +00:00
Bjoern A. Zeeb
97aa4a517a Try to remove/assimilate as much of formerly IPv4/6 specific
(duplicate) code in sys/netipsec/ipsec.c and fold it into
common, INET/6 independent functions.

The file local functions ipsec4_setspidx_inpcb() and
ipsec6_setspidx_inpcb() were 1:1 identical after the change
in r186528. Rename to ipsec_setspidx_inpcb() and remove the
duplicate.

Public functions ipsec[46]_get_policy() were 1:1 identical.
Remove one copy and merge in the factored out code from
ipsec_get_policy() into the other. The public function left
is now called ipsec_get_policy() and callers were adapted.

Public functions ipsec[46]_set_policy() were 1:1 identical.
Rename file local ipsec_set_policy() function to
ipsec_set_policy_internal().
Remove one copy of the public functions, rename the other
to ipsec_set_policy() and adapt callers.

Public functions ipsec[46]_hdrsiz() were logically identical
(ignoring one questionable assert in the v6 version).
Rename the file local ipsec_hdrsiz() to ipsec_hdrsiz_internal(),
the public function to ipsec_hdrsiz(), remove the duplicate
copy and adapt the callers.
The v6 version had been unused anyway. Cleanup comments.

Public functions ipsec[46]_in_reject() were logically identical
apart from statistics. Move the common code into a file local
ipsec46_in_reject() leaving vimage+statistics in small AF specific
wrapper functions. Note: unfortunately we already have a public
ipsec_in_reject().

Reviewed by:	sam
Discussed with:	rwatson (renaming to *_internal)
MFC after:	26 days
X-MFC:		keep wrapper functions for public symbols?
2009-02-08 09:27:07 +00:00
Jamie Gritton
67c19233f1 Don't bother null-checking the thread pointer before the prison checks
in udp6_connect (td is already dereferenced elsewhere without such a
check).  This makes the conversion from a sockaddr to a sockaddr_in6
always happen, so convert once at the beginning of the function rather
than twice in the middle.

Approved by:	bz (mentor)
2009-02-05 15:04:23 +00:00
Jamie Gritton
7c2f3cb964 Remove redundant calls of prison_local_ip4 in in_pcbbind_setup, and of
prison_local_ip6 in in6_pcbbind.

Approved by:	bz (mentor)
2009-02-05 14:25:53 +00:00
Jamie Gritton
b89e82dd87 Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
Bjoern A. Zeeb
5f16e341d4 When iterating through the list trying to find a router in
defrouter_select(), NULL the cached llentry after unlocking
as we are no longer interested in it and with the second
iteration would try to unlock it again resulting in
panic: Lock (rw) lle not locked @ ...

Reported by:	Mark Atkinson <m.atkinson@f5.com>
Tested by:	Mark Atkinson <m.atkinson@f5.com>
PR:		kern/128247 (in follow-up, unrelated to original report)
2009-02-04 10:35:27 +00:00
Randall Stewart
a99b67833a - Cleanup checksum code.
- Prepare for CRC offloading, add MIB counters (RS/MT).
- Bugfix: Disable CRC computation for IPv6 addresses with local scope (MT).
- Bugfix: Handle close() with SO_LINGER correctly when notifications
          are generated during the close() call(MT).
- Bugfix: Generate DRY event when sender is dry during subscription.
          Only for 1-to-1 style sockets (RS/MT)
- Bugfix: Put vtags for the correct amount of time into time-wait (MT).
- Bugfix: Clear vtag entries correctly on expiration (MT).
- Bugfix: shutdown() indicates ENOTCONN when called for unconnected
          1-to-1 style sockets (MT).
- Bugfix: In sctp Auth code (PL).
- Add support for devices that support SCTP csum offload (igb).
- Add missing sctp_associd to mib sysctl xsctp_tcb structure (RS)
Obtained from:	With help from Peter Lei and Michael Tuexen
2009-02-03 11:04:03 +00:00
Bjoern A. Zeeb
09f8c3ff36 Remove the single global unlocked route cache ip6_forward_rt
from the inet6 stack along with statistics and make sure we
properly free the rt in all cases.

While the current situation is not better performance wise it
prevents panics seen more often these days.
After more inet6 and ipsec cleanup we should be able to improve
the situation again passing the rt to ip6_forward directly.

Leave the ip6_forward_rt entry in struct vinet6 but mark it
for removal.

PR:		kern/128247, kern/131038
MFC after:	25 days
Committed from:	Bugathon #6
Tested by:	Denis Ahrens <denis@h3q.com> (different initial version)
2009-02-01 21:11:08 +00:00
Bjoern A. Zeeb
959e14c15e Remove unused local MACROs.
Submitted by:	Christoph Mallon christoph.mallon@gmx.de
MFC after:	2 weeks
2009-01-31 17:35:44 +00:00
Bjoern A. Zeeb
39f046dac2 Coalesce two consecutive #ifdef IPSEC blocks.
Move the skip_ipsec: label below the goto as we can never have
ipsecrt set if we get to that label so there is no need to check.

MFC after:	2 weeks
2009-01-31 12:24:53 +00:00
Bjoern A. Zeeb
e173d3df0c Remove dead code from #if 0:
we do not have an ipsrcchk_rt anywhere else.

MFC after:	2 weeks
2009-01-31 11:19:20 +00:00
Bjoern A. Zeeb
2e730bea0a Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code.  Also check that the rt is valid before freeing it
either way there.

PR:		kern/129793
Submitted by:	Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after:	2 weeks
Committed from:	Bugathon #6
2009-01-31 10:48:02 +00:00
Bjoern A. Zeeb
351c4745f1 Remove 4 entirely unsued ip6 variables.
Leave then in struct vinet6 to not break the ABI with kernel modules
but mark them for removal so we can do it in one batch when the time
is right.

MFC after:	1 month
2009-01-30 23:40:24 +00:00
Bjoern A. Zeeb
1cecba0fcd For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by:	jamie (as part of a larger patch)
MFC after:	1 week
2009-01-25 10:11:58 +00:00
Sam Leffler
cbd1844537 remove too noisy DIAGNOSTIC code
Reviewed by:	qingli
2009-01-18 07:20:02 +00:00
Qing Li
14981d8057 Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.
2009-01-12 11:24:32 +00:00
Bjoern A. Zeeb
813dd6ae5e Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR:		kern/68189
Submitted by:	Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with:	rwatson [2]
Reviewed by:	rwatson
MFC after:	4 weeks
2009-01-09 21:57:49 +00:00
Bjoern A. Zeeb
5ce0eb7f08 Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related
jail-aware. Up to now we returned the first address of the interface
for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for
programs querying for an address but running inside a jail, as the
address returned usually did not belong to the jail.
Like for v6, if there was an ifr_addr given on v4, you could probe
for more addresses on the interfaces that you were not allowed to see
from inside a jail. Return an error (EADDRNOTAVAIL) in that case
now unless the address is on the given interface and valid for the
jail.

PR:		kern/114325
Reviewed by:	rwatson
MFC after:	4 weeks
2009-01-09 13:06:56 +00:00
Randall Stewart
bbb0e3d9d5 Addresses Roberts comments on comments. Also adds
the KASSERT and checks suggested.

Reviewed by:	The udp tunneling was discussed on net@ under the
                thread entitled "Heads up -- Thinking about UDP and tunneling"
2009-01-06 13:27:56 +00:00
Randall Stewart
c7c7ea4b5a Add the ability of an alternate transport protocol
to easily tunnel over udp by providing a hook
function that will be called instead of appending
to the socket buffer.
2009-01-06 12:13:40 +00:00
Bjoern A. Zeeb
4b5c098fdf Switch the last protosw* structs to C99 initializers.
Reviewed by:	ed, julian, Christoph Mallon <christoph.mallon@gmx.de>
MFC after:	2 weeks
2009-01-05 20:29:01 +00:00
Robert Watson
5e48a30d2e Unlike with struct protosw, several instances of struct ip6protosw
did not use C99-style sparse structure initialization, so remove
NULL assignments for now-removed pr_usrreq function pointers.

Reported by:	Chris Ruiz <yr.retarded at gmail.com>
2009-01-04 21:53:42 +00:00
Robert Watson
cba318dc12 struct ip6protosw is a copy of struct protosw, so remove pr_usrreq there
to reflect removal from struct protosw.

Spotted by:	ed
2009-01-04 21:13:51 +00:00
Qing Li
dc49549713 Some modules such as SCTP supplies a valid route entry as an input argument
to ip_output(). The destionation is represented in a sockaddr{} object
that may contain other pieces of information, e.g., port number. This
same destination sockaddr{} object may be passed into L2 code, which
could be used to create a L2 entry. Since there exists a L2 table per
address family, the L2 lookup function can make address family specific
comparison instead of the generic bcmp() operation over the entire
sockaddr{} structure.

Note in the IPv6 case the sin6_scope_id is not compared because the
address is currently stored in the embedded form inside the kernel.
The in6_lltable_lookup() has to account for the scope-id if this
storage format were to change in the future.
2009-01-03 00:27:28 +00:00
Qing Li
8eca593c5a This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
   option "-iface". During if_attach(), an sockaddr_dl{} entry is created
   for the interface and is part of the interface address list. This
   sockaddr_dl{} entry describes the interface in detail. The "route"
   command selects this entry as the "gateway" object when the "-iface"
   option is present. The "arp" and "ndp" commands also interact with the
   kernel through the routing socket when adding and removing static L2
   entries. The static L2 information is also provided through the
   "gateway" object with an AF_LINK family type, similar to what is
   provided by the "route" command. In order to differentiate between
   these two types of operations, a RTF_LLDATA flag is introduced. This
   flag is set by the "arp" and "ndp" commands when issuing the add and
   delete commands. This flag is also set in each L2 entry returned by the
   kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
   is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
   in the fields for a "rtm" object, which is reinjected into the kernel by
   a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
   is a prefix route, so the RTF_LLDATA flag must be specified when issuing
   the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
   specification for retrieving L2 information. Also optimized the
   code logic.

Reviewed by:   julian
2008-12-26 19:45:24 +00:00
Kip Macy
ee6326a30b avoid lock recursion by deferring the link check until after LLE lock is dropped 2008-12-24 01:08:18 +00:00
Bjoern A. Zeeb
f5d35259fe Correct variable name in comment.
MFC after:	4 weeks
2008-12-22 12:54:52 +00:00
Qing Li
ebf1c74403 Similar to the INET case, do not destroy the nd6 entries for
interface addresses until those addresses are removed. I already
made the patch in INET but forgot to bring the code over for
INET6.
2008-12-22 07:11:15 +00:00
Bjoern A. Zeeb
099d0bd34b Only unlock the llentry if it is actually valid.
Reported by:	ed
2008-12-18 19:09:14 +00:00
Bjoern A. Zeeb
97590249ad Another step assimilating IPv[46] PCB code:
normalize IN6P_* compat flags usage to their equialent
INP_* counterpart.

Discussed with:	rwatson
Reviewed by:	rwatson
MFC after:	4 weeks
2008-12-17 13:00:18 +00:00
Bjoern A. Zeeb
dcdb4371ca Use inc_flags instead of the inc_isipv6 alias which so far
had been the only flag with random usage patterns.
Switch inc_flags to be used as a real bit field by using
INC_ISIPV6 with bitops to check for the 'isipv6' condition.

While here fix a place or two where in case of v4 inc_flags
were not properly initialized before.[1]

Found by:	rwatson during review [1]
Discussed with:	rwatson
Reviewed by:	rwatson
MFC after:	4 weeks
2008-12-17 12:52:34 +00:00
Qing Li
9928dafbb8 Remove the rt argument from nd6_storelladdr() because
rt is no longer accessed.
2008-12-17 10:27:34 +00:00
Qing Li
f16e1269b4 A couple of files were not meant to be committed. 2008-12-17 10:19:53 +00:00
Qing Li
bbd8aebaba in6_clsroute() was applied to prefix routes causing some
of them to expire. in6_clsroute() was only applied to
cloned routes that are no longer applicable after the
arp-v2 commit.
2008-12-17 10:03:49 +00:00
Kip Macy
a614678035 * Compare pointer with NULL
* Remove trailing whitespace (added in r186162)
* Reduce indentation by rephrasing test

Submitted by:	Christopher Mallon (christoph dot mallon at gmx dot de)
2008-12-16 23:56:24 +00:00
Kip Macy
fd14c50bbb - Simplify handling of the deferring of mbuf transmit until after lle lock drop
- add a couple of comments to clarify intent
2008-12-16 23:06:36 +00:00
Kip Macy
75bab8b81d check pointers against NULL 2008-12-16 06:01:08 +00:00
Kip Macy
aba53ef0a6 convert more pointer validation checks to checking against NULL 2008-12-16 03:12:44 +00:00
Kip Macy
d78be3a909 simplify locking in find_pfxlist_reachable_router 2008-12-16 03:05:18 +00:00
Kip Macy
23ee1bfa82 explicitly check return of lla_lookup against NULL 2008-12-16 02:47:22 +00:00
Kip Macy
15209fb6e8 advance tail pointer in nd6_output_lle and check lla_output return against NULL 2008-12-16 02:33:53 +00:00
Kip Macy
688d079b2d check return from lla_lookup against NULL not zero 2008-12-16 02:30:42 +00:00
Kip Macy
56c423b065 make sure redirect doesn't return without dropping the lock 2008-12-16 02:06:26 +00:00
Kip Macy
83904f7116 need to check that lle is not null before unlock if the break condition is not met
also fix the break condition to explicitly check against NULL
2008-12-16 02:05:11 +00:00
Kip Macy
6289115121 unlock the llentry after use in find_pfxlist_reachable_router 2008-12-16 01:58:30 +00:00
Qing Li
3d3728e9f8 Initialize the variable "router", and apply "static_route" flag
across the entire nd6_cache_lladdr() function.
2008-12-16 01:21:19 +00:00
Kip Macy
fbc2ca1bef unlock and destroy an llentry's lock before freeing
Found by: sam
2008-12-16 00:20:49 +00:00
Kip Macy
c0641cc03b unlock looked up llentrys in defrouter_select 2008-12-16 00:18:04 +00:00
Kip Macy
c2b3a02b38 fix two use after frees in nd6_cache_lladdr caused by last minute unlock shuffling 2008-12-16 00:16:51 +00:00
Bjoern A. Zeeb
fc384fa5d6 Another step assimilating IPv[46] PCB code - directly use
the inpcb names rather than the following IPv6 compat macros:
in6pcb,in6p_sp, in6p_ip6_nxt,in6p_flowinfo,in6p_vflag,
in6p_flags,in6p_socket,in6p_lport,in6p_fport,in6p_ppcb and
sotoin6pcb().

Apart from removing duplicate code in netipsec, this is a pure
whitespace, not a functional change.

Discussed with:	rwatson
Reviewed by:	rwatson (version before review requested changes)
MFC after:	4 weeks (set the timer and see then)
2008-12-15 21:50:54 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
Kip Macy
41c6def2d1 in6_addroute is called through rnh_addadr which is always called with the radix node head lock held
exclusively. Pass RTF_RNH_LOCKED to rtalloc so that rtalloc1_fib will not try to re-acquire the lock.
2008-12-13 20:15:42 +00:00
Bjoern A. Zeeb
1b193af610 Second round of putting global variables, which were virtualized
but formerly missed under VIMAGE_GLOBAL.

Put the extern declarations of the  virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

Sponsored by:	The FreeBSD Foundation
2008-12-13 19:13:03 +00:00
Kip Macy
7b5ba4e7f0 RTF_RNH_LOCKED needs to be passed in the flags arg not report,
apologies to thompsa
2008-12-12 02:07:45 +00:00
Andrew Thompson
84b5117529 Pass RTF_RNH_LOCKED to rtalloc1 sunce the node head is locked, this avoids a
recursive lock panic on inet6 detach.

Reviewed by:	kmacy
2008-12-12 01:46:59 +00:00
Bjoern A. Zeeb
86413abf5f Put a global variables, which were virtualized but formerly
missed under VIMAGE_GLOBAL.

Start putting the extern declarations of the  virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

While there garbage collect a few dead externs from ip6_var.h.

Sponsored by:	The FreeBSD Foundation
2008-12-11 16:26:38 +00:00
Marko Zec
385195c062 Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out.  Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs.  This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps.  PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw.  Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-12-10 23:12:39 +00:00
Warner Losh
609ff41f16 Add missing include to sys/lock.h before sys/rwlock.h 2008-12-08 00:28:21 +00:00
Kip Macy
3120b9d428 - convert radix node head lock from mutex to rwlock
- make radix node head lock not recursive
 - fix LOR in rtexpunge
 - fix LOR in rtredirect

Reviewed by:	sam
2008-12-07 21:15:43 +00:00
Randall Stewart
830d754d52 Code from the hack-session known as the IETF (and a
bit of debugging afterwards):
- Fix protection code for notification generation.
- Decouple associd from vtag
- Allow vtags to have less strigent requirements in non-uniqueness.
   o don't pre-hash them when you issue one in a cookie.
   o Allow duplicates and use addresses and ports to
     discriminate amongst the duplicates during lookup.
- Add support for the NAT draft draft-ietf-behave-sctpnat-00, this
  is still experimental and needs more extensive testing with the
  Jason Butt ipfw changes.
- Support for the SENDER_DRY event to get DTLS in OpenSSL working
  with a set of patches from Michael Tuexen (hopefully heading to OpenSSL soon).
- Update the support of SCTP-AUTH by Peter Lei.
- Use macros for refcounting.
- Fix MTU for UDP encapsulation.
- Fix reporting back of unsent data.
- Update assoc send counter handling to be consistent with endpoint sent counter.
- Fix a bug in PR-SCTP.
- Fix so we only send another FWD-TSN when a SACK arrives IF and only
  if the adv-peer-ack point progressed. However we still make sure
  a timer is running if we do have an adv_peer_ack point.
- Fix PR-SCTP bug where chunks were retransmitted if they are sent
  unreliable but not abandoned yet.

With the help of:	Michael Teuxen and Peter Lei :-)
MFC after:	 4 weeks
2008-12-06 13:19:54 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Bjoern A. Zeeb
413628a7e3 MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
  and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
  help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
  suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
  on cluster machines as well as all the testers and people
  who provided feedback the last months on freebsd-jail and
  other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by:	(see above)
MFC after:	3 months (this is just so that I get the mail)
X-MFC Before:   7.2-RELEASE if possible
2008-11-29 14:32:14 +00:00
Marko Zec
f02493cbbd Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-28 23:30:51 +00:00
Bjoern A. Zeeb
6aee2fc550 Merge in6_pcbfree() into in_pcbfree() which after the previous
IPsec change in r185366 only differed in two additonal IPv6 lines.
Rather than splattering conditional code everywhere add the v6
check centrally at this single place.

Reviewed by:	rwatson (as part of a larger changset)
MFC after:	6 weeks (*)
(*) possibly need to leave a stub wrapper in 7 to keep the symbol.
2008-11-27 12:04:35 +00:00
Bjoern A. Zeeb
6974bd9e75 Unify ipsec[46]_delete_pcbpolicy in ipsec_delete_pcbpolicy.
Ignoring different names because of macros (in6pcb, in6p_sp) and
inp vs. in6p variable name both functions were entirely identical.

Reviewed by:	rwatson (as part of a larger changeset)
MFC after:	6 weeks (*)
(*) possibly need to leave a stub wrappers in 7 to keep the symbols.
2008-11-27 10:43:08 +00:00
Marko Zec
97021c2464 Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by:  bz, julian
Approved by:  julian (mentor)
Obtained from:        //depot/projects/vimage-commit2/...
X-MFC after:  never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-11-26 22:32:07 +00:00
Bjoern A. Zeeb
0206cdb846 Remove in6_pcbdetach() as it is exactly the same function
as in_pcbdetach() and we don't need the code twice.

Reviewed by:	rwatson
MFC after:	6 weeks (*)
(*) possibly need to leave a stub wrapper in 7 to keep the symbol.
2008-11-26 20:52:26 +00:00
Bjoern A. Zeeb
a7df09e8c9 Unify the v4 and v6 versions of pcbdetach and pcbfree as good
as possible so that they are easily diffable.

No functional changes.

Reviewed by:	rwatson
MFC after:	6 weeks
2008-11-26 12:54:31 +00:00
Bjoern A. Zeeb
b0fab0344f Plug a credential leak in case the inpcb is freed by
in6_pcbfree() instead of in_pcbfree(); missed in r183606.

Reviewed by:	rwatson
MFC after:	3 days (instantly for 7.1-RC?)
2008-11-26 12:24:18 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00