206 Commits

Author SHA1 Message Date
tuexen
493873a6ef MFC r264241:
Call sctp_addr_change() from rt_addrmsg() instead of rt_newaddrmsg_fib(),
since rt_addrmsg() gets also called from other functions.
2014-06-22 16:36:14 +00:00
asomers
322a1ee4a0 MFC r264887
Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
        In rtinit1, use the interface fib instead of the process fib.  The
        latter wasn't very useful because ifconfig(8) is usually invoked
        with the default process fib.  Changing ifconfig(8) to use setfib(2)
        would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
        Clear the expected ATF failure

sys/net/if.c
        Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
        Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
        in_scrubprefix.  Pass it the interface fib.
2014-06-06 21:45:14 +00:00
asomers
a8aa481895 MFC changes relating to running multiple interfaces on different fibs but
with addresses on the same subnet.

MFC r266860

Fix unintended KBI change from r264905.  Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr()  The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
        Add legacy-compatible functions as described above.  Ensure legacy
        behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
        Call with _fib() functions if we must use a specific fib, or the
        legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
        Improve the udp_dontroute test.  The bug that this test exercises is
        that ifa_ifwithnet() will return the wrong address, if multiple
        interfaces have addresses on the same subnet but with different
        fibs.  The previous version of the test only considered one possible
        failure mode: that ifa_ifwithnet_fib() might fail to find any
        suitable address at all.  The new version also checks whether
        ifa_ifwithnet_fib() finds the correct address by checking where the
        ARP request goes.

MFC r264917

Style fixes, mostly trailing whitespace elimination.  No functional change.

MFC r264905

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related.  The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
        Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr.  Those
        functions will only return an address whose interface fib equals the
        argument.

sys/net/route.c
        Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
        arguments.

sys/netinet/in.c
        Update in_addprefix to consider the interface fib when adding
        prefixes.  This will prevent it from not adding a subnet route when
        one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
        Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
        In some cases it there wasn't a clear specific fib number to use.
        In others, I was unable to test those functions so I chose
        RT_DEFAULT_FIB to minimize divergence from current behavior.  I will
        fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
        Revert r263738.  The udp_dontroute test was right all along.
        However, bugs kern/187550 and kern/187553 cancelled each other out
        when it came to this test.  Because of kern/187553, ifa_ifwithnet
        searched the default fib instead of the requested one, but because
        of kern/187550, there was an applicable subnet route on the default
        fib.  The new test added in r263738 doesn't work right, however.  I
        can verify with dtrace that ifa_ifwithnet returned the wrong address
        before I applied this commit, but route(8) miraculously found the
        correct interface to use anyway.  I don't know how.

        Clear expected failure messages for kern/187550 and kern/187552.

MFC r263738

tests/sys/netinet/Makefile
tests/sys/netinet/fibs.sh
        Replace fibs:udp_dontroute with fibs:src_addr_selection_by_subnet.
        The original test was poorly written; it was actually testing
        kern/167947 instead of the desired kern/187553.  The root cause of the
        bug is that ifa_ifwithnet did not have a fib argument.  The new test
        more directly targets that behavior.

tests/sys/netinet/udp_dontroute.c
        Delete the auxilliary binary used by the old test
2014-06-06 20:35:40 +00:00
melifaro
aaa6b80bb3 Merge 260488, r260508.
r260488:
  Split rt_newaddrmsg_fib() into two different functions.
  Adding/deleting interface addresses involves access to 3 different subsystems,
  int different parts of code. Each call can fail, so reporting successful
  operation by rtsock in the middle of the process error-prone.

  Further split routing notification API and actual rtsock calls via creating
  public-available rt_addrmsg() / rt_routemsg() functions with "private"
  rtsock_* backend.

r260508:
  Simplify inet alias handling code: if we're adding/removing alias which
  has the same prefix as some other alias on the same interface, use
  newly-added rt_addrmsg() instead of hand-rolled in_addralias_rtmsg().

  This eliminates the following rtsock messages:

  Pinned RTM_ADD for prefix (for alias addition).
  Pinned RTM_DELETE for prefix (for alias withdrawal).

  Example (got 10.0.0.1/24 on vlan4, playing with 10.0.0.2/24):

  before commit, addition:

    got message of size 116 on Fri Jan 10 14:13:15 2014
    RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

    got message of size 192 on Fri Jan 10 14:13:15 2014
    RTM_ADD: Add Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
    locks:  inits:
    sockaddrs: <DST,GATEWAY,NETMASK>
     10.0.0.0 10.0.0.2 (255) ffff ffff ff

  after commit, addition:

    got message of size 116 on Fri Jan 10 13:56:26 2014
    RTM_NEWADDR: address being added to iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 14.0.0.2 14.0.0.255

  before commit, wihdrawal:

    got message of size 192 on Fri Jan 10 13:58:59 2014
    RTM_DELETE: Delete Route: len 192, pid: 0, seq 0, errno 0, flags:<UP,PINNED>
    locks:  inits:
    sockaddrs: <DST,GATEWAY,NETMASK>
     10.0.0.0 10.0.0.2 (255) ffff ffff ff

    got message of size 116 on Fri Jan 10 13:58:59 2014
    RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  adter commit, withdrawal:

    got message of size 116 on Fri Jan 10 14:14:11 2014
    RTM_DELADDR: address being removed from iface: len 116, metric 0, flags:
    sockaddrs: <NETMASK,IFP,IFA,BRD>
     255.255.255.0 vlan4:8.0.27.c5.29.d4 10.0.0.2 10.0.0.255

  Sending both RTM_ADD/RTM_DELETE messages to rtsock is completely wrong
  (and requires some hacks to keep prefix in route table on RTM_DELETE).

  I've tested this change with quagga (no change) and bird (*).

  bird alias handling is already broken in *BSD sysdep code, so nothing
  changes here, too.

  I'm going to MFC this change if there will be no complains about behavior
  change.

  While here, fix some style(9) bugs introduced by r260488
  (pointed by glebius and bde).
2014-05-08 21:03:31 +00:00
melifaro
5ca6003c5c Merge r260379, r260460.
r260379:
  Partially fix IPv4 interface routes deletion in RADIX_MPATH.

  Noticed by:   Nikolay Denev <ndenev at gmail.com>

r260460:
  Constanly use RT_ALL_FIBS everywhere instead of -1.
2014-05-08 20:41:39 +00:00
melifaro
d42ec49fe7 Merge r259528, r259528, r260295.
r259528:
  Simplify contiguous mask checking.

  Suggested by: glebius

r260228:
  Remove useless register variable modifiers.
  Do some more style(9).

r260295:
  Change semantics for rnh_lookup() function: now
  it performs exact match search, regardless of netmask existance.
  This simplifies most of rnh_lookup() consumers.

  Fix panic triggered by deleting non-existent host route.

  PR:           kern/185092
  Submitted by: Nikolay Denev <ndenev at gmail.com>
2014-05-08 20:27:06 +00:00
glebius
a25c39725c Merge r263203: garbage collect long time obsoleted (or never used) stuff
from routing API.
2014-04-09 11:15:50 +00:00
glebius
03fdc2934e Merge r262763, r262767, r262771, r262806 from head:
- Remove rt_metrics_lite and simply put its members into rtentry.
  - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
    removes another cache trashing ++ from packet forwarding path.
  - Create zini/fini methods for the rtentry UMA zone. Via initialize
    mutex and counter in them.
  - Fix reporting of rmx_pksent to routing socket.
  - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
2014-03-21 15:15:30 +00:00
glebius
ed41469327 Merge r261582, r261601, r261610, r261613, r261627, r261640, r261641, r261823,
r261825, r261859, r261875, r261883, r261911, r262027, r262028, r262029,
      r262030, r262162 from head.

  Large flowtable revamp. See commit messages for merged revisions for
  details.

Sponsored by:	Netflix
2014-03-04 15:14:47 +00:00
rodrigc
a19b1d3a58 MFC 258591
In vnet_route_uninit(), free some memory that is allocated in vnet_route_init().

To reproduce the problem:
  (1)  Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
       INVARIANTS.
  (2)  Run this command in a loop:
       jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

       see: http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html
            http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021291.html

This doesn't eliminate all the "Freed UMA keg was not empty" warning messages
on the console, but it helps.

Approved by: re (gjb)
2013-12-04 07:55:49 +00:00
melifaro
967c651cd7 Fix rte leak introduced in r248070.
MFC after:	2 weeks
2013-05-18 07:10:22 +00:00
julian
329247aec2 Finally change the mbuf to have its own fib field instead of stealing
4 flag bits. This was supposed to happen in 8.0, and again in 2012..

MFC after:	never
2013-05-16 16:20:17 +00:00
melifaro
fccbb24392 Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with:	andre, eri
MFC after:	3 weeks
2013-03-08 20:33:50 +00:00
melifaro
2a77ba4103 Write lock is not required for find&compare operation.
MFC after:	2 weeks
2013-03-05 13:38:45 +00:00
bz
a45ce69b30 Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file.  With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.

This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore.  This means that thet multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.

Ok.ed by:	julian, hrs, melifaro
MFC after:	2 weeks
2012-03-18 11:23:40 +00:00
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
bz
417a8b2daa Replace random ARIN direct assignment legacy IPs with proper RFC 5735
TEST-NET1 block for use in documentation and example code addresses.

MFC after:	3 days
2012-01-24 15:20:31 +00:00
glebius
c43116e67e Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib(). 2011-12-15 12:49:10 +00:00
qingli
9ae130094e The host-id/interface-id can have a specific value and is properly
masked out when adding a prefix route through the "route" command.
However, when deleting the route, simply changing the command keyword
from "add" to "delete" does not work. The failoure is observed in
both IPv4 and IPv6 route insertion. The patch makes the route command
behavior consistent between the "add" and the "delete" operation.

MFC after:	1 week
2011-10-25 00:34:39 +00:00
bz
a13ffdabcc Pass the fibnum where we need filtering of the message on the
rtsock allowing routing daemons to filter routing updates on an
rtsock per FIB.

Adjust raw_input() and split it into wrapper and a new function
taking an optional callback argument even though we only have one
consumer [1] to keep the hackish flags local to rtsock.c.

PR:		kern/134931
Submitted by:	multiple (see PR)
Suggested by:	rwatson [1]
Reviewed by:	rwatson
MFC after:	3 days
2011-09-28 13:48:36 +00:00
kmacy
99851f359e In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
kevlo
c7105822b4 In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is
initialized by flags (function argument) or-ed with ifa->ifa_flags.
If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s).
As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with
RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set
to zero in rtrequest1_fib(), and request to add network route is changed
under hands to request to add host route.

Tested by:	Andrew Boyer <aboyer at averesystems.com>
Submitted by:	Svatopluk Kraus <onwahe at gmail dot com>
Approved by:	re (hrs)
2011-08-08 05:25:51 +00:00
bz
a8e6967ab3 Garbage collect never used global, sysctl, externs.
MFC after:	1 week
2011-06-21 07:19:03 +00:00
bz
98020d5ece Leave an extra comment about flowtable and IPv6 support rectifying a
previous comment.

MFC after:	1 week
2011-06-20 12:35:12 +00:00
dchagin
a96a94aaf2 ouch, newrt is used on the return path, my fault.
Partialy revert the previous change.

MFC after:	1 Week.
2011-03-19 21:10:57 +00:00
dchagin
b3fb445553 A bit rearranged rtalloc1_fib() code.
Initialize a variable when it is really needed.
To avoid code duplication move the miss label to line up and jump on it.

MFC after:	1 Week
2011-03-19 19:50:36 +00:00
dchagin
9dffdca01d Remove a now unused variable.
MFC after:	1 Week
2011-03-19 16:52:06 +00:00
brucec
6d9b42b486 Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
mdf
5e41205b16 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the net* piece.
2011-01-12 19:53:50 +00:00
dim
fb307d7d1d After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
dim
fda4020a88 Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
qingli
f6ab4a6810 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
bz
0a90ef1728 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
qingli
93013817b0 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
qingli
ed965a92bc The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00
luigi
003a092f8a Move the scan for max_keylen into route.c::route_init(),
and make max_keylen an argument for rn_init().
This removes an unnecessary dependency on domain.h from radix.c

MFC after:	7 days
2009-12-14 20:12:51 +00:00
tuexen
7b17948ff8 Fix a LOR showing up with sctp_bsd_addr(): Do not hold a rt lock
when calling rt_newaddrmsg().

Reviewed by: qingli
Approved by: rrs (mentor)
MFC after: 1 month
2009-11-17 12:57:10 +00:00
bz
1e3cae3b31 Put #ifdef INET around parts of the FLOWTABLE code, to unbreak
nooptions INET kernel builds.

MFC after:	3 days
X-MFC:		with r197687
2009-10-03 10:56:03 +00:00
qingli
42eac0e4cd The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.

Submitted by:	simon, phk
Reviewed by:	bz, kmacy
MFC after:	3 days
2009-10-01 20:32:29 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
rwatson
b3be1c6e3b Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
rwatson
88f8de4d40 Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
kmacy
ccb66bdf1d Re-factoring for adding weighted routes introduced a
fairly irritating bug where the system will panic
when RADIX_MPATH is enabled. This change fixes this.

Approved by:	re@
2009-07-11 21:56:23 +00:00
rwatson
c9ef486fe1 Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references.  The following routines now return references:

  ifaddr_byindex
  ifa_ifwithaddr
  ifa_ifwithbroadaddr
  ifa_ifwithdstaddr
  ifa_ifwithnet
  ifaof_ifpforaddr
  ifa_ifwithroute
  ifa_ifwithroute_fib
  rt_getifa
  rt_getifa_fib
  IFP_TO_IA
  ip_rtaddr
  in6_ifawithifp
  in6ifa_ifpforlinklocal
  in6ifa_ifpwithaddr
  in6_ifadd
  carp_iamatch6
  ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

  IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc).  In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit.  Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by:	bz
Obtained from:	Apple, Inc. (portions)
MFC after:	6 weeks (portions)
2009-06-23 20:19:09 +00:00
bz
309ab541f2 Move virtualization of routing related variables into their own
Vimage module, which had been there already but now is stateful.

All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.

Add a missing function local variable in case of MPATHing.

Reviewed by:	zec
2009-06-22 17:48:16 +00:00
bz
425e742e59 Collect all VIMAGE_GLOBALS variables in one place.
No longer export rt_tables as all lookups go through
rt_tables_get_rnh().

We cannot make rt_tables (and rtstat, rttrash[1]) static as
netstat -r (-rs[1]) would stop working on a stripped
VIMAGE_GLOBALS kernel.

Reviewed by:		zec
Presumably broken by:	phk 13.5y ago in r12820 [1]
2009-06-22 15:07:12 +00:00
rwatson
3c02410d55 Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present.  In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after:	3 weeks
2009-06-22 10:59:34 +00:00
rwatson
1f7e54e8c5 Clean up common ifaddr management:
- Unify reference count and lock initialization in a single function,
  ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
  reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after:	3 weeks
2009-06-21 19:30:33 +00:00
zec
8b1f38241a Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state.  The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers.  Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels.  Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by:	bz, julian
Approved by:	rwatson, kib (re), julian (mentor)
2009-06-08 17:15:40 +00:00