Commit Graph

955 Commits

Author SHA1 Message Date
bms
f88254b35a Add missing #include <sys/ktr.h>.
Submitted by:	Hideki Yamamoto
MFC after:	1 week
2009-12-15 10:40:40 +00:00
bz
932cbdbe4d Throughout the network stack we have a few places of
if (jailed(cred))
left.  If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that,  as it is used
outside the network stack as well.

Discussed with:	julian, zec, jamie, rwatson (back in Sept)
MFC after:	5 days
2009-12-13 13:57:32 +00:00
bms
cb3a6b3546 Adapt r197136 to IPv6 stack:
Comment some flawed assumptions in in6p_join_group() about
  mixing SSM full-state and delta-based APIs.

MFC after:	1 day
2009-11-19 13:39:07 +00:00
bms
b006145221 Adapt r197135 to IPv6 stack:
Don't allow joins w/o source on an existing group.
  This is almost always pilot error.

  We don't need to check for group filter UNDEFINED state at t1,
  because we only ever allocate filters with their groups, so we
  unconditionally reject such calls with EINVAL.
  Trying to change the active filter mode w/o going through IPV6_MSFILTER
  is also disallowed.

MFC after:	1 day
2009-11-19 13:33:23 +00:00
bms
d8ea9b2a5b Adapt r197132 to IPv6 stack:
Tighten input checking in in6p_join_group():
   * Don't try to use the source address, when its family is unspecified.
   * If we get a join without a source, on an existing inclusive
     mode group, this is an error, as it would change the filter mode.

  Fix a problem with the handling of in6_mfilter for new memberships:
   * Do not rely on im6f being NULL; it is explicitly initialized to a
     non-NULL pointer when constructing a membership.
   * Explicitly initialize *im6f to EX mode when the source address
     is unspecified.

  This fixes a problem with in_mfilter slot recycling in the join path.

MFC after:	1 day
2009-11-19 13:30:06 +00:00
bms
375b60ebd5 Adapt r197314 to IPv6 stack:
Return ENOBUFS consistently if user attempts to exceed
  in_mcast_maxsocksrc resource limit.

MFC after:	1 day
2009-11-19 12:21:20 +00:00
bms
63de6a0a63 Adapt r197130 to IPv6 stack:
Fix an obvious logic error in the IPv4 multicast leave processing,
  where the filter mode vector was not updated correctly after the leave.

MFC after:	1 day
2009-11-19 12:18:30 +00:00
bms
028af3a421 Adapt the fix for IGMPv2 in r199287 for the IPv6 stack.
Only multicast routing is affected by the issue.

MFC after:	1 day
2009-11-19 11:55:19 +00:00
ume
ff25cdd646 - We are not guaranteed that we're not dropping a reference that
we did not add.  Call LLE_REMREF() only when callout_stop()
  actually canceled a pending callout.
- callout_reset() may cancel a pending callout.  When
  callout_reset() canceled a pending callout, call LLE_REMREF()
  to drop a reference for the canceled callout.

MFC after:	1 week
2009-11-12 14:48:36 +00:00
ume
d551d6d272 CURVNET_RESTORE() was not called in certain cases.
MFC after:	3 days
2009-11-11 08:28:18 +00:00
ume
8bc39a4def Make nd6_llinfo_timer() does its job, again. ln->la_expire was
greater than time_second, in most cases.

MFC after:	3 days
2009-11-06 17:34:26 +00:00
ume
e8e9c60f27 Don't call LLE_FREE() after nd6_free().
MFC after:	3 days
2009-11-06 10:07:38 +00:00
qingli
c96d27ad80 Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by:	ru
MFC after:	3 days
2009-10-23 18:27:34 +00:00
bz
d7403abaf5 Explicitly compare to a return code.
Discussed with:	philip (after we both misread the logic there the 1st time)
MFC after:	6 weeks
2009-10-14 12:01:11 +00:00
hrs
62171fd4d3 - Do not assign a link-local address when ND6_IFF_IFDISABLED.
Adding a tentative address is useless.

- Comment out a confused warning message when
  in6_ifattach_linklocal() fails.  This can occur when the
  interface does not support ioctl(SIOCAIFADDR) (interfaces
  associated with 802.11 wireless network device drivers, for
  example).
2009-10-12 18:54:02 +00:00
julian
79c1f884ef Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
hrs
707daf685f Enable adding a link-local address even if ND6_IFF_IFDISABLED.
Note that when the interface has ND6_IFF_IFDISABLED, a newly-added
address is always marked as IN6_IFF_TENTATIVE so that the interface
can perform DAD after the ND6_IFF_IFDISABLED is cleared.
2009-10-02 07:00:20 +00:00
rrs
1418771847 Support for VNET in SCTP (hopefully) 2009-09-17 15:11:12 +00:00
qingli
3a82e44273 Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 19:18:34 +00:00
hrs
2eb62239d7 Improve flexibility of receiving Router Advertisement and
automatic link-local address configuration:

- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
  default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
  global knob.  The default value of the sysctl is 0.

- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
  sysctl net.inet6.ip6.auto_linklocal to one for its default
  value.  The default value of the sysctl is 1.

- Make ND6_IFF_IFDISABLED more robust.  It can be used to disable
  IPv6 functionality of an interface now.

- Receiving RA is allowed if ip6_forwarding==0 *and*
  ND6_IFF_ACCEPT_RTADV is set on that interface.  The former
  condition will be revisited later to support a "host + router" box
  like IPv6 CPE router.  The current behavior is compatible with
  the older releases of FreeBSD.

- The ifconfig(8) now supports these ND6 flags as well as "nud",
  "prefer_source", and "disabled" in ndp(8).  The ndp(8) now
  supports "auto_linklocal".

Discussed with:	bz and jinmei
Reviewed by:	bz
MFC after:	3 days
2009-09-12 22:08:20 +00:00
qingli
5b9cf14b54 The addresses that are assigned to the loopback interface
should be part of the kernel routing table.

Reviewed by:	bz
MFC after:	immediately
2009-09-05 20:24:37 +00:00
qingli
831e6c957c This patch fixes an address scope violation. Considering the
scenario where an anycast address is assigned on one interface,
and a global address with the same scope is assigned on another
interface. In other words, the interface owns the anycast
address has only the link-local address as one other address.
Without this patch, "ping6" the anycast address from another
station will observe the source address of the returned ICMP6
echo reply has the link-local address, not the global address
that exists on the other interface in the same node.

Reviewed by:	bz
MFC after:	immediately
2009-09-05 16:50:55 +00:00
qingli
0cca60c70d This patch fixes the following issues:
- Interface link-local address is not reachable within the
  node that owns the interface, this is due to the mismatch
  in address scope as the result of the installed interface
  address loopback route. Therefore for each interface
  address loopback route, the rt_gateway field (of AF_LINK
  type) will be used to track which interface a given
  address belongs to. This will aid the address source to
  use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
  the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
  only for validation reason. Doing so will eliminate as much
  of the special case (loopback addresses) handling code
  as possible, however, these empty nd6 entries should not
  be returned to the userland applications such as the
  "ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by:	bz
MFC after:	immediately
2009-09-05 16:43:16 +00:00
qingli
e888e87097 Prefix on-link verification is being performed on statically
configured prefixes. Since these statically configured prefixes
do not have any associated advertising routers, these prefixes
are treated as unreachable and those prefix routes are deleted
from the routing table. Therefore bypass prefixes that are not
learned from router advertisements during prefix on-link check.

Reviewed by:	hrs
2009-08-30 02:07:23 +00:00
qingli
487e2c0d00 When multiple interfaces exist in the system, with each interface having
an IPv6 address assigned to it, and if an incoming packet received on
one interface has a packet destination address that belongs to another
interface, the routing table is consulted to determine how to reach this
packet destination. Since the packet destination is an interface address,
the route table will return a host route with the loopback interface as
rt_ifp. The input code must recognize this fact, instead of using the
loopback interface, the input code performs a search to find the right
interface that owns the given IPv6 address.

Reviewed by:	bz, gnn, kmacy
MFC after:	immediately
2009-08-26 21:32:50 +00:00
rwatson
544dfa0789 Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables.  This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by:	bz, kmacy
MFC after:	3 days
2009-08-25 09:52:38 +00:00
rwatson
ef8d755d4d Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
qingli
d7ddc7ccbe A piece of code was added to install a host route when an IPv6 interface
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.

Reviewed by:	bz
Approved by:	re
2009-08-12 19:15:26 +00:00
rwatson
5c6699ad3d Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem.  These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored.  This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

      if_bridge
      if_cxgb
      if_gif
      ip_mroute
      ipdivert
      pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack.  However, that change is too agressive for
this point in the release cycle.

Reviewed by:	bz
Approved by:	re (kib)
2009-08-02 19:43:32 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
qingli
8c1899d934 This patch does the following:
- Allow loopback route to be installed for address assigned to
      interface of IFF_POINTOPOINT type.
    - Install loopback route for an IPv4 interface addreess when the
      "useloopback" sysctl variable is enabled. Similarly, install
      loopback route for an IPv6 interface address when the sysctl variable
      "nd6_useloopback" is enabled. Deleting loopback routes for interface
      addresses is unconditional in case these sysctl variables were
      disabled after an interface address has been assigned.

Reviewed by:	bz
Approved by:	re
2009-07-27 17:08:06 +00:00
rwatson
b3be1c6e3b Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
bz
1f4b104d4d sysctl_msec_to_ticks is used with both virtualized and
non-vrtiualized sysctls so we cannot used one common function.

Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.

Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.

Convert the two over places to use the new macro.

Reviewed by:	rwatson
Approved by:	re (kib)
2009-07-21 21:58:55 +00:00
rwatson
d32e9e9901 Garbage collect vnet module registrations that have neither constructors
nor destructors, as there's no actual work to do.

In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.

Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-07-20 13:55:33 +00:00
rwatson
6955067932 Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
bms
630d769cdb Fix a problem, whereby misbehaving IPv6 applications, which don't include
a valid zone ID or interface identifier in a v6 multicast leave, would
trigger a fairly paranoid KASSERT().

Observed with Boost++ regression tests on ref8.freebsd.org.

Approved by:	re (kib)
2009-07-18 17:38:18 +00:00
rwatson
88f8de4d40 Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
qingli
1d57d752af This patch adds a host route to an interface address (that is assigned
to a non loopback/ppp link type) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route was explicitly created when
processing the IPv6 address assignment. This loopback host route is
deleted when that IPv6 address is removed from the interface.

Reviewed by:	bz, gnn
Approved by:	re
2009-07-12 19:20:55 +00:00
rwatson
0734bf13fc Fix "options VIMAGE_GLOBALS" build following introduction of
in6_ifaddrhead.

Approved by:	re (kib)
2009-06-29 15:23:50 +00:00
rwatson
3b6551a921 In in6_update_ifa(), jump to 'cleanup' rather than returning directly
in one additional case, avoiding an ifaddr reference leak.

Defer releasing the in6_ifaddr's in6_ifaddrhead reference until the
end of in6_unlink_ifa(), as callers are inconsistent regarding whether
or not they hold a reference across the call.  This avoids using the
ifaddr after it may have been freed.

Reported by:	tegge
Reviewed by:	tegge
Approved by:	re (blanket)
MFC after:	6 weeks
2009-06-27 11:05:53 +00:00
rwatson
bd6eb7be79 Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by:	bz
MFC after:	6 weeks
2009-06-25 16:35:28 +00:00
rwatson
4cf6a458ca Clean up reference management in in6_update_ifa and in6_unlink_ifa, and
in particular, add a reference for in6_ifaddrhead since we do remove a
reference for it when an IPv6 address is removed.  This fixes ifconfig
delete of an IPv6 alias.

Reported by:	tegge
MFC after:	6 weeks
2009-06-25 08:37:38 +00:00
rwatson
9c4380a8ee Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead).  Adopt
the code styles and conventions present in netinet where possible.

Reviewed by:	gnn, bz
MFC after:	6 weeks (possibly not MFCable?)
2009-06-24 21:00:25 +00:00
bz
a8839212d2 Make callers to in6_selectsrc() and in6_pcbladdr() pass in memory
to save the selected source address rather than returning an
unreferenced copy to a pointer that might long be gone by the
time we use the pointer for anything meaningful.

Asked for by:	rwatson
Reviewed by:	rwatson
2009-06-23 22:08:55 +00:00
rwatson
c9ef486fe1 Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references.  The following routines now return references:

  ifaddr_byindex
  ifa_ifwithaddr
  ifa_ifwithbroadaddr
  ifa_ifwithdstaddr
  ifa_ifwithnet
  ifaof_ifpforaddr
  ifa_ifwithroute
  ifa_ifwithroute_fib
  rt_getifa
  rt_getifa_fib
  IFP_TO_IA
  ip_rtaddr
  in6_ifawithifp
  in6ifa_ifpforlinklocal
  in6ifa_ifpwithaddr
  in6_ifadd
  carp_iamatch6
  ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

  IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc).  In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit.  Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by:	bz
Obtained from:	Apple, Inc. (portions)
MFC after:	6 weeks (portions)
2009-06-23 20:19:09 +00:00
bz
0808d0b1a6 After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
bz
2583b2bbb7 In r194702 I meant to remove vnet.h which is no longer needed, not route.h. 2009-06-23 14:54:42 +00:00
bz
1b28c68371 in6_rtqdrain() has been unused. Cleanup.
As this was the only consumer of net/route.h left remove that as well.
2009-06-23 13:22:19 +00:00
rwatson
1f7e54e8c5 Clean up common ifaddr management:
- Unify reference count and lock initialization in a single function,
  ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
  reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after:	3 weeks
2009-06-21 19:30:33 +00:00