Commit Graph

566 Commits

Author SHA1 Message Date
Xin LI
41c8c6e876 Add interface description capability as inspired by OpenBSD.
MFC after:	3 months
2009-11-11 21:30:58 +00:00
Qing Li
46e7f9838b A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.

Submitted by:	noted by bz
Reviewed by:	hrs
MFC after:	immediately
2009-09-20 17:22:19 +00:00
Qing Li
9bb7d0f47a Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
MFC after:	immediately
2009-09-15 19:18:34 +00:00
Robert Watson
ed2dabfc68 Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations.  Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.

Add ifindex_free() to centralize the implementation of releasing an
ifindex value.  Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().

Reviewed by:	bz
MFC after:	3 days
2009-08-26 11:13:10 +00:00
Robert Watson
61f6986b07 Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc().  Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc().  This does not
close all known races in this code.

Reviewed by:	bz
MFC after:	3 days
2009-08-25 20:21:16 +00:00
Robert Watson
8e937462f4 Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

MFC after:	3 days
2009-08-24 12:52:05 +00:00
Marko Zec
52db6805ea When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.

This change affects only options VIMAGE builds.

Submitted by:	bz
Reviewed by:	bz
Approved by:	re (rwatson), julian (mentor)
2009-08-24 10:14:09 +00:00
Robert Watson
77dfcdc445 Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
Marko Zec
9abb486279 Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.

Approved by:	re (rwatson), julian (mentor)
2009-08-14 22:46:45 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Bjoern A. Zeeb
be31e5e7b5 Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls
(ifconfig ifN (-)vnet <jname|jid>) work correctly.

Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.

In the reclaim case, correctly set the vnet before calling if_vmove.

Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)

There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.

Suggested by:	rwatson (*)
Approved by:	re (kib)
2009-07-26 11:29:26 +00:00
Robert Watson
d0728d7174 Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
Robert Watson
006e9db452 Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-19 17:40:45 +00:00
Robert Watson
5ee847d3ac Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
Jamie Gritton
7afcbc18b3 Remove the interim vimage containers, struct vimage and struct procg,
and the ioctl-based interface that supported them.

Approved by:	re (kib), bz (mentor)
2009-07-17 14:48:21 +00:00
Robert Watson
1e77c1056a Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Brooks Davis
6cb7f168db Remove support for the /dev/net/* per-interface devices. They serve
little purpose and are unused in the base system.

The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.

Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.

Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.

Reviewed by:	rwatson
Discussed with:	bz, zec
Approved by:	re@ (kensmith)
2009-06-29 19:46:29 +00:00
Robert Watson
395cbe82d2 Remove unnecessary include of kdb.h that snuck in during ifaddr refcount
work.

Reported by:	pluknet <pluknet at gmail.com>
Approved by:	re (kib)
2009-06-27 10:30:28 +00:00
Robert Watson
f9ef96ca71 Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.

We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.

MFC after:	6 weeks
2009-06-26 00:36:47 +00:00
Robert Watson
3baaf2974d In if_setlladdr(), use IF_ADDR_LOCK() and ifaddr references to improve
the safety of link layer address manipulation.

MFC after:	6 weeks
2009-06-24 10:36:48 +00:00
Robert Watson
8c0fec805f Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references.  The following routines now return references:

  ifaddr_byindex
  ifa_ifwithaddr
  ifa_ifwithbroadaddr
  ifa_ifwithdstaddr
  ifa_ifwithnet
  ifaof_ifpforaddr
  ifa_ifwithroute
  ifa_ifwithroute_fib
  rt_getifa
  rt_getifa_fib
  IFP_TO_IA
  ip_rtaddr
  in6_ifawithifp
  in6ifa_ifpforlinklocal
  in6ifa_ifpwithaddr
  in6_ifadd
  carp_iamatch6
  ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

  IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc).  In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit.  Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by:	bz
Obtained from:	Apple, Inc. (portions)
MFC after:	6 weeks (portions)
2009-06-23 20:19:09 +00:00
Bjoern A. Zeeb
a877d0cffa Remove duplicate #include <net/route.h> from the middle of the file. 2009-06-23 13:16:16 +00:00
Bjoern A. Zeeb
b58ea5f310 Move virtualization of routing related variables into their own
Vimage module, which had been there already but now is stateful.

All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.

Add a missing function local variable in case of MPATHing.

Reviewed by:	zec
2009-06-22 17:48:16 +00:00
Robert Watson
8896f83a58 Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present.  In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after:	3 weeks
2009-06-22 10:59:34 +00:00
Bjoern A. Zeeb
bed56bb51b After the update to fxp(4) in r194573 we should no longer need
this DELAY(100) hack introduced in r56938.

Thanks to:	yongari
MFC after:	6 weeks
X-MFC note:	not before the fxp(4) changes
2009-06-22 10:27:20 +00:00
Robert Watson
1099f828b3 Clean up common ifaddr management:
- Unify reference count and lock initialization in a single function,
  ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
  reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after:	3 weeks
2009-06-21 19:30:33 +00:00
Roman Divacky
e40bae9a45 Switch cmd argument to u_long. This matches what if_ethersubr.c does and
allows the code to compile cleanly on amd64 with clang.

Reviewed by:	rwatson
Approved by:	ed (mentor)
2009-06-21 10:29:31 +00:00
Sam Leffler
d659538f72 r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by:	jhb, kmacy
2009-06-15 19:50:03 +00:00
Jamie Gritton
679e13901c Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail.  Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by:	zec, julian
Approved by:	bz (mentor)
2009-06-15 18:59:29 +00:00
Bjoern A. Zeeb
259d2d5431 carp(4) allows people to share a set of IP addresses and can only
use IPv4/v6 for inter-node communication (according to my reading).

Properly wrap the carp callouts in INET || INET6 and refelect this
in sys/conf/files as well.  While in theory this should be ok,
it might be a bit optimistic to think that carp could build with
inet6 only[1].

Discussed with:		mlaier [1]
2009-06-11 10:26:38 +00:00
Konstantin Belousov
d8b0556c6d Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
Bjoern A. Zeeb
8d8bc0182e After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
Marko Zec
bc29160df3 Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state.  The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers.  Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels.  Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by:	bz, julian
Approved by:	rwatson, kib (re), julian (mentor)
2009-06-08 17:15:40 +00:00
Robert Watson
bcf11e8d00 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
Sam Leffler
c9dd371765 move ifq_detach from if_detach to if_free; this permits callers to
reference if_snd in the period between detach+free which helps simplify
detach code

Reviewed by:	jhb, rwatson
2009-06-02 18:53:21 +00:00
Bjoern A. Zeeb
c2c2a7c11e Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by:	julian, rwatson, zec
X-MFC:		not possible
2009-06-01 15:49:42 +00:00
Marko Zec
feb08d06b9 Introduce an interm userland-kernel API for creating vnets and
assigning ifnets from one vnet to another.  Deletion of vnets is not
yet supported.

The interface is implemented as an ioctl extension so that no syscalls
had to be introduced.  This should be acceptable given that the new
interface will be used for a short / interim period only, until the
new jail management framwork gains the capability of managing vnets.
This method for managing vimages / vnets has been in use for the past
7 years without any observable issues.

The userland tool to be used in conjunction with the interim API can be
found in p4: //depot/projects/vimage-commit2/src/usr.sbin/vimage/... and
will most probably never get commited to svn.

While here, bump copyright notices in kern_vimage.c and vimage.h to
cover work done in year 2009.

Approved by:	julian (mentor)
Discussed with:	bz, rwatson
2009-05-31 12:10:04 +00:00
Marko Zec
67da1f3d8d Set ifp->if_afdata_initialized to 0 while holding IF_AFDATA_LOCK on ifp,
not after the lock has been released.

Reviewed by:	bz
Discussed with:	rwatson
2009-05-22 22:22:21 +00:00
Marko Zec
e0c14af9b3 Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another.  Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz, rwatson, brooks?
Approved by:	julian (mentor)
2009-05-22 22:09:00 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Robert Watson
8bd015a1ca As with ifnet_byindex_ref(), don't return IFF_DYING interfaces from
ifunit_ref().  ifunit() continues to return them.

MFC after:	3 weeks
2009-04-23 15:56:01 +00:00
Robert Watson
6064c5d362 Add ifunit_ref(), a version of ifunit(), that returns not just an
interface pointer, but also a reference to it.

Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.

This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.

MFC after:	3 weeks
2009-04-23 13:08:47 +00:00
Robert Watson
111c6b617b During if_detach(), invoke if_dead() to set the ifnet's function
pointers to "dead" implementations that no-op rather than invoking
the device driver.  This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.

Reviewed by:	bms
MFC after:	3 weeks
2009-04-23 11:51:53 +00:00
Robert Watson
d6f157ea9a Move portions of data structure initialization from if_attach() to
if_alloc(), and portions of data structure destruction from if_detach()
to if_free().  These changes leave more of the struct ifnet in a
safe-to-access condition between alloc and attach, and between detach
and free, and focus on attach/detach as stack usage events rather than
data structure initialization.

Affected fields include the linkstate task queue, if_afdata lock,
address lists, kqueue state, and MAC labels.  ifq_attach() ifq_detach()
are not moved as ifq_attach() may use a queue length set by the device
driver between if_alloc() and if_attach().

MFC after:	3 weeks
2009-04-23 10:59:40 +00:00
Robert Watson
242a8e72eb Add a new interface flag, IFF_DYING, which is set when a device driver
calls if_free(), and remains set if the refcount is elevated.  IF_DYING
skips the bit in the if_flags bitmask previously used by IFF_NEEDSGIANT,
so that an MFC can be done without changing which bit is used, as
IFF_NEEDSGIANT is still present in 7.x.

ifnet_byindex_ref() checks for IFF_DYING and returns NULL if it is set,
preventing new references from by acquired by index, preventing
monitoring sysctls from seeing it.  Other lookup mechanisms currently
do not check IFF_DYING, but may need to in the future.

MFC after:	3 weeks
2009-04-23 09:32:30 +00:00
Robert Watson
27d37320ec Start to address a number of races relating to use of ifnet pointers
after the corresponding interface has been destroyed:

(1) Add an ifnet refcount, ifp->if_refcount.  Initialize it to 1 in
    if_alloc(), and modify if_free_type() to decrement and check the
    refcount.

(2) Add new if_ref() and if_rele() interfaces to allow kernel code
    walking global interface lists to release IFNET_[RW]LOCK() yet
    keep the ifnet stable.  Currently, if_rele() is a no-op wrapper
    around if_free(), but this may change in the future.

(3) Add new ifnet field, if_alloctype, which caches the type passed
    to if_alloc(), but unlike if_type, won't be changed by drivers.
    This allows asynchronous free's of the interface after the
    driver has released it to still use the right type.  Use that
    instead of the type passed to if_free_type(), but assert that
    they are the same (might have to rethink this if that doesn't
    work out).

(4) Add a new ifnet_byindex_ref(), which looks up an interface by
    index and returns a reference rather than a pointer to it.

(5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking
    up the ifnet to global lists.

(6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release
    the ifnet when done.

When this change is MFC'd, it will need to replace if_ispare fields
rather than adding new fields in order to avoid breaking the binary
interface.  Once this change is MFC'd, if_free_type() should be
removed, as its 'type' argument is now optional.

This refcount is not appropriate for counting mbuf pkthdr references,
and also not for counting entry into the device driver via ifnet
function pointers.  An rmlock may be appropriate for the latter.
Rather, this is about ensuring data structure stability when reaching
an ifnet via global ifnet lists and tables followed by copy in or out
of userspace.

MFC after:      3 weeks
Reported by:    mdtancsa
Reviewed by:    brooks
2009-04-21 22:43:32 +00:00
Robert Watson
ab5ed8a5aa Acquire the interface address list lock over some iterations over
if_addrhead.  This closes some reader-writer races associated with
the address list.

MFC after:	2 weeks
2009-04-21 19:06:47 +00:00
Kip Macy
7cc5b47fb3 export if_qflush for use by driver if_qflush routines
only set ifp->if_{transmit, qflush} if not already set
KASSERT that neither or both are set
2009-04-16 23:05:10 +00:00
Marko Zec
4d79e3d5e8 In the !VIMAGE_GLOBALS case, make sure not to call vnet_net_iattach()
both via the vnet_mod_register() framework and then directly, but only
once.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-15 18:15:29 +00:00
Kip Macy
aee3056f64 call default if_qflush on ifq if default method isn't used by the driver 2009-04-14 03:17:44 +00:00
Marko Zec
bfe1aba468 Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized.  In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw).  Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module.  The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on.  Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first.  The framework will panic if it detects any unresolved
dependencies before completing system initialization.  Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run.  In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container.  IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-11 05:58:58 +00:00
Max Laier
8623f9fd7a Follow up for r190895 It's not only the "all" group that is affected, but
all groups on the given interface.

PR:		kern/130977, kern/131310
MFC after:	3 days (%vnet)
2009-04-10 19:16:14 +00:00
Max Laier
876d5c038f Remove interfaces from IFG_ALL on detach. This cures a couple of pf panics
when using the "self" keyword in tables or as ()-style host address and
fixes "ifconfig -g all" output.

PR:		kern/130977, kern/131310
Submitted by:	Mikolaj Golub
MFC after:	3 days
2009-04-10 14:41:51 +00:00
Marko Zec
1ed81b739e First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

        At this stage, the new per-vnet initializer functions are
	directly called from the existing global initialization code,
	which should in most cases result in compiler inlining those
	new functions, hence yielding a near-zero functional change.

        Modify the existing initializer functions which are invoked via
        protosw, like ip_init() et. al., to allow them to be invoked
	multiple times, i.e. per each vnet.  Global state, if any,
	is initialized only if such functions are called within the
	context of vnet0, which will be determined via the
	IS_DEFAULT_VNET(curvnet) check (currently always true).

        While here, V_irtualize a few remaining global UMA zones
        used by net/netinet/netipsec networking code.  While it is
        not yet clear to me or anybody else whether this is the right
        thing to do, at this stage this makes the code more readable,
        and makes it easier to track uncollected UMA-zone-backed
        objects on vnet removal.  In the long run, it's quite possible
        that some form of shared use of UMA zone pools among multiple
        vnets should be considered.

	Bump __FreeBSD_version due to changes in layout of structs
	vnet_ipfw, vnet_inet and vnet_net.

Approved by:	julian (mentor)
2009-04-06 22:29:41 +00:00
Sam Leffler
a51f44a7a6 enable setting the mac address of 802.11 devices 2009-03-28 17:36:56 +00:00
Jamie Gritton
bc3977f122 Call the interface's if_ioctl from ifioctl(), if the protocol didn't
handle the ioctl.  There are other paths that already call it, but this
allows for a non-interface socket (like AF_LOCAL which ifconfig now
uses) to use a broader class of interface ioctls.

Approved by:	bz (mentor), rwatson
2009-03-20 13:41:23 +00:00
Robert Watson
e5adda3d51 Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free.  This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile.  They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

        if_ar
        if_axe
        if_aue
        if_cdce
        if_cue
        if_kue
        if_ray
        if_rue
        if_rum
        if_sr
        if_udav
        if_ural
        if_zyd

Drivers that were already disabled because of tty changes:

        if_ppp
        if_sl

Discussed on:	arch@
2009-03-15 14:21:05 +00:00
Sam Leffler
c0c9ea90a8 remove stray ; 2009-03-14 17:54:58 +00:00
Bjoern A. Zeeb
33553d6e99 For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
Jamie Gritton
b89e82dd87 Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise.  The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family.  For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes).  Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by:	bz (mentor)
2009-02-05 14:06:09 +00:00
John Baldwin
eb322a6f77 Only start the if_slowtimo timer (which drives the if_watchdog methods of
network interfaces) if we have at least one interface with an if_watchdog
routine.

MFC after:	2 weeks
2009-01-23 20:53:01 +00:00
Kip Macy
6241d13a1b if_rtdel is always called with the RADIX_NODE_HEAD lock held 2008-12-18 09:59:24 +00:00
Kip Macy
d24c444ca0 add ifnet_byindex_locked to allow for use of IFNET_RLOCK 2008-12-18 04:50:44 +00:00
Kip Macy
c368cff776 avoid trying to acquire a shared lock while holding an exclusive lock
by making the ifnet lock acquisition exclusive
2008-12-17 04:33:52 +00:00
Kip Macy
991f8615e4 convert ifnet and afdata locks from mutexes to rwlocks 2008-12-17 00:11:56 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
Bjoern A. Zeeb
40eb85e75e Whitespace changes only - tabs must have been converted to spaces
somehow, when moving the code from p4 to svn.

Sponsored by:	The FreeBSD Foundation
2008-12-11 15:42:59 +00:00
Marko Zec
385195c062 Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out.  Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs.  This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps.  PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw.  Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-12-10 23:12:39 +00:00
Bjoern A. Zeeb
21b14a75f6 It does not make much sense to include net/route.h twice.
Remove one #include.
2008-12-09 21:09:05 +00:00
Bjoern A. Zeeb
653735c44c Add rwlock.h (and lock.h for that) to keep no-INET kernels compiling
after RADIX_NODE_HEAD_{,UN}LOCK() were added.  Must have been "learned"
by pollution before (most likely: route.h -> radix.h -> rwlock.h)
2008-12-09 20:05:58 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Bjoern A. Zeeb
413628a7e3 MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
  and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
  help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
  suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
  on cluster machines as well as all the testers and people
  who provided feedback the last months on freebsd-jail and
  other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by:	(see above)
MFC after:	3 months (this is just so that I get the mail)
X-MFC Before:   7.2-RELEASE if possible
2008-11-29 14:32:14 +00:00
Marko Zec
97021c2464 Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by:  bz, julian
Approved by:  julian (mentor)
Obtained from:        //depot/projects/vimage-commit2/...
X-MFC after:  never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-11-26 22:32:07 +00:00
Sam Leffler
1444358966 use consistent style 2008-11-24 17:34:00 +00:00
Kip Macy
db7f0b974f - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
  allow drivers to efficiently manage multiple hardware queues
  (i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
2008-11-22 05:55:56 +00:00
Marko Zec
44e33a0758 Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions.  As a rule,
initialization at instatiation for such variables should never be
introduced again from now on.  Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact.  In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-19 09:39:34 +00:00
Bjoern A. Zeeb
5a97c9d46c Include if_arp.h for IFP2AC so that the netgraph parts in if.c
are happy even if compiled without INET or INET6.

MFC after:	2 months
2008-11-06 15:26:09 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Ed Schouten
6bfa9a2d66 Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
Ed Schouten
d3ce832719 Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
Bjoern A. Zeeb
f0c042211b Make the checks for ptp interfaces in ifa_ifwithdstaddr() and
ifa_ifwithnet() look more similar by comparing the pointer to NULL
in both cases.

MFC after:	3 months
2008-08-24 11:03:43 +00:00
Andrew Thompson
516993d48e ifnet_setbyindex() is only used locally, go back to being static. 2008-08-20 05:00:18 +00:00
Julian Elischer
ac957cd271 A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.
2008-08-20 01:05:56 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Robert Watson
02f4879d3a Introduce locking around use of ifindex_table, whose use was previously
unsynchronized.  While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
  ifdev_byindex() to replace existing accessor macros.  These functions
  now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
  which set values in the table either asserting of acquiring the ifnet
  lock.
- Use accessor functions throughout if.c to modify and read
  ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets.  Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by:	brooks
2008-06-26 23:05:28 +00:00
Brooks Davis
d94ccb096b The if_check() function performed three actions:
- verified that the ifp->if_snd.ifq_mtx was initalized for
   all attached interfaces.  This was pointless because it was
   initalized for all interfaces in if_attach() so I've removed it.
 - Checked that ifp->if_snd.ifq_maxlen is initalized and set it to
   ifqmaxlen if unset.  This makes more sense in if_attach() so
   I moved it there.
 - The first call of if_slowtimo().  Delete if_check() and call
   if_slowtimo() directly from the SYSINIT().
2008-05-17 03:38:13 +00:00
Julian Elischer
8b07e49a00 Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

  One thing where FreeBSD has been falling behind, and which by chance I
  have some time to work on is "policy based routing", which allows
  different
  packet streams to be routed by more than just the destination address.

  Constraints:
  ------------

  I want to make some form of this available in the 6.x tree
  (and by extension 7.x) , but FreeBSD in general needs it so I might as
  well do it in -current and back port the portions I need.

  One of the ways that this can be done is to have the ability to
  instantiate multiple kernel routing tables (which I will now
  refer to as "Forwarding Information Bases" or "FIBs" for political
  correctness reasons). Which FIB a particular packet uses to make
  the next hop decision can be decided by a number of mechanisms.
  The policies these mechanisms implement are the "Policies" referred
  to in "Policy based routing".

  One of the constraints I have if I try to back port this work to
  6.x is that it must be implemented as a EXTENSION to the existing
  ABIs in 6.x so that third party applications do not need to be
  recompiled in timespan of the branch.

  This first version will not have some of the bells and whistles that
  will come with later versions. It will, for example, be limited to 16
  tables in the first commit.
  Implementation method, Compatible version. (part 1)
  -------------------------------
  For this reason I have implemented a "sufficient subset" of a
  multiple routing table solution in Perforce, and back-ported it
  to 6.x. (also in Perforce though not  always caught up with what I
  have done in -current/P4). The subset allows a number of FIBs
  to be defined at compile time (8 is sufficient for my purposes in 6.x)
  and implements the changes needed to allow IPV4 to use them. I have not
  done the changes for ipv6 simply because I do not need it, and I do not
  have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

  Other protocol families are left untouched and should there be
  users with proprietary protocol families, they should continue to work
  and be oblivious to the existence of the extra FIBs.

  To understand how this is done, one must know that the current FIB
  code starts everything off with a single dimensional array of
  pointers to FIB head structures (One per protocol family), each of
  which in turn points to the trie of routes available to that family.

  The basic change in the ABI compatible version of the change is to
  extent that array to be a 2 dimensional array, so that
  instead of protocol family X looking at rt_tables[X] for the
  table it needs, it looks at rt_tables[Y][X] when for all
  protocol families except ipv4 Y is always 0.
  Code that is unaware of the change always just sees the first row
  of the table, which of course looks just like the one dimensional
  array that existed before.

  The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
  are all maintained, but refer only to the first row of the array,
  so that existing callers in proprietary protocols can continue to
  do the "right thing".
  Some new entry points are added, for the exclusive use of ipv4 code
  called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
  which have an extra argument which refers the code to the correct row.

  In addition, there are some new entry points (currently called
  rtalloc_fib() and friends) that check the Address family being
  looked up and call either rtalloc() (and friends) if the protocol
  is not IPv4 forcing the action to row 0 or to the appropriate row
  if it IS IPv4 (and that info is available). These are for calling
  from code that is not specific to any particular protocol. The way
  these are implemented would change in the non ABI preserving code
  to be added later.

  One feature of the first version of the code is that for ipv4,
  the interface routes show up automatically on all the FIBs, so
  that no matter what FIB you select you always have the basic
  direct attached hosts available to you. (rtinit() does this
  automatically).

  You CAN delete an interface route from one FIB should you want
  to but by default it's there. ARP information is also available
  in each FIB. It's assumed that the same machine would have the
  same MAC address, regardless of which FIB you are using to get
  to it.

  This brings us as to how the correct FIB is selected for an outgoing
  IPV4 packet.

  Firstly, all packets have a FIB associated with them. if nothing
  has been done to change it, it will be FIB 0. The FIB is changed
  in the following ways.

  Packets fall into one of a number of classes.

  1/ locally generated packets, coming from a socket/PCB.
     Such packets select a FIB from a number associated with the
     socket/PCB. This in turn is inherited from the process,
     but can be changed by a socket option. The process in turn
     inherits it on fork. I have written a utility call setfib
     that acts a bit like nice..

         setfib -3 ping target.example.com # will use fib 3 for ping.

     It is an obvious extension to make it a property of a jail
     but I have not done so. It can be achieved by combining the setfib and
     jail commands.

  2/ packets received on an interface for forwarding.
     By default these packets would use table 0,
     (or possibly a number settable in a sysctl(not yet)).
     but prior to routing the firewall can inspect them (see below).
     (possibly in the future you may be able to associate a FIB
     with packets received on an interface..  An ifconfig arg, but not yet.)

  3/ packets inspected by a packet classifier, which can arbitrarily
     associate a fib with it on a packet by packet basis.
     A fib assigned to a packet by a packet classifier
     (such as ipfw) would over-ride a fib associated by
     a more default source. (such as cases 1 or 2).

  4/ a tcp listen socket associated with a fib will generate
     accept sockets that are associated with that same fib.

  5/ Packets generated in response to some other packet (e.g. reset
     or icmp packets). These should use the FIB associated with the
     packet being reponded to.

  6/ Packets generated during encapsulation.
     gif, tun and other tunnel interfaces will encapsulate using the FIB
     that was in effect withthe proces that set up the tunnel.
     thus setfib 1 ifconfig gif0 [tunnel instructions]
     will set the fib for the tunnel to use to be fib 1.

  Routing messages would be associated with their
  process, and thus select one FIB or another.
  messages from the kernel would be associated with the fib they
  refer to and would only be received by a routing socket associated
  with that fib. (not yet implemented)

  In addition Netstat has been edited to be able to cope with the
  fact that the array is now 2 dimensional. (It looks in system
  memory using libkvm (!)). Old versions of netstat see only the first FIB.

  In addition two sysctls are added to give:
  a) the number of FIBs compiled in (active)
  b) the default FIB of the calling process.

  Early testing experience:
  -------------------------

  Basically our (IronPort's) appliance does this functionality already
  using ipfw fwd but that method has some drawbacks.

  For example,
  It can't fully simulate a routing table because it can't influence the
  socket's choice of local address when a connect() is done.

  Testing during the generating of these changes has been
  remarkably smooth so far. Multiple tables have co-existed
  with no notable side effects, and packets have been routes
  accordingly.

  ipfw has grown 2 new keywords:

  setfib N ip from anay to any
  count ip from any to any fib N

  In pf there seems to be a requirement to be able to give symbolic names to the
  fibs but I do not have that capacity. I am not sure if it is required.

  SCTP has interestingly enough built in support for this, called VRFs
  in Cisco parlance. it will be interesting to see how that handles it
  when it suddenly actually does something.

  Where to next:
  --------------------

  After committing the ABI compatible version and MFCing it, I'd
  like to proceed in a forward direction in -current. this will
  result in some roto-tilling in the routing code.

  Firstly: the current code's idea of having a separate tree per
  protocol family, all of the same format, and pointed to by the
  1 dimensional array is a bit silly. Especially when one considers that
  there is code that makes assumptions about every protocol having the
  same internal structures there. Some protocols don't WANT that
  sort of structure. (for example the whole idea of a netmask is foreign
  to appletalk). This needs to be made opaque to the external code.

  My suggested first change is to add routing method pointers to the
  'domain' structure, along with information pointing the data.
  instead of having an array of pointers to uniform structures,
  there would be an array pointing to the 'domain' structures
  for each protocol address domain (protocol family),
  and the methods this reached would be called. The methods would have
  an argument that gives FIB number, but the protocol would be free
  to ignore it.

  When the ABI can be changed it raises the possibilty of the
  addition of a fib entry into the "struct route". Currently,
  the structure contains the sockaddr of the desination, and the resulting
  fib entry. To make this work fully, one could add a fib number
  so that given an address and a fib, one can find the third element, the
  fib entry.

  Interaction with the ARP layer/ LL layer would need to be
  revisited as well. Qing Li has been working on this already.

  This work was sponsored by Ironport Systems/Cisco

Reviewed by:    several including rwatson, bz and mlair (parts each)
Obtained from:  Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
Brooks Davis
ae0615f633 Delay the global registration of the struct ifnet in if_alloc() until after
we're certain the allocation will entierly succeed.  This fixes a leak in a
fairly unlikely case.

Reported by:	vijay singh <vijjus at rocketmail dot com>
MFC after:	1 week
2008-04-19 22:04:51 +00:00
Sam Leffler
fb27dd1db3 expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after:	3 weeks
2008-03-25 21:23:32 +00:00
Robert Watson
237fdd787b In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation.  This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after:	1 month
Discussed with:	imp, rink
2008-03-16 10:58:09 +00:00
Robert Watson
b9175c4556 Move IFF_NEEDSGIANT warning from if_ethersubr.c to if.c so it is displayed
for all network interfaces, not just ethernet-like ones.

Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.

Upgrade if_watchdog to a WARNING.
2008-03-07 16:00:44 +00:00
Robert Watson
30d239bc4c Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

  mac_<object>_<method/action>
  mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme.  Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier.  Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods.  Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by:	SPARTA (original patches against Mac OS X)
Obtained from:	TrustedBSD Project, Apple Computer
2007-10-24 19:04:04 +00:00
Robert Watson
33d2bb9ca3 First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
  debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
  debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
  debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
  which keyed off debug_mpsafenet to determine various aspects of their own
  locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
  no-op's, as their entire behavior was determined by the value in
  debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by:	bz, jhb
Approved by:	re (kensmith)
2007-07-27 11:59:57 +00:00
Brooks Davis
a45cbf12c8 Update the comments on if_alloc(), if_free(), if_free_type(), and
if_attach.

Remove a comment about pre-3.0 network drivers from if_attach().

Be a bit more consistant about whitespace near comments.
2007-05-16 19:59:01 +00:00
Andrew Thompson
18242d3b09 Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.
The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on:	current@
2007-04-17 00:35:11 +00:00
Andrew Thompson
b47888ceba Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance.  This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover    - Sends traffic through the secondary port if the master becomes
              inactive.
fec         - Supports Cisco Fast EtherChannel.
lacp        - Supports the IEEE 802.3ad Link Aggregation Control Protocol
              (LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin  - Distributes outgoing traffic using a round-robin scheduler
              through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.
2007-04-10 00:27:25 +00:00
Bruce M Simpson
75ae0c016b Fix a case where hardware removal of an interface caused an attempt to
announce an ll_ifma which has gone away. Add a KASSERT to catch regressions.

Bug found by:	Tom Uffner
2007-03-27 16:11:28 +00:00
Bruce M Simpson
5896d12465 Fix tinderbox; ng_ether needs to see if_findmulti(). 2007-03-20 03:15:43 +00:00
Bruce M Simpson
ec002fee99 Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from:	p4 branch bms_netdev
Reviewed by:	andre
Sponsored by:	Garance A Drosehn
Idea from:	NetBSD
MFC after:	1 month
2007-03-20 00:36:10 +00:00
Bruce M Simpson
40d8a30241 Fix a bug in if_findmulti(), whereby it would not find (and thus delete)
a link-layer multicast group membership.
Such memberships are needed in order to support protocols such as
IS-IS without putting the interface into PROMISC or ALLMULTI modes.

sa_equal() is not OK for comparing sockaddr_dl as it has deeper structure
than a simple byte array, so add sa_dl_equal() and use that instead.

Reviewed by:	rwatson
Verified with:	/usr/sbin/mtest
Bug found by:	Jouke Witteveen
MFC after:	2 weeks
2007-02-22 00:14:02 +00:00
Gleb Smirnoff
c18ffdc87d The recent issues with em(4) interface has shown that the old 4.4BSD
if_watchdog/if_timer interface doesn't fit modern SMP network
stack design.

Device drivers that need watchdog to monitor their hardware should
implement it theirselves.

Eventually the if_watchdog/if_timer API will be removed. For now,
warn that driver uses it.

Reviewed by:	scottl
2006-11-30 15:02:01 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Robert Watson
aed5570872 Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h.  sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from:	TrustedBSD Project
Sponsored by:	SPARTA
2006-10-22 11:52:19 +00:00
Andre Oppermann
773725a255 Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR:		kern/99558
Tested by:	Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by:	TCP/IP Optimization Fundraise 2005
MFC after:	3 days
2006-09-06 17:12:10 +00:00
Sam Leffler
6b7330e2d4 Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by:	arch@
2006-07-09 06:04:01 +00:00
Yaroslav Tykhiy
4b97d7affd There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures.  Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by:	glebius, jhb, ru
2006-06-29 19:22:05 +00:00
Gleb Smirnoff
457f48e65c - First initialize ifnet, and then insert it into global
list.
- First remove from global list, then start destroying.

PR:		kern/97679
Submitted by:	Alex Lyashkov <shadow itt.net.ru>
Reviewed by:	rwatson, brooks
2006-06-21 06:02:35 +00:00
Max Laier
0dad3f0e15 Import interface groups from OpenBSD. This allows to group interfaces in
order to - for example - apply firewall rules to a whole group of
interfaces.  This is required for importing pf from OpenBSD 3.9

Obtained from:	OpenBSD (with changes)
Discussed on:	-net (back in April)
2006-06-19 22:20:45 +00:00
Max Khon
affcaf7871 Fix KASSERT conditions in if_deregister_com_alloc(). 2006-06-11 22:09:28 +00:00
Andrew Thompson
f3b90d48bb Announce all interfaces to devd on attach/detach. This adds a new devctl
notification so all interfaces including pseudo are reported. When netif
creates the clones at startup devctl_disable has not been turned off yet so the
interfaces will not be initialised twice, enforce this by adding an explicit
order between rc.d/netif and rc.d/devd.

This change allows actions to taken in userland when an interface is cloned
and the pseudo interface will be automatically configured if a ifconfig_<int>=""
line exists in rc.conf.

Reviewed by:		brooks
No objections on:	net
2006-06-01 00:41:07 +00:00
Gleb Smirnoff
93a69f5703 No direct call to carp_ifdetach() anymore. It is called by
event handler.

PR:		kern/82908
Submitted by:	Dan Lukes <dan obluda.cz>
2006-03-21 14:31:18 +00:00
Paul Saab
19cf04981a Implement SIOCGIFCONF for 32bit binaries. 2006-02-02 19:58:37 +00:00
Gleb Smirnoff
75ee267c22 Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o   Instead of global linked list of all vlan softc use a per-trunk
  hash. The size of hash is dynamically adjusted, depending on
  number of entries. This changes struct ifnet, replacing counter
  of vlans with a pointer to trunk structure. This change is an
  improvement for setups with big number of VLANs, several interfaces
  and several CPUs. It is a small regression for a setup with a single
  VLAN interface.
    An alternative to dynamic hash is a per-trunk static array with
  4096 entries, which is a compile time option - VLAN_ARRAY. In my
  experiments the array is not an improvement, probably because such
  a big trunk structure doesn't fit into CPU cache.
o   Introduce an UMA zone for VLAN tags. Since drivers depend on it,
  the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
  This change is a big improvement for any setup utilizing vlan(4).
o   Use rwlock(9) instead of mutex(9) for locking. We are the first
  ones to do this! :)
o   Some drivers can do hardware VLAN tagging + hardware checksum
  offloading. Add an infrastructure for this. Whenever vlan(4) is
  attached to a parent or parent configuration is changed, the flags
  on vlan(4) interface are updated.

In collaboration with:	yar, thompsa
In collaboration with:	Mihail Balikov <mihail.balikov interbgc.com>
2006-01-30 13:45:15 +00:00
Yaroslav Tykhiy
83ec464f61 Be consistent in checking ifa->ifa_addr for NULL.
Found by:	Coverity Prevent (tm)
MFC after:	3 days
2006-01-23 10:30:34 +00:00
Ruslan Ermilov
4a0d6638b3 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
Ruslan Ermilov
d09ed26fd8 - Make IFP2ENADDR() a pointer to IF_LLADDR() rather than another
copy of Ethernet address.

- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
  address as an argument, similar to ether_ifattach(), to make
  this work.
2005-11-11 07:36:14 +00:00
Yaroslav Tykhiy
b5c8bd5924 Clean up consistency checks in if_setflag():
. use KASSERT for all checks so that the source of an error can be detected;
. use __func__ instead of spelling function name each time;
. fix a typo.
2005-10-03 02:14:51 +00:00
Yaroslav Tykhiy
7aebc5e86e Log a message about entering or leaving permanently promiscuous mode,
as it is done for usual promiscuous mode already.  This info is important
because promiscuous mode in the hands of a malicious party can jeopardize
the whole network.
2005-10-03 01:47:43 +00:00
Robert Watson
b1c53bc9c0 Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
  protocol to perform early tear-down on the interface early in
  if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
  this is not the place to detect interface removal!  This also
  removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
  IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after:	3 days
2005-09-18 17:36:28 +00:00
Robert Watson
0a53be4671 In netkqfilter(), return EINVAL instead of 1 (EPERM) when a filter type
is requested on a network interface file descriptor that is non-applicable.

MFC after:	3 days
2005-09-12 19:26:03 +00:00
Sam Leffler
62313e4c3f reclaim sbuf and clear lock on error in ifconf
Submitted by:	Ted Unangst
Reviewed by:	rwatson
MFC after:	3 days
2005-09-04 17:32:47 +00:00
Brooks Davis
dc7c539e33 When we started calling if_findindex() from if_alloc() with an empty
struct ifnet most of if_findindex() become a complex no-op.  Remove it
and replace it with a corrected version of the four line for loop it
devolved to plus some error handling.  This should probably be replaced
with subr_unit at some point.

Switch from checking ifaddr_byindex to ifnet_byindex when looking for
empty indexes.  Since we're doing this from if_alloc/if_free, we can
only be sure that ifnet_byindex will be correct.  This fixes panics when
loading the ef(4) module.  The panics were caused by the fact that
if_alloc was called four time before if_attach was called and thus
ifaddr_byindex was not set and the same unit was allocated again.  This
in turn caused the first if_attach to fail because the ifp was not the
one in ifnet_byindex(ifp->if_index).

Reported by:	"Wojciech A. Koszek" <dunstan at freebsd dot czest dot pl>
PR:		kern/84987
MFC After:	1 day
2005-08-18 18:36:40 +00:00
Brooks Davis
7cf30146f0 - Move IF_ADDR_LOCK_DESTROY(ifp) from if_free to if_free_type.
- Add a note that additions should be made to if_free_type and not
  if_free to help avoid this in the future.

This apparently fixes a use after free in if_bridge and may fix bugs
in other direct if_free_type consumers.

Reported by:	thompsa
2005-08-16 17:02:35 +00:00
Robert Watson
292ee7be1c Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack.  The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names.  Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together.  Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking.  Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:16:17 +00:00
Sam Leffler
456d182d5b destroy lock _before_ free'ing the structure it resides in 2005-08-06 18:42:01 +00:00
John Baldwin
6da3131abd Initialize the if_addr mutex in if_alloc() rather than waiting until
if_attach().  This allows ethernet drivers to use it in their routines
to program their MAC filters before ether_ifattach() is called (de(4) is
one such driver).  Also, the if_addr mutex is destroyed in if_free()
rather than if_detach(), so there was another potential bug in that a
driver that failed during attach and called if_free() without having
called ether_ifattach() would have tried to destroy an uninitialized mutex.

Reported by:	Holm Tiffe holm at freibergnet dot de
Discussed with:	rwatson
2005-08-04 14:39:47 +00:00
Robert Watson
c3b31afd92 Protect link layer network interface multicast address list manipulation
using ifp->if_addr_mtx:

- Initialize if_addr_mtx when ifnet is initialized.

- Destroy if_addr_mtx when ifnet is torn down.

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
  Staticize.

- Extract ifmultiaddr allocation and initialization into if_allocmulti();
  accept a 'mflags' argument to indicate whether or not sleeping is
  permitted.  This centralizes error handling and address duplication.

- Extract ifmultiaddr tear-down and deallocation in if_freemulti().

- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
  the ifnet multicast address list and reference count manipulation.
  Make use of non-sleeping allocations.  Annotate the fact that we only
  generate routing socket events for explicit address addition, not
  implicit link layer address addition.

- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
  the ifnet multicast address list and reference count manipulation.
  Annotate the lack of a routing socket event for implicit link layer
  address removal.

- De-spl all and sundry.

Problem reported by:	Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after:		1 week
2005-08-02 23:23:26 +00:00
Robert Watson
2432c31c8b In multicast routines:
Compare pointers with NULL rather than treating them as booleans.

Compare pointers with NULL rather than 0 to make it more clear
they are pointers.

Assign pointers value of NULL rather than 0 to make it more clear
they are pointers.

MFC after:	3 days
2005-07-19 10:12:58 +00:00
Robert Watson
d8d5b10e84 Rename equal() macro to sa_equal(), which matches the definitions
of sa_equal() in other files, and makes it more clear what equal()
is comparing.

MFC after:	3 days
2005-07-19 10:03:47 +00:00
Max Laier
52023244de Move eventhandler for 'ifnet_departure_event' at the end of the progress.
Some of the (IPv6) cleanup functions send packets to inform peers of the
departure.  These packets confused users of ifnet_departure_event (pf at the
moment).

PR:		kern/80627
Tested by:	Divacky Roman
MFC after:	1 week
2005-07-14 20:26:43 +00:00
Yaroslav Tykhiy
1a3b685942 MFp4:
- Introduce a helper function if_setflag() containing the code common
  to ifpromisc() and if_allmulti() instead of duplicating the code poorly,
  with different bugs.
- Call ifp->if_ioctl() in a consistent way: always use more compatible C
  syntax and check whether ifp->if_ioctl is not NULL prior to the call.

MFC after:	1 month
2005-07-14 13:56:51 +00:00
Suleiman Souhlal
571dcd15e2 Fix the recent panics/LORs/hangs created by my kqueue commit by:
- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by:	jmg
Approved by:	re (scottl), scottl (mentor override)
Pointyhat to:	ssouhlal
Will be happy:	everyone
2005-07-01 16:28:32 +00:00
Brooks Davis
1436936ab0 Spelling/grammer fixes in comment.
Reported by:	Hans Petter Selasky <hselasky at c2i dot net>
Approved by:	re (ifnet blanked)
2005-06-17 17:19:34 +00:00
Brooks Davis
28ef2db496 Return NULL instead of a bogus pointer from if_alloc when if_com_alloc
fails.

Move detaching the ifnet from the ifindex_table into if_free so we can
both keep the sanity checks and actually delete the ifnets. [0]

Reported by:	gallatin [0]
Approved by:	re (blanket)
2005-06-12 00:53:03 +00:00
Brooks Davis
fc74a9f93a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
Brooks Davis
9d80a3307a Send link state change notifications to /dev/devctl. This is needed to
start the OpenBSD dhclient when links come up.
2005-06-06 19:08:11 +00:00
Andrew Thompson
8f86751705 Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by:	mlaier (mentor)
Obtained from:	NetBSD
2005-06-05 03:13:13 +00:00
Peter Edwards
45778b37b2 Separate out address-detaching part of if_detach into if_purgeaddrs,
so if_tap doesn't need to rely on locally-rolled code to do same.

The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.

Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)
2005-05-25 13:52:03 +00:00
Gleb Smirnoff
68a3482f69 Do not call all link state callbacks directly, but schedule
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.

Sponsored by:	Rambler
Reviewed by:	sam, brooks, mux
2005-04-20 09:30:54 +00:00
Colin Percival
fbd24c5ed6 Zero the ifr.ifr_name buffer in ifconf() in order to avoid
accidental disclosure of kernel memory to userland.

Security:	FreeBSD-SA-05:04.ifconf
2005-04-15 01:52:40 +00:00
Gleb Smirnoff
d4d2297060 ifma_protospec is a pointer. Use NULL when assigning or compating it. 2005-03-20 14:31:45 +00:00
Gleb Smirnoff
5515c2e793 Add a sysctl net.link.log_link_state_change, which allows to
suppress logging of interface link state changes.

Requested by:	sam, kan
2005-03-12 12:58:03 +00:00
Brooks Davis
bc9d299133 Change the definition of struct if_data's member ifi_epoch from wall
clock time to uptime because wall clock time may go backwards.

This is a change in the API which will impact SNMP agents who are using
ifi_epoch to set RFC2233's ifCounterDiscontinuityTime.  None are know to
exist today.  This will not impact applications that are using the
<index, epoch> tuple to verify interface uniqueness except that it
eliminates a race which could lead to a false assumption of uniqueness.

Because this is a behavior change, bump __FreeBSD_version.

Discussed with:	re (jhb, scottl)
MFC after:	3 days
Pointed out by:	pkh (way back at EuroBSDCon)
Pointy hat:	brooks
2005-02-25 19:46:41 +00:00
Gleb Smirnoff
8b25904e36 Typo in comment. 2005-02-22 15:29:29 +00:00
Gleb Smirnoff
4d96314f88 - In if_link_state_change() extract function body from if-block, to improve
readability.
- Call carp_carpdev_state() from if_link_state_change() if interface has
  associated CARP interface.

Sponsored by:	Rambler
2005-02-22 14:21:59 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00
Xin LI
b0b4b28bf1 Validate ifc->ifc_len before submitting its incarnation to sbuf_new,
which will finally lead to kernel panic.

Security:	This prevents a local (root-launched) DoS
Submitted by:	Wojciech A. Koszek [dunstan at freebsd czest pl]
PR:		77421
MFC After:	1 week
2005-02-12 17:51:12 +00:00
Gleb Smirnoff
8b02df2485 Log changes of link state.
Reviewed by:	rwatson
2005-01-30 12:57:47 +00:00
Gleb Smirnoff
1c7899c74e This change adds reliability for Ethernet trunks built with ng_one2many:
- Introduce another ng_ether(4) callback ng_ether_link_state_p, which
  is called from if_link_state_change(), every time link is changed.
- In ng_ether_link_state() send netgraph control message notifying
  of link state change to a node connected to "lower" hook.

Reviewed by:	sam
MFC after:	2 weeks
2005-01-08 12:42:03 +00:00
Warner Losh
c398230b64 /* -> /*- for license, minor formatting changes 2005-01-07 01:45:51 +00:00
Sam Leffler
94f5c9cfc0 Cleanup link state change notification:
o add new if_link_state_change routine that deals with link state changes
o change mii to use if_link_state_change
2004-12-08 05:45:59 +00:00
Max Laier
69fb23b73d Implement the check I was talking about in the previous message already.
Introduce domain_init_status to keep track of the init status of the domains
list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 =
initialized/done. Higher values can be used to support late addition of
domains which right now "works", but is potential dangerous. I choose to
only give a warning when doing so.

Use domain_init_status with if_attachdomain[1]() to ensure that we have a
complete domains list when we init the if_afdata array. Store the current
value of domain_init_status in if_afdata_initialized. This way we can update
if_afdata after a new protocol has been added (once that is allowed).

Submitted by:	se (with changes)
Reviewed by:	julian, glebius, se
PR:		kern/73321	(partly)
2004-11-30 22:38:37 +00:00
Robert Watson
6237419d5c Assign if_broadcastaddr to NULL not 0 in if_attach().
Printf() a warning if if_attachdomain() is called more than once on an
  interface to generate some noise on mailing lists when this occurs.

Fix up style in if_start(), where spaces crept in instead of tabs at
some point.

MFC after:	1 week
MFC note:	Not the printf().
2004-11-23 23:31:33 +00:00
Robert Watson
0b762445b9 Move if_handoff() from an inline in if_var.h to a function to if.c
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.

MFC after:	3 days
Discussed at:	EuroBSDCon Developer's Summit
2004-10-30 09:39:13 +00:00
Robert Watson
31302ebf9d Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.

MFC after:	3 days
Bumped into by:	jmg
2004-10-19 18:11:55 +00:00
Brian Feldman
5ed8cedc83 Call sbuf_finish() before sbuf_data() so as to not panic the system. 2004-09-22 12:53:27 +00:00
Brooks Davis
4dcf2bbbff Fix a LOR where ifconf() used copyout while holding a mutex. This LOR
was seen when configuring addresses on interfaces using ifconfig.  This
patch has been verified to work with over eight thousand addresses
assigned to an interface.

LOR id:		031
2004-09-22 08:59:41 +00:00
Brooks Davis
71672bb6f6 Log the renaming of an interface. This should make it easier to follow
kernel log files.
2004-09-18 05:02:08 +00:00
Brooks Davis
55287f2a60 Re-add ifi_epoch, to struct if_data, this time replacing ifi_unused
to avoid ABI changes.  It is set to the last time the interface
counters were zeroed, currently the time if_attach() was called.  It is
intentended to be a valid value for RFC2233's ifCounterDiscontinuityTime
and to make it easier for applications to verify that the interface they
find at a given index is the one that was there last time they looked.

Due to space constraints ifi_epoch is a time_t rather then a struct
timeval.  SNMP would prefer higher precision, but this unlikely to be
useful in practice.
2004-09-08 04:50:55 +00:00
John-Mark Gurney
9b90387dcf don't call f_detach if the filter has alread removed the knote.. This
happens when a proc exits, but needs to inform the user that this has
happened..  This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...

MFC after:	5 days
2004-09-06 19:02:42 +00:00
Brooks Davis
4ff62bd97b Back out ifi_epoch. The ABI breakage is too disruptive this close to
5-STABLE. ifi_epoch will shortly be reintroduced with less precistion
using the space currently allocated to ifi_unused.
2004-09-02 05:07:29 +00:00
Max Laier
7b21048cea Fix an assertion when if_down()ing a ALTQ managed interface. The lock should
have been in place all the time the mtx_assert in the ALTQ code just
discovered the shortcoming.

PR:		i386/71195
Tested by:	Bettan (PR originator), myself
MFC after:	5 days
2004-09-01 19:56:47 +00:00
Brooks Davis
9e734b4468 Use a spare byte in struct if_data to store the structure size without
increasing it.  Add code to ifconfig to use this size to find the
sockaddr_dl after the struct if_data in the routing message.  This
allows struct if_data to grow (up to 255 bytes) without breaking
ifconfig.

Submitted by:	peter
2004-09-01 18:22:14 +00:00
Brooks Davis
1fc4519b1d Add a new variable, ifi_epoch, to struct if_data. It is set to the last
time the interface counters were zeroed, currently the time if_attach()
was called.  It is indentended to be a valid value for RFC2233's
ifCounterDiscontinuityTime and to make it easier for applications to
verify that the interface they find at a given index is the one that was
there last time they looked.

An if_epoch "compatability" macro has not been created as ifi_epoch has
never been a member of struct ifnet.

Approved by:	andre, bms, wollman
2004-08-30 06:29:26 +00:00
Brooks Davis
b9907cd45b When detaching an interface, don't leave an obsolete pointer to the
soon to be deleted struct ifnet around.

PR:		kern/52260
MFC After:	3 days
2004-08-27 19:42:40 +00:00
John-Mark Gurney
ad3b9257c2 Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers.  Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks.  Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by:	green, rwatson (both earlier versions)
2004-08-15 06:24:42 +00:00
Peter Pentchev
3f35d5150b Do not attempt to clean up data that has not been initialized yet.
This fixes two kernel panics on boot when the xl driver fails to
allocate bus/port/memory resources.

Reviewed by:	silence on -net
2004-08-06 09:08:33 +00:00
Robert Watson
af5e59bf28 Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment.  To do this, introduce if_start(), a
wrapper function for ifp->if_start().  If the interface can run MPSAFE,
it directly dispatches into the interface start routine.  If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch.  However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.
2004-07-27 23:20:45 +00:00
Robert Watson
8bbfdc98e4 Gratuitous whitespace change to un-wrap a short line. 2004-07-18 19:53:35 +00:00
Brooks Davis
f889d2ef8d Major overhaul of pseudo-interface cloning. Highlights include:
- Split the code out into if_clone.[ch].
 - Locked struct if_clone. [1]
 - Add a per-cloner match function rather then simply matching names of
   the form <name><unit> and <name>.
 - Use the match function to allow creation of <interface>.<tag>
   vlan interfaces.  The old way is preserved unchanged!
 - Also the match function to allow creation of stf(4) interfaces named
   stf0, stf, or 6to4.  This is the only major user visible change in
   that "ifconfig stf" creates the interface stf rather then stf0 and
   does not print "stf0" to stdout.
 - Allow destroy functions to fail so they can refuse to delete
   interfaces.  Currently, we forbid the deletion of interfaces which
   were created in the init function, particularly lo0, pflog0, and
   pfsync0.  In the case of lo0 this was a panic implementation so it
   does not count as a user visiable change. :-)
 - Since most interfaces do not need the new functionality, an family of
   wrapper functions, ifc_simple_*(), were created to wrap old style
   cloner functions.
 - The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
   IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
   instead.

Submitted by:   Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by:    andre, mlaier
Discussed on:	net
2004-06-22 20:13:25 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Max Laier
4cb655c020 Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.
2004-06-15 01:45:19 +00:00
Max Laier
02b199f158 Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by:	(i386)LINT
2004-06-13 17:29:10 +00:00
Luigi Rizzo
3fefbff0c2 arpcom untangling:
consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.

In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'
2004-04-24 22:24:48 +00:00
Luigi Rizzo
f4247b5934 Fix a recently introduced panic in if_detach() by delaying
the invalidation of ifindex_table[] entry. Probably this
code should be moved even further down, but for the time being
let's do it this way.
2004-04-19 17:28:15 +00:00
Max Laier
8614fb12a0 Make if_(un)route static in if.c as they are called from if_up/if_down only.
This is also cleanup to make locking easier.

Reviewed by:	luigi
Approved by:	bms(mentor)
2004-04-18 18:59:44 +00:00
Luigi Rizzo
9046571f1c Use if_link instead of the alias if_list, and change a for() into
the TAILQ_FOREACH() form.

Comment the need to store the same info (mac address for ethernet-type
devices) in two different places.

No functional changes. Even the compiler output should be unmodified
by this change.
2004-04-16 10:32:13 +00:00
Luigi Rizzo
9b98ee2c4f Consistently use ifaddr_byindex() to access the link-level address
of an interface. No functional change.

On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed
2004-04-16 08:14:34 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Brooks Davis
bc1470f1f1 Don't allow interfaces to be renamed to the empty string.
While I'm here, errors aren't bools.

Pointed out by:	hmp
2004-03-13 02:35:03 +00:00
Brooks Davis
196f7f54d2 Remove if_withname. It came in with the KAME import, but never got
used.  Should someone need its functionality, it's a really expensive
implementation of:
	ifnet_byindex(sdl->sdl_index)

Reviewed by:    bde, ume
2004-03-13 02:31:40 +00:00
Max Laier
25a4adcec4 Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)
2004-02-26 04:27:55 +00:00
Poul-Henning Kamp
dc08ffec87 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
Yaroslav Tykhiy
913e410e29 Minor beautifications related to style(9) and code consistency.
No functional changes.
2004-02-21 12:56:09 +00:00
Yaroslav Tykhiy
efb4018be7 Improve the SIOCSIFCAP handler a bit:
- allow for ifp->if_ioctl being NULL, as the rest of ifioctl() does;
- give the interface driver a chance to report a error to the caller;
- don't forget to update ifp->if_lastchange upon successful modification
  of interface operation parameters.
2004-02-21 12:48:25 +00:00
Brooks Davis
36c19a572a Add the kernel side of network interface renaming support.
The basic process is to send a routing socket announcement that the
interface has departed, change if_xname, update the sockaddr_dl
associated with the interface, and announce the arrival of the interface
on the routing socket.

As part of this change, ifunit() is greatly simplified by testing
if_xname directly.  if_clone_destroy() now uses if_dname to look up the
cloner for the interface and if_dunit to identify the unit number.

Reviewed by:	ru, sam (concept)
		Vincent Jardin <vjardin AT free.fr>
		Max Laier <max AT love2party.net>
2004-02-04 02:54:25 +00:00
Brooks Davis
ccb82468ac More macro cleanup. Use the system roundup2() macro instead of making
our own ROUNDUP() macro.

Suggested by:	bde
2004-02-02 21:55:34 +00:00
Brooks Davis
a8773564ca Cleanup malloc() use in if_attach():
- malloc() returns a void* and does not need a cast
 - when called with M_WAITOK, malloc() can not return NULL so don't
   check for that case.  The result of the check was bogus anyway since
   it would leave the interface broken.
2004-01-27 19:35:05 +00:00
Brooks Davis
8abaf58586 Clean up macro usage in if_attach():
- Use the system offsetof macro rather then making out own.
 - undef ROUND after we use it rather then polluting the whole file.
2004-01-27 03:15:09 +00:00
Ruslan Ermilov
12b8b80e45 Don't panic if there are more than 255 interfaces in the system. 2004-01-23 15:53:23 +00:00
Brian Feldman
5d7252afab Don't truncate the interface name in ifunit(). It's now possible to query
"very long interface names", e.g.:
ndis_atheros0: flags=8847<UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST> mtu 1500
2003-12-26 18:09:35 +00:00
Brooks Davis
9bf40ede4a Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By:	re (in principle)
Reviewed By:	njl, imp
Tested On:	i386, amd64, sparc64
Obtained From:	NetBSD (if_xname)
2003-10-31 18:32:15 +00:00
Brooks Davis
13fb40df0a Replace a couple printfs with if_printfs. 2003-10-31 01:35:07 +00:00
Hajimu UMEMOTO
234a35c714 Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot
use mutex lock directly here.  Protect ifp->if_afdata instead.

Reported by:	grehan
2003-10-24 16:57:59 +00:00
Dag-Erling Smørgrav
72fd1b6a20 Clean up whitespace, remove "register" keyword, ANSIfy.
No functional changes.
2003-10-23 13:49:10 +00:00
Hajimu UMEMOTO
e115574c1d protect by IFNET_RLOCK. 2003-10-22 15:10:39 +00:00
Hajimu UMEMOTO
31b1bfe1b0 - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from:	KAME
2003-10-17 15:46:31 +00:00
Hajimu UMEMOTO
212bd869db AF_LINK sockaddr has to be attached to ifp->if_addrlist until the
end, as many of the code assumes that TAILQ_FIRST(ifp->if_addrlist)
is non-null.

Submitted by:	itojun
2003-10-16 13:38:29 +00:00
Sam Leffler
d1dd20be6e Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents.  Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
  may be returned; this was never used and added unwarranted complexity
  for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
  we assume the parent will remain as long as the clone; doing this avoids
  a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
   applications cannot/do-no know about mutex's.  Doing this requires
   that the mutex be the last element in the structure.  A better solution
   is to introduce an externalized version of struct rtentry but this is
   a major task because of the intertwining of rtentry and other data
   structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
   work to eliminate many held references.  If not these will be resolved
   prior to release.
3. ATM changes are untested.

Sponsored by:	FreeBSD Foundation
Obtained from:	BSD/OS (partly)
2003-10-04 03:44:50 +00:00
Poul-Henning Kamp
ed692400eb I don't know from where the notion that device driver should or
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.

If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.

Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.

The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.
2003-09-28 20:48:13 +00:00
Hajimu UMEMOTO
89eaef50bb Disabling multicast on vlan interface caused kernel panic.
PR:		kern/40723
Submitted by:	Hideki ONO <ono@kame.net>
MFC after:	1 week
2003-07-19 16:47:16 +00:00
Mark Murray
51da11a27a Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.
2003-04-30 12:57:40 +00:00
John Baldwin
31566c96f4 Use td->td_ucred instead of td->td_proc->p_ucred. 2003-03-20 21:17:40 +00:00
Poul-Henning Kamp
d42ee4e410 Note that MAJOR_AUTO is now the default if d_maj is not initialized. This
is more robust and prevents the hijacking of /dev/console for the typical
mistake.

Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.
2003-03-09 11:03:45 +00:00
Poul-Henning Kamp
182a9f7455 Make nokqfilter() return the correct return value.
Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.
2003-03-03 16:24:47 +00:00
Poul-Henning Kamp
7ac40f5f59 Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
Maxime Henrion
7e1f8a0b2f Make the network /dev entries use MAJOR_AUTO. 2003-02-28 18:04:42 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
Max Khon
6cdcc15976 - add support for IPX (tested with mount -t nwfs and mars_nwe),
IP fast forwarding, SIOCGIFADDR, setting hardware address (not currently
enabled in cm driver), multicasts (experimental)
- add ARC_MAX_DATA, use IF_HANDOFF, remove arc_sprintf() and some unused
variables
- if_simloop logic is made more similar to ethernet
- drop not ours packets early (if we are not in promiscous mode)

Submitted by:	mark tinguely (partially)
2003-01-24 01:32:20 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Jeffrey Hsu
956b0b653c SMP locking for radix nodes. 2002-12-24 03:03:39 +00:00
Jeffrey Hsu
b30a244c34 SMP locking for ifnet list. 2002-12-22 05:35:03 +00:00
Jeffrey Hsu
19fc74fb60 Lock up ifaddr reference counts. 2002-12-18 11:46:59 +00:00
Sam Leffler
0f43e1aada Back out rev 1.150; things are more complicated than this. 2002-11-15 18:42:10 +00:00
Sam Leffler
10ed96fd9c if_attach should not sleep; change malloc's M_WAITOK to M_NOWAIT 2002-11-15 18:35:41 +00:00
Brooks Davis
fa882e87a5 Add a new helper function if_printf() modeled on device_printf(). The
function takes a struct ifnet pointer followed by the usual printf
arguments and prints "<interfacename>: " before the results of printf.
Since this is the primary form of printf calls in network device drivers
and accounts for most uses of the ifnet menber if_unit, this
significantly simplifies many printf()s.
2002-09-24 17:35:08 +00:00
Juli Mallett
6e82956c21 Clean up a comment talking about C strings, which are terminated with the
ASCII NUL character (0, or '\0' in C).
2002-08-19 17:20:03 +00:00
Maxim Sobolev
ffb079be0c Implement user-setable promiscuous mode (a new `promisc' flag for ifconfig(8)).
Also, for all interfaces in this mode pass all ethernet frames to upper layer,
even those not addressed to our own MAC, which allows packets encapsulated
in those frames be processed with packet filters (ipfw(8) et al).

Emphatically requested by:	Anton Turygin <pa3op@ukr-link.net>
Valuable suggestions by:	fenner
2002-08-19 15:16:38 +00:00
Maxim Sobolev
62f7648682 Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by:	-hackers, -net
2002-08-18 07:05:00 +00:00
Robert Watson
8f293a63ce Introduce support for Mandatory Access Control and extensible
kernel access control.

Introduce two ioctls, SIOCGIFMAC, SIOCSIFMAC, which permit user
processes to manage the MAC labels on network interfaces.  Note
that this is part of the user process API/ABI that will be revised
prior to 5.0-RELEASE.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-08-01 21:15:53 +00:00
Robert Watson
e70cd26366 Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the interface management code so that MAC labels are
properly maintained on network interfaces (struct ifnet).  In
particular, invoke entry points when interfaces are created and
removed.  MAC policies may initialized the label interface based
on a variety of factors, including the interface name.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2002-07-31 16:16:03 +00:00
Jonathan Mini
13990766ef Check retifma for NULL before using it.
PR:		kern/9391
Submitted by:	Assar Westerlund <assar@sics.se>
MFC after:	3 days
2002-07-02 08:23:00 +00:00
Brooks Davis
ae5a19be8e Move all unit number management cloned interfaces into the cloning
code.  The reverts the API change which made the <if>_clone_destory()
functions return an int instead of void bringing us into closer
alignment with NetBSD.

Reviewed by:	net (a long time ago)
2002-05-25 20:17:04 +00:00
SUZUKI Shinsuke
88ff5695c1 just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by:	ume
MFC after:	1 week
2002-04-19 04:46:24 +00:00
Peter Wemm
d637e9891d Add missing 'struct ifreq ifr;' that was forgotten in the last commit. 2002-04-10 06:07:16 +00:00
SUZUKI Shinsuke
ee0a4f7ee7 fixed a kernel crash when enabling multicast on vlan interface
owing to a NULL argument to vlan_ioctl() at if_allmulti().

Reviewed by:    ume
MFC after:   	1 week
2002-04-10 04:18:42 +00:00
John Baldwin
6008862bc2 Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on:	i386, alpha, sparc64
2002-04-04 21:03:38 +00:00
John Baldwin
44731cab3b Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API.  The entire API now consists of two functions
similar to the pre-KSE API.  The suser() function takes a thread pointer
as its only argument.  The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0.  The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on:	smp@
2002-04-01 21:31:13 +00:00
Hajimu UMEMOTO
c61cd599ec Make `route add -inet6 default ::1 -ifp gif0' work actually.
The change between 1.13 and 1.14 is specific to AF_INET.

MFC after:	1 week
2002-04-01 16:17:13 +00:00
Alfred Perlstein
929ddbbb89 Remove __P. 2002-03-19 21:54:18 +00:00
Maxime Henrion
3b16e7b252 Simplify the interface cloning framework by handling unit
unit allocation with a bitmap in the generic layer.  This
allows us to get rid of the duplicated rman code in every
clonable interface.

Reviewed by:	brooks
Approved by:	phk
2002-03-11 09:26:07 +00:00
Brian Feldman
0346e9733a Use revoke_and_destroy_dev() instead of destroy_dev() when removing /dev/net
pseudo-devices when an interface goes away.  Otherwise, an open /dev/net/foo0
when the interface is removed can cause a crash.

Not objected to by:	jlemon
2002-03-05 17:50:35 +00:00
Brooks Davis
b75496fedf Change the network interface cloning API so the destroy function returns
an int errorcode instead of void in preperation for merging cloning of
the loopback device.

Submitted by:	mux
MFC after:	2 weeks
2002-03-04 21:43:49 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Peter Wemm
c0933269c3 Fix a warning by pulling prototype for arp_ifinit() into scope.
Then fix cast the correct value into an incorrect value, which was not
detected due to the missing prototype (but was harmless anyway).
2002-02-26 01:11:08 +00:00
Luigi Rizzo
b2c08f43d0 When the local link address is changed, send out gratuitous ARPs
to notify other nodes about the address change. Otherwise, they
might try and keep using the old address until their arp table
entry times out and the address is refreshed.

Maybe this ought to be done for INET6 addresses as well but i have
no idea how to do it. It should be pretty straightforward though.

MFC-after: 10 days
2002-02-18 22:50:13 +00:00
Ruslan Ermilov
7b6edd044b Introduce an interface announcement message for the routing
socket so that routing daemons and other interested parties
know when an interface is attached/detached.

PR:		kern/33747
Obtained from:	NetBSD
MFC after:	2 weeks
2002-01-18 14:33:04 +00:00
Jonathan Lemon
de5934508a Add a SIOCGIFINDEX ioctl, which returns the index of a named interface.
This will be used to more efficiently support if_nametoindex(3).
2001-10-17 19:40:44 +00:00
Jonathan Lemon
10930aad3f Cleanup ifunit(), so it uses the dev_named() function to map an interface
name into a device.
2001-10-17 18:58:14 +00:00
Ruslan Ermilov
8071913df2 Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.
Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument.  Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest.  3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works.  Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''.  It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from:	BSD/OS, NetBSD
MFC after:	1 month
PR:		kern/28360
2001-10-17 18:07:05 +00:00
Ruslan Ermilov
66afbd6890 Revision 1.13 corresponded to CSRG revision 8.4.
Revision 1.59 corresponded to CSRG revision 8.5.
2001-10-17 10:41:00 +00:00
Bill Fenner
05153c617d if_index is the highest interface index in the system, not the next
available index.
2001-10-17 04:23:14 +00:00
Max Khon
322dcb8d3d bring in ARP support for variable length link level addresses
Reviewed by:	jdp
Approved by:	jdp
Obtained from:	NetBSD
MFC after:	6 weeks
2001-10-14 20:17:53 +00:00
Jonathan Lemon
d2b4566aa6 Fix the ``WARNING: Driver mistake: repeat make_dev'', caused by using
the wrong index variable within a loop.  I have no idea how this managed
to work on my test box.

Spotted by: fenner
2001-10-11 18:39:05 +00:00
Jonathan Lemon
ffb5a10458 Move device nodes into a /dev/net/ directory, to avoid conflict with
existing devices (e.g.: tunX).  This may need a little more thought.

Create a /dev/netX alias for devices.  net0 is reserved.

Allow wiring of net aliases in /boot/device.hints of the form:
	hint.net.1.dev="lo0"
	hint.net.12.ether="00:a0:c9:c9:9d:63"
2001-10-11 05:54:39 +00:00
Jonathan Lemon
9a2a57a1de Add ability to attach knotes to network devices.
Introduce EVFILT_NETDEV to report network device changes.
2001-09-29 18:32:35 +00:00
Jonathan Lemon
f13ad20660 Introduce network device nodes. Network devices will now automatically
appear in /dev.  Interface hardware ioctls (not protocol or routing) can
be performed on the descriptor.  The SIOCGIFCONF ioctl may be performed
on the special /dev/network node.
2001-09-29 05:55:04 +00:00