Commit Graph

146 Commits

Author SHA1 Message Date
Gleb Smirnoff
eaf151c49d Improve style(9) of bcopy() to and from mbuf tag.
Submitted by:	bde
2012-05-30 13:51:00 +00:00
Gleb Smirnoff
a856ddc665 After r228571 carp_output() expects carp_softc * pointer in the mtag.
Noticed by:	thompsa
2012-05-30 07:11:27 +00:00
Gleb Smirnoff
a9a2c40ced It is a logical error that in carp_multicast_cleanup()
we look at count of addresses on a particular vhid, we
should account number of addresses on cif.

To achieve this we need to run carp_attach() and
carp_detach() under appropriate cif lock.
2012-04-11 12:26:30 +00:00
Gleb Smirnoff
9ca1fe0db8 CARP should be capable to run on if_bridge(4). Unfortunately,
this commit is not enough to enable CARP operation on
if_bridge(4), because the latter doesn't handle or even
initialize its ifp->if_link_state.

Reported by:	Alexander Lunev <sol289 gmail.com>
2012-04-10 05:42:48 +00:00
Gleb Smirnoff
afdbac98df Set vnet context in callouts and taskqueues.
PR:		164696
2012-02-08 13:39:38 +00:00
Gleb Smirnoff
2512b096a2 o Provide functions carp_ifa_addroute()/carp_ifa_delroute()
to cleanup routes from a single ifa.
o Implement carp_addroute()/carp_delroute() via above functions.
o Call carp_ifa_delroute() in the carp_detach() to avoid
  junk routes left in routing table, in case if user
  removes an address in a MASTER state. [1]

Reported by:	az [1]
2012-02-01 11:07:41 +00:00
John Baldwin
137f91e80f Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
Gleb Smirnoff
1c435c73a1 Use a better log message for master down event. 2011-12-22 18:48:21 +00:00
Gleb Smirnoff
f08535f872 Restore a feature that was present in 5.x and 6.x, and was cleared in
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.

However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:

- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
  conditions, for now these are:
  - interface goes down
  - carp(4) has problems with ip_output() or ip6_output()
  - pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
  is actual value added to advskew. The adjustment values for
  particular error conditions are also configurable, and their
  defaults are maximum advskew value, so a single failure bumps
  demotion to maximum. This is for POLA compatibility, and should
  satisfy most users.
- Demotion factor is a writable sysctl, so user can do
  foot shooting, if he desires to.
2011-12-20 13:53:31 +00:00
Gleb Smirnoff
08b68b0e4c A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
Brooks Davis
4b22573a89 In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type.  Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after:	1 Month
2011-11-11 22:57:52 +00:00
Gleb Smirnoff
2a2e6f0aeb Never switch directly from INIT to MASTER, since this produces
nasty status flaps.

PR:		kern/161123
Submitted by:	Damien Fleuriot <dam my.gd>
OpenBSD:	ip_carp.c, rev. 1.115
2011-10-14 19:05:26 +00:00
Bjoern A. Zeeb
a0ae8f04e8 Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs.  For module builds the framework needs
adjustments for at least carp.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:30:44 +00:00
Gleb Smirnoff
0715546197 Redo r166423. It is important not only skip freeing multicast
entires when underlying interface is detached, but also purge
pointers to them, to avoid double-free in future.
2010-11-24 05:24:36 +00:00
Gleb Smirnoff
6baf7a243a Do not convert some meaningful error value to EINVAL.
Reviewed by:	will
2010-09-20 12:23:10 +00:00
Will Andrews
15249f73e9 Fix CARP in backup mode by properly registering its hooks for INET and INET6
using ipproto_{un,}register() and the newly created ip6proto_{un,}register()
so that it can again receive IPPROTO_CARP packets allowing its state machine
to work.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-09-06 21:06:06 +00:00
Will Andrews
e24fa11d3e Fix static kernel builds with carp(4) by changing its SYSINIT order so that
it is initialized after basic protocol initialization, which allows it to
register via pf_proto_register().

Reviewed by:	bz
Approved by:	ken (mentor)
2010-09-06 21:03:30 +00:00
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
Xin LI
9fe5092de1 Address an edge condition that we found at work, where the carp(4)
interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at
cold boot.  This behavior is not observed when carp(4) interface
is created slightly later, when the underlying interface is fully
up.

Before this change what happen at boot is roughly:

 - ifconfig creates em0 interface;
 - ifconfig clones a carp device using em0;
   (em0's link state is DOWN at this point)
 - carp state: INIT -> BACKUP [*]
 - carp state: BACKUP -> MASTER
 - [Some negotiate between em0 and switch]
 - em0 kicks up link state change event
   (em0's link state is now up DOWN at this point)
 - do_link_state_change() -> carp_carpdev_state()
 - carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+]
 - carp state: INIT -> BACKUP
 - carp state: BACKUP -> MASTER

At the [*] stage, em0 did not received any broadcast message from other
node, and assume our node is the master, thus carp(4) sets the link
state to "UP" after becoming a master.  At [+], the master status
is forcely set to "INIT", then an election is casted, after which our
node would actually become a master.

We believe that at the [*] stage, the master status should remain as
"INIT" since the underlying parent interface's link state is not up.

Obtained from:	iXsystems, Inc.
Reported by:	jpaetzel
MFC after:	2 months
2010-08-08 07:04:27 +00:00
Ruslan Ermilov
acc0fee071 Complete the swap of carp(4) log levels and document the change.
MFC after:	3 days
2010-01-08 16:14:41 +00:00
Gleb Smirnoff
e81ab87652 Until this moment carp(4) used a strange logging priority. It used debug
priority for such important information as MASTER/BACKUP state change,
and used a normal logging priority for such innocent messages as receiving
short packet (which is a normal VRRP packet between some other routers) or
receving a CARP packet on non-carp interface (someone else running CARP).

This commit shifts message logging priorities to a more sane default.
2009-12-02 13:24:21 +00:00
Will Andrews
52e12426d1 Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when
CARP tries to free them using M_IFADDR after the last address for a virtual
host is removed and when detaching from the parent interface.

Reviewed by:	mlaier
Approved by:	re (kib), ken (mentor)
2009-08-20 02:33:12 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Xin LI
16e324fc56 Show interface name which received short CARP packet (e.g. a VRRP packet),
in order to match other codepaths nearby.  This makes troubleshooting
easier.

Approved by:	re (kib)
MFC after:	1 month
2009-07-30 17:40:47 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Robert Watson
d1da0a0672 Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by:	bz
MFC after:	6 weeks
2009-06-25 16:35:28 +00:00
Robert Watson
2d9cfabad4 Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support.  As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done.  This means that one
class of reader-writer races still exists.

MFC after:      6 weeks
Reviewed by:    bz
2009-06-25 11:52:33 +00:00
Robert Watson
32187eb6d9 Fix CARP build.
Reported by:	bz
2009-06-24 21:34:38 +00:00
Robert Watson
80af0152f3 Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead).  Adopt
the code styles and conventions present in netinet where possible.

Reviewed by:	gnn, bz
MFC after:	6 weeks (possibly not MFCable?)
2009-06-24 21:00:25 +00:00
Robert Watson
8c0fec805f Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references.  The following routines now return references:

  ifaddr_byindex
  ifa_ifwithaddr
  ifa_ifwithbroadaddr
  ifa_ifwithdstaddr
  ifa_ifwithnet
  ifaof_ifpforaddr
  ifa_ifwithroute
  ifa_ifwithroute_fib
  rt_getifa
  rt_getifa_fib
  IFP_TO_IA
  ip_rtaddr
  in6_ifawithifp
  in6ifa_ifpforlinklocal
  in6ifa_ifpwithaddr
  in6_ifadd
  carp_iamatch6
  ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

  IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc).  In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit.  Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by:	bz
Obtained from:	Apple, Inc. (portions)
MFC after:	6 weeks (portions)
2009-06-23 20:19:09 +00:00
Bruce M Simpson
33cde13046 Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev.  Summary of changes:

 * Connect netinet6/in6_mcast.c to build.
   The legacy KAME KPIs are mostly preserved.
 * Eliminate now dead code from ip6_output.c.
   Don't do mbuf bingo, we are not going to do RFC 2292 style
   CMSG tricks for multicast options as they are not required
   by any current IPv6 normative reference.
 * Refactor transports (UDP, raw_ip6) to do own mcast filtering.
   SCTP, TCP unaffected by this change.
 * Add ip6_msource, in6_msource structs to in6_var.h.
 * Hookup mld_ifinfo state to in6_ifextra, allocate from
   domifattach path.
 * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
   Kernel consumers which need this should use in6m_lookup().
 * Refactor IPv6 socket group memberships to use a vector (like IPv4).
 * Update ifmcstat(8) for IPv6 SSM.
 * Add witness lock order for IN6_MULTI_LOCK.
 * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
 * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
 * Update carp(4) for new IPv6 SSM KPIs.
 * Virtualize ip6_mrouter socket.
   Changes mostly localized to IPv6 MROUTING.
 * Don't do a local group lookup in MROUTING.
 * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
 * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
 * Bump __FreeBSD_version to 800084.
 * Update UPDATING.

NOTE WELL:
 * This code hasn't been tested against real MLDv2 queriers
   (yet), although the on-wire protocol has been verified in Wireshark.
 * There are a few unresolved issues in the socket layer APIs to
   do with scope ID propagation.
 * There is a LOR present in ip6_output()'s use of
   in6_setscope() which needs to be resolved. See comments in mld6.c.
   This is believed to be benign and can't be avoided for the moment
   without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
Robert Watson
db091502fb Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead
(colloquially known as if_addrlist).  Currently not acquired around
interface address loops that call out to the routing code due to
potential lock order issues.

MFC after:	3 weeks
2009-04-26 19:05:40 +00:00
Kip Macy
279aa3d419 Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
Robert Watson
6bf65bcf3a Update stats in struct carpstats using two new macros: CARPSTATS_ADD()
and CARPSTATS_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the implementation
of these statistics, such as using per-CPU versions of the data
structure.

MFC after:	3 days
2009-04-12 14:19:37 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
Gleb Smirnoff
0b476f1cce In a case of CARP status change run through the if_link_state_change()
routine, so that devd(8) and others are notified about link state change.
2008-12-05 14:37:14 +00:00
Bjoern A. Zeeb
4b79449e2f Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Ermal Luçi
7972c979c5 Fix carp(4) panics that can occur during carp interface configuration.
Approved by:	mlaier (mentor)
Reported by:	Scott Ullrich
MFC after:	1 week
2008-07-14 20:11:51 +00:00
Max Laier
1ead26d4e1 Sort IP addresses before hashing them for the signature. Otherwise carp is
sensitive to address configuration order.

PR:		kern/121574
Reported by:	Douglas K. Rand, Wouter de Jong
Obtained from:	OpenBSD (rev 1.114 + fixes)
MFC after:	2 weeks
2008-06-02 18:58:07 +00:00
Gleb Smirnoff
e60a0104f8 If the vhid already present, return EEXIST instead of
non-informative EINVAL.
2008-02-07 13:18:59 +00:00
Mike Silbersack
4b421e2daa Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by:	re (kensmith)
2007-10-07 20:44:24 +00:00
Robert Watson
c6b2899785 Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over:	bz
Approved by:	re (kensmith)
2007-07-28 07:31:30 +00:00
Bruce M Simpson
71498f308b Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
 * IPv4 multicast socket processing is now moved out of ip_output.c
   into a new module, in_mcast.c.
 * The in_mcast.c module implements the IPv4 legacy any-source API in
   terms of the protocol-independent source-specific API.
 * Source filters are lazy allocated as the common case does not use them.
   They are part of per inpcb state and are covered by the inpcb lock.
 * struct ip_mreqn is now supported to allow applications to specify
   multicast joins by interface index in the legacy IPv4 any-source API.
 * In UDP, an incoming multicast datagram only requires that the source
   port matches the 4-tuple if the socket was already bound by source port.
   An unbound socket SHOULD be able to receive multicasts sent from an
   ephemeral source port.
 * The UDP socket multicast filter mode defaults to exclusive, that is,
   sources present in the per-socket list will be blocked from delivery.
 * The RFC 3678 userland functions have been added to libc: setsourcefilter,
   getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
 * Definitions for IGMPv3 are merged but not yet used.
 * struct sockaddr_storage is now referenced from <netinet/in.h>. It
   is therefore defined there if not already declared in the same way
   as for the C99 types.
 * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
   which are then interpreted as interface indexes) is now deprecated.
 * A patch for the Rhyolite.com routed in the FreeBSD base system
   is available in the -net archives. This only affects individuals
   running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
 * Make IPv6 detach path similar to IPv4's in code flow; functionally same.
 * Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from:  p4://bms_netdev
Submitted by:   Wilbert de Graaf (original work)
Reviewed by:    rwatson (locking), silence from fenner,
		net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
Gleb Smirnoff
e9bf9fb67c Do not leak lock in the case of EEXIST error.
PR:		kern/92776
Submitted by:	Ed Schouten <Ed.Schouten tunix.nl>
2007-06-06 14:21:49 +00:00
Gleb Smirnoff
fbfdcf8735 Since rev. 1.94 of netinet/in.c, the netinet layer frees all its
multicast memberships, when interface is detached. Thus, when
an underlying interface is detached, we do not need to free
our multicast memberships.

Reviewed by:	bms
2007-02-02 09:39:09 +00:00
Gleb Smirnoff
3cf0d02480 Make it possible that carpdetach() unlocks on return. Then, in
carp_clone_destroy() we are on a safe side, we don't need to
unlock the cif, that can me already non-existent at this point.

Reported by:	Anton Yuzhaninov <citrin rambler-co.ru>
2007-01-25 18:03:40 +00:00
Gleb Smirnoff
62dae1e917 Spacing. 2007-01-25 17:58:16 +00:00
Robert Watson
acd3428b7d Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
Bjoern A. Zeeb
7002145d8e Set scope on MC address so IPv6 carp advertisement will not get dropped
in ip6_output. In case this fails  handle the error directly and log it[1].
In addition permit CARP over v6 in ip_fw2.

PR:                     kern/98622
Similar patch by:       suz
Discussed with:         glebius [1]
Tested by:              Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr
MFC after:              3 days
2006-10-07 10:19:58 +00:00
Bruce M Simpson
13c8384424 Fix an incompatibility between CARP and IPv4 multicast routing, whereby
the VRRPv2 advertisements will originate from the wrong source address.
This only affects kernels compiled with MROUTING and after the MRT_INIT
ioctl() has been issued.
Set imo_multicast_vif in carp's softc to the invalid value -1 after it is
zeroed by softc allocation, to stop the ip_output() path looking up the
incorrect source address thinking a vif is set.

PR:		kern/100532
Submitted by:	Bohus Plucinsky
MFC after:	1 week
2006-09-25 11:53:54 +00:00
Sam Leffler
6b7330e2d4 Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by:	arch@
2006-07-09 06:04:01 +00:00
Max Laier
05206588f2 Make in-kernel multicast protocols for pfsync and carp work after enabling
dynamic resizing of multicast membership array.

Reported and testing by:	Maxim Konovalov, Scott Ullrich
Reminded by:			thompsa
MFC after:			2 weeks
2006-07-08 00:01:01 +00:00
Christian S.J. Peron
16d878cc99 Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0
	(2) Packet is received by netif0
	(3) ifp->if_bpf pointer is checked and handed off to bpf
	(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
	    initialized to NULL.
	(5) ifp->if_bpf is dereferenced by bpf machinery
	(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
  bpf interface structure. Once this is done, ifp->if_bpf should never be
  NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
  a lockless read bpf peer list associated with the interface. It should
  be noted that the bpf code will pickup the bpf_interface lock before adding
  or removing bpf peers. This should serialize the access to the bpf descriptor
  list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
  can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

	(1) Packet is received by netif0
	(2) Check to see if bpf descriptor list is empty
	(3) Pickup the bpf interface lock
	(4) Hand packet off to process

From the attach/detach side:

	(1) Pickup the bpf interface lock
	(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
    not we have any bpf peers that might be interested in receiving packets.

In collaboration with:	sam@
MFC after:	1 month
2006-06-02 19:59:33 +00:00
Gleb Smirnoff
0fa0801895 o Introduce carp_multicast_cleanup(), which removes and frees
multicast addresses from carp interface. [1]
o Rewrite carpdetach(), so that it does the following things: [1]
  - Stops callouts.
  - Decrements carp_suppress_preempt, if needed.
  - Downs interface and sets CARP state to INIT.
  - Calls carp_multicast_cleanup().
  - Detaches softc from carp_if and if we are the last frees
    the carp_if.
o Use new carpdetach() in carp_clone_destroy().
o In carp_ifdetach() acquire the carp_if lock and cleanup all
  interfaces hanging on carp_if. [1]
o Make carp_ifdetach() static and use EVENT(9) to call it
  from if_detach(). [2]
o In carp_setrun() exit if the softc doesn't have a valid pointer
  to parent. [1]

Obtained from:	OpenBSD [1]
Submitted by:	Dan Lukes <dan obluda.cz> [2]
PR:		kern/82908 [2]
2006-03-21 14:29:48 +00:00
Gleb Smirnoff
218837618a MFOpenBSD 1.62:
Prevent backup CARP hosts from replying to arp requests, fixes strangeness
  with some layer-3 switches. From Bill Marquette.

Tested by:	Kazuaki Oda <kaakun highway.ne.jp>
2005-11-17 12:56:40 +00:00
Ruslan Ermilov
433aaf04cb Unbreak for !INET6 case. 2005-11-14 12:50:23 +00:00
Ruslan Ermilov
4a0d6638b3 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
Andrew Thompson
4e7e0183e1 Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by:	brooks
Reviewed by:	brooks
2005-11-08 20:08:34 +00:00
Yaroslav Tykhiy
9f4abef9a3 Since carp(4) interfaces presently are kinda fake yet possess
IP addresses, mark them with LOOPBACK so that routing daemons
take them easy for link-state routing protocols.

Reviewed by:	glebius
2005-10-26 05:57:35 +00:00
Max Laier
1e4b360655 Fix build after in6_joingroup change. It remains unclear if DAD breaks CARP
or not.
2005-10-22 14:54:02 +00:00
Andrew Thompson
febd0759f3 Change the reference counting to count the number of cloned interfaces for each
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.

Assert that all cloners have been destroyed when freeing the memory.

Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.

Discussed with:	brooks, pjd, -current
Reviewed by:	brooks
2005-10-12 19:52:16 +00:00
Gleb Smirnoff
5d40d65b5a When a carp(4) interface is being destroyed and is in a promiscous mode,
first interface is detached from parent and then bpfdetach() is called.
If the interface was the last carp(4) interface attached to parent, then
the mutex on parent is destroyed. When bpfdetach() calls if_setflags()
we panic on destroyed mutex.

To prevent the above scenario, clear pointer to parent, when we detach
ourselves from parent.
2005-09-09 08:41:39 +00:00
Robert Watson
13f4c340ae Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
Hajimu UMEMOTO
29da8af658 include netinet6/scope6_var.h. 2005-07-25 12:36:43 +00:00
Hajimu UMEMOTO
a1f7e5f8ee scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from:	KAME
2005-07-25 12:31:43 +00:00
Gleb Smirnoff
a196a3c8aa When doing ARP load balancing source IP is taken in network byte order,
so residue of division for all hosts on net is the same, and thus only
one VHID answers. Change source IP in host byte order.

Reviewed by:	mlaier
Approved by:	re (scottl)
2005-07-01 08:22:13 +00:00
David Malone
01399f34a5 Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

        1) Consistently use type u_int32_t for the header of a
           DLT_NULL device - it continues to represent the address
           family as always.
        2) In the DLT_NULL case get bpf_movein to store the u_int32_t
           in a sockaddr rather than in the mbuf, to be consistent
           with all the DLT types.
        3) Consequently fix a bug in bpf_movein/bpfwrite which
           only permitted packets up to 4 bytes less than the MTU
           to be written.
        4) Fix all DLT_NULL devices to have the code required to
           allow writing to their bpf devices.
        5) Move the code to allow writing to if_lo from if_simloop
           to looutput, because it only applies to DLT_NULL devices
           but was being applied to other devices that use if_simloop
           possibly incorrectly.

PR:		82157
Submitted by:	Matthew Luckie <mjl@luckie.org.nz>
Approved by:	re (scottl)
2005-06-26 18:11:11 +00:00
Brooks Davis
fc74a9f93a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
Gleb Smirnoff
32247f8629 - When carp interface is destroyed, and it affects global preemption
suppresion counter, decrease the latter. [1]
- Add sysctl to monitor preemption suppression.

PR:		kern/80972 [1]
Submitted by:	Frank Volf [1]
MFC after:	1 week
2005-05-15 01:44:26 +00:00
Gleb Smirnoff
9dc1f8e41e Remove anti-LOR bandaid, it is not needed now.
Sponsored by:	Rambler
2005-04-20 09:32:05 +00:00
Gleb Smirnoff
4cb39345c0 When several carp interfaces are attached to Ethernet interface,
carp_carpdev_state_locked() is called every time carp interface is attached.
The first call backs up flags of the first interface, and the second
call backs up them again, erasing correct values.
  To solve this, a carp_sc_state_locked() function is introduced. It is
called when interface is attached to parent, instead of calling
carp_carpdev_state_locked. carp_carpdev_state_locked() calls
carp_sc_state_locked() for each sc in chain.

Reported by:	Yuriy N. Shkandybin, sem
2005-03-30 11:44:43 +00:00
Gleb Smirnoff
ee6f227017 If vhid exists return more informative EEXIST instead of EINVAL. While here
remove redundant brackets.
2005-03-18 13:41:38 +00:00
Gleb Smirnoff
9860bab349 Fix a potential crash that could occur when CARP_LOG is being used.
Obtained from:	OpenBSD (pat)
2005-03-18 13:18:34 +00:00
Gleb Smirnoff
b82936c5d4 Fix typo. Unbreak build. Take pointy hat. 2005-03-02 09:11:18 +00:00
Gleb Smirnoff
d220759b41 Add more locking when reading/writing to carp softc. When carp softc is
attached to a parent interface we use its mutex to lock the softc. This
means that in several places like carp_ioctl() we lock softc conditionaly.
This should be redesigned.

To avoid LORs when MII announces us a link state change, we schedule
a quick callout and call carp_carpdev_state_locked() from it.

Initialize callouts using NET_CALLOUT_MPSAFE.

Sponsored by:	Rambler
Reviewed by:	mlaier
2005-03-01 13:14:33 +00:00
Gleb Smirnoff
d92d54d54d - Add carp_mtx. Use it to protect list of all carp interfaces.
- In carp_send_ad_all() walk through list of all carp interfaces
  instead of walking through list of all interfaces.

Sponsored by:	Rambler
Reviewed by:	mlaier
2005-03-01 12:36:07 +00:00
Gleb Smirnoff
3a84d72a78 Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by:	brooks
2005-03-01 10:59:14 +00:00
Gleb Smirnoff
4358dfc32f Remove debugging printf.
Reviewed by:	mlaier
2005-03-01 09:31:36 +00:00
Yaroslav Tykhiy
630481bb92 Support running carp(4) over a vlan(4) parent interface.
Encouraged by:	glebius
2005-02-28 16:19:11 +00:00
Gleb Smirnoff
1d0a237660 Remove unused field from carp softc.
OK'ed by:	mcbride@OpenBSD
2005-02-28 11:57:03 +00:00
Gleb Smirnoff
3e07def4cd Fix tcpdump(8) on carp(4) interface:
- Use our loop DLT type, not OpenBSD. [1]
- The fields that are converted to network byte order are not 32-bit
  fields but 16-bit fields, so htons should be used in htonl. [1]
- Secondly, ip_input changes ip->ip_len into its value without
  the ip-header length. So, restore the length to make bpf happy. [1]
- Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't
  understand uint8_t af identifier.

Submitted by:	Frank Volf [1]
2005-02-28 11:54:36 +00:00
Max Laier
a4e5390551 Unbreak the build. carp_iamatch6 and carp_macmatch6 are not supposed to be
static as they are used elsewhere.
2005-02-27 11:32:26 +00:00
Gleb Smirnoff
e8c34a71eb Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.
Obtained from:	OpenBSD
2005-02-26 13:55:07 +00:00
Gleb Smirnoff
5c1f0f6de5 Staticize local functions. 2005-02-26 10:33:14 +00:00
Gleb Smirnoff
88bf82a62e New lines when logging. 2005-02-25 11:26:39 +00:00
Gleb Smirnoff
947b7cf3c6 Embrace macros with do {} while (0)
Submitted by:	maxim
2005-02-25 10:49:47 +00:00
Gleb Smirnoff
39aeaa0eb5 Call carp_carpdev_state() from carp_set_addr6(). See log for rev 1.4.
Sponsored by:	Rambler
2005-02-25 10:12:11 +00:00
Gleb Smirnoff
1e9e65729b Improve logging:
- Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD).
 - Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when
   net.inet.carp.log > 1
 - Use CARP_DEBUG to log state changes of carp interfaces.

After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc
argument. Remove it.

Sponsored by:	Rambler
2005-02-25 10:09:44 +00:00
Gleb Smirnoff
6fba4c0bae Fix problem when master comes up with one interface down, and preempts
mastering on all other interfaces:

- call carp_carpdev_state() on initialize instead of just setting to INIT
- in carp_carpdev_state() check that interface is UP, instead of checking
  that it is not DOWN, because a rebooted machine may have interface in
  UNKNOWN state.

Sponsored by:	Rambler
Obtained from:	OpenBSD (partially)
2005-02-24 09:05:28 +00:00
Maxime Henrion
2368737719 Unbreak CARP build on 64-bit architectures.
Tested on:	sparc64
2005-02-23 00:20:33 +00:00
Gleb Smirnoff
67df421496 Remove promisc counter from parent interface in carp_clone_destroy(),
so that parent interface is not left in promiscous mode after carp
interface is destroyed.

This is not perfect, since promisc counter is added when carp
interface is assigned an IP address. However, when address is removed
parent interface is still in promiscuous mode. Only removal of
carp interface removes promisc from parent. Same way in OpenBSD.

Sponsored by:	Rambler
2005-02-22 16:24:55 +00:00
Gleb Smirnoff
a97719482d Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by:	mlaier
Obtained from:	OpenBSD (mickey, mcbride)
2005-02-22 13:04:05 +00:00