Commit Graph

61 Commits

Author SHA1 Message Date
qingli
ec826ad5c7 This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
glebius
1a93d72fb6 In a case of CARP status change run through the if_link_state_change()
routine, so that devd(8) and others are notified about link state change.
2008-12-05 14:37:14 +00:00
bz
604d89458a Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
des
66f807ed8b Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
zec
8797d4caec Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
eri
253913181a Fix carp(4) panics that can occur during carp interface configuration.
Approved by:	mlaier (mentor)
Reported by:	Scott Ullrich
MFC after:	1 week
2008-07-14 20:11:51 +00:00
mlaier
5acdb2c237 Sort IP addresses before hashing them for the signature. Otherwise carp is
sensitive to address configuration order.

PR:		kern/121574
Reported by:	Douglas K. Rand, Wouter de Jong
Obtained from:	OpenBSD (rev 1.114 + fixes)
MFC after:	2 weeks
2008-06-02 18:58:07 +00:00
glebius
c845f83019 If the vhid already present, return EEXIST instead of
non-informative EINVAL.
2008-02-07 13:18:59 +00:00
silby
f965c7bdc4 Add FBSDID to all files in netinet so that people can more
easily include file version information in bug reports.

Approved by:	re (kensmith)
2007-10-07 20:44:24 +00:00
rwatson
a62dbe240a Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over:	bz
Approved by:	re (kensmith)
2007-07-28 07:31:30 +00:00
bms
ffd77d9ba5 Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

Summary
 * IPv4 multicast socket processing is now moved out of ip_output.c
   into a new module, in_mcast.c.
 * The in_mcast.c module implements the IPv4 legacy any-source API in
   terms of the protocol-independent source-specific API.
 * Source filters are lazy allocated as the common case does not use them.
   They are part of per inpcb state and are covered by the inpcb lock.
 * struct ip_mreqn is now supported to allow applications to specify
   multicast joins by interface index in the legacy IPv4 any-source API.
 * In UDP, an incoming multicast datagram only requires that the source
   port matches the 4-tuple if the socket was already bound by source port.
   An unbound socket SHOULD be able to receive multicasts sent from an
   ephemeral source port.
 * The UDP socket multicast filter mode defaults to exclusive, that is,
   sources present in the per-socket list will be blocked from delivery.
 * The RFC 3678 userland functions have been added to libc: setsourcefilter,
   getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
 * Definitions for IGMPv3 are merged but not yet used.
 * struct sockaddr_storage is now referenced from <netinet/in.h>. It
   is therefore defined there if not already declared in the same way
   as for the C99 types.
 * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
   which are then interpreted as interface indexes) is now deprecated.
 * A patch for the Rhyolite.com routed in the FreeBSD base system
   is available in the -net archives. This only affects individuals
   running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
 * Make IPv6 detach path similar to IPv4's in code flow; functionally same.
 * Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from:  p4://bms_netdev
Submitted by:   Wilbert de Graaf (original work)
Reviewed by:    rwatson (locking), silence from fenner,
		net@ (but with encouragement)
2007-06-12 16:24:56 +00:00
glebius
27c6e3f25e Do not leak lock in the case of EEXIST error.
PR:		kern/92776
Submitted by:	Ed Schouten <Ed.Schouten tunix.nl>
2007-06-06 14:21:49 +00:00
glebius
325d4d7fda Since rev. 1.94 of netinet/in.c, the netinet layer frees all its
multicast memberships, when interface is detached. Thus, when
an underlying interface is detached, we do not need to free
our multicast memberships.

Reviewed by:	bms
2007-02-02 09:39:09 +00:00
glebius
34079a8c02 Make it possible that carpdetach() unlocks on return. Then, in
carp_clone_destroy() we are on a safe side, we don't need to
unlock the cif, that can me already non-existent at this point.

Reported by:	Anton Yuzhaninov <citrin rambler-co.ru>
2007-01-25 18:03:40 +00:00
glebius
b7c8a97d9b Spacing. 2007-01-25 17:58:16 +00:00
rwatson
10d0d9cf47 Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges.  These may
require some future tweaking.

Sponsored by:           nCircle Network Security, Inc.
Obtained from:          TrustedBSD Project
Discussed on:           arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
                        Alex Lyashkov <umka at sevcity dot net>,
                        Skip Ford <skip dot ford at verizon dot net>,
                        Antoine Brodin <antoine dot brodin at laposte dot net>
2006-11-06 13:42:10 +00:00
bz
af0ae0b158 Set scope on MC address so IPv6 carp advertisement will not get dropped
in ip6_output. In case this fails  handle the error directly and log it[1].
In addition permit CARP over v6 in ip_fw2.

PR:                     kern/98622
Similar patch by:       suz
Discussed with:         glebius [1]
Tested by:              Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr
MFC after:              3 days
2006-10-07 10:19:58 +00:00
bms
e5ef61bea6 Fix an incompatibility between CARP and IPv4 multicast routing, whereby
the VRRPv2 advertisements will originate from the wrong source address.
This only affects kernels compiled with MROUTING and after the MRT_INIT
ioctl() has been issued.
Set imo_multicast_vif in carp's softc to the invalid value -1 after it is
zeroed by softc allocation, to stop the ip_output() path looking up the
incorrect source address thinking a vif is set.

PR:		kern/100532
Submitted by:	Bohus Plucinsky
MFC after:	1 week
2006-09-25 11:53:54 +00:00
sam
2350e92037 Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by:	arch@
2006-07-09 06:04:01 +00:00
mlaier
f7e47bf374 Make in-kernel multicast protocols for pfsync and carp work after enabling
dynamic resizing of multicast membership array.

Reported and testing by:	Maxim Konovalov, Scott Ullrich
Reminded by:			thompsa
MFC after:			2 weeks
2006-07-08 00:01:01 +00:00
csjp
2c4f67981e Fix the following bpf(4) race condition which can result in a panic:
(1) bpf peer attaches to interface netif0
	(2) Packet is received by netif0
	(3) ifp->if_bpf pointer is checked and handed off to bpf
	(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
	    initialized to NULL.
	(5) ifp->if_bpf is dereferenced by bpf machinery
	(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
  bpf interface structure. Once this is done, ifp->if_bpf should never be
  NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
  a lockless read bpf peer list associated with the interface. It should
  be noted that the bpf code will pickup the bpf_interface lock before adding
  or removing bpf peers. This should serialize the access to the bpf descriptor
  list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
  can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

	(1) Packet is received by netif0
	(2) Check to see if bpf descriptor list is empty
	(3) Pickup the bpf interface lock
	(4) Hand packet off to process

From the attach/detach side:

	(1) Pickup the bpf interface lock
	(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
    not we have any bpf peers that might be interested in receiving packets.

In collaboration with:	sam@
MFC after:	1 month
2006-06-02 19:59:33 +00:00
glebius
aca7253de4 o Introduce carp_multicast_cleanup(), which removes and frees
multicast addresses from carp interface. [1]
o Rewrite carpdetach(), so that it does the following things: [1]
  - Stops callouts.
  - Decrements carp_suppress_preempt, if needed.
  - Downs interface and sets CARP state to INIT.
  - Calls carp_multicast_cleanup().
  - Detaches softc from carp_if and if we are the last frees
    the carp_if.
o Use new carpdetach() in carp_clone_destroy().
o In carp_ifdetach() acquire the carp_if lock and cleanup all
  interfaces hanging on carp_if. [1]
o Make carp_ifdetach() static and use EVENT(9) to call it
  from if_detach(). [2]
o In carp_setrun() exit if the softc doesn't have a valid pointer
  to parent. [1]

Obtained from:	OpenBSD [1]
Submitted by:	Dan Lukes <dan obluda.cz> [2]
PR:		kern/82908 [2]
2006-03-21 14:29:48 +00:00
glebius
be367893cd MFOpenBSD 1.62:
Prevent backup CARP hosts from replying to arp requests, fixes strangeness
  with some layer-3 switches. From Bill Marquette.

Tested by:	Kazuaki Oda <kaakun highway.ne.jp>
2005-11-17 12:56:40 +00:00
ru
815788042c Unbreak for !INET6 case. 2005-11-14 12:50:23 +00:00
ru
f70f525b49 - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
  through ifp anyway.  IF_LLADDR() works faster, and all (except
  one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
  and drop the IFP2ENADDR() macro; all users have been converted
  to use IF_LLADDR() instead.
2005-11-11 16:04:59 +00:00
thompsa
48c0bcb5c2 Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by:	brooks
Reviewed by:	brooks
2005-11-08 20:08:34 +00:00
yar
54f0eb5f04 Since carp(4) interfaces presently are kinda fake yet possess
IP addresses, mark them with LOOPBACK so that routing daemons
take them easy for link-state routing protocols.

Reviewed by:	glebius
2005-10-26 05:57:35 +00:00
mlaier
25376bdec2 Fix build after in6_joingroup change. It remains unclear if DAD breaks CARP
or not.
2005-10-22 14:54:02 +00:00
thompsa
d6130a4703 Change the reference counting to count the number of cloned interfaces for each
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.

Assert that all cloners have been destroyed when freeing the memory.

Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.

Discussed with:	brooks, pjd, -current
Reviewed by:	brooks
2005-10-12 19:52:16 +00:00
glebius
044310140d When a carp(4) interface is being destroyed and is in a promiscous mode,
first interface is detached from parent and then bpfdetach() is called.
If the interface was the last carp(4) interface attached to parent, then
the mutex on parent is destroyed. When bpfdetach() calls if_setflags()
we panic on destroyed mutex.

To prevent the above scenario, clear pointer to parent, when we detach
ourselves from parent.
2005-09-09 08:41:39 +00:00
rwatson
5d770a09e8 Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags.  Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags.  This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by:	pjd, bz
MFC after:	7 days
2005-08-09 10:20:02 +00:00
ume
817761fa0f include netinet6/scope6_var.h. 2005-07-25 12:36:43 +00:00
ume
da2cf62b28 scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
  scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
  scoped addresses as a special case.
- scope boundary check will be stricter.  For example, the current
  *BSD code allows a packet with src=::1 and dst=(some global IPv6
  address) to be sent outside of the node, if the application do:
    s = socket(AF_INET6);
    bind(s, "::1");
    sendto(s, some_global_IPv6_addr);
  This is clearly wrong, since ::1 is only meaningful within a single
  node, but the current implementation of the *BSD kernel cannot
  reject this attempt.

Submitted by:	JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from:	KAME
2005-07-25 12:31:43 +00:00
glebius
c6a8611901 When doing ARP load balancing source IP is taken in network byte order,
so residue of division for all hosts on net is the same, and thus only
one VHID answers. Change source IP in host byte order.

Reviewed by:	mlaier
Approved by:	re (scottl)
2005-07-01 08:22:13 +00:00
dwmalone
f1f0123e88 Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

        1) Consistently use type u_int32_t for the header of a
           DLT_NULL device - it continues to represent the address
           family as always.
        2) In the DLT_NULL case get bpf_movein to store the u_int32_t
           in a sockaddr rather than in the mbuf, to be consistent
           with all the DLT types.
        3) Consequently fix a bug in bpf_movein/bpfwrite which
           only permitted packets up to 4 bytes less than the MTU
           to be written.
        4) Fix all DLT_NULL devices to have the code required to
           allow writing to their bpf devices.
        5) Move the code to allow writing to if_lo from if_simloop
           to looutput, because it only applies to DLT_NULL devices
           but was being applied to other devices that use if_simloop
           possibly incorrectly.

PR:		82157
Submitted by:	Matthew Luckie <mjl@luckie.org.nz>
Approved by:	re (scottl)
2005-06-26 18:11:11 +00:00
brooks
567ba9b00a Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
 - Struct arpcom is no longer referenced in normal interface code.
   Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
   To enforce this ac_enaddr has been renamed to _ac_enaddr.
 - The second argument to ether_ifattach is now always the mac address
   from driver private storage rather than sometimes being ac_enaddr.

Reviewed by:	sobomax, sam
2005-06-10 16:49:24 +00:00
glebius
ed7b2b9937 - When carp interface is destroyed, and it affects global preemption
suppresion counter, decrease the latter. [1]
- Add sysctl to monitor preemption suppression.

PR:		kern/80972 [1]
Submitted by:	Frank Volf [1]
MFC after:	1 week
2005-05-15 01:44:26 +00:00
glebius
63fda197fc Remove anti-LOR bandaid, it is not needed now.
Sponsored by:	Rambler
2005-04-20 09:32:05 +00:00
glebius
20adbdefb7 When several carp interfaces are attached to Ethernet interface,
carp_carpdev_state_locked() is called every time carp interface is attached.
The first call backs up flags of the first interface, and the second
call backs up them again, erasing correct values.
  To solve this, a carp_sc_state_locked() function is introduced. It is
called when interface is attached to parent, instead of calling
carp_carpdev_state_locked. carp_carpdev_state_locked() calls
carp_sc_state_locked() for each sc in chain.

Reported by:	Yuriy N. Shkandybin, sem
2005-03-30 11:44:43 +00:00
glebius
285d285103 If vhid exists return more informative EEXIST instead of EINVAL. While here
remove redundant brackets.
2005-03-18 13:41:38 +00:00
glebius
e90a54cc90 Fix a potential crash that could occur when CARP_LOG is being used.
Obtained from:	OpenBSD (pat)
2005-03-18 13:18:34 +00:00
glebius
96ce85223e Fix typo. Unbreak build. Take pointy hat. 2005-03-02 09:11:18 +00:00
glebius
1f4abe7c9a Add more locking when reading/writing to carp softc. When carp softc is
attached to a parent interface we use its mutex to lock the softc. This
means that in several places like carp_ioctl() we lock softc conditionaly.
This should be redesigned.

To avoid LORs when MII announces us a link state change, we schedule
a quick callout and call carp_carpdev_state_locked() from it.

Initialize callouts using NET_CALLOUT_MPSAFE.

Sponsored by:	Rambler
Reviewed by:	mlaier
2005-03-01 13:14:33 +00:00
glebius
1f62a67ad7 - Add carp_mtx. Use it to protect list of all carp interfaces.
- In carp_send_ad_all() walk through list of all carp interfaces
  instead of walking through list of all interfaces.

Sponsored by:	Rambler
Reviewed by:	mlaier
2005-03-01 12:36:07 +00:00
glebius
ea3bf9bbdd Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by:	brooks
2005-03-01 10:59:14 +00:00
glebius
d86b1595dd Remove debugging printf.
Reviewed by:	mlaier
2005-03-01 09:31:36 +00:00
yar
d792578421 Support running carp(4) over a vlan(4) parent interface.
Encouraged by:	glebius
2005-02-28 16:19:11 +00:00
glebius
3bad63fb1d Remove unused field from carp softc.
OK'ed by:	mcbride@OpenBSD
2005-02-28 11:57:03 +00:00
glebius
b190483209 Fix tcpdump(8) on carp(4) interface:
- Use our loop DLT type, not OpenBSD. [1]
- The fields that are converted to network byte order are not 32-bit
  fields but 16-bit fields, so htons should be used in htonl. [1]
- Secondly, ip_input changes ip->ip_len into its value without
  the ip-header length. So, restore the length to make bpf happy. [1]
- Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't
  understand uint8_t af identifier.

Submitted by:	Frank Volf [1]
2005-02-28 11:54:36 +00:00