1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,
The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.
Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:
- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.
For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.
Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.
This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.
The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html
Summary
* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.
This work was financially supported by another FreeBSD committer.
Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)
multicast memberships, when interface is detached. Thus, when
an underlying interface is detached, we do not need to free
our multicast memberships.
Reviewed by: bms
carp_clone_destroy() we are on a safe side, we don't need to
unlock the cif, that can me already non-existent at this point.
Reported by: Anton Yuzhaninov <citrin rambler-co.ru>
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
in ip6_output. In case this fails handle the error directly and log it[1].
In addition permit CARP over v6 in ip_fw2.
PR: kern/98622
Similar patch by: suz
Discussed with: glebius [1]
Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr
MFC after: 3 days
the VRRPv2 advertisements will originate from the wrong source address.
This only affects kernels compiled with MROUTING and after the MRT_INIT
ioctl() has been issued.
Set imo_multicast_vif in carp's softc to the invalid value -1 after it is
zeroed by softc allocation, to stop the ip_output() path looking up the
incorrect source address thinking a vif is set.
PR: kern/100532
Submitted by: Bohus Plucinsky
MFC after: 1 week
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)
Reviewed by: arch@
(1) bpf peer attaches to interface netif0
(2) Packet is received by netif0
(3) ifp->if_bpf pointer is checked and handed off to bpf
(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
initialized to NULL.
(5) ifp->if_bpf is dereferenced by bpf machinery
(6) Kaboom
This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.
Summary of changes:
- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
bpf interface structure. Once this is done, ifp->if_bpf should never be
NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
a lockless read bpf peer list associated with the interface. It should
be noted that the bpf code will pickup the bpf_interface lock before adding
or removing bpf peers. This should serialize the access to the bpf descriptor
list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present
Now what happens is:
(1) Packet is received by netif0
(2) Check to see if bpf descriptor list is empty
(3) Pickup the bpf interface lock
(4) Hand packet off to process
From the attach/detach side:
(1) Pickup the bpf interface lock
(2) Add/remove from bpf descriptor list
Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).
[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
not we have any bpf peers that might be interested in receiving packets.
In collaboration with: sam@
MFC after: 1 month
multicast addresses from carp interface. [1]
o Rewrite carpdetach(), so that it does the following things: [1]
- Stops callouts.
- Decrements carp_suppress_preempt, if needed.
- Downs interface and sets CARP state to INIT.
- Calls carp_multicast_cleanup().
- Detaches softc from carp_if and if we are the last frees
the carp_if.
o Use new carpdetach() in carp_clone_destroy().
o In carp_ifdetach() acquire the carp_if lock and cleanup all
interfaces hanging on carp_if. [1]
o Make carp_ifdetach() static and use EVENT(9) to call it
from if_detach(). [2]
o In carp_setrun() exit if the softc doesn't have a valid pointer
to parent. [1]
Obtained from: OpenBSD [1]
Submitted by: Dan Lukes <dan obluda.cz> [2]
PR: kern/82908 [2]
Prevent backup CARP hosts from replying to arp requests, fixes strangeness
with some layer-3 switches. From Bill Marquette.
Tested by: Kazuaki Oda <kaakun highway.ne.jp>
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.
- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
softc lists and associated mutex are now unused so these have been removed.
Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.
Idea by: brooks
Reviewed by: brooks
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.
Assert that all cloners have been destroyed when freeing the memory.
Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.
Discussed with: brooks, pjd, -current
Reviewed by: brooks
first interface is detached from parent and then bpfdetach() is called.
If the interface was the last carp(4) interface attached to parent, then
the mutex on parent is destroyed. When bpfdetach() calls if_setflags()
we panic on destroyed mutex.
To prevent the above scenario, clear pointer to parent, when we detach
ourselves from parent.
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.
Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.
Reviewed by: pjd, bz
MFC after: 7 days
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME
so residue of division for all hosts on net is the same, and thus only
one VHID answers. Change source IP in host byte order.
Reviewed by: mlaier
Approved by: re (scottl)
a DLT_NULL interface. In particular:
1) Consistently use type u_int32_t for the header of a
DLT_NULL device - it continues to represent the address
family as always.
2) In the DLT_NULL case get bpf_movein to store the u_int32_t
in a sockaddr rather than in the mbuf, to be consistent
with all the DLT types.
3) Consequently fix a bug in bpf_movein/bpfwrite which
only permitted packets up to 4 bytes less than the MTU
to be written.
4) Fix all DLT_NULL devices to have the code required to
allow writing to their bpf devices.
5) Move the code to allow writing to if_lo from if_simloop
to looutput, because it only applies to DLT_NULL devices
but was being applied to other devices that use if_simloop
possibly incorrectly.
PR: 82157
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (scottl)
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.
This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.
Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.
Reviewed by: sobomax, sam
carp_carpdev_state_locked() is called every time carp interface is attached.
The first call backs up flags of the first interface, and the second
call backs up them again, erasing correct values.
To solve this, a carp_sc_state_locked() function is introduced. It is
called when interface is attached to parent, instead of calling
carp_carpdev_state_locked. carp_carpdev_state_locked() calls
carp_sc_state_locked() for each sc in chain.
Reported by: Yuriy N. Shkandybin, sem
attached to a parent interface we use its mutex to lock the softc. This
means that in several places like carp_ioctl() we lock softc conditionaly.
This should be redesigned.
To avoid LORs when MII announces us a link state change, we schedule
a quick callout and call carp_carpdev_state_locked() from it.
Initialize callouts using NET_CALLOUT_MPSAFE.
Sponsored by: Rambler
Reviewed by: mlaier
- In carp_send_ad_all() walk through list of all carp interfaces
instead of walking through list of all interfaces.
Sponsored by: Rambler
Reviewed by: mlaier
- Use our loop DLT type, not OpenBSD. [1]
- The fields that are converted to network byte order are not 32-bit
fields but 16-bit fields, so htons should be used in htonl. [1]
- Secondly, ip_input changes ip->ip_len into its value without
the ip-header length. So, restore the length to make bpf happy. [1]
- Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't
understand uint8_t af identifier.
Submitted by: Frank Volf [1]