usable again with options VIMAGE kernels.
Submitted by: bz (the original version, probably identical to this one)
Reviewed by: many @ DevSummit Cambridge
MFC after: 3 days
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.
Reviewed by: bz
MFC after: immediately
has not worked since the arp-v2 rewrite.
The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.
Reviewed by: gnn, kmacy
Approved by: gnn (mentor)
MFC after: 1 month
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.
Reviewed by: bz
MFC after: immediately
New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen
The new statistics are seen in the netstat command
when it is given the -s command line switch.
MFC after: 2 weeks
In collaboration with: bz
and sysuninit handlers.
Previously, sx_vnet, which is a lock designated for protecting
the vnet list, was (ab)used for protecting vnet sysinit / sysuninit
handler lists as well. Holding exclusively the sx_vnet lock while
invoking sysinit and / or sysuninit handlers turned out to be
problematic, since some of the handlers may attempt to wake up
another thread and wait for it to walk over the vnet list, hence
acquire a shared lock on sx_vnet, which in turn leads to a deadlock.
Protecting vnet sysinit / sysuninit lists with a separate lock
mitigates this issue, which was first observed with
flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c.
Reviewed by: rwatson, jhb
MFC after: 3 days
information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type.
Since the L2 information (rt_lle) is invalid for these interface
types, accidental caching attempt will trigger panic when the invalid
rt_lle reference is accessed.
When installing a new route, or when updating an existing route, the
user supplied gateway address may be an interface address (this is
particularly true for point-to-point interface related modules such
as ppp, if_tun, if_gif). Currently the routing command handler always
set the RTF_GATEWAY flag if the gateway address is given as part of the
command paramters. Therefore the gateway address must be verified against
interface addresses or else the route would be treated as an indirect
route, thus making that route unusable.
Reviewed by: kmacy, julia, rwatson
Verified by: marcus
MFC after: 3 days
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.
Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().
Reviewed by: bz
MFC after: 3 days
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.
Reviewed by: bz
MFC after: 3 days
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy
MFC after: 3 days
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.
This change affects only options VIMAGE builds.
Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
ifnet list during if_ef load, directly acquire the ifnet_sxlock
exclusively. That way when if_alloc() recurses the lock, it's a write
recursion rather than a read->write recursion.
This code structure is arguably a bug, so add a comment indicating that
this is the case. Post-8.0, we should fix this, but this commit
resolves panic-on-load for if_ef.
Discussed with: bz, julian
Reported by: phk
MFC after: 3 days
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
MFC after: 3 days
Specifically, not until the per-vnet parts have been set up.
Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
MFC after: immediately
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.
Approved by: re@
the mbuf for obtaining the fib index
- check that a cached flow corresponds to the same fib index as the
packet for which we are doing the lookup
- at interface detach time flush any flows referencing stale rtentrys
associated with the interface that is going away (fixes reported
panics)
- reduce the time between cleans in case the cleaner is running at
the time the eventhandler is called and the wakeup is missed less
time will elapse before the eventhandler returns
- separate per-vnet initialization from global initialization
(pointed out by jeli@)
Reviewed by: sam@
Approved by: re@
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.
Reviewed by: rwatson, zec
Approved by: re (kib)
NET_RT_DUMP, which otherwise only returned information of AF_MAX.
This was broken in r193232 (save your time - my bug, my fix).
PR: kern/137700
Reported by: Larry Baird (lab gta.com)
Tested by: Larry Baird (lab gta.com)
Reviewed by: zec, lstewart, qing
Approved by: re (kib)
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.
These sections will not support symbols with super-cache line alignment
requirements.
For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.
Debugging and testing patches by:
Kamigishi Rei (spambox haruhiism.net),
np, lstewart, jhb, kib, rwatson
Tested by: Kamigishi Rei, lstewart
Reviewed by: kib
Approved by: re
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.
Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.
The following modules are affected by this change:
if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf
In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.
Reviewed by: bz
Approved by: re (kib)
vnet.c to iterate virtual network stacks without being aware of
the implementation details previously hidden in kern_vimage.c.
Now they are in the same file, so remove this added complexity.
Reviewed by: bz
Approved by: re (vimage blanket)
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.
Reviewed by: bz
Approved by: re (vimage blanket)
L2 information. For an indirect route the cached L2 entry contains the
MAC address of the gateway. Typically the default route is used to
transmit multicast packets when explicit multicast routes are not
available. The ether_output() function bypasses L2 resolution function
if it verifies the L2 cache is valid, because the cached L2 address
(a unicast MAC address) is copied into the packets as the destination
MAC address. This validation, however, does not apply to broadcast and
multicast packets because the destination MAC address is mapped
according to a standard method instead.
Submitted by: Xin Li
Reviewed by: bz
Approved by: re
- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.
Reviewed by: bz
Approved by: re
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.
Hook epair(4) up to the build.
Approved by: re (kib)
(ifconfig ifN (-)vnet <jname|jid>) work correctly.
Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.
In the reclaim case, correctly set the vnet before calling if_vmove.
Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)
There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.
Suggested by: rwatson (*)
Approved by: re (kib)
network stacks, VNET_SYSINIT:
- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.
Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)
non-vrtiualized sysctls so we cannot used one common function.
Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.
Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.
Convert the two over places to use the new macro.
Reviewed by: rwatson
Approved by: re (kib)
nor destructors, as there's no actual work to do.
In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.
Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.
Reviewed by: bz
Approved by: re (vimage blanket)
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.
Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.
Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.
Update various consumers of these KPIs based on whether they may sleep
or not.
Reviewed by: bz
Approved by: re (kib)