r201282
-------
The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.
r201543
-------
The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.
PR: ports/141342, kern/141134
Change vlan interfaces to cope more usefully with the parent interface being
renamed. Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed. Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
renamed. This flag can be checked in ifnet departure/arrival event
handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
ignore departure events due to a trunk interface being renamed.
Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.
Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.
We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.
Discussed with: julian, zec, jamie, rwatson (back in Sept)
The flow-table function flowtable_route_flush() may be called
during system initialization time. Since the flow-table is
designed to maintain per CPU flow cache, the existing code
did not check whether "smp_started" is true before calling
sched_bind() and sched_unbind(), which triggers a page fault.
Reviewed by: jeff
Approved by: re
The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.
Reviewed by: bz, kmacy
Approved by: re
A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.
Submitted by: noted by bz
Reviewed by: hrs
Approved by: re (kib)
Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.
Reviewed by: bz
Approved by: re
attached to the bridge, rather than just in the case
when some device cannot do TSO. Customer tests have
shown that even when all devices can do TSO that LRO
will cause problems when bridging.
Approved by: re
This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.
Reviewed by: bz
Approved by: re
Introduce a separate sx lock for protecting lists of vnet sysinit
and sysuninit handlers.
Previously, sx_vnet, which is a lock designated for protecting
the vnet list, was (ab)used for protecting vnet sysinit / sysuninit
handler lists as well. Holding exclusively the sx_vnet lock while
invoking sysinit and / or sysuninit handlers turned out to be
problematic, since some of the handlers may attempt to wake up
another thread and wait for it to walk over the vnet list, hence
acquire a shared lock on sx_vnet, which in turn leads to a deadlock.
Protecting vnet sysinit / sysuninit lists with a separate lock
mitigates this issue, which was first observed with
flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c.
Reviewed by: rwatson, jhb
MFC after: 3 days
Approved by: re (rwatson)
In ip_output(), the flow-table module must not try to cache L2/L3
information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type.
Since the L2 information (rt_lle) is invalid for these interface
types, accidental caching attempt will trigger panic when the invalid
rt_lle reference is accessed.
When installing a new route, or when updating an existing route, the
user supplied gateway address may be an interface address (this is
particularly true for point-to-point interface related modules such
as ppp, if_tun, if_gif). Currently the routing command handler always
set the RTF_GATEWAY flag if the gateway address is given as part of the
command paramters. Therefore the gateway address must be verified against
interface addresses or else the route would be treated as an indirect
route, thus making that route unusable.
Reviewed by: kmacy, julian, rwatson
Approved by: re
Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.
Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().
Reviewed by: bz
Approved by: re (kib)
Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.
Reviewed by: bz
Approved by: re (kib)
Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy, qingli
Approved by: re (kib)
Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.
Approved by: re (kib)
Rather than using IFNET_RLOCK() when iterating over (and modifying) the
ifnet list during if_ef load, directly acquire the ifnet_sxlock
exclusively. That way when if_alloc() recurses the lock, it's a write
recursion rather than a read->write recursion.
This code structure is arguably a bug, so add a comment indicating that
this is the case. Post-8.0, we should fix this, but this commit
resolves panic-on-load for if_ef.
Discussed with: bz, julian
Reported by: phk
Approved by: re (kib)
Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
Approved by: re (kib)
When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.
This change affects only options VIMAGE builds.
Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.
Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.
Approved by: re@
- change the interface to flowtable_lookup so that we don't rely on
the mbuf for obtaining the fib index
- check that a cached flow corresponds to the same fib index as the
packet for which we are doing the lookup
- at interface detach time flush any flows referencing stale rtentrys
associated with the interface that is going away (fixes reported
panics)
- reduce the time between cleans in case the cleaner is running at
the time the eventhandler is called and the wakeup is missed less
time will elapse before the eventhandler returns
- separate per-vnet initialization from global initialization
(pointed out by jeli@)
Reviewed by: sam@
Approved by: re@
Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from
INVARIANTS.
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)
Approved by: re (rwatson)
Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.
Reviewed by: rwatson, zec
Approved by: re (kib)
Put multiple instructions into a block when iterating; unbreaks
NET_RT_DUMP, which otherwise only returned information of AF_MAX.
This was broken in r193232 (save your time - my bug, my fix).
Reported by: Larry Baird (lab gta.com)
Tested by: Larry Baird (lab gta.com)
Reviewed by: zec, lstewart, qing
PR: kern/137700
Approved by: re (kib)
Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.
Reviewed by: rwatson
Approved by: re (rwatson)
Put minimum alignment on the dpcpu and vnet section so that ld
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.
These sections will not support symbols with super-cache line alignment
requirements.
For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.
Debugging and testing patches by:
Kamigishi Rei (spambox haruhiism.net),
np, lstewart, jhb, kib, rwatson
Tested by: Kamigishi Rei, lstewart
Reviewed by: kib
Approved by: re
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.
Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.
The following modules are affected by this change:
if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf
In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.
Reviewed by: bz
Approved by: re (kib)
vnet.c to iterate virtual network stacks without being aware of
the implementation details previously hidden in kern_vimage.c.
Now they are in the same file, so remove this added complexity.
Reviewed by: bz
Approved by: re (vimage blanket)
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.
Reviewed by: bz
Approved by: re (vimage blanket)