Commit Graph

2549 Commits

Author SHA1 Message Date
Michael Tuexen
cf19fced17 MFC 197288,197326,197327,197328,197342,197914,197929,
197955,199365,199370,199371,199373,199866
This MFCs all SCTP/VNET relevant fixes from head.

Approved by: rrs (mentor)
2009-12-07 07:33:51 +00:00
Hajimu UMEMOTO
0e78091638 MFC r197286, r197306:
V_irtualize the lltables list, making ARP and ND reasonably
usable again with options VIMAGE kernels.

Discussed with:	hrs
2009-11-17 16:11:53 +00:00
Qing Li
54b7653bd5 MFC r198353
Verify "smp_started" is true before calling
sched_bind() and sched_unbind().

Reviewed by:	kmacy
2009-10-28 21:43:16 +00:00
Qing Li
dbd4bfe317 MFC 198306
The flow-table function flowtable_route_flush() may be called
during system initialization time. Since the flow-table is
designed to maintain per CPU flow cache, the existing code
did not check whether "smp_started" is true before calling
sched_bind() and sched_unbind(), which triggers a page fault.

Reviewed by:	jeff
Approved by:	re
2009-10-22 18:48:25 +00:00
Robert Watson
cbdd92bda4 Merge r198233 from head to stable/8:
Clean up comments, white space, and style in pfil.c (VNET changes not
  MFC'd)

Approved by:	re (kib)
2009-10-21 14:05:51 +00:00
Robert Watson
5e2ef9933c Merge r198198 from head to stable/8:
Line-wrap pfil.c so that it prints more nicely.

Approved by:	re (kensmith)
2009-10-21 13:11:38 +00:00
Robert Watson
304762b138 Merge r198219 from head to stable/8:
Remove unused pfil_flags field in packet_filter_hook.

Approved by:	re (kib)
2009-10-21 09:53:55 +00:00
Robert Watson
752b1b6971 Merge r198218 from head to stable/8:
Sort function prototypes in pfil.h, clean up white space, and better
  align fields for printing.

Approved by:	re (kensmith)
2009-10-20 18:54:51 +00:00
Bjoern A. Zeeb
67f0b21fa6 MFC r197727:
Put #ifdef INET around parts of the FLOWTABLE code, to unbreak
  nooptions INET kernel builds.

Approved by:	re (kib)
2009-10-08 20:58:09 +00:00
Qing Li
e85f0cc52d MFC r197687
The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.

Reviewed by:	bz, kmacy
Approved by:	re
2009-10-06 18:47:02 +00:00
Qing Li
8cb7f8f861 MFC r197364
A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.

Submitted by:	noted by bz
Reviewed by:	hrs
Approved by:	re (kib)
2009-09-20 17:46:56 +00:00
Qing Li
553a7dec4b MFC r197227
Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by:	bz
Approved by:	re
2009-09-15 22:46:06 +00:00
Jack F Vogel
6a89c3ede1 Make LRO turned off uncategorically for devices
attached to the bridge, rather than just in the case
when some device cannot do TSO. Customer tests have
shown that even when all devices can do TSO that LRO
will cause problems when bridging.

Approved by:  re
2009-09-08 23:25:39 +00:00
Qing Li
69406c1632 MFC r196871
The addresses that are assigned to the loopback interface
should be part of the kernel routing table.

Reviewed by:	bz
Approved by:	re
2009-09-05 20:35:18 +00:00
Qing Li
3d2a8d364d MFC r196864
This patch fixes the following issues:
- Interface link-local address is not reachable within the
  node that owns the interface, this is due to the mismatch
  in address scope as the result of the installed interface
  address loopback route. Therefore for each interface
  address loopback route, the rt_gateway field (of AF_LINK
  type) will be used to track which interface a given
  address belongs to. This will aid the address source to
  use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
  the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
  only for validation reason. Doing so will eliminate as much
  of the special case (loopback addresses) handling code
  as possible, however, these empty nd6 entries should not
  be returned to the userland applications such as the
  "ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by:	bz
Approved by:	re
2009-09-05 17:40:27 +00:00
Marko Zec
e9cedda843 MFC r196633:
Introduce a separate sx lock for protecting lists of vnet sysinit
  and sysuninit handlers.

  Previously, sx_vnet, which is a lock designated for protecting
  the vnet list, was (ab)used for protecting vnet sysinit / sysuninit
  handler lists as well.  Holding exclusively the sx_vnet lock while
  invoking sysinit and / or sysuninit handlers turned out to be
  problematic, since some of the handlers may attempt to wake up
  another thread and wait for it to walk over the vnet list, hence
  acquire a shared lock on sx_vnet, which in turn leads to a deadlock.
  Protecting vnet sysinit / sysuninit lists with a separate lock
  mitigates this issue, which was first observed with
  flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c.

  Reviewed by:  rwatson, jhb
  MFC after:    3 days

Approved by:	re (rwatson)
2009-08-31 09:44:07 +00:00
Qing Li
c7276c59ff As part of r196609, a call to "rtalloc" did not take the fib into
account. So call the appropriate "rtalloc_ign_fib()" instead of
calling "rtalloc_ign()".

Reviewed by:	pointed out by bz
Approved by:	re
2009-08-31 00:18:17 +00:00
Qing Li
ba3ae75b3c MFC r196609
In ip_output(), the flow-table module must not try to cache L2/L3
information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type.
Since the L2 information (rt_lle) is invalid for these interface
types, accidental caching attempt will trigger panic when the invalid
rt_lle reference is accessed.

When installing a new route, or when updating an existing route, the
user supplied gateway address may be an interface address (this is
particularly true for point-to-point interface related modules such
as ppp, if_tun, if_gif). Currently the routing command handler always
set the RTF_GATEWAY flag if the gateway address is given as part of the
command paramters. Therefore the gateway address must be verified against
interface addresses or else the route would be treated as an indirect
route, thus making that route unusable.

Reviewed by:	kmacy, julian, rwatson
Approved by:	re
2009-08-30 22:42:32 +00:00
Robert Watson
d6f7f21cac Merge r196559 from head to stable/8:
Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
  which allows an index to be reserved for an ifnet without making
  the ifnet available for management operations.  Use this in if_alloc()
  while the ifnet lock is released between initial index allocation and
  completion of ifnet initialization.

  Add ifindex_free() to centralize the implementation of releasing an
  ifindex value.  Use in if_free() and if_vmove(), as well as when
  releasing a held index in if_alloc().

  Reviewed by:  bz

Approved by:	re (kib)
2009-08-28 21:14:04 +00:00
Robert Watson
57d231bba6 Merge r196553 from head to stable/8:
Break out allocation of new ifindex values from if_alloc() and if_vmove(),
  and centralize in a single function ifindex_alloc().  Assert the
  IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc().  This does not
  close all known races in this code.

  Reviewed by:  bz

Approved by:	re (kib)
2009-08-28 21:12:38 +00:00
Robert Watson
a0021692f2 Merge r196535 from head to stable/8:
Use locks specific to the lltable code, rather than borrow the ifnet
  list/index locks, to protect link layer address tables.  This avoids
  lock order issues during interface teardown, but maintains the bug that
  sysctl copy routines may be called while a non-sleepable lock is held.

  Reviewed by:  bz, kmacy, qingli

Approved by:	re (kib)
2009-08-28 21:10:26 +00:00
Robert Watson
b569420afa Merge r196510 from head to stable/8:
Make if_grow static -- it's not used outside of if.c, and with the
  internals destined to change, it's better if it remains that way.

Approved by:	re (kib)
2009-08-28 21:07:43 +00:00
Robert Watson
1b257b0e92 Merge r196482 from head to stable/8:
Rather than using IFNET_RLOCK() when iterating over (and modifying) the
  ifnet list during if_ef load, directly acquire the ifnet_sxlock
  exclusively.  That way when if_alloc() recurses the lock, it's a write
  recursion rather than a read->write recursion.

  This code structure is arguably a bug, so add a comment indicating that
  this is the case.  Post-8.0, we should fix this, but this commit
  resolves panic-on-load for if_ef.

  Discussed with:       bz, julian
  Reported by:  phk

Approved by:	re (kib)
2009-08-28 20:07:38 +00:00
Robert Watson
3ef94f2b72 Merge r196481 from head to stable/8:
Rework global locks for interface list and index management, correcting
  several critical bugs, including race conditions and lock order issues:

  Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
  sxlock.  Either can be held to stablize the lists and indexes, but both
  are required to write.  This allows the list to be held stable in both
  network interrupt contexts and sleepable user threads across sleeping
  memory allocations or device driver interactions.  As before, writes to
  the interface list must occur from sleepable contexts.

  Reviewed by:  bz, julian

Approved by:	re (kib)
2009-08-28 20:06:02 +00:00
Marko Zec
d6976e0558 MFC r196504:
When moving ifnets from one vnet to another, and the ifnet
  has ifaddresses of AF_LINK type which thus have an embedded
  if_index "backpointer", we must update that if_index backpointer
  to reflect the new if_index that our ifnet just got assigned.

  This change affects only options VIMAGE builds.

  Submitted by: bz
  Reviewed by:  bz
  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-28 19:18:20 +00:00
Julian Elischer
1261248008 MFC r196419:
Don't allow access to the internals until it has all been set up.
  Specifically, not until the per-vnet parts have been set up.

Submitted by:	kmacy@
Reviewed by:	julian@, zec@
Approved by:	re(rwatson)
2009-08-21 10:05:26 +00:00
Robert Watson
aeba9e80ff Merge r196263 from head to stable/8:
Remove unused if_rawoutput() macro; it has been unused since at least
  FreeBSD 2.

Approved by:	re (kib)
2009-08-20 21:14:52 +00:00
Kip Macy
786f829ddb This change fixes a comment and addresses a complaint by kib@ by
moving a frequently executed flowtable syslog statement from being
 conditional on bootverbose to conditional on a per-vnet flowtable
 sysctl.

Approved by:	re@
2009-08-19 20:17:36 +00:00
Kip Macy
670151d0e4 MFC 196368
- change the interface to flowtable_lookup so that we don't rely on
    the mbuf for obtaining the fib index
  - check that a cached flow corresponds to the same fib index as the
    packet for which we are doing the lookup
  - at interface detach time flush any flows referencing stale rtentrys
    associated with the interface that is going away (fixes reported
    panics)
  - reduce the time between cleans in case the cleaner is running at
    the time the eventhandler is called and the wakeup is missed less
    time will elapse before the eventhandler returns
  - separate per-vnet initialization from global initialization
    (pointed out by jeli@)

Reviewed by:	sam@
Approved by:	re@
2009-08-18 20:39:35 +00:00
Kip Macy
bf4e402b83 fix netboot issue by disabling flowtable lookups until initialization has been run
+ mergeinfo garbage

Reviewed by:	rwatson@
Approved by:	re@
2009-08-17 20:06:00 +00:00
Marko Zec
d326ff4914 MFC r196230:
Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
  recurse from one vnet to another which is OK, so no need
  to flood the console with warnings here.

  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-14 23:05:10 +00:00
Marko Zec
857e561523 MFC r196228:
Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from
  INVARIANTS.

  Reviewed by:  bz
  Approved by:  re (rwatson), julian (mentor)

Approved by:	re (rwatson)
2009-08-14 22:55:54 +00:00
Bjoern A. Zeeb
da2a30fca1 MFC r196176:
Make it possible to change the vnet sysctl variables on jails
  with their own virtual network stack. Jails only inheriting a
  network stack cannot change anything that cannot be changed from
  within a prison.

  Reviewed by:  rwatson, zec

Approved by:	re (kib)
2009-08-13 10:31:02 +00:00
Bjoern A. Zeeb
9b36fbe976 MFC r196174:
Put multiple instructions into a block when iterating; unbreaks
  NET_RT_DUMP, which otherwise only returned information of AF_MAX.
  This was broken in r193232 (save your time - my bug, my fix).

  Reported by:  Larry Baird (lab gta.com)
  Tested by:    Larry Baird (lab gta.com)
  Reviewed by:  zec, lstewart, qing

PR:		kern/137700
Approved by:	re (kib)
2009-08-13 09:32:15 +00:00
Jung-uk Kim
3eced0dd19 MFC: r196150
Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.

Reviewed by:	rwatson
Approved by:	re (rwatson)
2009-08-12 17:45:55 +00:00
Bjoern A. Zeeb
3baf84f587 MFC r196129:
Update DDB show vnet command to print all used and available information.

  Reviewed by:  rwatson, zec

Approved by:	re
2009-08-12 12:05:07 +00:00
Bjoern A. Zeeb
28a2b3dd57 MFC r196118:
Put minimum alignment on the dpcpu and vnet section so that ld
  when adding the __start_ symbol knows the expected section alignment
  and can place the __start_ symbol correctly.

  These sections will not support symbols with super-cache line alignment
  requirements.

  For full details, see posting to freebsd-current, 2009-08-10,
  Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.

  Debugging and testing patches by:
                Kamigishi Rei (spambox haruhiism.net),
                np, lstewart, jhb, kib, rwatson
  Tested by:    Kamigishi Rei, lstewart
  Reviewed by:  kib

Approved by:	re
2009-08-12 10:32:20 +00:00
Robert Watson
315e3e38fa Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem.  These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored.  This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

      if_bridge
      if_cxgb
      if_gif
      ip_mroute
      ipdivert
      pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack.  However, that change is too agressive for
this point in the release cycle.

Reviewed by:	bz
Approved by:	re (kib)
2009-08-02 19:43:32 +00:00
Robert Watson
6aad5c1c93 The colour was red as shall be the letters of this warning to people upon
boot if the experimental VIMAGE feature was compiled into the kernel.

Submitted by:	bz
Reviewed by:	zec
Approved by:	re (vimage blanket)
2009-08-01 22:22:45 +00:00
Robert Watson
c8f6a13820 Minor style tweaks.
Approved by:	re (vimage blanket)
2009-08-01 21:58:32 +00:00
Robert Watson
6bc2c7b70c Make the vnet alloc/destroy paths a bit easier to followg by merging
vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy.

Reviewed by:	bz, zec
Approved by:	re (vimage blanket)
2009-08-01 21:54:15 +00:00
Robert Watson
7429a3f3d8 Remove vnet_foreach() utility function, which previously allowed
vnet.c to iterate virtual network stacks without being aware of
the implementation details previously hidden in kern_vimage.c.
Now they are in the same file, so remove this added complexity.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 20:24:45 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
ed3db012fc Reorder and recomment vnet.c and vnet.h on the basis that they are no longer
solely about the virtual network stack memory allocator.

Approved by:	re (vimage blanket)
2009-07-30 12:41:19 +00:00
Robert Watson
a9bcca799e Revise header comments for vnet.h as we now implement VNET_SYSINIT, not
just VNET_DEFINE in vnet.h.

Approved by:	re (vimage blanket)
2009-07-28 22:17:34 +00:00
Qing Li
9fca4f79c7 The new flow table caches both the routing table entry as well as the
L2 information. For an indirect route the cached L2 entry contains the
MAC address of the gateway. Typically the default route is used to
transmit multicast packets when explicit multicast routes are not
available. The ether_output() function bypasses L2 resolution function
if it verifies the L2 cache is valid, because the cached L2 address
(a unicast MAC address) is copied into the packets as the destination
MAC address. This validation, however, does not apply to broadcast and
multicast packets because the destination MAC address is mapped
according to a standard method instead.

Submitted by:	Xin Li
Reviewed by:	bz
Approved by:	re
2009-07-28 17:16:54 +00:00
Qing Li
df813b7ea2 This patch does the following:
- Allow loopback route to be installed for address assigned to
      interface of IFF_POINTOPOINT type.
    - Install loopback route for an IPv4 interface addreess when the
      "useloopback" sysctl variable is enabled. Similarly, install
      loopback route for an IPv6 interface address when the sysctl variable
      "nd6_useloopback" is enabled. Deleting loopback routes for interface
      addresses is unconditional in case these sysctl variables were
      disabled after an interface address has been assigned.

Reviewed by:	bz
Approved by:	re
2009-07-27 17:08:06 +00:00
Bjoern A. Zeeb
d0ea47437a Update epair(4) to the new netisr implementation and polish
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
  from going away, under INVARIANTS as this is a general problem
  of the stack and should be solved in if.c/netisr but still good
  to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.

Hook epair(4) up to the build.

Approved by:	re (kib)
2009-07-26 12:20:07 +00:00
Bjoern A. Zeeb
be31e5e7b5 Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls
(ifconfig ifN (-)vnet <jname|jid>) work correctly.

Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.

In the reclaim case, correctly set the vnet before calling if_vmove.

Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)

There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.

Suggested by:	rwatson (*)
Approved by:	re (kib)
2009-07-26 11:29:26 +00:00
Robert Watson
d0728d7174 Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00