Commit Graph

3429 Commits

Author SHA1 Message Date
Marko Zec
94e9f5a1c2 Remove unnecessary CURVNET_SET() calls where curvnet context is
(i.e. seems to be) already set.

This should reduce console noise due to curvnet recursion reports.

This change has no impact on nooptions VIMAGE builds.
Approved by:	julian (mentor)
2009-05-06 13:30:46 +00:00
Marko Zec
743da3bcdb Unbreak options VIMAGE kernel builds.
Approved by:	julian (mentor)
2009-05-06 08:49:39 +00:00
Marko Zec
21ca7b57bd Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
Marko Zec
5f416f8e84 Make indentation more uniform accross vnet container structs.
This is a purely cosmetic / NOP change.

Reviewed by:	bz
Approved by:	julian (mentor)
Verified by:	svn diff -x -w producing no output
2009-05-02 08:16:26 +00:00
Marko Zec
d7fcc52895 Unbreak options VIMAGE + nooptions INVARIANTS kernel builds.
Submitted by:	julian
Approved by:	julian (mentor)
2009-05-02 05:02:28 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Bruce M Simpson
33cde13046 Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev.  Summary of changes:

 * Connect netinet6/in6_mcast.c to build.
   The legacy KAME KPIs are mostly preserved.
 * Eliminate now dead code from ip6_output.c.
   Don't do mbuf bingo, we are not going to do RFC 2292 style
   CMSG tricks for multicast options as they are not required
   by any current IPv6 normative reference.
 * Refactor transports (UDP, raw_ip6) to do own mcast filtering.
   SCTP, TCP unaffected by this change.
 * Add ip6_msource, in6_msource structs to in6_var.h.
 * Hookup mld_ifinfo state to in6_ifextra, allocate from
   domifattach path.
 * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
   Kernel consumers which need this should use in6m_lookup().
 * Refactor IPv6 socket group memberships to use a vector (like IPv4).
 * Update ifmcstat(8) for IPv6 SSM.
 * Add witness lock order for IN6_MULTI_LOCK.
 * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
 * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
 * Update carp(4) for new IPv6 SSM KPIs.
 * Virtualize ip6_mrouter socket.
   Changes mostly localized to IPv6 MROUTING.
 * Don't do a local group lookup in MROUTING.
 * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
 * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
 * Bump __FreeBSD_version to 800084.
 * Update UPDATING.

NOTE WELL:
 * This code hasn't been tested against real MLDv2 queriers
   (yet), although the on-wire protocol has been verified in Wireshark.
 * There are a few unresolved issues in the socket layer APIs to
   do with scope ID propagation.
 * There is a LOR present in ip6_output()'s use of
   in6_setscope() which needs to be resolved. See comments in mld6.c.
   This is believed to be benign and can't be avoided for the moment
   without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
2009-04-29 19:19:13 +00:00
Bruce M Simpson
9efc1a1bbf Add MLDv2 prototypes and defines. 2009-04-29 10:20:17 +00:00
Bruce M Simpson
5cf93e5d2c Use KTR_INET for MROUTING CTRs. 2009-04-29 10:17:08 +00:00
Bruce M Simpson
c566b47669 Cut over to KTR_INET for CTR.
For clarity, put pointer incremement/size decrement on own line
when copying out in-mode source filters to userland.
2009-04-29 10:14:16 +00:00
Bruce M Simpson
1096332a4a Do not assume that ip6_moptions is always set, it is
a lazy-allocated structure.
2009-04-29 10:13:22 +00:00
Bruce M Simpson
31a3e65dc2 Fix a problem whereby enqueued IGMPv3 filter list changes would be
incorrectly output, if the RB-tree enumeration happened to reuse the
same chain for a mode switch: that is, both ALLOW and BLOCK records
were appended for the same group, in the same mbuf packet chain.

This was introduced during an mbuf chain layout bug fix involving
m_getptr(), which obviously cannot count from offset 0 on the
second pass through the RB-tree when serializing the IGMPv3
group records into the pending mbuf chain.

Cut over to KTR_INET for IGMPv3 CTR usage.
2009-04-29 10:12:01 +00:00
Edward Tomasz Napierala
1a4998162e Don't require packet to match a route (any route; this information wasn't
used anyway, so a typical workaround was to add a dummy route) if it's going
to be sent through IPSec tunnel.

Reviewed by:	bz
2009-04-28 11:10:33 +00:00
Oleg Bulyzhin
a3a981b7f9 Optimize packet flow: if net.inet.ip.fw.one_pass != 0 and packet was
processed by ipfw once - avoid second ipfw_chk() call.
This saves us from unnecessary IPFW_RLOCK(), m_tag_find() calls and
ip/tcp/udp header parsing.

MFC after:	2 month
2009-04-27 17:37:36 +00:00
Marko Zec
093f25f8c8 In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by:	bz (an older version of the patch)
Approved by:	julian (mentor)
2009-04-26 22:06:42 +00:00
Robert Watson
db091502fb Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead
(colloquially known as if_addrlist).  Currently not acquired around
interface address loops that call out to the routing code due to
potential lock order issues.

MFC after:	3 weeks
2009-04-26 19:05:40 +00:00
Robert Watson
588885f2f5 Expand coverage of IF_ADDR_LOCK() in in_control() from point of initial
lookup of 'ia' from if_addrhead through most use.  Note that we
currently have to drop it prematurely in some cases due to calls out to
the routing and interface code while using 'ia', but this closes many
races.  Annotate several potential races that persist after this change.
Move to using M_NOWAIT for allocating new interface addresses due to
lock(s) being held.

MFC after:	3 weeks
2009-04-25 23:02:57 +00:00
Robert Watson
07cde5e92c In in_purgemaddrs(), remove the inm being freed from the address list
before freeing it, rather than vice version, to avoid potential use
after free.

Reviewed by:	bms
2009-04-24 22:11:53 +00:00
Robert Watson
cf7b18f15e Relocate permissions checking code in in_control() to before the body
of the implementation of ioctls.  This makes the mapping of ioctls to
specific privileges more explicit, and also simplifies the
implementation by reducing the use of FALLTHROUGH handling in switch.

While this is not intended to be a functional change, it does mean
that certain privilege checks are now performed earlier, so EPERM
might be returned in preference to EADDRNOTAVAIL for management
ioctls that could have failed for both reasons.

MFC after:	3 weeks
2009-04-24 09:54:46 +00:00
Robert Watson
bbb3fb6194 Reorganize in_control() so that invariants are more obvious, and so
that it is easier to lock:

- Handle the unsupported ioctl case at the beginning of in_control(),
  handing off to ifp->if_ioctl, rather than looking up interfaces and
  addresses unnecessarily in this case.

- Make it an invariant that ifp is always non-NULL when running
  in_control()-implemented ioctls, simplifying the code structure.

MFC after:	3 weeks
2009-04-23 21:41:37 +00:00
Bruce M Simpson
86979280fc Bracket struct mfc and struct rtdetq with #ifdef _KERNEL.
Match the bracketing in netstat.
Since the cleanup of MROUTING, ports have broken because they
expect to include <netinet/ip_mroute.h> without including
<sys/queue.h>. Fix breakage at source.

The real fix, of course, is to fix the MROUTING APIs by blowing them
away and replacing them with something else...
2009-04-21 12:47:09 +00:00
Bruce M Simpson
5def3edcad remove IFF_ASSERTGIANT 2009-04-21 09:43:51 +00:00
Robert Watson
4ed6f8c1f1 Prefer actual field names (if_addrhead, ifa_link) to macros aliasing
those field names in FreeBSD code.

MFC after:	2 weeks
2009-04-20 22:40:44 +00:00
Robert Watson
0aade26e6d In ip_input(), cache the received mbuf's network interface in a local
variable.  Acquire the interface address list lock when iterating over
the interface address list searching for a matching received broadcast
address.

MFC after:	2 weeks
2009-04-20 14:35:42 +00:00
Robert Watson
33c4f96d88 In icmp_reflect(), acquire the inteface address list lock when
searching for a source address to use.

MFC after:	2 weeks
Reviewed by:	bz
2009-04-20 13:45:39 +00:00
Robert Watson
072b8f8ea7 Lock the interface address list when searching for a matching interface
by address, or when implementing 'me' rules on IPv6.  Prefer the field
name if_addrhead to the macro if_addrlist.

MFC after:	2 weeks
2009-04-19 22:34:35 +00:00
Robert Watson
b132600ab2 In divert_packet(), lock the interface address list before iterating over
it in search of an address.

MFC after:	2 weeks
2009-04-19 22:29:16 +00:00
Robert Watson
9317b04e46 Lock interface address lists in in_pcbladdr() when searching for a
source address for a connection and there's no route or now interface
for the route.

MFC after:	2 weeks
2009-04-19 22:25:09 +00:00
Robert Watson
8021456a24 Protect against some writer-writer races in in_control() by acquiring
the interface address list lock around interface address list
modifications.  More to do here.

MFC after:	2 weeks
2009-04-19 22:16:19 +00:00
Bruce M Simpson
b5fbc0b98f Now that IFF_NEEDSGIANT has been removed from the network
stack, catch up with this in IGMPv3 and remove dead code.
This has the side-effect of not being back-portable to RELENG_7
w/o further changes.
2009-04-19 08:14:21 +00:00
Kip Macy
65111ec7aa - Allocate a small flowtable in ip_input.c (changeable by tuneable)
- Use for accelerating ip_output
2009-04-19 04:44:05 +00:00
Kip Macy
ab25fa3558 s/void/void */ 2009-04-16 23:02:56 +00:00
Kip Macy
114f15c686 restore spare pointers for MFCing 2009-04-16 22:47:43 +00:00
Kip Macy
279aa3d419 Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
Kip Macy
8b12a7c2a6 - convert pspare pointers in inpcb to an llentry and rtentry cache
- add flags to indicate their validity
2009-04-15 22:22:00 +00:00
Kip Macy
773b573a96 - add second flags field to to inpcb
- update comments in vflag
2009-04-15 22:09:42 +00:00
Kip Macy
82c33e73f2 provide additional convenience macros for inpcb locking (upgrade, downgrade, exclusive) 2009-04-15 21:39:56 +00:00
Kip Macy
582b6122ab make LLTABLE visible to netinet 2009-04-15 20:49:59 +00:00
Kip Macy
de4ab55e43 add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry
2009-04-15 20:34:19 +00:00
Randall Stewart
e261340ef7 Add missing address lock when we look at the ifa list 2009-04-14 19:20:27 +00:00
Randall Stewart
544e35bd97 Move the flight size reduction to right after
we recognize its a retransmit, ahead of the PR-SCTP
work. Without this fix, we end up NOT reducing flight
size and causing an miscalculation when PR-SCTP is active
and data is skipped.

Obtained from:	Michael Tuexen.
2009-04-14 07:50:29 +00:00
Robert Watson
de231a063a Put TCPSTAT_ADD() and TCPSTAT_INC() behind _KERNEL.
MFC after:	3 days
2009-04-12 21:28:35 +00:00
Robert Watson
6bf65bcf3a Update stats in struct carpstats using two new macros: CARPSTATS_ADD()
and CARPSTATS_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the implementation
of these statistics, such as using per-CPU versions of the data
structure.

MFC after:	3 days
2009-04-12 14:19:37 +00:00
Robert Watson
07cf7ab29c Update stats in struct pimstat using two new macros: PIMSTAT_ADD()
and PIMSTAT_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structure.

MFC after:	3 days
2009-04-12 14:06:26 +00:00
Robert Watson
fb83a36856 Update stats in struct mrtstat using two new macros: MRTSTAT_ADD()
and MRTSTAT_INC(), rather than directly manipulating the fields of
the structure.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structure.

MFC after:	3 days
2009-04-12 14:00:36 +00:00
Robert Watson
bd88cce2ed Update stats in struct igmpstat using two new macros:
IGMPSTAT_ADD() and IGMPSTAT_INC(), rather than directly
manipulating the fields of the structure.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-12 13:41:13 +00:00
Robert Watson
e27b0c8775 Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after:	3 days
2009-04-12 13:22:33 +00:00
Robert Watson
026decb8f3 Update stats in struct udpstat using two new macros, UDPSTAT_ADD()
and UDPSTAT_INC(), rather than directly manipulating the fields
across the kernel.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structures.

MFC after:	3 days
2009-04-12 11:42:40 +00:00
Robert Watson
86425c62a0 Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel.  This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 23:35:20 +00:00
Robert Watson
78b5071407 Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and
TCPSTAT_INC(), rather than directly manipulating the fields across the
kernel.  This will make it easier to change the implementation of
these statistics, such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 22:07:19 +00:00