Commit Graph

377 Commits

Author SHA1 Message Date
andre
c59c83ebb6 Replace the ill-named ZERO_COPY_SOCKET kernel option with two
more appropriate named kernel options for the very distinct
send and receive path.

"options SOCKET_SEND_COW" enables VM page copy-on-write based
sending of data on an outbound socket.

NB: The COW based send mechanism is not safe and may result
in kernel crashes.

"options SOCKET_RECV_PFLIP" enables VM kernel/userspace page
flipping for special disposable pages attached as external
storage to mbufs.

Only the naming of the kernel options is changed and their
corresponding #ifdef sections are adjusted.  No functionality
is added or removed.

Discussed with:	alc (mechanism and limitations of send side COW)
2012-10-23 14:19:44 +00:00
andre
6062df3e66 Grammar fixes to r241781.
Submitted by:	alc
2012-10-20 19:38:22 +00:00
andre
1211b4598c Hide the unfortunate named sysctl kern.ipc.somaxconn from sysctl -a
output and replace it with a new visible sysctl kern.ipc.acceptqueue
of the same functionality.  It specifies the maximum length of the
accept queue on a listen socket.

The old kern.ipc.somaxconn remains available for reading and writing
for compatibility reasons so that existing programs, scripts and
configurations continue to work.  There no plans to ever remove the
orginal and now hidden kern.ipc.somaxconn.
2012-10-20 12:53:14 +00:00
andre
e3d1886535 Tidy up somaxconn (accept queue limit) and related functions
and move it together into one place.
2012-10-20 10:51:32 +00:00
andre
b521bc0b76 Move socket UMA zone initialization functionality together into
one place.
2012-10-19 12:16:29 +00:00
andre
a00941fc1d Move UMA socket zone initialization from uipc_domain.c to uipc_socket.c
into one place next to its other related functions to avoid confusion.
2012-10-19 10:15:32 +00:00
andre
0a3cd6fb1a Remove unnecessary includes from sosend_copyin() and fix
a couple of style issues.
2012-10-18 21:04:30 +00:00
andre
bd86ceebe0 Remove double-wrapping of #ifdef ZERO_COPY_SOCKETS within
zero copy specialized sosend_copyin() helper function.
2012-10-18 20:22:17 +00:00
wollman
b94c59684c Fix spelling of the function name in two assertion messages. 2012-10-02 18:38:05 +00:00
trociny
ae35ddce9c In soreceive_generic() remove the optimization for the case when
MSG_WAITALL is set, and it is possible to do the entire receive
operation at once if we block (resid <= hiwat). Actually it might make
the recv(2) with MSG_WAITALL flag get stuck when there is enough space
in the receiver buffer to satisfy the request but not enough to open
the window closed previously due to the buffer being full.

The issue can be reproduced using the following scenario:

On the sender side do 2 send(2) requests:

1) data of size much smaller than SOBUF_SIZE (e.g. SOBUF_SIZE / 10);
2) data of size equal to SOBUF_SIZE.

On the receiver side do 2 recv(2) requests with MSG_WAITALL flag set:

1) recv() data of SOBUF_SIZE / 10 size;
2) recv() data of SOBUF_SIZE size;

We totally fill the receiver buffer with one SOBUF_SIZE/10 size request
and partial SOBUF_SIZE request. When the first request is processed we
get SOBUF_SIZE/10 free space. It is just enough to receive the rest of
bytes for the second request, and soreceive_generic() blocks in the
part that is a subject of this change waiting for the rest. But the
window was closed when the buffer was filled and to avoid silly window
syndrome it opens only when available space is larger than sb_hiwat/4
or maxseg. So it is stuck and pending data is only sent via TCP window
probes.

Discussed with:	kib (long ago)
MFC after:	2 weeks
2012-09-02 07:33:52 +00:00
trociny
e483d14d74 In soreceive_generic() when checking if the type of mbuf has changed
check it for MT_CONTROL type too, otherwise the assertion
"m->m_type == MT_DATA" below may be triggered by the following scenario:

- the sender sends some data (MT_DATA) and then a file descriptor
  (MT_CONTROL);
- the receiver calls recv(2) with a MSG_WAITALL asking for data larger
  than the receive buffer (uio_resid > hiwat).

MFC after:	2 week
2012-09-02 07:29:37 +00:00
trociny
3ef0ae6cd1 Fix KASSERT message.
MFC after:	3 days
2012-07-03 19:08:02 +00:00
np
307ef13f94 - Remove redundant call to pr_ctloutput from code that handles SO_SETFIB.
- Add a check for errors during copyin while here.

Reviewed by:	julian, bz
MFC after:	2 weeks
2012-04-03 18:38:00 +00:00
kib
79e65f67d4 Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number.  This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.

Submitted by:	Jukka A. Ukkonen <jau iki fi>
PR:	  kern/162352
Discussed with:	bz
Reviewed by:	glebius
MFC after:	2 weeks
2012-02-26 13:55:43 +00:00
kib
18b7fd6ba3 Remove apparently redundand checks for socket so_proto being non-NULL
from sosetopt() and sogetopt().  No exposed sockets may have so_proto
invalid.

Discussed with:	bz, rwatson
Reviewed by:	glebius
MFC after:	2 weeks
2012-02-26 13:51:05 +00:00
kib
80ae8fe82c Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
hrs
43b8619604 Fix input validation in SO_SETFIB.
Reviewed by:	bz
MFC after:	1 day
2012-02-04 15:00:26 +00:00
rmh
ff5c11fefd Remove a few bits of FreeBSD 2.x compatibility code.
Approved by:	kib (mentor)
2011-11-14 18:21:27 +00:00
attilio
683d7a54ce Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
andre
41fc3214df In the experimental soreceive_stream():
o Move the non-blocking socket test below the SBS_CANTRCVMORE so that EOF
   is correctly returned on a remote connection close.
 o In the non-blocking socket test compare SS_NBIO against the so->so_state
   field instead of the incorrect sb->sb_state field.
 o Simplify the ENOTCONN test by removing cases that can't occur.

Submitted by:	trociny (with some further tweaks by committer)
Tested by:	trociny
2011-07-08 10:50:13 +00:00
andre
19bc1179a6 Remove the TCP_SORECEIVE_STREAM compile time option. The use of
soreceive_stream() for TCP still has to be enabled with the loader
tuneable net.inet.tcp.soreceive_stream.

Suggested by:	trociny and others
2011-07-07 10:37:14 +00:00
trociny
1dfa9ab873 In soreceive_generic(), if MSG_WAITALL is set but the request is
larger than the receive buffer, we have to receive in sections.
When notifying the protocol that some data has been drained the
lock is released for a moment. Returning we block waiting for the
rest of data. There is a race, when data could arrive while the
lock was released and then the connection stalls in sbwait.

Fix this by checking for data before blocking and skip blocking
if there are some.

PR:		kern/154504
Reported by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Tested by:	Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
Reviewed by:	rwatson
Approved by:	kib (co-mentor)
MFC after:	2 weeks
2011-05-29 18:00:50 +00:00
bz
b9b7d3e93a Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks
2011-02-16 21:29:13 +00:00
deischen
cc0a83ec21 Allow the SO_SETFIB socket option to select the default (0)
routing table.

Reviewed by:	julian
2011-02-13 00:14:13 +00:00
bz
aad842be85 Mfp4 CH=177255:
Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

  Change the syntax to match KASSERT() to allow more flexible panic
  messages rather than having a printf with hardcoded arguments
  before panic.

  Adjust the few assertions we have to the new format (and enhance
  the output).

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:	jhb

MFC after:	2 weeks
2011-02-11 13:27:00 +00:00
luigi
d5e8d236f4 This commit implements the SO_USER_COOKIE socket option, which lets
you tag a socket with an uint32_t value. The cookie can then be
used by the kernel for various purposes, e.g. setting the skipto
rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has
been implemented; however there is nothing ipfw-specific in its
implementation).

The ipfw-related code that uses the optopn will be committed separately.

This change adds a field to 'struct socket', but the struct is not
part of any driver or userland-visible ABI so the change should be
harmless.

See the discussion at
http://lists.freebsd.org/pipermail/freebsd-ipfw/2009-October/004001.html

Idea and code from Paul Joe, small modifications and manpage
changes by myself.

Submitted by:	Paul Joe
MFC after:	1 week
2010-11-12 13:02:26 +00:00
rwatson
b9d3291981 With reworking of the socket life cycle in 7.x, the need for a "sotryfree()"
was eliminated: all references to sockets are explicitly managed by sorele()
and the protocols.  As such, garbage collect sotryfree(), and update
sofree() comments to make the new world order more clear.

MFC after:	3 days
Reported by:	Anuranjan Shukla <anshukla at juniper dot net>
2010-09-18 11:18:42 +00:00
tuexen
542f657a7f Fix a bug where MSG_TRUNC was not returned in all necessary cases for
SOCK_DGRAM socket. MSG_TRUNC was only returned when some mbufs could
not be copied to the application. If some data was left in the last
mbuf, it was correctly discarded, but MSG_TRUNC was not set.

Reviewed by: bz
MFC after: 3 weeks
2010-08-07 17:57:58 +00:00
rwatson
c7e8976175 When close() is called on a connected socket pair, SO_ISCONNECTED might be
set but be cleared before the call to sodisconnect().  In this case,
ENOTCONN is returned: suppress this error rather than returning it to
userspace so that close() doesn't report an error improperly.

PR:		kern/144061
Reported by:	Matt Reimer <mreimer at vpop.net>,
		Nikolay Denev <ndenev at gmail.com>,
		Mikolaj Golub <to.my.trociny at gmail.com>
MFC after:	3 days
2010-05-27 15:27:31 +00:00
nwhitehorn
142a4d2993 Provide groundwork for 32-bit binary compatibility on non-x86 platforms,
for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32
option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts
of the kernel and enhances the freebsd32 compatibility code to support
big-endian platforms.

Reviewed by:	kib, jhb
2010-03-11 14:49:06 +00:00
bz
7772630fb3 Set curvnet earlier so that it also covers calls to sodisconnect(), which
before were possibly panicing the system in ULP code in the VIMAGE case.

Submitted by:	Igor (igor ispsystem.com)
MFC after:	5 days
2010-02-20 22:29:28 +00:00
rwatson
86b3bcad7d Don't comment on stream socket handling in sosend_dgram, since that's
not handled.

MFC after:	3 weeks
2009-10-02 21:31:15 +00:00
andre
3e3fe69f6b -Put the optimized soreceive_stream() under a compile time option called
TCP_SORECEIVE_STREAM for the time being.

Requested by:	brooks

Once compiled in make it easily switchable for testers by using a tuneable
 net.inet.tcp.soreceive_stream
and a corresponding read-only sysctl to report the current state.

Suggested by:	rwatson

MFC after:	2 days
-This line, and those below, will be ignored--
> Description of fields to fill in above:                     76 columns --|
> PR:            If a GNATS PR is affected by the change.
> Submitted by:  If someone else sent in the change.
> Reviewed by:   If someone else reviewed your modification.
> Approved by:   If you needed approval for this commit.
> Obtained from: If the change is from a third party.
> MFC after:     N [day[s]|week[s]|month[s]].  Request a reminder email.
> Security:      Vulnerability reference (one per line) or description.
> Empty fields above will be automatically removed.

M    sys/conf/options
M    sys/kern/uipc_socket.c
M    sys/netinet/tcp_subr.c
M    sys/netinet/tcp_usrreq.c
2009-09-15 22:23:45 +00:00
rwatson
53eaed07bc Use C99 initialization for struct filterops.
Obtained from:	Mac OS X
Sponsored by:	Apple Inc.
MFC after:	3 weeks
2009-09-12 20:03:45 +00:00
jilles
0b9c3c2f3d Fix poll() on half-closed sockets, while retaining POLLHUP for fifos.
This reverts part of r196460, so that sockets only return POLLHUP if both
directions are closed/error. Fifos get POLLHUP by closing the unused
direction immediately after creating the sockets.

The tools/regression/poll/*poll.c tests now pass except for two other things:
- if POLLHUP is returned, POLLIN is always returned as well instead of only
  when there is data left in the buffer to be read
- fifo old/new reader distinction does not work the way POSIX specs it

Reviewed by:	kib, bde
2009-08-25 21:44:14 +00:00
kib
8b1803af93 Fix the conformance of poll(2) for sockets after r195423 by
returning POLLHUP instead of POLLIN for several cases. Now, the
tools/regression/poll results for FreeBSD are closer to that of the
Solaris and Linux.

Also, improve the POSIX conformance by explicitely clearing POLLOUT
when POLLHUP is reported in pollscan(), making the fix global.

Submitted by:	bde
Reviewed by:	rwatson
MFC after:	1 week
2009-08-23 12:44:15 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
julian
f22b416ddb Somewhere along the line accept sockets stopped honoring the
FIB selected for them. Fix this.

Reviewed by:	ambrisko
Approved by:	re (kib)
MFC after:	3 days
2009-07-28 19:43:27 +00:00
rwatson
80ed051e0c Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-19 17:40:45 +00:00
kib
6c6bda868d Fix poll(2) and select(2) for named pipes to return "ready for read"
when all writers, observed by reader, exited. Use writer generation
counter for fifo, and store the snapshot of the fifo generation in the
f_seqcount field of struct file, that is otherwise unused for fifos.
Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is
equal to fifo' fi_wgen, and revert r89376.

Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them.
Note that the patch does not fix not returning POLLHUP for fifos.

PR:	kern/94772
Submitted by:	bde (original version)
Reviewed by:	rwatson, jilles
Approved by:	re (kensmith)
MFC after:	6 weeks (might be)
2009-07-07 09:43:44 +00:00
andre
e66ed06df4 Add soreceive_stream(), an optimized version of soreceive() for
stream (TCP) sockets.

It is functionally identical to generic soreceive() but has a
number stream specific optimizations:
o does only one sockbuf unlock/lock per receive independent of
  the length of data to be moved into the uio compared to
  soreceive() which unlocks/locks per *mbuf*.
o uses m_mbuftouio() instead of its own copy(out) variant.
o much more compact code flow as a large number of special
  cases is removed.
o much improved reability.

It offers significantly reduced CPU usage and lock contention
when receiving fast TCP streams.  Additional gains are obtained
when the receiving application is using SO_RCVLOWAT to batch up
some data before a read (and wakeup) is done.

This function was written by "reverse engineering" and is not
just a stripped down variant of soreceive().

It is not yet enabled by default on TCP sockets.  Instead it is
commented out in the protocol initialization in tcp_usrreq.c
until more widespread testing has been done.

Testers, especially with 10GigE gear, are welcome.

MFP4:	r164817 //depot/user/andre/soreceive_stream/
2009-06-22 23:08:05 +00:00
jamie
f950eed7d7 Get vnets from creds instead of threads where they're available, and from
passed threads instead of curthread.

Reviewed by:	zec, julian
Approved by:	bz (mentor)
2009-06-15 19:01:53 +00:00
kib
e1cb2941d4 Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by:	jhb [1]
Noted by:	jhb [2]
Reviewed by:	jhb
Tested by:	rnoland
2009-06-10 20:59:32 +00:00
rwatson
f4934662e5 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
rwatson
0f9e858440 Add internal 'mac_policy_count' counter to the MAC Framework, which is a
count of the number of registered policies.

Rather than unconditionally locking sockets before passing them into MAC,
lock them in the MAC entry points only if mac_policy_count is non-zero.

This avoids locking overhead for a number of socket system calls when no
policies are registered, eliminating measurable overhead for the MAC
Framework for the socket subsystem when there are no active policies.

Possibly socket locks should be acquired by policies if they are required
for socket labels, which would further avoid locking overhead when there
are policies but they don't require labeling of sockets, or possibly
don't even implement socket controls.

Obtained from:	TrustedBSD Project
2009-06-02 18:26:17 +00:00
jhb
a1af9ecca4 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
zec
b31e199a10 A NOP change: style / whitespace cleanup of the noise that slipped
into r191816.

Spotted by:	bz
Approved by:	julian (mentor) (an earlier version of the diff)
2009-05-08 14:34:25 +00:00
zec
d78a1b1a82 Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
zec
39b6dc8ba2 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00