Commit Graph

1190 Commits

Author SHA1 Message Date
Max Khon
707d205808 Constify "address" argument of ng_address_path(). 2011-11-06 05:20:27 +00:00
Gleb Smirnoff
e5fe87b387 - If KDB & NETGRAPH_DEBUG are on, print traces on discovered failed
invariants.
- Reduce tautology in NETGRAPH_DEBUG output.
2011-10-27 09:43:25 +00:00
Alexander V. Chernikov
7aabe9d9e0 Free mbuf in case when protocol in unknown in ng_ipfw_rcvdata().
This change fixes (theoretically) possible mbuf leak introduced in
r225586. Reorder code a bit and change return codes to be more specific

Reviewed by:	glebius
Approved by:    kib (mentor)
2011-10-10 09:33:07 +00:00
Andrey V. Elsukov
f2a66f8e17 Add IPv6 support to the ng_ipfw(4) [1]. Also add ifdefs to be able
build it with and without INET/INET6 support.

Submitted by:	Alexander V. Chernikov <melifaro at yandex-team.ru> [1]
Tested by:	Alexander V. Chernikov <melifaro at yandex-team.ru> [1]
Approved by:	re (bz)
MFC after:	2 weeks
2011-09-15 12:28:17 +00:00
Robert Watson
a9d2f8d84f Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *.  With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by:	re (bz)
Submitted by:	jonathan
Sponsored by:	Google Inc
2011-08-11 12:30:23 +00:00
Andriy Gapon
7a0b13ed28 remove RESTARTABLE_PANICS option
This is done per request/suggestion from John Baldwin
who introduced the option.  Trying to resume normal
system operation after a panic is very unpredictable
and dangerous.  It will become even more dangerous
when we allow a thread in panic(9) to penetrate all
lock contexts.
I understand that the only purpose of this option was
for testing scenarios potentially resulting in panic.

Suggested by:	jhb
Reviewed by:	attilio, jhb
X-MFC-After:	never
Approved by:	re (kib)
2011-07-25 09:12:48 +00:00
Marko Zec
2cdf8c49a6 Clear pending ifnet events, in an attempt at preventing
ng_ether_link_state() from being dispatched after we have
cleared our IFP2NG(ifp).

MFC after:	3 days
2011-07-16 19:11:45 +00:00
Gleb Smirnoff
0a7d45d349 In ng_attach_cntl() first allocate things that may fail, and then
do the rest of initialization. This simplifies code and fixes
a double free in failure scenario.

Reviewed by:	bz
2011-07-14 18:38:10 +00:00
Gleb Smirnoff
acfc07098c Add missing unlocks. 2011-07-06 09:43:25 +00:00
Gleb Smirnoff
ea7e163882 o Eliminate flow6_hash_entry in favor of flow_hash_entry. We don't need
a separate struct to start a slist of semi-opaque structs. This
  makes some code more compact.
o Rewrite ng_netflow_flow_show() and its API/ABI:
  - Support for IPv6 is added.
  - Request and response now use same struct. Structure specifies
    version (6 or 4), index of last retrieved hash, and also index
    of last retrieved entry in the hash entry.
2011-07-05 14:48:39 +00:00
Gleb Smirnoff
d33dc2fa5c Fix build with NETGRAPH_DEBUG. 2011-07-04 20:50:09 +00:00
Gleb Smirnoff
f8dd68c912 Fix build with NETGRAPH_DEBUG. 2011-07-04 13:55:55 +00:00
Gleb Smirnoff
3fbdf77459 - Use refcount(9) API to manage node and hook refcounting.
- Make ng_unref_node() void, since caller shouldn't be
  interested in whether node is valid after call or not,
  since it can't be guaranteed to be valid. [1]

Ok from:	julian [1]
2011-07-04 07:03:44 +00:00
Bjoern A. Zeeb
a34c6aeb85 Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).

Submitted by:	Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by:	julian
MFC after:	2 weeks
2011-07-03 16:08:38 +00:00
Gleb Smirnoff
9b2139a27e Fix double free.
Submitted by:	Alexander V. Chernikov <melifaro ipfw.ru>
2011-07-01 08:27:03 +00:00
Hans Petter Selasky
f1a16106b6 - Move all USB device ID arrays into so-called sections,
sorted according to the mode which they support:
	host, device or dual mode
- Add generic tool to extract these data:
	tools/bus_autoconf

Discussed with:	imp
Suggested by:	Robert Millan <rmh@debian.org>
PR:		misc/157903
MFC after:	14 days
2011-06-24 02:30:02 +00:00
Gleb Smirnoff
cd5bdbcb4d Be consistent with r160968: keep autoSrcAddr flag untouched when
node receives NGM_SHUTDOWN.

Submitted by:	pluknet
2011-06-23 09:42:41 +00:00
Andrey V. Elsukov
c57e67d04e Sync ng_nat with recent (r222806) ipfw_nat changes:
Make a behaviour of the libalias based in-kernel NAT a bit closer to
  how natd(8) does work. natd(8) drops packets only when libalias returns
  PKT_ALIAS_IGNORED and "deny_incoming" option is set, but ipfw_nat
  always did drop packets that were not aliased, even if they should
  not be aliased and just are going through.

Also add SCTP support: mark response packets to skip firewall processing.

MFC after:	1 month
2011-06-07 06:48:42 +00:00
Marko Zec
2956ec9bc7 Assume the link to be dead if bit error rate (BER) parameter is set to 1.
When a transition from link alive to link dead configuration or vice
versa occurs, notify any upstream and / or downstream peers using
NGM_FLOW messagges.

Link state notification using NGM_FLOW messages is modelled around
around already existing code in ng_ether.c.

MFC after:	3 days
2011-05-24 14:36:32 +00:00
Marko Zec
7d5ddd30cd Provide fake link status information in an attempt to let ng_eiface(4)
virtual ifnets more realistically mimic physical ethernet interfaces.
The main motivation behind this change is to allow for ng_eiface(4)
interfaces to participate in STP if_bridge(4) configurations.

When announcing link status changes, switch to the vnet to which the
ifnet belongs, since it is possible for ng_eiface ifnets to be assigned
to a vnet different from the one in which its netgraph node resides.

MFC after:	3 days
2011-05-24 14:10:33 +00:00
Andriy Gapon
a5db8fd19e usb: fix a missed use of use_generic in r222051
Submitted by:	gcooper
Pointyhat to:	avg
MFC after:	1 month
X-MFC with:	r222051
2011-05-18 11:38:36 +00:00
Gleb Smirnoff
ca47294ddf LibAliasInit() should allocate memory with M_WAITOK flag. Modify it
and its callers.
2011-04-18 20:07:08 +00:00
Gleb Smirnoff
e0c7bc79c4 Finish last change.
Pointy hat to: glebius
2011-04-18 14:07:01 +00:00
Gleb Smirnoff
c1d21557c5 Further cleanup of node creation path from M_NOWAIT usage. 2011-04-18 14:05:26 +00:00
Gleb Smirnoff
b6770143c4 ng_netflow_cache_init() can be void. 2011-04-18 09:14:23 +00:00
Gleb Smirnoff
674d86bf91 Node constructor methods are supposed to be called in syscall
context always. Convert nodes to consistently use M_WAITOK flag
for memory allocation.

Reviewed by:	julian
2011-04-18 09:12:27 +00:00
Andrey V. Elsukov
ffbfc0aacb Use M_WAITOK flag instead M_WAIT for malloc.
Suggested by:	glebius
MFC after:	1 week
2011-04-18 09:10:27 +00:00
Gleb Smirnoff
5633ca7116 Fix error where error variable was assigned result of comparison,
instead of function return value.

Submitted by:	Przemyslaw Frasunek <przemyslaw frasunek.com>
MFC after:	4 days
2011-04-17 16:31:21 +00:00
Marko Zec
fae147aab3 Properly unref ng_hub nodes on shutdown, so that we don't leak them.
MFC after:	3 days
2011-04-07 11:40:10 +00:00
Gleb Smirnoff
a7da736a64 Improve locking of creating and dropping links in the graph, acquiring
the topology mutex in the following functions, that manipulate pointers
to peer nodes:

- ng_bypass()
- ng_path2noderef() when switching to the next node in sequence.
  Rewrite the function a bit.
- ng_address_hook()
- ng_address_path()

This patch improves stability of large mpd5 installations.
2011-03-21 14:18:40 +00:00
Gleb Smirnoff
ce4b2e2c63 Remove spl(9) remnants. 2011-03-19 19:37:53 +00:00
Bjoern A. Zeeb
3090c02041 Unbreak the build for no options INET6.
PR:		kern/155227
Submitted by:	Dmitry Afanasiev (KOT MATPOCKuH.Ru)
2011-03-03 16:16:49 +00:00
Gleb Smirnoff
5dcd9c1061 Add support for NetFlow version 9 into ng_netflow(4) node.
Submitted by:	Alexander V. Chernikov <melifaro ipfw.ru>
2011-03-02 16:15:11 +00:00
Andrey V. Elsukov
633c5bdac8 Add XMIT_FAILOVER transmit algorithm to ng_one2many node. Packets are
delivered out the first active "many" hook.

PR:		kern/137775
Submitted by:	Maxim Ignatenko
MFC after:	2 weeks
2011-03-01 13:10:56 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Bjoern A. Zeeb
1fb51a12f2 Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks
2011-02-16 21:29:13 +00:00
Matthew D Fleming
f29fc08590 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the netgraph piece.
2011-01-12 19:53:39 +00:00
John Baldwin
58ccf5b41c Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
Marko Zec
57ce8ebf8c Simplify ng_pipe locking model by relying on the netgraph framework
to provide serialization of calls into the node, which is accomplished
by markng the node as single-threaded (NGF_FORCE_WRITER).

The price we pay is that each ng_pipe instance now has its own callout
handler which polls for queued frames on each clock tick, as long as
the pipe has any frames in its internal queues.  OTOH, we got rid of
the global ng_pipe mutex, so from now on multiple ng_pipe instances
can operate in parallel.  This change also fixes counting of forwarded
frames when an ng_pipe node is not enforcing any packet impairments.

While here, attempt to improve adherance to style(9) throughout
otherwise mostly unreadable code.

MFC after:	3 days
2010-11-24 16:02:58 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Marko Zec
abe80e1272 Allow for MTU sizes of up to ETHER_MAX_LEN_JUMBO (i.e. 9018) bytes to be
configured on ng_eiface ifnets.  The default MTU remains unchanged at
1500 bytes.

Mark ng_eiface ifnets as IFCAP_VLAN_MTU capable, so that the associated
vlan(4) ifnets may use full-sized Ethernet MTUs (1500 bytes).

MFC after:	3 days
2010-11-22 12:32:19 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
Rui Paulo
d28843a449 When calling panic(), always pass a format string. 2010-10-13 17:21:21 +00:00
Maksim Yevmenkin
fae1602944 Fix typo
PR:	kern/140590
MFC after:	3 days
2010-08-02 22:26:08 +00:00
Gleb Smirnoff
b9bff254af Fix operation of "netgraph" action in conjunction with the
net.inet.ip.fw.one_pass sysctl.

The "ngtee" action is still broken.

PR:		kern/148885
Submitted by:	Nickolay Dudorov <nnd mail.nsk.ru>
2010-07-27 14:26:34 +00:00
Gleb Smirnoff
9da8211828 Zero padding fields of netflow records. This helps to reduce
size of compressed export logs.

Requested by:	Alexey Illarionov <littlesavage orionet.ru>
2010-07-26 13:48:35 +00:00
Ed Maste
67d187beb1 Remove defunct email address from header as well. 2010-07-06 16:55:39 +00:00
Ed Maste
d11f4f5dea Remove email address that no longer exists. 2010-07-06 16:42:11 +00:00
Marko Zec
c9840b23d5 Fix a double-free bug which can occur if both bit error rate and packet
duplication probability are configured on a ng_pipe node.

Submitted by:	Jeffrey Ahrenholtz
MFC after:	3 days
2010-07-06 12:13:15 +00:00
Gleb Smirnoff
7418c6e1a9 Avoid double-free. In error cases ipfw(4) frees the mbuf(4), we don't
need to.

PR:		kern/145462
2010-07-06 10:45:38 +00:00
Gleb Smirnoff
956000b838 The struct ipfw_rule_ref follows the struct m_tag. Deal with this
correctly. This fixes breakage of ng_ipfw(4) in r201527.

Submitted by:	Alexander Zagrebin <alexz visp.ru>
2010-07-01 17:46:12 +00:00
Andrey V. Elsukov
cac2fe695e * Include sys/systm.h for KASSERT()
* Remove unneeded includes and comment
* Replace home made OFFSETOF() macro with standard offsetof()

Pointed out by:	bde
Approved by:	kib (mentor)
2010-06-15 08:53:13 +00:00
Andrey V. Elsukov
6c0a2eb136 Style(9) fixes:
* Sort includes
* Replace #define<SPACE> to #define<TAB>
* Split declarations and initializations
* Split long lines

Requested by:	kib
Approved by:	kib (mentor)
MFC after:	1 month
2010-06-10 16:45:30 +00:00
Andrey V. Elsukov
d359a62d44 New netgraph node ng_patch(4). It performs data modification of packets
passing through. Modifications are restricted to a subset of C language
operations on unsigned integers of 8, 16, 32 or 64 bit size.
These are: set to new value (=), addition (+=), subtraction (-=),
multiplication (*=), division (/=), negation (= -), bitwise AND (&=),
bitwise OR (|=), bitwise eXclusive OR (^=), shift left (<<=),
shift right (>>=). Several operations are all applied to a packet
sequentially in order they were specified by user.

Submitted by:	Maxim Ignatenko <gelraen.ua at gmail.com>
		Vadim Goncharov <vadimnuclight at tpu.ru>
Discussed with:	net@
Approved by:	mav (mentor)
MFC after:	1 month
2010-06-09 12:25:57 +00:00
Alexander Motin
5a73d193c4 Remove some dead and incorrect code.
Found with:   Coverity Prevent(tm)
CID:          4562
2010-06-05 10:16:23 +00:00
Attilio Rao
b1b11ad27e Fix a race between ngs_rcvmsg() and soclose() which closes the control
socket while it is still in use.
priv->ctlsock is checked at the top of the function but without any
lock held, which means the control socket state may certainly change.
Add a similar protection to ngs_shutdown() even if a race is unlikely
to be experienced there.

Sponsored by:	Sandvine Incorporated
Obtained from:	Nima Misaghian @ Sandvine Incorporated
		<nmisaghian at sandvine dot com>
MFC after:	10 days
2010-05-19 15:06:09 +00:00
Marko Zec
98a5a343e3 Increase the target buffer for performing NGM_ASCII2BINARY conversion
from 2000 bytes to 20 Kbytes, which now matches the buffer size used for
NGM_BINARY2ASCII conversions.

The aim of this change is to allow for bigger binary structures to be
managed via netgraph ASCII messages, until we come up with an API
improvement which would get rid of such arbitrary hardcoded limits.

MFC after:	3 days
2010-05-13 16:48:28 +00:00
Fabien Thomas
f9e4dd7122 Fix an invalid parameter detected by INVARIANT and confirmed by r193272. 2010-05-06 20:58:23 +00:00
Marko Zec
f8aab721b2 Add an optional "persistent" flag to ng_hub and ng_bridge, which if set,
disables automatic node shutdown when the last hook gets disconnected.

Reviewed by:	julian
2010-05-05 22:06:05 +00:00
Marko Zec
a3f93b7269 When destroying a vnet, shut down all netgraph nodes tied to that vnet
before proceeding with dismantling other protocol domains.

This change only affects options VIMAGE builds.

Reviewed by:	julian, bz
MFC after:	3 days
2010-05-03 16:08:24 +00:00
Maxim Sobolev
e50d35e6c6 Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after:	1 month
2010-05-03 07:32:50 +00:00
Edward Tomasz Napierala
8016863c91 Avoid undefined behaviour.
Reviewed by:	zec@
2010-04-30 07:09:13 +00:00
Joel Dahl
c0587701ad Start copyright notice with /*- 2010-04-07 16:29:10 +00:00
Alexander Motin
38f2d636ca Remove alignment constraints. 2010-04-01 16:20:36 +00:00
Alexander Motin
b652a5fa66 Remove alignment constraints. 2010-04-01 16:18:16 +00:00
Alexander Motin
7937d24b12 Remove alignment constraints. 2010-04-01 10:41:01 +00:00
Alexander Motin
39228864fd Remove some more alignment constraints. 2010-03-31 22:47:55 +00:00
Alexander Motin
5c100aeaad Make ng_ksocket fulfill lower protocol stack layers alignment requirements
on platforms with strict alignment constraints.
This fixes kernel panics on arm and probably other architectures.

PR:		sparc64/80410
2010-03-31 22:16:05 +00:00
Alexander Motin
148ac1dacc Make ng_l2tp irrelevant to data alignment. 2010-03-31 22:11:06 +00:00
Alexander Motin
d6b013b537 Make ng_ppp fulfill upper protocol stack layers alignment requirements
on platforms with strict alignment constraints.
This fixes kernel panics on arm and probably other architectures.

PR:		sparc64/80410
2010-03-31 20:37:44 +00:00
Gleb Smirnoff
cecdd23f87 Remove disabled code. In 99% cases exports are send to ng_ksocket(4), which
already forces queued mode, so what was suggested in disabled code is already
done.
2010-03-25 10:13:21 +00:00
Gleb Smirnoff
c1b90938b1 Now fix functionality of 'netstat -f netgraph' that hasn't worked
starting from netgraph import in 1999.

netstat(8) used pointer to node as node address, oops. That didn't
work, we need the node ID in brackets to successfully address a node.
We can't look into ng_node, due to inability to include netgraph/netgraph.h
in userland code. So let the node make a hint for a userland, storing
the node ID in its private data.

MFC after:	2 weeks
2010-03-12 15:04:59 +00:00
Gleb Smirnoff
60a25b4ba1 Fix 'netstat -f netgraph', which I had broken in r163463 ling time
ago in 2006. This linked list is actually needed for userland.

PR:		kern/140446
Submitted by:	Adrian Steinmann <ast marabu.ch>
2010-03-12 14:51:42 +00:00
Andrew Thompson
ea4ca115b7 Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR:		kern/142927
Submitted by:	Nikolay Denev
2010-01-18 20:34:00 +00:00
Max Khon
bb027e0647 Send link state change control messages to "orphans" hook as well.
MFC after:	1 week
2010-01-09 19:03:48 +00:00
Luigi Rizzo
1aa0091897 ip_var.h now needs to be before ip_fw_private.h 2010-01-07 14:23:19 +00:00
Luigi Rizzo
7173b6e554 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Luigi Rizzo
e59084e086 bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c
2009-12-28 12:29:13 +00:00
Luigi Rizzo
830c6e2b97 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
Luigi Rizzo
de240d1013 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
Luigi Rizzo
5f2e16424c add ip_fw_private.h to ng_ipfw.c, forgotten in previous commit;
comment out remove ip_fw.h from ng_bridge.c, as it seems unused.

MFC after:	1 month
2009-12-15 18:33:12 +00:00
John Baldwin
e1b17582f4 Take a step towards removing if_watchdog/if_timer. Don't explicitly set
if_watchdog/if_timer to NULL/0 when initializing an ifnet.  if_alloc()
sets those members to NULL/0 already.
2009-11-06 14:55:01 +00:00
Ruslan Ermilov
90147b7506 Spell DIAGNOSTIC correctly. 2009-10-24 18:49:17 +00:00
Julian Elischer
0b4b0b0fee Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
Maksim Yevmenkin
06acf4bc6f Get those pesky RFCOMM RPM data bits right. This is likely a noop.
MFC after:	1 month
2009-09-10 23:30:13 +00:00
Robert Watson
77dfcdc445 Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
Robert Watson
530c006014 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
Robert Watson
d0728d7174 Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
Robert Watson
5ee847d3ac Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
Robert Watson
1e77c1056a Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
Robert Watson
eddfbb763d Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
Alexander Motin
505feb8f37 Fix infinite loop in ng_iface, that happens when packet passes out via
two different ng interfaces sequentially due to tunnelling.

PR:		kern/134557
Submitted by:	Mikolaj Golub
Approved by:	re (kensmith)
MFC after:	3 days
2009-07-01 08:08:56 +00:00
Stanislav Sedov
fe1d3f15f6 - Turn the third (islocked) argument of the knote call into flags parameter.
Introduce the new flag KNF_NOKQLOCK to allow event callers to be called
  without KQ_LOCK mtx held.
- Modify VFS knote calls to always use KNF_NOKQLOCK flag.  This is required
  for ZFS as its getattr implementation may sleep.

Approved by:	re (rwatson)
Reviewed by:	kib
MFC after:	2 weeks
2009-06-28 21:49:43 +00:00
Robert Watson
eb956cd041 Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs.  This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by:	re (kib)
MFC after:	6 weeks
2009-06-26 11:45:06 +00:00
Robert Watson
c4c96d5ea5 Update Netgraph nodes to use if_addr_rlock()/if_addr_runlock() instead
of IF_ADDR_LOCK()/IF_ADDR_UNLOCK() when iterating ifp->if_addrhead.

MFC after:	6 weeks
2009-06-26 00:49:12 +00:00
Roman Divacky
8419ef9a1e Use proper form of gnu designated initalizers. This lets
clang compile this files.

Approved by: ed (mentor)
Silence from: harti (maintainer?)
2009-06-24 12:01:10 +00:00
Bjoern A. Zeeb
5736e6fb9d After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
Alexander Motin
c1b8a9edab Mark ng_ether node hooks as HI_STACK. It is usually the last point when
netgraph may unroll the call stack, and I have found that in some cases 2K
guarantied there for i386 may be not enough for NIC driver and BPF.
2009-06-23 12:30:21 +00:00
Andrew Thompson
8f9e0ef947 Fix a typeo in the frame len function to unbreak the build, make it shorter
while I am here.
2009-06-23 06:00:31 +00:00