VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.
The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec
Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks
to provide serialization of calls into the node, which is accomplished
by markng the node as single-threaded (NGF_FORCE_WRITER).
The price we pay is that each ng_pipe instance now has its own callout
handler which polls for queued frames on each clock tick, as long as
the pipe has any frames in its internal queues. OTOH, we got rid of
the global ng_pipe mutex, so from now on multiple ng_pipe instances
can operate in parallel. This change also fixes counting of forwarded
frames when an ng_pipe node is not enforcing any packet impairments.
While here, attempt to improve adherance to style(9) throughout
otherwise mostly unreadable code.
MFC after: 3 days
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
configured on ng_eiface ifnets. The default MTU remains unchanged at
1500 bytes.
Mark ng_eiface ifnets as IFCAP_VLAN_MTU capable, so that the associated
vlan(4) ifnets may use full-sized Ethernet MTUs (1500 bytes).
MFC after: 3 days
passing through. Modifications are restricted to a subset of C language
operations on unsigned integers of 8, 16, 32 or 64 bit size.
These are: set to new value (=), addition (+=), subtraction (-=),
multiplication (*=), division (/=), negation (= -), bitwise AND (&=),
bitwise OR (|=), bitwise eXclusive OR (^=), shift left (<<=),
shift right (>>=). Several operations are all applied to a packet
sequentially in order they were specified by user.
Submitted by: Maxim Ignatenko <gelraen.ua at gmail.com>
Vadim Goncharov <vadimnuclight at tpu.ru>
Discussed with: net@
Approved by: mav (mentor)
MFC after: 1 month
socket while it is still in use.
priv->ctlsock is checked at the top of the function but without any
lock held, which means the control socket state may certainly change.
Add a similar protection to ngs_shutdown() even if a race is unlikely
to be experienced there.
Sponsored by: Sandvine Incorporated
Obtained from: Nima Misaghian @ Sandvine Incorporated
<nmisaghian at sandvine dot com>
MFC after: 10 days
from 2000 bytes to 20 Kbytes, which now matches the buffer size used for
NGM_BINARY2ASCII conversions.
The aim of this change is to allow for bigger binary structures to be
managed via netgraph ASCII messages, until we come up with an API
improvement which would get rid of such arbitrary hardcoded limits.
MFC after: 3 days
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.
MFC after: 1 month
starting from netgraph import in 1999.
netstat(8) used pointer to node as node address, oops. That didn't
work, we need the node ID in brackets to successfully address a node.
We can't look into ng_node, due to inability to include netgraph/netgraph.h
in userland code. So let the node make a hint for a userland, storing
the node ID in its private data.
MFC after: 2 weeks
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.
The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.
PR: kern/142927
Submitted by: Nikolay Denev
- use a uniform mtag format for all packets that exit and re-enter
the firewall in the middle of a rulechain. On reentry, all tags
containing reinject info are renamed to MTAG_IPFW_RULE so the
processing is simpler.
- make ipfw and dummynet use ip_len and ip_off in network format
everywhere. Conversion is done only once instead of tracking
the format in every place.
- use a macro FREE_PKT to dispose of mbufs. This eases portability.
On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.
Overall the code shrinks a bit and is hopefully more readable.
I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
ipfw-specific. This removes a dependency on ng_ipfw.h from some files.
- move many equivalent definitions of direction (IN, OUT) for
reinjected packets into ip_fw_private.h
- document the structure of the packet tags used for dummynet
and netgraph;
r201049
- merge some common code to attach/detach hooks into
a single function.
r201055
- remove some duplicated code in ip_fw_pfil. The input
and output processing uses almost exactly the same code so
there is no need to use two separate hooks.
ip_fw_pfil.o goes from 2096 to 1382 bytes of .text
r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
between host and network format more explicit
r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
localize variables so the compiler does not think they are uninitialized,
do not insist on precise allocation size if we have more than we need.
r201119
- when doing a lookup, keys must be in big endian format because
this is what the radix code expects (this fixes a bug in the
recently-introduced 'lookup' option)
No ABI changes in this commit.
MFC after: 1 week
and remove all O(N) sequences from kernel critical sections in ipfw.
In detail:
1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.
2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).
3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).
4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>
All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t
Operation costs now are as follows:
Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------
The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI
Supported by: Valeria Paoli
MFC after: 1 month