Commit Graph

1271 Commits

Author SHA1 Message Date
mav
8ca0043002 Make ng_ppp fulfill upper protocol stack layers alignment requirements
on platforms with strict alignment constraints.
This fixes kernel panics on arm and probably other architectures.

PR:		sparc64/80410
2010-03-31 20:37:44 +00:00
glebius
411453536a Remove disabled code. In 99% cases exports are send to ng_ksocket(4), which
already forces queued mode, so what was suggested in disabled code is already
done.
2010-03-25 10:13:21 +00:00
glebius
e93d329696 Now fix functionality of 'netstat -f netgraph' that hasn't worked
starting from netgraph import in 1999.

netstat(8) used pointer to node as node address, oops. That didn't
work, we need the node ID in brackets to successfully address a node.
We can't look into ng_node, due to inability to include netgraph/netgraph.h
in userland code. So let the node make a hint for a userland, storing
the node ID in its private data.

MFC after:	2 weeks
2010-03-12 15:04:59 +00:00
glebius
aad46e823b Fix 'netstat -f netgraph', which I had broken in r163463 ling time
ago in 2006. This linked list is actually needed for userland.

PR:		kern/140446
Submitted by:	Adrian Steinmann <ast marabu.ch>
2010-03-12 14:51:42 +00:00
thompsa
5056e27c2d Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR:		kern/142927
Submitted by:	Nikolay Denev
2010-01-18 20:34:00 +00:00
fjoe
698b013832 Send link state change control messages to "orphans" hook as well.
MFC after:	1 week
2010-01-09 19:03:48 +00:00
luigi
2d0bc470bd ip_var.h now needs to be before ip_fw_private.h 2010-01-07 14:23:19 +00:00
luigi
40024ff7c3 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
antoine
bfd388c026 (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
luigi
b41c473d90 bring the NGM_IPFW_COOKIE back into ng_ipfw.h, libnetgraph expects
to find it there. Unfortunately this reintroduces the dependency
on ip_fw_pfil.c
2009-12-28 12:29:13 +00:00
luigi
483862a5a2 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
luigi
2043aec456 merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

 1. introduce a IPFW_UH_LOCK to arbitrate requests from
     the upper half of the kernel. Some things, such as 'ipfw show',
     can be done holding this lock in read mode, whereas insert and
     delete require IPFW_UH_WLOCK.

  2. introduce a mapping structure to keep rules together. This replaces
     the 'next' chain currently used in ipfw rules. At the moment
     the map is a simple array (sorted by rule number and then rule_id),
     so we can find a rule quickly instead of having to scan the list.
     This reduces many expensive lookups from O(N) to O(log N).

  3. when an expensive operation (such as insert or delete) is done
     by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
     without blocking the bottom half of the kernel, then acquire
     IPFW_WLOCK and quickly update pointers to the map and related info.
     After dropping IPFW_LOCK we can then continue the cleanup protected
     by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
     is only blocked for O(1).

  4. do not pass pointers to rules through dummynet, netgraph, divert etc,
     but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
     We validate the slot index (in the array of #2) with chain_id,
     and if successful do a O(1) dereference; otherwise, we can find
     the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

  Function				Old	Now	  Planned
-------------------------------------------------------------------
  + skipto X, non cached		O(N)	O(log N)
  + skipto X, cached			O(1)	O(1)
XXX dynamic rule lookup			O(1)	O(log N)  O(1)
  + skipto tablearg			O(N)	O(1)
  + reinject, non cached		O(N)	O(log N)
  + reinject, cached			O(1)	O(1)
  + kernel blocked during setsockopt()	O(N)	O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after:	1 month
2009-12-22 19:01:47 +00:00
luigi
38b08bc779 add ip_fw_private.h to ng_ipfw.c, forgotten in previous commit;
comment out remove ip_fw.h from ng_bridge.c, as it seems unused.

MFC after:	1 month
2009-12-15 18:33:12 +00:00
jhb
892fe839c4 Take a step towards removing if_watchdog/if_timer. Don't explicitly set
if_watchdog/if_timer to NULL/0 when initializing an ifnet.  if_alloc()
sets those members to NULL/0 already.
2009-11-06 14:55:01 +00:00
ru
e866365bb9 Spell DIAGNOSTIC correctly. 2009-10-24 18:49:17 +00:00
julian
79c1f884ef Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after:	2 months
2009-10-11 05:59:43 +00:00
emax
9c5bffc4a5 Get those pesky RFCOMM RPM data bits right. This is likely a noop.
MFC after:	1 month
2009-09-10 23:30:13 +00:00
rwatson
ef8d755d4d Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock.  Either can be held to stablize the lists and indexes, but both
are required to write.  This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions.  As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by:	bz, julian
MFC after:	3 days
2009-08-23 20:40:19 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
rwatson
b3be1c6e3b Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
  occur each time a network stack is instantiated and destroyed.  In the
  !VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
  For the VIMAGE case, we instead use SYSINIT's to track their order and
  properties on registration, using them for each vnet when created/
  destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
  previously, as well as its dependency scheme: we now just use the
  SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
  they want init functions to be called for each virtual network stack
  rather than just once at boot, compiling down to DOMAIN_SET() in the
  non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
  of modinfo or DOMAIN_SET() for init/uninit events.  In some cases,
  convert modular components from using modevent to using sysinit (where
  appropriate).  In some cases, do minor rejuggling of SYSINIT ordering
  to make room for or better manage events.

Portions submitted by:	jhb (VNET_SYSINIT), bz (cleanup)
Discussed with:		jhb, bz, julian, zec
Reviewed by:		bz
Approved by:		re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
rwatson
6955067932 Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use).  Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions.  Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by:	bz
Approved by:	re (kib)
2009-07-19 14:20:53 +00:00
rwatson
88f8de4d40 Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used.  Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with:	bz, julian
Reviewed by:	bz
Approved by:	re (kensmith, kib)
2009-07-16 21:13:04 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
mav
7efd755635 Fix infinite loop in ng_iface, that happens when packet passes out via
two different ng interfaces sequentially due to tunnelling.

PR:		kern/134557
Submitted by:	Mikolaj Golub
Approved by:	re (kensmith)
MFC after:	3 days
2009-07-01 08:08:56 +00:00
stas
c61e1d6988 - Turn the third (islocked) argument of the knote call into flags parameter.
Introduce the new flag KNF_NOKQLOCK to allow event callers to be called
  without KQ_LOCK mtx held.
- Modify VFS knote calls to always use KNF_NOKQLOCK flag.  This is required
  for ZFS as its getattr implementation may sleep.

Approved by:	re (rwatson)
Reviewed by:	kib
MFC after:	2 weeks
2009-06-28 21:49:43 +00:00
rwatson
be5740a255 Use if_maddr_rlock()/if_maddr_runlock() rather than IF_ADDR_LOCK()/
IF_ADDR_UNLOCK() across network device drivers when accessing the
per-interface multicast address list, if_multiaddrs.  This will
allow us to change the locking strategy without affecting our driver
programming interface or binary interface.

For two wireless drivers, remove unnecessary locking, since they
don't actually access the multicast address list.

Approved by:	re (kib)
MFC after:	6 weeks
2009-06-26 11:45:06 +00:00
rwatson
b12e610687 Update Netgraph nodes to use if_addr_rlock()/if_addr_runlock() instead
of IF_ADDR_LOCK()/IF_ADDR_UNLOCK() when iterating ifp->if_addrhead.

MFC after:	6 weeks
2009-06-26 00:49:12 +00:00
rdivacky
841360d91c Use proper form of gnu designated initalizers. This lets
clang compile this files.

Approved by: ed (mentor)
Silence from: harti (maintainer?)
2009-06-24 12:01:10 +00:00
bz
0808d0b1a6 After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
mav
d3c42edbcc Mark ng_ether node hooks as HI_STACK. It is usually the last point when
netgraph may unroll the call stack, and I have found that in some cases 2K
guarantied there for i386 may be not enough for NIC driver and BPF.
2009-06-23 12:30:21 +00:00
thompsa
30004d4d8e Fix a typeo in the frame len function to unbreak the build, make it shorter
while I am here.
2009-06-23 06:00:31 +00:00
thompsa
74c6c20b93 - Make struct usb_xfer opaque so that drivers can not access the internals
- Reduce the number of headers needed for a usb driver, the common case is just   usb.h and usbdi.h
2009-06-23 02:19:59 +00:00
thompsa
06303d491a s/usb2_/usb_|usbd_/ on all function names for the USB stack. 2009-06-15 01:02:43 +00:00
zec
0867d9f94a Assign ng_eiface nodes a netgraph name on instantiation, in a way which
is consistent with the current behavior of ng_iface, i.e. borrow the
same naming code from ng_iface.c.

Approved by:	julian (mentor)
2009-06-12 09:20:31 +00:00
zec
6448f34585 Introduce a mechanism for detecting calls from outbound path of the
network stack when reentering the inbound path from netgraph, and
force queueing of mbufs at the outbound netgraph node.

The mechanism relies on two components.  First, in netgraph nodes
where outbound path of the network stack calls into netgraph, the
current thread has to be appropriately marked using the new
NG_OUTBOUND_THREAD_REF() macro before proceeding to call further
into the netgraph topology, and unmarked using the
NG_OUTBOUND_THREAD_UNREF() macro before returning to the caller.
Second, netgraph nodes which can potentially reenter the network
stack in the inbound path have to mark their inbound hooks using
NG_HOOK_SET_TO_INBOUND() macro.  The netgraph framework will then
detect when there is a danger of a call graph looping back from
outbound to inbound path via netgraph, and defer handing off the
mbufs to the "inbound" node to a worker thread with a clean stack.

In this first pass only the most obvious netgraph nodes have been
updated to ensure no outbound to inbound calls can occur.  Nodes
such as ng_ipfw, ng_gif etc. should be further examined whether a
potential for outbound to inbound call looping exists.

This commit changes the layout of struct thread, but due to
__FreeBSD_version number shortage a version bump has been omitted
at this time, nevertheless kernel and modules have to be rebuilt.

Reviewed by:	julian, rwatson, bz
Approved by:	julian (mentor)
2009-06-11 16:50:49 +00:00
oleg
1980405dfd Close long existed race with net.inet.ip.fw.one_pass = 0:
If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc)
it carries pointer to matching ipfw rule. If this packet then reinjected back
to ipfw, ruleset processing starts from that rule. If rule was deleted
meanwhile, due to existed race condition panic was possible (as well as
other odd effects like parsing rules in 'reap list').

P.S. this commit changes ABI so userland ipfw related binaries should be
recompiled.

MFC after:	1 month
Tested by:	Mikolaj Golub
2009-06-09 21:27:11 +00:00
imp
aeed4d8561 World now builds without these defines, so eliminate them.
Approved by:	julian@
2009-06-09 07:07:20 +00:00
bz
b7ff2bdc20 After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
zec
8b1f38241a Introduce an infrastructure for dismantling vnet instances.
Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state.  The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers.  Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels.  Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by:	bz, julian
Approved by:	rwatson, kib (re), julian (mentor)
2009-06-08 17:15:40 +00:00
jhb
a1af9ecca4 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
thompsa
44c17bdf07 s/usb2_/usb_/ on all typedefs for the USB stack. 2009-05-29 18:46:57 +00:00
thompsa
af6fb4f3d2 s/usb2_/usb_/ on all C structs for the USB stack. 2009-05-28 17:36:36 +00:00
thompsa
9263872892 Hook ubt and ubtbcmfw back up to the build. 2009-05-27 16:43:40 +00:00
thompsa
f5ce731edd move ng_ubt_var.h back to its original place 2009-05-27 16:34:08 +00:00
thompsa
97718a5597 move ng_ubt.c back to its original place 2009-05-27 16:33:08 +00:00
thompsa
b60849b714 move ubtbcmfw.c back to its original place 2009-05-27 16:32:05 +00:00
thompsa
77ea8140da Delete the bluetooth drivers for the old usb stack. 2009-05-27 16:29:56 +00:00
mav
41798f5f08 Fix copy-paste bug in NGM_NETFLOW_SETCONFIG argument size verification.
PR:		kern/134220
Submitted by:	Eugene Mychlo
MFC after:	1 week
2009-05-13 02:26:34 +00:00
zec
0be321084a Unbreak LINT build, caused by a change in struct ng_node layout introduced
with r191816, which become uncovered only with NETGRAPH_DEBUG defined.

NOT approved by mentor (julian) due to emergency.
2009-05-05 16:26:06 +00:00
zec
d78a1b1a82 Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one.  The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE().  Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc.  Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by:	julian (mentor)
2009-05-05 10:56:12 +00:00
zec
c269cc664e In preparation to make options VIMAGE operational, where needed,
initialize / release netgraph related state in iattach() / idetach()
functions called via the vnet module registration / initialization
framework, instead of initialization / cleanups being done in
mod_event handlers.

While here, introduce a crude hack aimed at preventing ng_ether to
autoattach to ng_eiface ifnets, which are also netgraph nodes already.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-04-26 07:14:50 +00:00
rwatson
80e4437a3b Lock the interface address list while building replies to
NGM_CISCO_COOKIE messages in ng_iface.

MFC after:	2 weeks
2009-04-19 22:05:39 +00:00
rwatson
8a092572b5 Lock interface address list when building a reply to NGM_EIFACE_GET_IFADDRS
messages in ng_eiface.

MFC after:	2 weeks
2009-04-19 22:04:29 +00:00
ed
8dc23b5452 Switch ubtbcmfw(4) to use si_drv1 instead of storing the unit number.
The unit number is still used to store the type of the device node.

Approved by:	emax
2009-04-17 22:13:41 +00:00
kmacy
24b38efdce Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
ed
7cfe6a40d7 Make Netgraph compile with Clang.
Clang disallows structs with variable length arrays to be nested inside
other structs, because this is in violation with ISO C99. Even though we
can keep bugging the LLVM folks about this issue, we'd better just fix
our code to not do this. This code seems to be the only code in the
entire source tree that does this.

I haven't tested this patch by using the kernel modules in question, but
Diane Bruce and I have compared disassembled versions of these kernel
modules. We would have expected them to be exactly the same, but due to
randomness in the register allocator and reordering of instructions,
there were some minor differences.

Approved by:	julian
2009-03-03 18:47:33 +00:00
ed
322413c46c Add memmove() to the kernel, making the kernel compile with Clang.
When copying big structures, LLVM generates calls to memmove(), because
it may not be able to figure out whether structures overlap. This caused
linker errors to occur. memmove() is now implemented using bcopy().
Ideally it would be the other way around, but that can be solved in the
future. On ARM we don't do add anything, because it already has
memmove().

Discussed on:	arch@
Reviewed by:	rdivacky
2009-02-28 16:21:25 +00:00
bz
df2be82cec For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
emax
b3c91fe7cc Update comment. soalloc() is no longer performing M_WAITOK memory allocations.
Submitted by:	ru
MFC after:	3 days
2009-02-10 20:27:05 +00:00
emax
9bddc26cc8 Allow unprivileged users to run l2ping(8).
MFC after:	1 month
2009-02-04 22:44:09 +00:00
mav
6191153d95 Check for infinite recursion possible on some broken PPTP/L2TP/... VPN setups.
Mark packets with mbuf_tag on first interface passage and drop on second.

PR:		ports/129625, ports/125303,
MFC after:	2 weeks
2009-01-20 22:26:09 +00:00
emax
bf3d7c639d Properly return error code to the caller. This should fix the following
panic in ng_l2cap(4).

panic: ng_l2cap_l2ca_con_req: ubt0l2cap - could not find connection!

While i'm here get rid of few goto's.

MFC after:	1 week
2009-01-19 22:06:35 +00:00
mav
2ff507514a If source mbuf chain consists of only one mbuf, use it directly as source
buffer to avoid extra copying.
2009-01-18 21:09:34 +00:00
mav
74ed22786b Use m_unshare()+m_copyback() instead of m_freem()+m_devget() to keep
original mbuf chain headers. It can be less efficient in some cases, but it
looks better then mess of copying headers into the nonempty chain.
2009-01-18 19:25:36 +00:00
mav
facc5c0f61 Remove strict limitation on minimal multilink MRRU. RFC claims that MRRU
of 1500 must be supported, but allows smaller values to be negotiated.
Enforce specified MRRU for outgoing frames.

MFC after:	2 weeks
2009-01-18 12:03:43 +00:00
mav
8c23321acb Mark ng_vjc node as FORCE_WRITER to protect slcompress state.
I think it can be the reason of livelock in netgraph reported by some
mpd users.

MFC after:	3 days
2009-01-08 17:51:15 +00:00
julian
eee0b0af37 shave about 7% off the overhead of ng_ether by using per-hook
receive data methods.
2008-12-25 09:02:55 +00:00
julian
36efa471ac Add a trivial node to reflect ethernet frames to whence they came.
MFC after: 1 month
2008-12-25 00:01:29 +00:00
emax
110365dd9e Change message severity level from WARN to INFO. This should reduce
amount of messages sent to syslog

MFC after:	1 week
2008-12-24 00:00:52 +00:00
mav
1290464fbc Unroll two loops of SHA1Update(). 60 bytes of static memory is not a price. 2008-12-16 19:15:31 +00:00
qingli
ec826ad5c7 This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
mav
3888dd07b1 To avoid one doubtless netgraph SMP scalability limitation point, switch
node queues processing from single swi:net thread to several specialized
threads.

Reviewed by:	julian
Tested with:	Netperf Cluster
2008-12-14 20:15:30 +00:00
mav
79df8e4599 Revert rev. 183277:
Remove ng_rmnode_flags() function.
ng_rmnode_self() was made to be called only while having node locked.
When node is properly locked, any function call sent to it will always be
queued. So turning ng_rmnode_self() into the ng_rmnode_flags() is not just
meaningless, but incorrent, as it violates node locking when called outside.

No objections:	julian, thompsa
2008-12-13 22:26:24 +00:00
mav
f83f5582ff Remove node shutdown on tty close. This could be easily done by user-level
while it's present implementation with ng_rmnode_flags() is at least
incorrect.
2008-12-13 22:05:46 +00:00
mav
37aff7daa7 Change ttyhook_register() second argument from thread to process pointer.
Thread was not really needed there, while previous ng_tty implementation
that used thread pointer had locking issues (using sx while holding mutex).
2008-12-13 21:17:46 +00:00
zec
7b573d1496 Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out.  Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs.  This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps.  PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw.  Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at:	devsummit Strassburg
Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-12-10 23:12:39 +00:00
mav
a689f48383 Carefully handle memory errors to keep peers compression/encryption state
consistent. There are some cases reported where peers fatally getting out
of sync without any visible reason. I hope this solve the problem.
2008-12-06 23:00:48 +00:00
bz
604d89458a Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by:	brooks, gnn, des, zec, imp
Sponsored by:	The FreeBSD Foundation
2008-12-02 21:37:28 +00:00
zec
7ecd715d48 Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by:	bz, julian
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-11-28 23:30:51 +00:00
mav
97bb9372d9 Remove unused variable.
Found with:     Coverity Prevent(tm)
CID:            3682
2008-11-22 16:55:55 +00:00
mav
f1487e0920 Fix typo. Clear session stats instead of config and part of stats.
Found with:	Coverity Prevent(tm)
CID:		2472
2008-11-22 16:40:12 +00:00
mav
5d12d947b3 Remove unneeded NULL check. At first msg can't be null here and and at second
NG_FREE_MSG() also checks it.

Found with:	Coverity Prevent(tm)
2008-11-22 16:03:18 +00:00
kmacy
2b4df7158b convert calls to IFQ_HANDOFF to if_transmit 2008-11-22 07:35:45 +00:00
mav
ef99a9576f Don't use curthread to resolve file descriptor. Request may be queued, so
thread will be different. Instead require sender to send process ID
together with file descriptor.
2008-11-08 06:25:57 +00:00
mav
ddff5a26cd Assign new cookie to the node to reflect API change.
All applications will have to be adapted and rebuilt.
2008-11-08 02:05:41 +00:00
mav
5eabda5fe7 Don't assign completely meaningless name to the node on creation.
As soon as node is created from the netgraph side now, it can be found
without using this. Allow application to assign whatever name it want later.
2008-11-07 19:51:07 +00:00
des
a1e1ad22e0 Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.
2008-10-23 20:26:15 +00:00
des
66f807ed8b Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
mav
ca86878f8e Add ability to generate egress netflow instead or in addition to ingress.
Use mbuf tagging for accounted packets to not account packets twice when
both ingress and egress netflow enabled.
To keep compatibility new "setconfig" message added to control new
functionality. By default node works as before, doing only ingress
accounting without using mbuf tags.

Reviewed by:	glebius
2008-10-08 10:37:07 +00:00
emax
c790c4de64 Abort transfers on all pipes before closing them. This fixes the crash
when Bluetooth USB device is pulled out without stopping the stack first.

Submitted by:	Vladimir Grebenschikov vova at fbsd dot ru
MFC after:	1 week
2008-10-03 22:40:42 +00:00
thompsa
a8d2a9f2d1 Update ng_tty for MPSAFE TTY.
This changes from a line discipline to the tty_hooks mechanism. Data will come
in directly via rint_bypass and sent to the peer node in a single mbuf.

As line disciplines are no longer used a new netgraph command called
NGM_TTY_SET_TTY is used to attach the tty. This takes a pointer to to the open
file descriptor of the tty and registers the tty hooks. When the tty disappears
the node will shutdown.

Thanks to:	ed
Sponsored by:	Hobnob, Inc
2008-10-03 05:14:54 +00:00
zec
8797d4caec Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
ed
4efdef565f Replace all calls to minor() with dev2unit().
After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by:	kib
2008-09-27 08:51:18 +00:00
ed
4212d51a7d Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
thompsa
a689a9e914 Add ng_rmnode_flags() so the caller can pass NG_QUEUE and have the node
destroyed asynchronously due to locking or other constraints.

Reviewed by:	julian
2008-09-22 19:28:18 +00:00
zec
c888ce3315 Fix error message content.
Approved by:	julian (mentor)
MFC after:	3 days
2008-09-21 07:33:33 +00:00
mav
ac69cb023f We can't implicitly trust the hook on NGQF_FN/NGQF_FN2 processing in
ng_apply_item(). There are possible (and I have got one) use-after-free
class panics because of it.

If hook is specified, require it to be valid at the apply time. The only
exceptions are the internal ng_con_part2(), ng_con_part3() and
ng_rmhook_part2() functions which are specially made to work with invalid
hooks.
2008-09-13 09:17:02 +00:00
julian
89264d09f2 Add Marko's pipe node.
This allows one to do flow modulation similar to dummynet
between arbitrary nodes.
2008-09-03 18:17:45 +00:00
jkim
8fb51c23ad Make sure BPF program is not bigger than set maximum (net.bpf.maxinsns). 2008-08-29 15:49:40 +00:00
julian
0592958505 A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.
2008-08-20 01:05:56 +00:00
bz
1021d43b56 Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
emax
5d25ad7852 Implement ratelimiting for debug messages. For now, allow at most
one message per second. In the future might add a sysctl knob for
each socket family to fine tune this.

MFC after:	1 week
2008-08-01 00:36:43 +00:00
emax
dbb1414312 Increase maximum input queue size limit for raw Bluetooth HCI sockets.
MFC after:	3 days
2008-08-01 00:16:40 +00:00
emax
8ea609367b Fix locking bug, i.e. lock "wildcard" matched pcb before return. 2008-08-01 00:13:32 +00:00
emax
bb4c6de0cf Introduce support for Bluetooth SCO sockets. This is based on older
code that was revisted.

MFC after:	3 months
2008-07-30 22:41:23 +00:00
emax
ab370dee42 Simplify ubt_isoc_in_complete2(). Also should fix off by 1 bug.
MFC after:	3 months
2008-07-29 00:17:53 +00:00
mav
a5ebe70ed0 Don't use memcpy() to copy several bytes.
Store IDs is host order. It is not so important to bloat code for it.
Combine m_adj() and M_PREPEND() into single M_PREPEND().
2008-07-28 22:22:38 +00:00
trhodes
b82c051e92 Fill in the string portion of the bluetooth stack version sysctl.
Approved by:	emax
2008-07-14 13:45:05 +00:00
emax
797b3cb554 Dust off old code for support of USB isochronous transfers.
USB isochronous transfer support is required for Bluetooth SCO.
While i'm here change u_int to uint and update TODO.
This should produce no visible changes unless the device is
broken (or really old).

MFC after:	3 months
2008-07-11 17:13:43 +00:00
emax
b07b40d7fe Get in some basic infrastructure for Bluetooth SCO support.
MFC after:	3 months
2008-07-10 00:15:29 +00:00
gonzo
e29b848dd4 Back out r180370. It was not discussed with subsystem maintainers. 2008-07-08 20:19:43 +00:00
gonzo
2d2c066986 Queue decapsulated packed instead of performing direct dispatch. Some
execution pathes might hit stack limit under certain circumstances
(e.g. ng_mppc).

PR:                     kern/125314
Reported by:            Illya Klymov <ilia dot klimov at gmail dot com>
2008-07-08 18:21:44 +00:00
rwatson
482bfeab47 Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred).  Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.

Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired.  This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.

It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.

Reviewed by:	bz
MFC after:	1 month
X-MFC note:	We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
		but the rest can go back.
2008-07-04 00:21:38 +00:00
gnn
2b9abdf860 Make it simpler to build netgraph modules outside of the kernel source
tree.  This change follows similar ones in the device tree.

MFC after:	2 weeks
2008-06-24 18:49:49 +00:00
mav
4d5dd96154 Pass really available buffer size to libalias instead of MCLBYTES constant.
MCLBYTES constant were used with believe that m_megapullup() always moves
date into a fresh cluster that may become not so.
2008-06-01 15:13:32 +00:00
rwatson
a3623cb733 Remove netatm from HEAD as it is not MPSAFE and relies on the now removed
NET_NEEDS_GIANT.  netatm has been disconnected from the build for ten
months in HEAD/RELENG_7.  Specifics:

- netatm include files
- netatm command line management tools
- libatm
- ATM parts in rescue and sysinstall
- sample configuration files and documents
- kernel support as a module or in NOTES
- netgraph wrapper nodes for netatm
- ctags data for netatm.
- netatm-specific device drivers.

MFC after:	3 weeks
Reviewed by:	bz
Discussed with:	bms, bz, harti
2008-05-25 22:11:40 +00:00
julian
1dfc5c98a4 Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

  One thing where FreeBSD has been falling behind, and which by chance I
  have some time to work on is "policy based routing", which allows
  different
  packet streams to be routed by more than just the destination address.

  Constraints:
  ------------

  I want to make some form of this available in the 6.x tree
  (and by extension 7.x) , but FreeBSD in general needs it so I might as
  well do it in -current and back port the portions I need.

  One of the ways that this can be done is to have the ability to
  instantiate multiple kernel routing tables (which I will now
  refer to as "Forwarding Information Bases" or "FIBs" for political
  correctness reasons). Which FIB a particular packet uses to make
  the next hop decision can be decided by a number of mechanisms.
  The policies these mechanisms implement are the "Policies" referred
  to in "Policy based routing".

  One of the constraints I have if I try to back port this work to
  6.x is that it must be implemented as a EXTENSION to the existing
  ABIs in 6.x so that third party applications do not need to be
  recompiled in timespan of the branch.

  This first version will not have some of the bells and whistles that
  will come with later versions. It will, for example, be limited to 16
  tables in the first commit.
  Implementation method, Compatible version. (part 1)
  -------------------------------
  For this reason I have implemented a "sufficient subset" of a
  multiple routing table solution in Perforce, and back-ported it
  to 6.x. (also in Perforce though not  always caught up with what I
  have done in -current/P4). The subset allows a number of FIBs
  to be defined at compile time (8 is sufficient for my purposes in 6.x)
  and implements the changes needed to allow IPV4 to use them. I have not
  done the changes for ipv6 simply because I do not need it, and I do not
  have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

  Other protocol families are left untouched and should there be
  users with proprietary protocol families, they should continue to work
  and be oblivious to the existence of the extra FIBs.

  To understand how this is done, one must know that the current FIB
  code starts everything off with a single dimensional array of
  pointers to FIB head structures (One per protocol family), each of
  which in turn points to the trie of routes available to that family.

  The basic change in the ABI compatible version of the change is to
  extent that array to be a 2 dimensional array, so that
  instead of protocol family X looking at rt_tables[X] for the
  table it needs, it looks at rt_tables[Y][X] when for all
  protocol families except ipv4 Y is always 0.
  Code that is unaware of the change always just sees the first row
  of the table, which of course looks just like the one dimensional
  array that existed before.

  The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
  are all maintained, but refer only to the first row of the array,
  so that existing callers in proprietary protocols can continue to
  do the "right thing".
  Some new entry points are added, for the exclusive use of ipv4 code
  called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
  which have an extra argument which refers the code to the correct row.

  In addition, there are some new entry points (currently called
  rtalloc_fib() and friends) that check the Address family being
  looked up and call either rtalloc() (and friends) if the protocol
  is not IPv4 forcing the action to row 0 or to the appropriate row
  if it IS IPv4 (and that info is available). These are for calling
  from code that is not specific to any particular protocol. The way
  these are implemented would change in the non ABI preserving code
  to be added later.

  One feature of the first version of the code is that for ipv4,
  the interface routes show up automatically on all the FIBs, so
  that no matter what FIB you select you always have the basic
  direct attached hosts available to you. (rtinit() does this
  automatically).

  You CAN delete an interface route from one FIB should you want
  to but by default it's there. ARP information is also available
  in each FIB. It's assumed that the same machine would have the
  same MAC address, regardless of which FIB you are using to get
  to it.

  This brings us as to how the correct FIB is selected for an outgoing
  IPV4 packet.

  Firstly, all packets have a FIB associated with them. if nothing
  has been done to change it, it will be FIB 0. The FIB is changed
  in the following ways.

  Packets fall into one of a number of classes.

  1/ locally generated packets, coming from a socket/PCB.
     Such packets select a FIB from a number associated with the
     socket/PCB. This in turn is inherited from the process,
     but can be changed by a socket option. The process in turn
     inherits it on fork. I have written a utility call setfib
     that acts a bit like nice..

         setfib -3 ping target.example.com # will use fib 3 for ping.

     It is an obvious extension to make it a property of a jail
     but I have not done so. It can be achieved by combining the setfib and
     jail commands.

  2/ packets received on an interface for forwarding.
     By default these packets would use table 0,
     (or possibly a number settable in a sysctl(not yet)).
     but prior to routing the firewall can inspect them (see below).
     (possibly in the future you may be able to associate a FIB
     with packets received on an interface..  An ifconfig arg, but not yet.)

  3/ packets inspected by a packet classifier, which can arbitrarily
     associate a fib with it on a packet by packet basis.
     A fib assigned to a packet by a packet classifier
     (such as ipfw) would over-ride a fib associated by
     a more default source. (such as cases 1 or 2).

  4/ a tcp listen socket associated with a fib will generate
     accept sockets that are associated with that same fib.

  5/ Packets generated in response to some other packet (e.g. reset
     or icmp packets). These should use the FIB associated with the
     packet being reponded to.

  6/ Packets generated during encapsulation.
     gif, tun and other tunnel interfaces will encapsulate using the FIB
     that was in effect withthe proces that set up the tunnel.
     thus setfib 1 ifconfig gif0 [tunnel instructions]
     will set the fib for the tunnel to use to be fib 1.

  Routing messages would be associated with their
  process, and thus select one FIB or another.
  messages from the kernel would be associated with the fib they
  refer to and would only be received by a routing socket associated
  with that fib. (not yet implemented)

  In addition Netstat has been edited to be able to cope with the
  fact that the array is now 2 dimensional. (It looks in system
  memory using libkvm (!)). Old versions of netstat see only the first FIB.

  In addition two sysctls are added to give:
  a) the number of FIBs compiled in (active)
  b) the default FIB of the calling process.

  Early testing experience:
  -------------------------

  Basically our (IronPort's) appliance does this functionality already
  using ipfw fwd but that method has some drawbacks.

  For example,
  It can't fully simulate a routing table because it can't influence the
  socket's choice of local address when a connect() is done.

  Testing during the generating of these changes has been
  remarkably smooth so far. Multiple tables have co-existed
  with no notable side effects, and packets have been routes
  accordingly.

  ipfw has grown 2 new keywords:

  setfib N ip from anay to any
  count ip from any to any fib N

  In pf there seems to be a requirement to be able to give symbolic names to the
  fibs but I do not have that capacity. I am not sure if it is required.

  SCTP has interestingly enough built in support for this, called VRFs
  in Cisco parlance. it will be interesting to see how that handles it
  when it suddenly actually does something.

  Where to next:
  --------------------

  After committing the ABI compatible version and MFCing it, I'd
  like to proceed in a forward direction in -current. this will
  result in some roto-tilling in the routing code.

  Firstly: the current code's idea of having a separate tree per
  protocol family, all of the same format, and pointed to by the
  1 dimensional array is a bit silly. Especially when one considers that
  there is code that makes assumptions about every protocol having the
  same internal structures there. Some protocols don't WANT that
  sort of structure. (for example the whole idea of a netmask is foreign
  to appletalk). This needs to be made opaque to the external code.

  My suggested first change is to add routing method pointers to the
  'domain' structure, along with information pointing the data.
  instead of having an array of pointers to uniform structures,
  there would be an array pointing to the 'domain' structures
  for each protocol address domain (protocol family),
  and the methods this reached would be called. The methods would have
  an argument that gives FIB number, but the protocol would be free
  to ignore it.

  When the ABI can be changed it raises the possibilty of the
  addition of a fib entry into the "struct route". Currently,
  the structure contains the sockaddr of the desination, and the resulting
  fib entry. To make this work fully, one could add a fib number
  so that given an address and a fib, one can find the third element, the
  fib entry.

  Interaction with the ARP layer/ LL layer would need to be
  revisited as well. Qing Li has been working on this already.

  This work was sponsored by Ironport Systems/Cisco

Reviewed by:    several including rwatson, bz and mlair (parts each)
Obtained from:  Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
mav
d2bb3d9ce8 ng_address_hook() microoptimization. Use local variables as they should be.
It helps compiller to avoid some extra memory accesses.
2008-04-19 05:30:49 +00:00
mav
0e42cfa396 Use separate UMA zone for data items allocation. It is a partial
rev. 1.149 rework.
It allows to save several percents of CPU time on SMP by using UMA's
internal per-CPU allocation limits instead of own global variable
each time updated with atomics.

Tested with:    Netperf cluster
2008-04-16 19:52:29 +00:00
kris
267b42a43b Replace callout_init(..., 1) with callout_init(..., CALLOUT_MPSAFE) for
better grep-compliance and to standardize with the rest of the kernel.

Reviewed by:	       jhb
MFC after:	       1 week
2008-04-16 16:47:14 +00:00
mav
9ed14d3ac1 Several changes breaking netgraph module ABI collected together:
- reorder structures fields (XX_refs) a bit to group fields modified
   same time together. According to my tests it gives up to 10%
   SMP performance benefit on real workload due to reduced inter-CPU
   cache trashing.
 - change q_flags from long to int as long is not really needed there and
   it's usage with atomics is argued by some people.
 - move NGF_WORKQ flag into the separate field q_flags2 as it protected by
   queue mutex instead of node writer protection used by the rest of flags.
 - move nd_work queue entry to ng_queue structure to which it is more
   related and make it STAILQ instead of TAILQ as now it is a classic FIFO.
 - remove q_node pointer from ng_queue structure as it is not really needed.
 - reimplement item queue using STAILQ instead of own equal implementation.
   As soon as BT subsystem has own item queues using ng_item.el_next update
   it also.
 - change depth field in ng_item from uintptr_t to u_int. It was made
   uintptr_t to keep ABI compatibility.

Reviewed by:	julian, emax
Tested with:	Netperf cluster
2008-04-15 21:15:32 +00:00
mav
07c143122a Add memory barriers to the node locking operations.
Add some comments.
2008-04-09 19:03:19 +00:00
mav
254de061de Rewrite node's r/w/q-lock semantics using only atomics instead of mutex
and atomics combination. Mutex is now used only for queue protection.
Also avoid unneded extra swi scheduling calls.
2008-04-06 15:26:32 +00:00
mav
19447c2a3e - Account all node stats at the shape mode.
- Do not check destination hook presence, it will be done by netgraph.
- Use u_int instead of int in some places to simplify type conversions.
- Use NG_SEND_DATA_ONLY() macro instead of selfmade equivalent.
2008-03-30 07:53:51 +00:00
mav
3847b38de2 Use new atomic_fetchadd() primitive instead of looping atomic_cmpset(). 2008-03-30 00:27:48 +00:00
mav
dd8be7d6b9 There is no need to erase hook->hk_node before freing hook. 2008-03-29 22:53:58 +00:00
mav
586e0246eb Remove ng_setisr() call from ng_dequeue(). It is useless as we any way
will never exit ngintr(), while there is some ready requests on the queue.
It was made years ago with hope of parallel queue processing by several
net threads. But even if we have several threads sometimes, we have no
rights to process queue in parallel as it will break original requests
serialization that is critically important for some setups.
2008-03-27 23:02:30 +00:00
mav
fd0bef772c Switch from timeval to bintime, to use 1/(2^20) of seconds instead of
microseconds. It allows to use bit shifts instead of some heavy 64bit
mul/div math operations.
2008-03-27 20:04:20 +00:00
mav
5b9ac353f2 Some minor code and math optimizations. 2008-03-26 21:19:03 +00:00
mav
9af7fc155d Rewrite node to support multiple hooks, alike to ng_l2tp, to use one pair
of pptpgre and ksocket nodes for all calls between two peers. This patch
modifies node's API by adding new "session_%04x" hook names support, while
keeping backward compatibility.

Together with appropriate user-level support (by latest mpd5) it gives
huge performance benefits for case of multiple active calls between
two peers because of avoiding data duplication and extra socket processing.
On my benchmarks I have got more then 10 times speedup for the 200
simultaneous PPTP calls between two peers.
In conclusion, it allows now to build effective "clients <=> PAC <=> PNS"
setups.
2008-03-24 22:55:22 +00:00
mav
b09a1d85ff Remove impossible (hk_peer == NULL) check from ng_address_hook().
Valid hook can't have NULL peer. Even invalid one can't, as it is resets to
deadhook, but not NULL.
2008-03-16 23:12:17 +00:00
mav
c3b3361aa8 Add session ID hashing to speedup incoming packets dispatch in case
of many connections working via the same tunnel. For example, in case
of full "client <-> LAC <-> LNS" setup.
2008-03-16 21:33:12 +00:00
mav
dd8463bf88 Improve apply callback error reporting:
Before this patch callback returned result of the last finished call chain.
Now it returns last nonzero result from all call chain results in this request.

As soon as this improvement gives reliable error reporting, it is now possible
to remove dirty workaround in ng_socket, made to return ENOBUFS error statuses
of request-response operations. That workaround was responsible for returning
ENOBUFS errors to completely unrelated requests working at the same time
on socket.
2008-03-11 21:58:48 +00:00
mav
be092ffb41 Addition to the previous commit. Release inproc in case of memory error. 2008-03-09 11:17:00 +00:00
mav
786b28be16 To avoid control data losses do not acknowledge recieving of control packet
if netgraph reported error while delivering to destination.
Reset 'next send' counter to the last requested by peer on ack timeout
to resend all subsequest packets after lost one again without additional hints.
2008-03-08 23:55:29 +00:00
mav
cbff2d5782 Send only one incoming notification at a time to reduce queue
trashing and improve performance.
Remove waitflag argument from ng_ksocket_incoming2(), it means nothing
as function call was queued by netgraph.
Remove node validity check, as node validity guarantied by netgraph.
Update comments.
2008-03-07 21:12:56 +00:00
mav
19469bf19f Increase default queue items allocation limit from 512 to 4096 items
to avoid terrible unpredicted effects for netgraph operation of their
exhaustion while allocating control messages.
Add separate configurable 512 items limit for data items allocation
for DoS/overload protection.

Discussed with:	julian
2008-03-05 22:12:34 +00:00
mav
3bb463bbf1 Implement 128 items node name hash for faster name search.
Increase node ID hash size from 32 to 128 items.
2008-03-04 18:22:18 +00:00
mav
7e474ff50b Fix incorrect field name. 2008-03-04 11:10:54 +00:00
mav
469edf6052 Use more compact LIST instead of TAILQ for session hash.
Add all listening hooks into LIST to simplify searches.
Use ng_findhook() instead of own equal implementation.
2008-03-03 19:36:03 +00:00
mav
0b47bb7a15 Make session ID generator to use session ID hash.
Make session ID generator thread-safe.
2008-03-02 23:26:35 +00:00
mav
354de8687f Add support for the libalias redirect functionality.
Submitted by:   Vadim Goncharov <vadim_nuclight@mail.ru>
2008-03-01 17:14:02 +00:00
mav
2d0cb9a815 Fix incorrect constant used in rev. 1.146 that broke node writer locking. 2008-02-25 21:24:53 +00:00
mav
166cb6ef20 Fix shutdown bug made by previous commit. 2008-02-24 10:13:32 +00:00
glebius
34f9d2c8a9 Use rtalloc1() instead of rtalloc_ign(). It returns a locked
rtentry. We quickly copy the fields of interest, and then
RTFREE_LOCKED(). This should be faster then lock & unlock the
rtentry twice.
2008-02-07 11:10:17 +00:00
mav
a735f997fd Do not use bcmp() to compare two bytes with constants. 2008-02-06 20:37:34 +00:00
mav
a51f95cd58 Cleanup and tune ng_snd_item() function as it is one of the
most busy netgraph functions.
Tune stack protection constants to avoid division operation.
2008-02-06 18:50:40 +00:00
mav
cd7d07a63f Prepare hooks direct pointers on setup to avoid heavy ng_findhook() calls
during operarion.
2008-02-04 19:26:53 +00:00
mav
083c0a5fcb Move all possible node logic out of the rcvdata() function
to the newhook()/disconnect().
Unify function names with other nodes.
2008-02-03 18:55:45 +00:00
mav
45f131f91d Revert previous commit.
glebius@ noticed that it was not a bug, but undocumented feature.
2008-02-03 10:30:45 +00:00
marck
92f929f0dd Fix one more grammo.
Noticed by:	ru
2008-02-02 08:41:53 +00:00
marck
e190c967dd Reword recent comment a bit. 2008-02-01 17:35:46 +00:00
mav
49092fb3fc Add comments about stack protection mechanism. 2008-02-01 11:01:15 +00:00
mav
7e0b4128cb Tune the message for better informativity.
Print the hook pointer as other functions do.
2008-02-01 07:25:06 +00:00
benno
538bebc843 Band-aid recent commit by mav by replacing a variable in a CTR statement with
the variable that appears as if it should've been there.

Pointy hat to:		mav
Not tested either by:	benno
2008-02-01 07:17:26 +00:00
mav
5af3bb221f Implement Session-ID hashing to improve receive performance scalability
for big number of concurrent sessions.
2008-01-31 22:42:37 +00:00
mav
94236d3d42 Some code reformat. 2008-01-31 10:13:04 +00:00
mav
5df3e934b9 Implement stack protection based on GET_STACK_USAGE() macro.
This fixes system panics possible with complicated netgraph setups
and allows to avoid unneded extra queueing for stack unwrapping.
2008-01-31 08:51:48 +00:00
mav
5eebdfa072 Avoid data copying when it is possible.
bpf_filter() is able to work directly on mbuf chain.
2008-01-28 22:37:17 +00:00
mav
1a411ba3c5 Run expire even without export hook connected.
PR:	kern/119839
2008-01-27 15:01:16 +00:00
mav
c2d1050fba Fix memory leak when export hook is not connected. 2008-01-27 09:22:10 +00:00
mav
2adafc5538 Remove one very strange unneded if. 2008-01-27 08:52:41 +00:00
mav
34b15a0c5e Slightly simplify code. 2008-01-27 02:04:12 +00:00
mav
afe3cc011f Improve multilink receive performance by netgraph item reuse. 2008-01-26 22:42:47 +00:00
mav
56891c11ee Improve multilink xmit performance by netgraph item reuse. 2008-01-26 22:41:14 +00:00
mav
4b87cfa8f8 Improve multilink receive performance with fragment headers preallocation. 2008-01-26 22:39:05 +00:00
mav
1a704c3175 Fix bundle xmit octets stats for packet-split operation mode. 2008-01-23 11:47:09 +00:00
jeff
ce18638805 Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
   in such a way that the ops vector is only valid after the data, type,
   and flags are valid.
 - Protect f_flag and f_count with atomic operations.
 - Remove the global list of all files and associated accounting.
 - Rewrite the unp garbage collection such that it no longer requires
   the global list of all files and instead uses a list of all unp sockets.
 - Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by:	kris, pho
2007-12-30 01:42:15 +00:00
mav
cfcf7dfc62 Add support for optional "AC-Name\Service-Name" syntax at NGM_PPPOE_CONNECT
argument. It allows ppp, mpd or any other node consumer to request
connection to specified access concentrator.

Proposed by:	Alexander A. Burylov <burylov@mail.ru>
2007-12-29 19:44:41 +00:00
mav
b82597528b Fix incorrectly placed bracket in pppoe_find_svc(). 2007-12-26 19:33:53 +00:00
mav
23a956d0c8 Remove some prehistoric never used defines. 2007-12-26 19:15:07 +00:00
rwatson
bdee30611d Add a new 'why' argument to kdb_enter(), and a set of constants to use
for that argument.  This will allow DDB to detect the broad category of
reason why the debugger has been entered, which it can use for the
purposes of deciding which DDB script to run.

Assign approximate why values to all current consumers of the
kdb_enter() interface.
2007-12-25 17:52:02 +00:00
mav
f381ac2711 Add option to set packets per second limits instead of default
bits per second ones.
2007-12-19 22:50:14 +00:00
mav
a1921d35de Increase control channel xmit queue to 128 packets.
Previous value 16 was too small for real LAC as temporal activity
spike cound easily overflow queue demanding tunnel disconnection due
to possible state inconsistency.
2007-12-12 19:04:30 +00:00
julian
4140cda7c9 Add ipv6 to ng_cisco node. ipv6 wasn't a reality when I wrote it..
Submitted by: Marko Zec
2007-11-30 23:27:39 +00:00
glebius
6cd20c281d - Merge all the ng_send_fn2* functions into one - ng_send_fn2(),
removing some copy&pasted code.
- Reduce copy and paste in ng_apply_item().
- Resurrect ng_send_fn() as a valid symbol, not a define.

Reviewed by:	mav, julian
2007-11-14 11:25:58 +00:00
emax
ff226f6ee0 Fix locking issue in ng_btsocket_l2cap_ctloutput()
Submitted by:	Heiko Wundram (Beenic) < wundram at beenic dot net >
MFC after:	3 days
2007-10-31 16:17:20 +00:00
emax
0cf18b2c7c Allow RFCOMM servers to bind to a ''wildcard'' RFCOMM channel
zero (0). Actual RFCOMM channel will be assigned after listen(2)
call is done on a RFCOMM socket bound to a ''wildcard'' RFCOMM
channel zero (0).

Address locking issues in ng_btsocket_rfcomm_bind()

Submitted by:	Heiko Wundram (Beenic) < wundram at beenic dot net >
MFC after:	1 week
2007-10-29 19:06:47 +00:00
mav
a7fd7284ec Minor debug message fix. 2007-10-28 18:05:59 +00:00
ru
8de25952d5 Fix build with NETGRAPH_DEBUG. 2007-10-19 20:09:58 +00:00
mav
5aea982161 Implement new apply callback mechanism to handle item forwarding.
When item forwarded refence counter is incremented, when item
processed, counter decremented. When counter reaches zero,
apply handler is getting called.
Now it allows to report right connect() call status from user-level
at the right time.
2007-10-19 15:04:17 +00:00
mav
bc7eda2e9a Split ng_pppoe_rcvdata() function into three hook-specific ones
to simplify code and reduce stack usage.
2007-10-14 09:58:22 +00:00
mav
b5f6701dea Remove ng_pppoe_sendpacket() function to simplify code as it is called
as much times as it has cases inside of it.
2007-10-14 09:51:19 +00:00
mav
58809b0788 Protect struct seq with mutex.
Approved by:	glebius (mentor)
2007-10-12 04:56:26 +00:00
mav
2d88988487 Remove one unneded assertion. It is also checked in
ng_l2tp_seq_check().

Approved by:	glebius (mentor)
2007-10-12 04:54:43 +00:00
mav
fab370c5a1 Replace single rcvdata with 3 distinct to simplify code and
reduce stack usage.

Approved by:	mentor (glebius)
2007-10-12 04:53:23 +00:00
mav
becababb0c Remove duplicate variables. 2007-10-12 04:51:30 +00:00
mav
1f0cbf5a23 Dead code removal.
Approved by:	re (kensmith), glebius (mentor)
2007-09-21 08:25:55 +00:00
mav
01f9f69635 This is optimization of ether and debug hooks determination. It
simplifies code and should speedup pppoe_findsession() function which is
called for every incoming packet.

Approved by:	re (kensmith), glebius (mentor)
2007-09-21 08:24:08 +00:00
mav
14a66bdd4b This patch fixes thread unsafe usage of global pkt_hdr
variable. Second part is not so important, but IMO is also good.

Approved by:	re (kensmith), glebius (mentor)
2007-09-21 08:16:33 +00:00
mav
1b8fdda809 Fix typo which brokes VJ decompression
when VJC negotiated in only one direction.

Approved by:	re (bmah), glebius (mentor)
2007-09-15 16:55:44 +00:00
emax
e04fc3e9d0 Return EADDRNOTAVAIL instead of EDESTADDRREQ error when
listen(2) is called on improperly bound socket.

Suggested by:	Iain Hibbert
Approved by:	re (kensmith)
MFC after:	3 days
2007-08-23 16:55:22 +00:00
mav
093e149434 Add ng_send_fn() error handeling inside ng_con_nodes().
Without it some errors may left unnoticed and unhandeled
that will lead to hooks left in half-connected state.

Reviewed by:	julian@
Approved by:	re (kensmith), glebius (mentor)
2007-08-18 11:59:17 +00:00
emax
7473c1093a Make ng_h4(4) MPSAFE. Use similar to ng_tty(4) locking strategy.
Reconnect ng_h(4) back to the build.

Reviewed by:	kensmith
Approved by:	re (kensmith)
MFC after:	1 month
2007-08-13 17:19:28 +00:00
rwatson
23574c8673 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00
mav
86e4cc2cc9 Add 64bit statistic counters to the ng_ppp node.
64bit counters are needed to simplify traffic accounting and
reduce system load at the big PPP concentrators.

Approved by:	re (rwatson), glebius (mentor)
2007-08-01 20:49:35 +00:00
mav
7ba336020a This patch improves fine-grained locking for the ng_ppp node.
Till now node's transmit path was completely unprotected
and so wasn't thread safe in multilink mode. It's receive path was
declared as WRITER as the simpliest protection method but it
reduces performance when compression or encryption enabled.

Approved by:	re (rwatson), glebius (mentor)
2007-08-01 20:38:37 +00:00
rwatson
a62dbe240a Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over:	bz
Approved by:	re (kensmith)
2007-07-28 07:31:30 +00:00
mav
5a9e4eaaa3 Reduce stack usage by 256 bytes per call. It helps to avoid kernel
stack overflow in complicated traffic filtering setups.

There can be minor performance degradation for the MHLEN < len <= 256 case
due to additional buffer allocation, but it is a rare case.

Approved by:	re (rwatson), glebius (mentor)
MFC after:	1 week
2007-07-26 18:15:02 +00:00
glebius
78c75cc535 Honor the IFF_MONITOR flag.
PR:		kern/99500
Submitted by:	Craig Leres <leres ee.lbl.gov>
Approved by:	re (kensmith)
2007-07-26 10:54:33 +00:00
emax
7fce1d7b71 Mark ng_h4(4) as not MPSAFE and disconnect it from the build for now.
Approved by:	re (rwatson)
2007-07-10 16:38:43 +00:00
imp
f58caadafc These modules depend on usb, make that explicit
Approved by: re@
2007-06-23 04:34:38 +00:00
mjacob
384923a3f7 Fix various compilation warnings for gcc-4.2.
Approved by:	re (bruce)
2007-06-23 00:02:20 +00:00
emax
0d312fb512 Replace sosend() with direct call to .pru_send method on the
L2CAP socket. This is to avoid LOR with sx(9) lock in sblock()
called from sosend_generic().

Approved by:	re (kensmith)
MFC after:	1 week
2007-06-21 19:55:49 +00:00
delphij
db531f1bee Fix build problem caused by a set of typos.
Reported by:	tinderbox
Approved by:	re (mux)
2007-06-19 14:56:35 +00:00
imp
17a6c4798c Finish removing usb_port.h compat macros. 2007-06-18 22:23:20 +00:00
ru
7d990e0773 Remove two more instances of the USBDEV() macro. 2007-06-13 12:36:01 +00:00
mav
a015640863 Add missing ng_uncallout() on node shutdown.
Approved by:	glebius (mentor)
2007-06-13 11:01:17 +00:00
emax
ec70de06d6 Catch up with USB cleanups and fix the world 2007-06-13 00:32:00 +00:00
imp
48df7c2e40 Eliminate usb_thread_t. 2007-06-12 17:30:54 +00:00
imp
d4a700c2dc Expand USB_ATTACH_{ERROR,SUCCESS}_RETURN inline and eliminate from
usb_port.h.  They aren't needed, and are a legacy of this code's past.
2007-06-12 15:37:19 +00:00
imp
e42a97f12a Silence a gcc warning in a more canonical way (evl = NULL rather than &evl).
I saw warnings here at one point on the arm build.
2007-06-11 15:29:02 +00:00
imp
7462ef5293 Expand USB_ATTACH_SETUP inline.
Kill devinfo stuff.
2007-06-09 06:53:27 +00:00
dwmalone
771efb08f5 Despite several examples in the kernel, the third argument of
sysctl_handle_int is not sizeof the int type you want to export.
The type must always be an int or an unsigned int.

Remove the instances where a sizeof(variable) is passed to stop
people accidently cut and pasting these examples.

In a few places this was sysctl_handle_int was being used on 64 bit
types, which would truncate the value to be exported.  In these
cases use sysctl_handle_quad to export them and change the format
to Q so that sysctl(1) can still print them.
2007-06-04 18:25:08 +00:00
mav
2f12bfb53b No need to update link queue stats when round-robin algorithm enabled.
Approved by:	glebius (mentor)
2007-06-04 13:50:09 +00:00
glebius
4623bf3eb1 Partially back out rev. 1.127, to restore broken functionality. This
should be redesigned, but better enter RELENG_7 with a working ngctl(8).

Agreed by:	julian
2007-06-01 09:20:57 +00:00
rwatson
79a2e40812 Universally adopt most conventional spelling of acquire. 2007-05-27 20:50:23 +00:00
mav
e8130fb9e7 Add support for setmode and settarget messages.
Approved by:	glebius (mentor)
2007-05-22 12:23:39 +00:00
mav
519a3dd7a9 Allow node to bypass traffic while no alias address defined.
Approved by:	glebius (mentor)
2007-05-22 12:14:43 +00:00
mav
61e626a6d8 Fix build with NETGRAPH_MPPC_COMPRESSION but without NETGRAPH_MPPC_ENCRYPTION.
Approved by:	glebius (mentor)
2007-05-18 15:28:01 +00:00
dwmalone
5ad0d6d6de Help ng_fec deal with multicast addresses.
While ng_fec called the ioctl to let interfaces in the bundle know
the list of multicast addresses had changed, it never actually
updated that list on the interfaces in the bundle. Consequently,
the multicast filters could be programmed incorrectly.

if_lagg does this correctly, by maintaining a list of addresses
that it has added to interfaces in the bundle. This commit basically
takes the if_lagg code and adds it to ng_fec.

A version of this patch for RELENG_6 has fixed some problems with
IPv6 ND over ng_fec. This is probably the problem in PR 107523.

PR:		107523
Tested by:	Rob Gallagher <robert.gallagher@heanet.ie>
Obtained from:	if_lagg
MFC after:	3 weeks
2007-05-18 15:05:49 +00:00
mav
2401c2cdb8 Fix small copy/paste mistake. 2007-05-17 13:33:38 +00:00
mav
bdc8f5bc7f Style cleanup.
Approved by:	glebius (mentor)
2007-05-16 12:11:09 +00:00
mav
e2cf6367f2 A node that implements various traffic shaping and rate limiting algorithms.
Approved by:	glebius (mentor)
2007-05-15 16:09:23 +00:00
mav
f1c1aa5c90 Performance optimization of the "encryption without compression" case by
avoiding memory allocation and data copying.
Encrypting directly at the original mbuf chain.

Approved by:	glebius (mentor)
2007-05-11 14:36:02 +00:00
rwatson
47d37a80be Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddr
protocol entry points using functions named proto_getsockaddr and
proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr.
While it's true that sockaddrs are allocated and set, the net effect is
to retrieve (get) the socket address or peer address from a socket, not
set it, so align names to that intent.
2007-05-11 10:20:51 +00:00
mav
e650654da1 Avoid extra rc4_init() when ng_mppc_updatekey() going to do it anyway.
Approved by:	glebius (mentor)
2007-05-04 16:20:47 +00:00
mav
f6fc3acc90 Compact code a bit
Approved by:	glebius (mentor)
2007-05-04 16:12:54 +00:00
mav
9f72209bd2 Make coherency counter 12bit as it should
Approved by:	glebius (mentor)
2007-05-04 16:05:58 +00:00
mav
3db45e799a Fix small mistake (sizeof(pad2) instead of sizeof(pad1))
Approved by:	glebius (mentor)
2007-05-04 15:44:22 +00:00
mav
46dad86393 Remove unneded bzero().
SHA1Final() does not require clean buffer.

Approved by:	glebius (mentor)
2007-05-04 15:41:49 +00:00
mav
054a6022e4 Avoid false assertion on transmit and delayed ack timeout with enabled invariants.
Replace callout_pending() by callout_active() to remove race window.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-24 10:50:25 +00:00
mav
008892b71a Global xmit stats calculation fix.
Approved by:	glebius (mentor)
2007-04-23 15:25:14 +00:00
mav
ba2ac38f34 Added m_tag_copy_chain() call to copy original outgoing packet tags to all of
it's fragments.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:44:40 +00:00
mav
4eafee4615 Optimized packet distribution plan for the equal links case. Do not
split packet on fragments smaller then MP_MIN_FRAG_LEN to reduce total
overhead.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:42:08 +00:00
mav
08eaeb92b2 - Changed sequence numbers processing to avoid incorrect timeout waiting
when one of links is inactive and have stale sequence number. To avoid
this sequence numbers of all links are getting updated on every
successful packet reassembling.
- ng_ppp_bump_mseq function created to simplify code.
- ng_ppp_frag_drop function separated from ng_ppp_frag_process to
simplify code.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:38:18 +00:00
mav
8063ace5a8 - Fixed mistakes in latency and xmitBytes calculation math
which lead to ineffective multilink packet distribution plans.
- Changed bytesInQueue calculation math to have more precise information
about links utilization.
- Taken rough account of the link overhead. Better way to do it could be to
get exact overhead from user-level, but I have not done it to keep
binary compatibility.

Reviewed by:	archie
Approved by:	glebius (mentor)
2007-04-20 08:22:57 +00:00
wkoszek
1463edcf11 We don't need spinning locks here. Change them to the adaptive mutexes. This
change should bring no performance decrease, as it did not in my tests.

Reviewed by:	julian, glebius
Approved by:	cognet (mentor)
2007-03-31 15:43:06 +00:00
wkoszek
44897a4bf0 Instead of direct manipulation on queue and worklist mutexes, bring macros
for doing this job. This change will make it easy to migrate from using
spinning locks to adaptive ones.

Reviewed by:	glebius, julian
Approved by:	cognet (mentor)
2007-03-30 14:34:34 +00:00
emax
66f5e5f292 Try to silence Coverity by adding (void) in front of function call.
Also add a comment, explaining why return value is not being checked.

Requested by:	netchild
MFC after:	1 week
2007-03-28 21:25:56 +00:00
glebius
e0b48c67ff Bump maximum number of interface hooks to the maximum possible value.
This will increase the memory consumption for more than 1 Mb, but this
is required for operation on multiinterface access concentrators running
mpd.

Requested by:	Alexander Motin
2007-03-28 13:59:13 +00:00
maxim
f01ad312df o Update a comment: sonewconn() lives in uipc_socket.c now. 2007-03-26 18:17:57 +00:00
bms
4ffc004901 Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from:	p4 branch bms_netdev
Reviewed by:	andre
Sponsored by:	Garance A Drosehn
Idea from:	NetBSD
MFC after:	1 month
2007-03-20 00:36:10 +00:00
rwatson
c7032e862d Prefer more traditional spellings of some words in comments. 2007-03-18 16:49:50 +00:00
julian
a9c72d2621 oops committed the wrong patch.
try this one..
2007-03-10 01:02:40 +00:00
julian
3cf52ff305 ng_apply_item should be void. It is called from the interrupt source or
from whoever has dequeued the item from the queue. Generally they have
no interest in the result, and even if it is called by the queuer, it
should still pretend that it was queued. The queuer should be assuming
that the call was queued and giving them the false confidence that they
are getting status leads to hard to find bugs.

Make it a void and remove all the code that tried to return status through it.
2007-03-09 21:04:50 +00:00
ru
42f2b268fd ng_send_fn() can return with an error, the function of interest
will never be called and OACTIVE will never be reset.  Fix this.

Submitted by:	Vsevolod Lobko
MFC after:	3 days
2007-03-08 21:10:53 +00:00
emaste
c199a9f3a2 Ensure message passed to "settimestamp" and "setcounter" is the right
length.  Use NULL instead of 0.

Submitted by: glebius, ru
2007-03-02 14:36:19 +00:00
emaste
2c703c5104 Add "setcounter" and "getcounter" messages, providing the the ability
to embed up to four counters in outgoing packets.  The message specifies
the offset at which the counter should be inserted as well as the
parameters of the counter.

Example usage:

  ngctl msg src0: setcounter \
    '{ index=0 offset=0x40 flags=1 width=4 increment=1 max_val=12345 }'

Sponsored by:   Sandvine Incorporated
2007-03-02 01:44:04 +00:00
emaste
6086e6d82a Add "settimestamp" and "gettimestamp" messages, providing the the ability
to embed a timestamp (struct timeval) in outgoing packets.  The message
specifies the offset at which the timestamp should be inserted.

NG_SOURCE(4) gives an example usage that queues an ICMP packet.  Using that
example, the following command will insert a timestamp in the ICMP's data
payload:

  ngctl msg src0: settimestamp '{ offset=0x2a flags=1 }'

Sponsored by:	Sandvine Incorporated
2007-03-01 23:16:17 +00:00