Commit Graph

2724 Commits

Author SHA1 Message Date
Ed Maste
be4572c896 Add a sysctl knob to accept input packets on any link in a failover lagg. 2010-09-01 16:53:38 +00:00
Bjoern A. Zeeb
c749353940 MFp4 CH=182972:
Add explicit linkstate UP/DOWN for the epair.  This is needed by carp(4)
and other things to work.

MFC after:	5 days
2010-08-27 23:22:58 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
Marko Zec
d3c351c50f When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by:	julian
MFC after:	3 days
2010-08-13 18:17:32 +00:00
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
John Baldwin
3ba24fde11 Adjust the interface type in the link layer socket address for vlan(4)
interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface
(IFT_ETHER).  The code already fixed if_type in the ifnet causing some
places to report the interface as a vlan (e.g. arp -a output) and other
places to report the interface as Ethernet (getifaddrs(3)).  Now they
should all report IFT_L2VLAN.

Reviewed by:	brooks
MFC after:	1 month
2010-08-06 15:15:26 +00:00
Konstantin Belousov
04f3205755 Properly set ifi_datalen for compat32 struct if_data32.
PR:	kern/149240
Submitted by:	Stef Walter <stef memberwebs com>
MFC after:	1 weeks
2010-08-03 15:40:42 +00:00
Gleb Smirnoff
b17f26b00c Don't check malloc(M_WAITOK) result. 2010-07-27 11:56:49 +00:00
Bjoern A. Zeeb
cd292f1264 Return NULL rather than 0 for a pointer.
MFC after:	3 days
2010-07-27 11:54:01 +00:00
Gleb Smirnoff
85011246ac When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.

This lead to losing LLE_PUB and/or LLE_PROXY flags.

We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.

PR:		kern/148784, kern/146539
Silence from:	qingli, 5 days
2010-07-27 10:05:27 +00:00
Jung-uk Kim
82040afcf3 Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock
regardless of the given flags.

MFC after:	3 days
2010-07-22 18:44:40 +00:00
Luigi Rizzo
1f6ad072ea whitespace cleanup 2010-07-15 14:41:59 +00:00
Luigi Rizzo
b62cb72c48 small portability fix to build on linux/windows 2010-07-15 14:41:06 +00:00
Jung-uk Kim
547d94bde3 Implement flexible BPF timestamping framework.
- Allow setting format, resolution and accuracy of BPF time stamps per
listener.  Previously, we were only able to use microtime(9).  Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command.  Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts.  bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp.  The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries.  On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long.  On 32-bit platforms, the size may
increase by 8 bytes.  For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested.  However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver.  Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by:	net@
2010-06-15 19:28:44 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Marko Zec
b1ae592bd4 Provide a macro for registering a virtualized sysctl handler for
VNET opaque data.

MFC after:	30 days
2010-06-02 15:29:21 +00:00
Qing Li
0ed6142b31 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
John Baldwin
6f359e2828 Ignore failures from removing multicast addresses from the parent (trunk)
interface when tearing down a vlan interface.  If a trunk interface is
detached, all of its multicast addresses are removed before the ifnet
departure eventhandlers are invoked.  This means that all of the multicast
addresses are removed before the vlan interfaces are removed which causes
the if_delmulti() calls in the vlan teardown to fail.

In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer
valid parent interface.  In the !VLAN_ARRAY case, the eventhandler gets
stuck in an infinite loop retrying vlan_unconfig_locked() forever.  In
general the callers of vlan_unconfig_locked() do not expect nor handle
failure, so I believe it is safer to ignore the errors and tear down as
much of the vlan state as possible.

Silence from:	net@
MFC after:	4 days
2010-05-17 19:36:56 +00:00
Kip Macy
83e711ec14 allocate ipv6 flows from the ipv6 flow zone
reported by: rrs@

MFC after:	3 days
2010-05-16 21:48:39 +00:00
Bjoern A. Zeeb
793f71bf2e Fix an issue with the dynamic pcpu/vnet data allocators.
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous.  For the moment do not support more than
*_MODMIN space and ignore the extra space (*).

(*) We know how to get it back but it'll need testing.

Discussed with:	jeff, rwatson (briefly)
Reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-05-14 21:11:58 +00:00
Kip Macy
19d0491585 workaround bug with ipv6 where a flow can have a null rtentry 2010-05-12 04:51:20 +00:00
Alan Cox
f0c0d3998d Remove page queues locking from all sf_buf_mext()-like functions. The page
lock now suffices.

Fix a couple nearby style violations.
2010-05-06 17:43:41 +00:00
Alan Cox
a7283d3213 Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by:	kib
2010-05-04 15:55:41 +00:00
Maxim Sobolev
e50d35e6c6 Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after:	1 month
2010-05-03 07:32:50 +00:00
Alan Cox
913814935a This is the first step in transitioning responsibility for synchronizing
access to the page's wire_count from the page queues lock to the page lock.

Submitted by:	kmacy
2010-05-03 05:41:50 +00:00
Kip Macy
2965a45315 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
Bjoern A. Zeeb
82cea7e6f3 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
Kip Macy
3e8b572db4 need to initialize the lock before it is used
MFC after:	3 days
2010-04-27 23:48:50 +00:00
Bjoern A. Zeeb
1b610a749e MFP4: @177254
Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.

Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-04-27 15:16:54 +00:00
Konstantin Belousov
fc0a61a401 Provide compat32 shims for bpf(4), except zero-copy facilities.
bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.

Submitted by:	jhb (version for 6.x)
Reviewed and tested by:	emaste
MFC after:	2 weeks
2010-04-25 16:43:41 +00:00
Konstantin Belousov
427a928af7 Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST.
This allows getifaddrs(3) to work for compat32 binaries.

Submitted by:	jhb (6.x version)
Reviewed by:	emaste
Tested by:	emaste and <pluknet gmail com>
MFC after:	2 weeks
2010-04-25 16:42:47 +00:00
Julian Elischer
7a90b21212 Move two copies of the same definition to a common include file.
MFC after: 3 weeks
2010-04-14 23:06:07 +00:00
Xin LI
57d848483e When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland.  However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by:	bschmidt
MFC after:	2 weeks
2010-04-14 22:02:19 +00:00
Bjoern A. Zeeb
d0088cde62 Take a reference to make sure that the interface cannot go away during
if_clone_destroy() in case parallel threads try to.

PR:		kern/116837
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	10 days
2010-04-11 18:47:38 +00:00
Bjoern A. Zeeb
c769e1be01 Check that the interface is on the list of cloned interfaces before trying
to remove it to avoid panics in case of two threads trying to remove it in
parallel.

PR:		kern/116837
Submitted by:	Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version)
MFC after:	10 days
2010-04-11 18:41:31 +00:00
Bjoern A. Zeeb
becba438d2 Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
Bjoern A. Zeeb
d8c136591a In if_detach_internal() we cannot hold the af_data lock over the
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.

Discussed with:	rwatson
MFC after:	10 days
2010-04-11 11:51:44 +00:00
Bjoern A. Zeeb
318c3213e5 In if_detach_internal() only try to do the detach run if if_attachdomain1()
has actually succeeded to initialize and attach.  There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.

Discussed with:	rwatson
MFC after:	10 days
2010-04-11 11:49:24 +00:00
Jung-uk Kim
704858479c Check the pointer to JIT binary filter before its de-allocation.
Submitted by:	Alexander Sack (asack at niksun dot com)
MFC after:	3 days
2010-03-29 20:24:03 +00:00
Rui Paulo
59fe4a8ce6 Add MCS to the list of media types.
Sponsored by:	iXsystems, inc.
2010-03-23 13:15:11 +00:00
Kip Macy
3059584e2a - boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
  when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
  allocating again until below 12.5%

MFC after:	7 days
2010-03-22 23:04:12 +00:00
Ed Maste
d8564efde1 Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA
ioctl call, as it may sleep.

Reviewed by:	rwatson
2010-03-21 15:00:33 +00:00
Bjoern A. Zeeb
42eedeac00 Split eventhandler_register() into an internal part and a wrapper function
that provides the allocated and setup eventhandler entry.

Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.

Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.

Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.

Sponsored by:	ISPsystem
Discussed with:	jhb
Reviewed by:	zec (earlier version), jhb
MFC after:	1 month
2010-03-19 19:51:03 +00:00
Bjoern A. Zeeb
335b943f8e Add ddb support to the "new" link layer code ("new-arp"):
- show all lltables [1] (optional flag to also show the llentries as well)
 - show lltable <struct lltable *>
 - show llentry <struct llentry *>

MFC after:	6 days
2010-03-18 09:09:59 +00:00
Qing Li
6b533b5ddb Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.

Also updated the ifconfig utility to show the LINKSTATE
capability if present.

Reviewed by:	rwatson, imp, juli
MFC after:	3 days
2010-03-16 17:59:12 +00:00
Max Laier
4c71aa5890 Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834.
MFC after:	3 days
2010-03-15 21:15:03 +00:00
Kip Macy
8847ae28f5 flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB
pointed out by jkim@
2010-03-12 19:58:51 +00:00
Jung-uk Kim
5d7af3a1cc Fix a style(9) nit. 2010-03-12 19:42:42 +00:00
Kip Macy
a398ca9cea re-update copyright to 2010
pointed out by danfe@
2010-03-12 19:26:45 +00:00
Jung-uk Kim
9fee1bd1d8 Tidy up callout for select(2) and read timeout.
- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.

PR:		kern/144453
Submitted by:	Alexander Sack (asack at niksun dot com)[1]
MFC after:	1 week
2010-03-12 19:14:58 +00:00
Qing Li
688ba6823b The flow-table module retrieves the destination and source
address as well as the transport protocol port information
from the outbound packets. The routing code is generic and
compares every byte in the given sockaddr object. Therefore
the temporary sockaddr objects must be cleared due to padding
bytes. In addition, the port information must be stripped
or the route search will either fail or return the incorrect
route entry.

Unit testing is done using OpenVPN over the if_tun interface.

MFC after:	7 days
2010-03-12 10:24:58 +00:00
Kip Macy
112125d206 fix stats reporting sysctl 2010-03-12 06:31:19 +00:00
Kip Macy
d4121a02c0 - restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
  (e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate

MFC after:	7 days
2010-03-12 05:03:26 +00:00
Qing Li
355ad3ead4 The if_tap interface is of IFT_ETHERNET type, but it
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.

Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.

This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.

MFC after:	3 days
2010-03-11 17:56:46 +00:00
Qing Li
c7ea0aa648 One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after:	5 days
2010-03-09 01:11:45 +00:00
Xin LI
13d85d4382 Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg
interface.  The check itself seems to be coming from OpenBSD but does not
seem to be useful for our code.

Discussed with:	thomasa
MFC after:	1 month
2010-03-09 00:52:16 +00:00
Bjoern A. Zeeb
e253cdd07c Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by:	ISPsystem
Reviewed by:	julian
MFC after:	5 days
2010-03-07 15:37:58 +00:00
Bjoern A. Zeeb
1bb635b04d Introduce a function rn_detachhead() that will free the
radix table root nodes.  This is only needed (and available)
in the virtualization case to free the resources when tearing
down a virtual network stack.

Sponsored by:	ISPsystem
Reviewed by:	julian, zec
MFC after:	5 days
2010-03-06 21:27:26 +00:00
Bjoern A. Zeeb
eea3faf77b Rework reference counting in case we queue into the netisr,
or overflow the netisr queue and fall back to the interface
queue so that we can garuantee that the ifnet pointer stays
valid.   Formerly we ended up with reference counts <= 0 in
case the netisr had returned ENOBUFS.  The idea is to track
any packet in the netisr queue and only change the refount
on edge operations for the fallback interface queue. This
also avoids problems in case the if_snd.ifq_len lies to us.

Also rework refount assertions to make sure they trigger if
we go below 1. Formerly a negative refence count did not
trigger the assert as the refcount variable is u_int.

Sponsored by:	ISPsystem
MFC after:	5 days
2010-03-06 21:22:28 +00:00
Luigi Rizzo
cc4d3c30ea Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch.  This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.
2010-03-02 17:40:48 +00:00
Luigi Rizzo
7bc2288264 remove unnecessary casts leftover from a bogus fix to a previous bug 2010-03-02 16:24:16 +00:00
Alfred Perlstein
e722820434 Merge projects/enhanced_coredumps (r204346) into HEAD:
Enhanced process coredump routines.

  This brings in the following features:
  1) Limit number of cores per process via the %I coredump formatter.
  Example:
    if corefilename is set to %N.%I.core AND num_cores = 3, then
    if a process "rpd" cores, then the corefile will be named
    "rpd.0.core", however if it cores again, then the kernel will
    generate "rpd.1.core" until we hit the limit of "num_cores".

    this is useful to get several corefiles, but also prevent filling
    the machine with corefiles.

  2) Encode machine hostname in core dump name via %H.

  3) Compress coredumps, useful for embedded platforms with limited space.
    A sysctl kern.compress_user_cores is made available if turned on.

    To enable compressed coredumps, the following config options need to be set:
    options COMPRESS_USER_CORES
    device zlib   # brings in the zlib requirements.
    device gzio   # brings in the kernel vnode gzip output module.

  4) Eventhandlers are fired to indicate coredumps in progress.

  5) The imgact sv_coredump routine has grown a flag to pass in more
  state, currently this is used only for passing a flag down to compress
  the coredump or not.

  Note that the gzio facility can be used for generic output of gzip'd
  streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan
2010-03-02 06:58:58 +00:00
Joel Dahl
7df6f59359 The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from:	NetBSD
2010-03-01 17:05:46 +00:00
Robert Watson
60efbc9991 Whitespace tweak.
MFC after:	3 days
2010-03-01 00:43:05 +00:00
Robert Watson
938448cd87 Changes to support crashdump analysis of netisr:
- Rename the netisr protocol registration array, 'np' to 'netisr_proto',
  in order to reduce the chances of symbol name collisions.  It remains
  statically defined, but it will be looked up by netstat(1).

- Move certain internal structure definitions from netisr.c to
  netisr_internal.h so that netstat(1) can find them.  They remain
  private, and should not be used for any other purpose (for example,
  they should not be used by kernel modules, which must instead use the
  public interfaces in netisr.h).

- Store a kernel-compiled version of NETISR_MAXPROT in the global variable
  netisr_maxprot, and export via a sysctl, so that it is available for use
  by netstat(1).  This is especially important for crashdump
  interpretation, where the size of the workstream structure is determined
  by the maximum number of protocols compiled into the kernel.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-03-01 00:42:36 +00:00
Konstantin Belousov
22e62e7e6e In both if_tun and if_tap:
Do not do additional dev_ref() on the newly created interface in the
if_clone create method [1]. This reference is not needed and never
removed, causing struct cdevpriv leakage. Remove the setting of
SI_CHEAPCLONE flag as well, since it is unused.

For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF)
instead of calling make_dev() and then dev_ref(), to avoid a race.

Call drain_dev_clone_events() at the module unload time after dev_clone
handler is deinstalled.

Submitted by:	Mikolaj Golub <to.my.trociny gmail com> [1]
MFC after:	1 week
2010-02-28 16:25:49 +00:00
Robert Watson
7f450feb07 Fix edge cases in several KASSERTs: use <= rather than < when testing that
counters have not gone about MAXCPU or NETISR_MAXPROT.  These problems
caused panics on UP kernels with INVARIANTS when using sysctl -a, but
would also have caused problems for 32-core boxes or if the netisr
protocol vector was fully populated.

Reported by:	nwhitehorn, Neel Natu <neelnatu@gmail.com>
MFC after:	4 days
2010-02-25 09:51:14 +00:00
Bjoern A. Zeeb
7405f23cd7 Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
in the db_show_all_table as 'show all ifnets' and with that follow the
convention for showing complete lists.

Submitted by:	thompsa
MFC after:	3 days
2010-02-24 15:54:24 +00:00
Robert Watson
c4fbf89fc5 Fix constant assignment for netisr protocol information sysctl.
MFC after:	1 week
Spotted by:	bz
2010-02-22 16:16:16 +00:00
Robert Watson
2d22f334ea Export netisr configuration and statistics to userspace via sysctl(9).
MFC after:	1 week
Sponsored by:	Juniper Networks
2010-02-22 15:03:16 +00:00
Robert Watson
5702371bd2 ifconfig(8) expects interface fooX to be supported by the module if_foo,
and will try to load it if it's not present.  To better meet these
expectations, change the module name for the loopback interface from
'loop' to 'if_lo'.  The loopback interface is always compiled into the
base kernel, so there are no resulting changes in kld files, etc.

Discussed with:	brooks (ages ago)
MFC after:	1 week
2010-02-21 15:25:47 +00:00
Pyun YongHyeon
8b2d91810b Add __FBSDID.
Reviewed by:	sam
2010-02-21 00:07:45 +00:00
Pyun YongHyeon
9b76d9cb3d Add TSO support on VLANs. Intentionally separated IFCAP_VLAN_HWTSO
from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to
TSO over VLAN without VLAN hardware tagging.
Driver changes and userland support will follow.

Reviewed by:	thompsa
2010-02-20 22:47:20 +00:00
Bjoern A. Zeeb
c9fdacdac8 Start to implement ifnet DDB support:
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.

We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].

Sponsored by:	ISPsystem
Suggested by:	rwatson [1]
Reviewed by:	rwatson
MFC after:	5 days
2010-02-20 22:09:48 +00:00
Bjoern A. Zeeb
58606037c1 Enhance a panic string to contain more useful debugging information.
Sponsored by:	ISPsystem
Reviewed by:	rwatson
MFC after:	5 days
2010-02-20 21:43:36 +00:00
Jung-uk Kim
8df67d77ed Return partially filled buffer for non-blocking read(2)
in non-immediate mode.

PR:		kern/143855
2010-02-20 00:19:21 +00:00
Pawel Jakub Dawidek
784949026c Mark various sysctls also as tunables.
Reviewed by:	rwatson
MFC after:	1 week
2010-02-15 09:19:07 +00:00
Max Laier
193cbc4d24 Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
   an enqueue operation: returns true if altq is enabled or there are
   already packets in the ring (as we need to maintain packet order)
 - update all drbr consumers
 - fix drbr_flush
 - avoid using the driver queue (IFQ_DRV_*) in the altq case as the
   multiqueue consumer does not provide enough protection, serialize altq
   interaction with the main queue lock
 - make drbr_dequeue_cond work with altq

Discussed with:		kmacy, yongari, jfv
MFC after:		4 weeks
2010-02-13 16:04:58 +00:00
Bjoern A. Zeeb
3e0490b3fe Add DDB support for printing vnet_sysinit and vnet_sysuninit
ordered call lists. Try to lookup function/symbol names and print
those in addition to the pointers, along with the constants for
subsystem and order.
This is useful for debugging vnet teardown ordering issues.

Make it possible to call the actual printing frunction from normal
code at runtime, ie. from vnet_sysuninit(), if DDB support is there.

Sponsored by:	ISPsystem
MFC After:	8 days
2010-02-09 22:39:34 +00:00
Bjoern A. Zeeb
61d033d436 Add an SDT provider for "vnet"s along with probes for vnet_alloc
and vnet_destroy.
Use the line number rather than NULL as dummy argument.

Note: the fbt provider does not reliably provide :return probes
(depending on optimization levels used at compile time) making
it unusable for scripts to generate complete call-traces with
well defined boundaries over allocations or destructions of
virtual network stacks.

Sponsored by:	ISPsystem
MFC After:	8 days
2010-02-09 22:15:59 +00:00
Ermal Luçi
644da90d9f Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features.
PR:	kern/141646
Reviewed by:	thompsa
Approved by:	thompsa(co-mentor)
MFC after:	2 weeks
2010-02-06 13:49:35 +00:00
Marko Zec
0a705ab66f Instead of spamming the console on each curvnet recursion event, print
out each such call graph only once, along with a stack backtrace.  This
should make kernels built with VNET_DEBUG reasonably usable again in
busy / production environments.

Introduce a new DDB command "show vnetrcrs" which dumps the whole log
of distinctive curvnet recursion events.  This might be useful when
recursion reports get burried / lost too deep in the message buffer.
In the later case stack backtraces are not available.

Reviewed by:	bz
MFC after:	3 days
2010-02-04 07:55:42 +00:00
Hiroki Sato
c2a5f1a57a - Check if_type of "addm <interface>" before setting the
interface's MTU to the if_bridge(4) interface.  This fixes a
  bug that MTU value of "addm <interface>" is used even when it
  is invalid for the if_bridge(4) member:

  # ifconfig bridge0 create
  # ifconfig bridge0
  bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
  ...
  # ifconfig bridge0 addm lo0
  ifconfig: BRDGADD lo0: Invalid argument
  # ifconfig bridge0
  bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 16384
  ...

- Do not ignore MTU value of an interface even when if_type == IFT_GIF.
  This fixes MTU mismatch when an if_bridge(4) interface has a
  gif(4) interface and no other interface as the member, and it
  is directly used for L2 communication with EtherIP tunneling
  enabled.

- Implement SIOCSIFMTU ioctl.  Changing the MTU is allowed only
  when all members have the same MTU value.
2010-01-31 08:16:37 +00:00
Xin LI
215940b3fa Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2010-01-27 00:30:07 +00:00
Shteryana Shopova
93ec7edca7 While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR:		kern/142391, kern/142392
Reviewed by:	sam, rpaolo, bms
MFC after:	1 week
2010-01-24 16:17:58 +00:00
Andrew Thompson
6117727b6c Do not hold the lock over if_setlladdr() as it calls into the interface driver
init routine.
2010-01-19 04:29:42 +00:00
Andrew Thompson
ea4ca115b7 Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR:		kern/142927
Submitted by:	Nikolay Denev
2010-01-18 20:34:00 +00:00
Bjoern A. Zeeb
3c20163a70 Correct a typo.
MFC after:	5 days
2010-01-10 12:03:53 +00:00
Edward Tomasz Napierala
22133b4449 Stop GCC from complaining about lagg_port_checkstacking() being unused. 2010-01-08 16:44:33 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Luigi Rizzo
0bcfa8e4b3 put ip_var before ip_fw_private.h as this will be needed in
the near future
2010-01-07 10:27:52 +00:00
Luigi Rizzo
7173b6e554 Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
  the firewall in the middle of a rulechain. On reentry, all tags
  containing reinject info are renamed to MTAG_IPFW_RULE so the
  processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
  everywhere. Conversion is done only once instead of tracking
  the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).
2010-01-04 19:01:22 +00:00
John Baldwin
fb92ad4af5 Use stricter checking to match possible vlan clones by not allowing extra
garbage characters around or within the tag.

Reviewed by:	brooks
MFC after:	3 days
2009-12-31 20:44:38 +00:00
Brooks Davis
a6fffd6cb0 The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175.  Remove all definitions, documentation, and usage.

fifo_misc.c:
	Remove all kqueue tests as fifo_io.c performs all those that
	would have remained.

Reviewed by:	rwatson
MFC after:	3 weeks
X-MFC note:	don't change vlan_link_state() function signature
2009-12-31 20:29:58 +00:00
Qing Li
9f1409057b Remove a deleted comment line that was brought back by
my previous commit.

MFC after:	5 days
2009-12-31 01:09:16 +00:00
Qing Li
c7ab66020f The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after:	5 days
2009-12-30 21:35:34 +00:00
John Baldwin
5428776e2c Change vlan interfaces to cope more usefully with the parent interface being
renamed.  Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed.  Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
  renamed.  This flag can be checked in ifnet departure/arrival event
  handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
  ignore departure events due to a trunk interface being renamed.

Reviewed by:	brooks, rwatson
MFC after:	1 week
2009-12-29 13:35:18 +00:00
Luigi Rizzo
830c6e2b97 bring in several cleanups tested in ipfw3-head branch, namely:
r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
  ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
  reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
  and netgraph;

r201049
- merge some common code to attach/detach hooks into
  a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
  and output processing uses almost exactly the same code so
  there is no need to use two separate hooks.
  ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
  between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
  localize variables so the compiler does not think they are uninitialized,
  do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
  this is what the radix code expects (this fixes a bug in the
  recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after:	1 week
2009-12-28 10:47:04 +00:00
Robert Watson
912f6323cd When warning about possible netisr configuration problems during boot,
report using "netisr_init" rather than "netisr2", which was the development
name for the project.

MFC after:	3 days
2009-12-23 12:33:59 +00:00