Commit Graph

3078 Commits

Author SHA1 Message Date
kib
cac2fe116f After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed.  Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by:	alc
MFC after:    2 weeks
2012-08-05 14:11:42 +00:00
glebius
abf245020a Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
  disestablish them.
  - This allows us to simplify the arptimer() and make it
    race safe.
o Consistently use ifp->if_afdata_lock to lock access to
  linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
  is attached to the hash.
  - Use LLE_LINKED to avoid double unlinking via consequent
    calls to llentry_free().
  - Mark lle with LLE_DELETED via |= operation istead of =,
    so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
  consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR:		kern/165863
Submitted by:	Andrey Zonov <andrey zonov.org>
Submitted by:	Ryan Stone <rysto32 gmail.com>
Submitted by:	Eric van Gyzen <eric_van_gyzen dell.com>
2012-08-02 13:57:49 +00:00
glebius
34fe3f296a The llentry_update() is used only by flowtable and the latter
always passes NULL pointer to it. Thus, code can be simplified
and function renamed to llentry_alloc() to match rtalloc().
2012-08-02 13:20:44 +00:00
glebius
588de42f27 Some more whitespace cleanup. 2012-08-01 09:00:26 +00:00
glebius
53cb168f80 Some style(9) and whitespace changes.
Together with:	Andrey Zonov <andrey zonov.org>
2012-07-31 11:31:12 +00:00
bz
245ecd0552 Hardcode the loopback rx/tx checkum options for IPv6 to on without
checking. This allows the FreeBSD 9.1 release process to move forward.
Work around the problem that loopback connections to local addresses
not on loopback interfaces and not on interfaces w/ IPv6 checksum offloading
enabled would not work.
A proper fix to allow us to disable the "checksum offload" on loopback
for testing, measurements, ... as we allow for IPv4 needs to put in
place later.

Reported by:	tuexen, Matthew Seaman (m.seaman infracaninophile.co.uk)
Reported by:	Mike Andrews (mandrews bit0.com), kib, ...
PR:		kern/170070
MFC after:	1 day
X-MFC after:	re approval
2012-07-28 20:31:39 +00:00
melifaro
497943ec39 Permit changing MTU in 6to4 relay.
This behavior is recommended by RFC 4213 clause 3.2.

Sometimes fragmentation is the least evil.
For example, some Linux IPVS kernels forwards
ICMPv6 checksums to real servers incorrectly.

Reviewed by:      hrs(previous version)
Approved by:      kib(mentor)
MFC after:        1 week
2012-07-15 17:44:27 +00:00
emaste
b19509b9f3 Simplify error case
Submitted by:	thompsa@
2012-07-10 20:59:35 +00:00
emaste
83fbc9d844 Plug potential mbuf leak when bridging fragments
If an error occurs when transmitting one mbuf in a chain of fragments,
free the subsequent fragments instead of leaking them.

Sponsored by:   ADARA Networks
2012-07-10 13:17:32 +00:00
trociny
f5f1e19755 In epair_clone_destroy(), when destroying the second half, we have to
switch to its vnet before calling ether_ifdetach(). Otherwise if the
second half resides in a different vnet, if_detach() silently fails
leaving a stale pointer in V_ifnet list, and the system crashes trying
to access this pointer later.

Another solution could be not to allow to destroy epair unless both
ends are in the home vnet.

Discussed with:	bz
Tested by:	delphij
2012-07-09 20:38:18 +00:00
emaste
e754993f68 Restore error handling lost in r191603
This was missed in the change from IFQ_ENQUEUE to if_transmit.

Sponsored by:   ADARA Networks
2012-07-09 14:16:49 +00:00
emaste
e91d8ed669 Implement SIOCGIFMEDIA for if_tap(4)
Appease certain if_tap(4) consumers by providing simulated Ethernet
media status.

DragonFly commit 70d9a675bf5441cc854a843ead702d08928c37f3

Obtained from:  DragonFly BSD
2012-07-06 23:17:30 +00:00
glebius
418a04b467 When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
  not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
  depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
  but empty route should use RO_RTFREE() to free results of
  lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
  ro->ro_rt == NULL.

Tested by:	tuexen (SCTP part)
2012-07-04 07:37:53 +00:00
thompsa
c61cad51ad Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR:		kern/169557
Submitted by:	Mark Johnston
MFC after:	3 days
2012-06-30 19:09:02 +00:00
jhb
722e057169 Hold GIF_LOCK() for almost all of gif_start(). It is required to be held
across in_gif_output() and in6_gif_output() anyway, and once it is held
across those it might as well be held for the entire loop.  This simplifies
the code and removes the need for the custom IFF_GIF_WANTED flag (which
belonged in the softc and not as an IFF_* flag anyway).

Tested by:	Vincent Hoffman  vince  unsane co uk
2012-06-29 15:21:34 +00:00
np
67d5f1a727 - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
rrs
7aa509cdff Fix comment to better reflect how we are
cheating and using the csum_data. Also fix
style issues with the comments.
2012-06-12 13:31:32 +00:00
rrs
13591e0a79 Note to self. Have morning coffee *before* committing things.
There is no mac_addr in the mbuf for BSD.. cheat like
we are supposed to and use the csum field since our friend
the gif tunnel itself will never use offload.
2012-06-12 12:44:17 +00:00
rrs
8fa5fc067b Opps forgot to commit the flag. 2012-06-12 12:40:15 +00:00
rrs
813f2809c5 Allow a gif tunnel to be used with ALTq.
Reviewed by:	gnn
2012-06-12 10:44:09 +00:00
thompsa
8c0678ab02 Fix a panic I introduced in r234487, the bridge softc pointer is set to null
early in the detach so rearrange things not to explode.

Reported by:	David Roffiaen, Gustau Perez Querol
Tested by:	David Roffiaen
MFC after:	3 days
2012-06-11 20:12:13 +00:00
melifaro
ca3b31c289 Fix typo introduced in r236559.
Pointed by:   bcr
Approved by:  kib(mentor)
2012-06-09 10:04:40 +00:00
trociny
e463b28f1f Sort includes.
Submitted by:	Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after:	3 days
2012-06-07 19:48:45 +00:00
trociny
2d97bce56e Add VIMAGE support to if_tap.
PR:		kern/152047, kern/158686
Submitted by:	Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after:	1 week
2012-06-07 19:46:46 +00:00
melifaro
511c0436a1 Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by:         avg
Approved by:      ae(mentor)
Tested by:        avg
MFC after:        1 week
2012-06-04 12:36:58 +00:00
tuexen
dc64091687 Seperate SCTP checksum offloading for IPv4 and IPv6.
While there: remove some trainling whitespaces.

MFC after: 3 days
X-MFC with: 236170
2012-05-30 20:56:07 +00:00
jkim
f2bf1d383e Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf(). 2012-05-29 22:28:46 +00:00
jkim
9a4c9bf04c - Save the previous filter right before we set new one.
- Reduce duplicate code and make it little easier to read.

MFC after:	2 weeks
2012-05-29 22:21:53 +00:00
jkim
2287ab7bca Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor
and reset statistics as it should.

MFC after:	3 days
2012-05-29 18:44:53 +00:00
melifaro
ce59e84fbe Fix BPF_JITTER code broken by r235746.
Pointed by:       jkim
Reviewed by:      jkim (except locking changes)
Approved by:      (mentor)
MFC after:        2 weeks
2012-05-29 12:52:30 +00:00
rea
cda9e92421 if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row
Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface.  So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week
2012-05-28 12:13:04 +00:00
bz
ac429c7044 It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading.  Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.

To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.

Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible).  Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation.  Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.

This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.

Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.

Individual driver updates will have to follow, as will SCTP.

Reported by:	gallatin, dim, ..
Reviewed by:	gallatin (glanced at?)
MFC after:	3 days
X-MFC with:	r235961,235959,235958
2012-05-28 09:30:13 +00:00
thompsa
d7eba5b13d Turn LACP debugging from a compile time option to a sysctl, it is very handy to
be able to turn it on when negotiation to a switch misbehaves.

Submitted by:	Andrew Boyer
MFC after:	3 days
2012-05-26 08:09:01 +00:00
bz
bc95289fae MFp4 bz_ipv6_fast:
Simple yet effective change enabling checksum "offload" on loopback
  for IPv6 to avoid expensive computations.

  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	iXsystems

Reviewed by:	gnn (as part of the whole)
MFC After:	3 days
2012-05-25 02:21:17 +00:00
melifaro
e145f3abac Make most BPF ioctls() SMP-safe.
Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:21:00 +00:00
melifaro
e5f61c6580 Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.
Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:19:19 +00:00
melifaro
34ec5c8650 Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by:      kib(mentor)
MFC in:            4 weeks
2012-05-21 22:17:29 +00:00
melifaro
1c776bfa6e Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by:       kib(mentor)
Tested by:         gnn
MFC in:            4 weeks
2012-05-21 22:13:48 +00:00
marcel
434c53cbc3 Use the LLINDEX macro to access the link-level I/F index. This makes
it possible to work with a different type for the sdl_index field --
it only requires a recompile.

Obtained from:	Juniper Networks, Inc.
2012-05-19 02:39:43 +00:00
delphij
a17ebbd192 Sync DLTs with the latest pcap version.
MFC after:	2 weeks
2012-05-14 05:10:41 +00:00
melifaro
46b1e41aff Revert r234834 per luigi@ request.
Cleaner solution (e.g. adding another header) should be done here.

Original log:
  Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
  Remove ipfw/ip_fw_private.h header from non-ipfw code.

Requested by:      luigi
Approved by:       kib(mentor)
2012-05-03 08:56:43 +00:00
emaste
62cde8b2a2 Relax restriction on direct tx to child ports
Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR:		kern/138620
Approved by:	thompsa@
2012-05-03 01:41:12 +00:00
melifaro
b600972ec6 Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Approved by:        ae(mentor)
MFC after:          2 weeks
2012-04-30 10:22:23 +00:00
melifaro
30081abc9a Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.

Reported by:        Pawel Tyll <ptyll@nitronet.pl>
Approved by:        kib(mentor)

MFC after:          1 week
2012-04-22 16:13:23 +00:00
thompsa
b85b9300bc Move the interface media check to a taskqueue, some interfaces (usb) sleep
during SIOCGIFMEDIA and we were holding locks.
2012-04-20 10:06:28 +00:00
thompsa
5ffa03204c Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.

Prodded by:	glebius
Tested by:	Alexander Lunev
2012-04-20 09:55:50 +00:00
thompsa
8379680eec Remove KASSERTS, they do not add any value here since the pointer is about to
be derefernced anyway.
2012-04-18 01:39:14 +00:00
luigi
39cda1ca43 A bit of cleanup in the names of fields of netmap-related structures.
Use the name 'ring' instead of 'queue' in all fields.
Bump NETMAP_API.
2012-04-13 16:03:07 +00:00
luigi
aed1b833cf remove an unnecessary #define 2012-04-12 10:32:34 +00:00
thompsa
8d206d7279 Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by:	Andrew Boyer (earlier version)
2012-04-12 01:07:17 +00:00
jhb
4e983169d1 Add media types for 40G media that might be used with FreeBSD.
Reviewed by:	bz
MFC after:	2 weeks
2012-04-10 13:59:35 +00:00
melifaro
ff7ed324aa Fix build broken by r233938.
Pointed by:     David Wolfskill <david@catwhisker.org>
Approved by:    kib (mentor)
Pointy hat to:  melifaro
2012-04-06 13:34:19 +00:00
melifaro
85ccef88d3 - Improve performace for writer-only BPF users.
Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      4 weeks
2012-04-06 06:55:21 +00:00
melifaro
8b1d10268c - Improve BPF locking model.
Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by:       Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by:   Yandex LLC

Reviewed by:    glebius (previous version)
Reviewed by:    silence on -net@
Approved by:    (mentor)

MFC after:      3 weeks
2012-04-06 06:53:58 +00:00
jhb
e86f714c83 Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD.
The new [RW]LOCK macros are merged back to 8.x so should be suitable for
new code in HEAD even if it is to be MFC'd.
2012-03-19 21:09:12 +00:00
bz
a45ce69b30 Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file.  With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.

This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore.  This means that thet multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.

Ok.ed by:	julian, hrs, melifaro
MFC after:	2 weeks
2012-03-18 11:23:40 +00:00
luigi
29d8b56ae7 - remove an extra parenthesis in a closing brace;
- add the macro NETMAP_RING_FIRST_RESERVED() which returns
  the index of the first non-released buffer in the ring
  (this is useful for code that retains buffers for some time
  instead of processing them immediately)
2012-03-11 17:35:12 +00:00
thompsa
1ead29e715 Move the vlan buffer space into the union which also fixes an unused variable
warning with !INET & !INET6.

Spotted by:	pluknet
2012-03-07 07:22:53 +00:00
thompsa
b1fbb40a93 Add the ability to set which packet layers are used for the load balance hash
calculation.
2012-03-06 22:58:13 +00:00
zec
d3c673348a Properly restore curvnet context when returning early from
ether_input_internal().

This change only affects options VIMAGE kernel builds.

PR:		kern/165643
Submitted by:	Vijay Singh
MFC after:	3 days
2012-03-04 11:11:03 +00:00
jmallett
50c253779f o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI.  This mostly follows nwhitehorn's lead in implementing
   COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
   32-bit ABI being used.  Since the MIPS port is relatively-new, even the 32-bit
   ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
   with 32-bit compatibility, then, disable some code which assumes otherwise
   wrongly when built for MIPS.  A more general macro to check in this case would
   seem like a good idea eventually.  If someone adds support for using n32
   userland with n64 kernels on MIPS, then they will have to add a variety of
   flags related to each piece of the ABI that can vary.  That's probably the
   right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
   freebsd32 compat code.  Probably this should be generalized at some point.

Reviewed by:	gonzo
2012-03-03 08:19:18 +00:00
thompsa
395d5a3c1d Use a more appropriate default for the maximum number of addresses in the
bridge forwarding table.

PR:		docs/164564
Discussed with:	brueffer
2012-02-29 20:58:21 +00:00
luigi
3ac0fcfb97 A bunch of netmap fixes:
USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
   to the netmap ring to indicate how many buffers we have already processed
   but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
   at it add a version field and some spares to the ioctl() argument
   to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
   to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
   in the ixgbe driver.
   Use 'legacy' descriptors on the tx ring and prefetch slots gives
   about 20% speedup at 900 MHz. Another 7-10% would come from removing
   the explict calls to bus_dmamap* in the core (they are effectively
   NOPs in this case, but it takes expensive load of the per-buffer
   dma maps to figure out that they are all NULL.

   Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.
2012-02-27 19:05:01 +00:00
thompsa
064ddf5722 Only look for a usable MAC address for the bridge ID from ports within our
bridge, this allows us to have more than one independent bridge in the same
STP domain.

PR:		kern/164369
Submitted by:	Nikos Vassiliadis (earlier version)
MFC after:	2 weeks
2012-02-24 17:50:36 +00:00
thompsa
f16a14fbae Add a sysctl/tunable default value for the use_flowid sysctl in r232008. 2012-02-23 21:56:53 +00:00
thompsa
dc536a5db9 Indicate this function decrements the timer as well as testing for expiry. 2012-02-23 20:58:52 +00:00
kmacy
a99e9d281d When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after:	2 weeks
2012-02-23 18:21:37 +00:00
thompsa
80037c4e25 Now that network interfaces advertise if they support linkstate notifications
we do not need to perform a media ioctl every 15 seconds.
2012-02-23 06:26:16 +00:00
thompsa
51c7bc89f0 bstp_input() always consumes the packet so remove the mbuf handling dance
around it.

Obtained from:	OpenBSD (r1.37)
2012-02-23 00:59:21 +00:00
thompsa
c8215b632a Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.

PR:		kern/164901
Submitted by:	Eugene Grosbein
MFC after:	1 week
2012-02-22 22:01:30 +00:00
bz
dcdb23291f Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by:	Cisco Systems, Inc.
Reviewed by:	melifaro (basically)
MFC after:	10 days
2012-02-17 02:39:58 +00:00
tijl
a89581c07f Change some headers such that lang/gcc* ports no longer patch them.
The lang/gcc* ports patch headers where they think something is
non-standard. These patched headers override the system headers which means
you have to rebuild these ports whenever you do installworld to make sure
they contain the latest changes.
2012-02-14 12:50:20 +00:00
bz
d05091db1d Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl.  This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by:	glebius, brooks
MFC after:	3 days
2012-02-11 06:02:16 +00:00
bz
f55d6eed8c Backout changes from r228571. Remove if_data from struct ifa_msghdr again.
While this breaks carp on HEAD temporary, it restores the upgrade path from
stable, and head before 20111215.

Reviewed by:	glebius, brooks
2012-02-11 05:59:54 +00:00
pluknet
322f330b92 g/c last bit of old ipv6 prefix management.
Reviewed by:	bz
Obtained from:	NetBSD, net/if.h, rev 1.80
2012-02-08 22:05:26 +00:00
luigi
0290a0c2f7 - change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
  with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
  to replace it with a different implementation. The current code
  relies on a single large contiguous chunk of memory obtained through
  contigmalloc.
  The new implementation (not committed yet) uses multiple
  smaller chunks which are easier to fit in a fragmented address
  space.
2012-02-08 11:43:29 +00:00
pjd
554ffe3a63 Allow to set if_bridge(4) sysctls from /boot/loader.conf.
MFC after:	3 days
2012-02-07 14:50:33 +00:00
glebius
bedbfc31c3 Fix typo in r231010.
Submitted by:	linimon
2012-02-05 12:52:28 +00:00
glebius
0bf7c1fae8 Better comment for ifa_init(), ifa_ref(), ifa_free(). 2012-02-05 08:53:05 +00:00
glebius
8be39a40a2 In ifa_init() initialize if_data.ifi_datalen. This would be
required after upcoming changes from bz@.

Discussed with:	bz
2012-02-05 08:31:15 +00:00
kmacy
8466f3ca88 A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly
referenced within its timeout window. This change clears the LLE_VALID flag when an llentry
is removed from an interface's hash table and adds an extra check to the flowtable code
for the LLE_VALID flag in llentry to avoid retaining and using a stale reference.

Reviewed by:	qingli@
MFC after:	2 weeks
2012-01-26 20:02:40 +00:00
bz
417a8b2daa Replace random ARIN direct assignment legacy IPs with proper RFC 5735
TEST-NET1 block for use in documentation and example code addresses.

MFC after:	3 days
2012-01-24 15:20:31 +00:00
eadler
8cde6e1e87 - Fix trivial typo
Approved by:	nwhitehorn
MFC after:	3 days
2012-01-14 17:07:52 +00:00
rwatson
ea95a90ac2 Clarify throughout the vlan(4) code the difference between a "tag" (the
802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a
VLAN ID (VID).  Tags go in packets.  VIDs identify VLANs.

No functional change is intended, so this should be safe to MFC.  Further
cleanup with functional changes will be committed separately (for example,
renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI).

Reviewed by:	bz
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-12 18:39:37 +00:00
lstewart
15b1ff3609 Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.

Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.

Whilst here, also:

- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
  exist in the list with a NULL ifnet pointer.

- Except when INVARIANTS is in the kernel config, silently ignore the case where
  no bpf_if referencing the specified ifnet is found, as it is harmless and does
  not require user attention.

Reviewed by:	csjp
MFC after:	1 week
2012-01-10 00:48:29 +00:00
jhb
0577b44f73 Convert the per-interface address list lock from a mutex to a reader/writer
lock.

Reviewed by:	bz
2012-01-09 19:34:12 +00:00
glebius
06fc76befd Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571.
Submitted by:	bz
2012-01-08 17:11:53 +00:00
glebius
f99edf0f86 Move arprequest() declaration to if_ether.h. 2012-01-08 13:34:00 +00:00
glebius
3c3da145a2 Since r228571 CARP is no longer an interface. 2012-01-06 12:05:43 +00:00
jhb
4ef366671a Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 19:00:36 +00:00
jhb
219e62f17e Add new variants of the IF_ADDR_*LOCK*() macros used for protecting
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros.  The lock is still kept as a mutex for now.

Reviewed by:	bz
MFC after:	2 weeks
2012-01-05 18:35:49 +00:00
rwatson
e3196d8882 Refine last comment.
Submitted by:	joeld
Sponsored by:	ADARA Networks, Inc.
MFC after:	3 days
2012-01-05 11:42:34 +00:00
rwatson
628c91bb51 Add comment to the VLAN code about its integration with VIMAGE: we see what
the code is doing, we recognise the legitimacy of its goal, but we're not
quite sure it's going about it the right way.  More pondering is clearly
required.

Sponsored by:	ADARA Networks, Inc.
Discussed with:	bz
MFC after:	3 days
2012-01-05 11:24:22 +00:00
lstewart
8a799f2a2f Revert r228986 until it can be reworked to avoid panicing the kernel when the
same interface is attached multiple times with different DLTs, as is done in
net80211 for example.

Reported by:	adrian
2011-12-31 07:21:28 +00:00
lstewart
1b1510811a - Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one
aspect of time stamp configuration per interface rather than per BPF
  descriptor. Prior to this, the order in which BPF devices were opened and the
  per descriptor time stamp configuration settings could cause non-deterministic
  and unintended behaviour with respect to time stamping. With the new scheme, a
  BPF attached interface's tscfg sysctl entry can be set to "default", "none",
  "fast", "normal" or "external". Setting "default" means use the system default
  option (set with the net.bpf.tscfg.default sysctl), "none" means do not
  generate time stamps for tapped packets, "fast" means generate time stamps for
  tapped packets using a hz granularity system clock read, "normal" means
  generate time stamps for tapped packets using a full timecounter granularity
  system clock read and "external" (currently unimplemented) means use the time
  stamp provided with the packet from an underlying source.

- Utilise the recently introduced sysclock_getsnapshot() and
  sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
  packet, regardless of the number of BPF descriptors and time stamp formats
  requested. Use the per BPF attached interface time stamp configuration to
  control if sysclock_getsnapshot() is called and whether the system clock read
  is fast or normal. The per BPF descriptor time stamp configuration is then
  used to control how the system clock snapshot is converted to a bintime by
  sysclock_snap2bintime().

- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
  read of the system clock is now controlled per BPF attached interface using
  the net.bpf.tscfg sysctl tree.

- Update the bpf.4 man page.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

In collaboration with:	Julien Ridoux (jridoux at unimelb edu au)
2011-12-30 08:57:58 +00:00
yongari
69a9d9792b Update if_obytes and if_omcast after successful transmit.
While I'm here update if_oerrors if parent interface of vlan is not
up and running.  Previously it updated collision counter and it was
confusing to interprete it.

PR:		kern/163478
Reviewed by:	glebius, jhb
Tested by:	Joe Holden < lists <> rewt dot org dot uk >
2011-12-29 18:40:58 +00:00
glebius
653f8c5e71 Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by:	brooks
2011-12-21 12:39:08 +00:00
glebius
8c74bad9f3 Restore a feature that was present in 5.x and 6.x, and was cleared in
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.

However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:

- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
  conditions, for now these are:
  - interface goes down
  - carp(4) has problems with ip_output() or ip6_output()
  - pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
  is actual value added to advskew. The adjustment values for
  particular error conditions are also configurable, and their
  defaults are maximum advskew value, so a single failure bumps
  demotion to maximum. This is for POLA compatibility, and should
  satisfy most users.
- Demotion factor is a writable sysctl, so user can do
  foot shooting, if he desires to.
2011-12-20 13:53:31 +00:00
glebius
27a36f6ac8 A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR:		kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by:	bz
Submitted by:	bz [1]
2011-12-16 12:16:56 +00:00
glebius
c43116e67e Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib(). 2011-12-15 12:49:10 +00:00
brooks
9029bb4f3b Remove the unused if_free_type() function.
X-MFC after:	never
2011-12-09 23:26:28 +00:00
luigi
298ffde665 1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
   even if the NIC resets its own ring (e.g. restarting from 0),
   the client will not see any change in the current rx/tx positions,
   because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
   There were some inconsistencies in the implementation of the netmap
   support routines, now drivers have been aligned to a common
   code structure.

3. import netmap support for ixgbe . This is implemented as a very
   small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
   in total the patch has only 54 lines of new code) , as most of
   the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
   following some initial comments from Jack Vogel about making
   changes less intrusive.
   (Note, i have emailed Jack multiple times asking if he had
   comments on this structure of the code; i got no reply so
   i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.
2011-12-05 12:06:53 +00:00
lstewart
9ff63371a4 Revert r227778 in preparation for committing reworked patches in its place. 2011-11-29 12:55:26 +00:00
jhb
27cc3c5372 Change the if_vlan driver to use if_transmit for forwarding packets to the
parent interface.  This avoids the overhead of queueing a packet to an IFQ
only to immediately dequeue it again.

Suggested by:	np
Reviewed by:	brooks
MFC after:	1 month
2011-11-28 19:35:08 +00:00
glebius
5393e1dd70 - Use generic alloc_unr(9) allocator for if_clone, instead
of hand-made.
- When registering new cloner, check whether a cloner with
  same name already exist.
- When allocating unit, also check with help of ifunit()
  whether such interface already exist or not. [1]

PR:		kern/162789 [1]
2011-11-28 14:44:59 +00:00
glebius
ec7618f8d2 Improve logging:
- don't hardcode function name
- use LOG_DEBUG for such a debug message
- print error value
2011-11-22 19:42:17 +00:00
lstewart
1ce25155b2 - When feed-forward clock support is compiled in, change the BPF header to
contain both a regular timestamp obtained from the system clock and the
  current feed-forward ffcounter value. This enables new possibilities including
  comparison of timekeeping performance and timestamp correction during post
  processing.

- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
  packets using the feedback or feed-forward system clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by:	Julien Ridoux (jridoux at unimelb edu au)
2011-11-21 04:17:24 +00:00
luigi
b97eb69f80 Bring in support for netmap, a framework for very efficient packet
I/O from userspace, capable of line rate at 10G, see

	http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

	sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
    Netmap has been developed by Luigi Rizzo and other collaborators
    at the Universita` di Pisa, and supported by EU project CHANGE
    (http://www.change-project.eu/)
    The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
  We should place kernel documentation in the same dir that contains
  the code, which would make it much simpler to keep doc and code
  in sync, reduce the clutter in share/man/ and incidentally is
  the policy used for all of userspace code.
  Makefiles and doc tools can be trivially adjusted to find the
  manpages in the relevant subdirs.
2011-11-17 12:17:39 +00:00
rmh
ff5c11fefd Remove a few bits of FreeBSD 2.x compatibility code.
Approved by:	kib (mentor)
2011-11-14 18:21:27 +00:00
brooks
e4a4d6436f In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type.  Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after:	1 Month
2011-11-11 22:57:52 +00:00
ed
0c56cf839d Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
2011-11-07 15:43:11 +00:00
ed
e97eae1577 Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
2011-11-07 06:44:47 +00:00
mlaier
7aec0d1835 Fix a use-after-free/redzone issue in the routing code.
Reported by (repeatedly):	Mike Tancsa
Prodded by (repeatedly):	bz
Forgotten by (repeatedly):	mlaier
MFC after:			2 weeks
2011-11-03 18:33:30 +00:00
glebius
fb9b0b1cbe Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off
the queue. It can be utilized in queue processing to avoid multiple
locking/unlocking.
2011-10-27 09:45:12 +00:00
qingli
9ae130094e The host-id/interface-id can have a specific value and is properly
masked out when adding a prefix route through the "route" command.
However, when deleting the route, simply changing the command keyword
from "add" to "delete" does not work. The failoure is observed in
both IPv4 and IPv6 route insertion. The patch makes the route command
behavior consistent between the "add" and the "delete" operation.

MFC after:	1 week
2011-10-25 00:34:39 +00:00
ed
b18bd1101c Add missing #includes.
According to POSIX, these two header files should be able to be included
by themselves, not depending on other headers. The <net/if.h> header
uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses
integer datatypes (u_int32_t, u_short, etc).

MFC after:	2 months
2011-10-21 12:58:34 +00:00
ed
832b15d289 Get rid of D_PSEUDO.
It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.

While there, remove an unneeded D_NEEDMINOR from the gpio driver.

Discussed with:	gonzo@ (gpio)
2011-10-18 08:09:44 +00:00
bz
a13ffdabcc Pass the fibnum where we need filtering of the message on the
rtsock allowing routing daemons to filter routing updates on an
rtsock per FIB.

Adjust raw_input() and split it into wrapper and a new function
taking an optional callback argument even though we only have one
consumer [1] to keep the hackish flags local to rtsock.c.

PR:		kern/134931
Submitted by:	multiple (see PR)
Suggested by:	rwatson [1]
Reviewed by:	rwatson
MFC after:	3 days
2011-09-28 13:48:36 +00:00
kmacy
e3079e1350 Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.
Reviewed by:	rwatson, bz
Approved by:	re (kib)
2011-09-20 20:27:26 +00:00
kmacy
99851f359e In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
thompsa
cb24e9343d On the first loop for generating a bridge MAC address use the local
hostid, this gives a good chance of keeping the same address over
reboots. This is intended to help IPV6 and similar which generate
their addresses from the mac.

PR:		kern/160300
Submitted by:	mdodd
Approved by:	re (kib)
2011-09-04 22:06:32 +00:00
bz
75529036c4 When adding IPv6 fwd support to ipfw in r225044 these two files were
not committed.  Initialize next_hop6 to align with the IPv4 code.

PR:		bin/117214
MFC after:	3 weeks
X-MFC with:	r225044
Approved by:	re (kib)
2011-08-27 08:49:55 +00:00
attilio
683d7a54ce Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
qingli
631a8abdff When the RADIX_MPATH kernel option is enabled, the RADIX_MPATH code tries
to find the first route node of an ECMP chain before executing the route
command. If the system has a default route, and the specific route argument
to the command does not exist in the routing table, then the default route
would be reached. The current code does not verify the reached node matches
the given route argument, therefore erroneous removed the entry. This patch
fixes that bug.

Approved by:	re
MFC after:	3 days
2011-08-25 04:31:20 +00:00
kevlo
c7105822b4 In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is
initialized by flags (function argument) or-ed with ifa->ifa_flags.
If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s).
As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with
RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set
to zero in rtrequest1_fib(), and request to add network route is changed
under hands to request to add host route.

Tested by:	Andrew Boyer <aboyer at averesystems.com>
Submitted by:	Svatopluk Kraus <onwahe at gmail dot com>
Approved by:	re (hrs)
2011-08-08 05:25:51 +00:00
pluknet
957e60f904 Add missing MODULE_VERSION() definition to protect against duplicating
module loads.

PR:		kern/159345
Reported by:	Eugene Grosbein <egrosbein att rdtc ru>
Tested by:	Eugene Grosbein <egrosbein att rdtc ru>
Approved by:	re (kib)
MFC after:	1 week
2011-08-01 11:24:55 +00:00
bz
352be4e985 Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)

Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.

Discussed with:	rwatson (slightly earlier version)
2011-07-17 21:15:20 +00:00
mp
f3103cdbe2 Clear the filter memory area before using it. Leaving it uninitialized may
leak previous kernel stack contents through a malicioius BPF filter.

PR:		kern/158880
Submitted by:	Guy Harris
Obtained from:	OpenBSD
MFC after:	1 week
2011-07-14 21:06:22 +00:00
zec
99a0b299b3 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
thompsa
9d0e437193 Grab the rlock before checking if our interface is enabled, it could be
possible to hit a dead pointer when changing interfaces.

PR:		kern/156978
Submitted by:	Andrew Boyer
MFC after:	1 week
2011-07-07 20:02:09 +00:00
bz
300a95bf76 Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).

Submitted by:	Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by:	julian
MFC after:	2 weeks
2011-07-03 16:08:38 +00:00
bz
cf260e73d6 Remove extra white space to comply with style for the rest of the struct.
MFC after:	2 weeks
2011-07-03 15:34:09 +00:00
bz
9cad5bfef3 Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by:	cjsp
Submitted by:	Alexander V. Chernikov (melifaro ipfw.ru)
		(original versions)
Reviewed by:	julian
Reviewed by:	Alexander V. Chernikov (melifaro ipfw.ru)
MFC after:	2 weeks
X-MFC:		use spare in struct ifnet
2011-07-03 12:22:02 +00:00
pluknet
29ff2b4aff Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.

PR:		kern/158369
Reported by:	Paul Procacci <pprocacci att gmail com>
MFC after:	1 week
2011-06-28 08:41:44 +00:00
bz
a8e6967ab3 Garbage collect never used global, sysctl, externs.
MFC after:	1 week
2011-06-21 07:19:03 +00:00
bz
98020d5ece Leave an extra comment about flowtable and IPv6 support rectifying a
previous comment.

MFC after:	1 week
2011-06-20 12:35:12 +00:00
bz
f4689a8d0f gre(4) was using a field in the softc to detect possible recursion.
On MP systems this is not a usable solution anymore and could easily
lead to false positives triggering enough logging that even  using
the console was no longer usable (multiple parallel ping -f can do).

Switch to the suggested solution of using mbuf tags to carry per
packet state between gre_output() invocations.  Contrary to the
proposed solution modelled after gif(4) only allocate one mbuf tag
per packet rather than per packet and per gre_output() pass through.

As the sysctl to control the possible valid (gre in gre) nestings does
no sanity checks, make sure to always allocate space in the mbuf tag
for at least one, and at most 255 possible gre interfaces to detect
loops in addition to the counter.

Submitted by:	Cristian KLEIN (cristi net.utcluj.ro) (original version)
PR:		kern/114714
Reviewed by:	Cristian KLEIN (cristi net.utcluj.ro)
Reviewed bu:	Wooseog Choi (ben_choi hotmail.com)
Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2011-06-18 09:34:03 +00:00
luigi
7cd78b912e Grab one of the ifcap bits for netmap, and enable printing in ifconfig.
Document the fact that we might want an IFCAP_CANTCHANGE mask,
even though the value is not yet used in sys/net/if.c

(asked on -current a week ago, no feedback so i assume no objection).
2011-06-14 12:40:55 +00:00
zec
5bdda7f91b Set curvnet context in a callout-trigerred code path.
MFC after:	3 days
2011-06-07 20:46:03 +00:00
jhb
b37cdd581d Properly return an ENOBUFS error if a write to a tun(4) device fails
due to m_uiotombuf() failing.

While here, trim unneeded error handling related to tuninit() since it
can never fail.

Submitted by:	Martin Birgmeier  la5lbtyi aon at
Reviewed by:	glebius
MFC after:	1 week
2011-06-03 13:47:05 +00:00
rwatson
79f582472b Add an optional netisr dispatch point at ether_input(), but set the
default dispatch method to NETISR_DISPATCH_DIRECT in order to force
direct dispatch.  This adds a fairly negligble overhead without
changing default behavior, but in the future will allow deferred or
hybrid dispatch to other worker threads before link layer processing
has taken place.

For example, this could allow redistribution using RSS hashes
without ethernet header cache line hits, if the NIC was unable to
adequately implement load balancing to too small a number of input
queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in
a modern system with 16-128 threads.  This can happen on highly
threaded systems, where you want want an ithread per core,
redistributing work to other queues, but also on virtualised systems
where hardware hashing is (or is not) available, but only a single
queue has been directed to one VCPU on a VM.

Note: this adds a previously non-present assertion about the
equivalence of the ifnet from which the packet is received, and the
ifnet stamped in the mbuf header.  I believe this assertion to
generally be true, but we'll find out soon -- if it's not, we might
have to add additional overhead in some cases to add an m_tag with
the originating ifnet pointer stored in it.

Reviewed by:    bz
MFC after:      3 weeks
Sponsored by:   Juniper Networks, Inc.
2011-06-01 20:00:25 +00:00
nwhitehorn
a69e106b2f On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
rwatson
d81ad1a304 Rework netisr policy mechanism so that per-protocol dispatch policies can
be represented:

- A single policy namespace is defined, consisting of four possible
  policies: "default" to use the global default, "deferred" to force
  deferred dispatch, "direct" to employ direct dispatch where possible, and
  "hybrid" which makes a dynamic decision based on CPU affinity, ordering,
  etc.  Routines are implemented to convert between strings and an integer
  namespace.

- A new global variable, netisr_dispatch_policy, subsumes existing global
  variables for direct dispatch, forced direct dispatch, etc, and is used
  for explicit policy interpretation and composition.  Old variables remain
  so that they can be exported by legacy sysctls for use by old netstat(1)
  binaries.  A new sysctl and tunable, netisr.dispatch.policy, accepts the
  above strings for specifying a global policy default.

- The protocol registration structure, netisr_handler, grows an nh_dispatch
  field, which accepts a per-policy policy override.  The default value is
  '0', which corresponds to "default", meaning that protocols will accept
  the global default policy unless otherwise specified.

- Policies are now interpreted and composed explicitly at various points in
  packet dispatch; protocol policies override global policies.

- Protocols grow the ability to express a non-opinion about affinity even
  when implenting m2cpuid by returning NETISR_CPUID_NONE.  In that case, the
  framework falls back on source ordering, rather than simply using the
  current CPU.

These changes are in support of allowing link layer re-dispatch based on
RSS or similar hashes provided by NICs, especially in the case where the
number of hardware receive queues matches hardware core count, rather than
hardware thread count, requiring further software redistributeon.  (i.e.,
on RMI XLR).

MFC after:      3 weeks
Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-24 12:34:19 +00:00
zec
0e7fcf0b2c Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured
on top of epair(4) virtual interfaces, since there's no physical
hardware associated with epair interfaces which would imply any
constraints on MTU sizes.

MFC after:	3 days
2011-05-24 08:02:55 +00:00
zec
3ab76c6800 Let epair(4) virtual interfaces report fake link / media status,
by borrowing the skeleton of if_media manipulation and reporting
code from if_lagg(4).  The main motivation behind this change is
to allow for epair(4) interfaces to participate in STP if_bridge(4)
configurations.

Reviewed by:	bz
MFC after:	3 days
2011-05-24 07:57:28 +00:00
qingli
a1bf1a2582 The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by:	delphij
MFC after:	5 days
2011-05-20 19:12:20 +00:00
marius
03919a3303 - Add 10baseT as an alias for 10baseT/UTP.
- Add shorthand aliases for common media+option combinations as announced
  by miibus(4) so that one can actually supply the media strings found in
  the dmesg output to ifconfig(8).

Obtained from:	NetBSD (in principle)
MFC after:	2 weeks
2011-05-15 12:58:29 +00:00
yongari
0aab37190f Fix white space nits and style 2011-05-06 20:46:29 +00:00
yongari
b282e7aca8 Do not increment collision counter if transmit have failed.
Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it
has nothing to do with collision.

Reported by:	Zeus V Panchenko (zeus <> ibs dot dn dot ua)
2011-05-06 20:37:07 +00:00
thompsa
fc83a48265 LACP frames must not be send VLAN-tagged, check for that before processing.
PR:		kern/156743
Submitted by:	Dmitrij Tejblum
MFC after:	1 week
2011-04-30 20:34:52 +00:00
bz
1910487722 Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs.  For module builds the framework needs
adjustments for at least carp.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:30:44 +00:00
glebius
8c3b1b83f9 When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.

Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.
2011-04-04 07:45:08 +00:00
jeff
2d7d8c05e7 - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
dchagin
0f55e947bd Remove dead code.
MFC after:	1 Week
2011-03-20 08:35:00 +00:00
dchagin
a96a94aaf2 ouch, newrt is used on the return path, my fault.
Partialy revert the previous change.

MFC after:	1 Week.
2011-03-19 21:10:57 +00:00
dchagin
b3fb445553 A bit rearranged rtalloc1_fib() code.
Initialize a variable when it is really needed.
To avoid code duplication move the miss label to line up and jump on it.

MFC after:	1 Week
2011-03-19 19:50:36 +00:00
dchagin
9dffdca01d Remove a now unused variable.
MFC after:	1 Week
2011-03-19 16:52:06 +00:00
eri
2ad117efbd Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none.
Approved by:	thompsa(mentor)
MFC after:	1 week
2011-03-04 20:37:38 +00:00
bz
209ebad7af Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4))
from processes inside jails if the addresses do not belong to the jail.

Originally reported by: Pieter de Boer via remko
PR:		kern/151119
Tested by:	Piotr KUCHARSKI (nospam 42.pl) [gif]
MFC after:	1 week
2011-03-02 21:39:08 +00:00
brucec
6d9b42b486 Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
bz
b9b7d3e93a Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks
2011-02-16 21:29:13 +00:00
bz
1289b7e264 Mfp4 CH=177255:
Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match
  the call order of the VNET_DEBUG case.

  Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that
  INVARIANTS will still catch problems.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb

MFC after:	2 weeks
2011-02-11 14:17:58 +00:00
bz
aad842be85 Mfp4 CH=177255:
Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

  Change the syntax to match KASSERT() to allow more flexible panic
  messages rather than having a printf with hardcoded arguments
  before panic.

  Adjust the few assertions we have to the new format (and enhance
  the output).

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:	jhb

MFC after:	2 weeks
2011-02-11 13:27:00 +00:00
bz
3bac00c1ca Mfp4 CH=177255:
Use __func__ rather than __FUNCTION__.

MFC after:	2 weeks
2011-02-11 12:56:05 +00:00
mlaier
f0ab2e35d4 As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm
until rt_dispatch is done with the sockaddr.

Found by:	memguard
MFC after:	3 days
2011-02-10 01:24:09 +00:00
jhb
aff100c87f Fix a LOR by dropping the global ifnet locks while allocating a new ifnet
table in if_grow().  The order of the SYSINIT's for ifnet state were swapped
so that the various locks were initialized before being used.

Reviewed by:	pluknet, bz
MFC after:	2 weeks
2011-01-24 22:21:58 +00:00
mdf
6648b8cede sysctl(8) should use the CTLTYPE to determine the type of data when
reading.  (This was already done for writing to a sysctl).  This
requires all SYSCTL setups to specify a type.  Most of them are now
checked at compile-time.

Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).

Succested by:	bde
2011-01-19 17:04:07 +00:00
mdf
5e41205b16 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the net* piece.
2011-01-12 19:53:50 +00:00
jhb
c17f46e472 Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
bz
7864be44ec MfP4 CH=185246 [1]:
Add FEATURE() to announce optional VIMAGE.

MFC after:	3 days
[1] for the moment put it in vnet.c.
2011-01-09 20:40:21 +00:00
jhb
3a7fd5f8c7 - Restore dropping the priority of syncer down to PPAUSE when it is idle.
This was lost when it was converted to using a condition variable instead
  of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
  since it is a similar background task.

MFC after:	2 weeks
2011-01-06 22:17:07 +00:00
marius
2c3e165ed0 Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD
counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for
"flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback"
for "hw-loopback".

MFC after:	1 week
2011-01-05 15:28:30 +00:00
marius
577786b402 Fix whitespace.
MFC after:	1 week
2011-01-05 14:51:04 +00:00
bz
43c37d3ddd Use NULL rather than 0 to invalidate a pointer.
Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(),
call it directly (like we do for the RT_* macros).

Sponsored by:	ISPsystem [1]
Reviewed by:	julian [1]
MFC After:	1 week

[1] Early 2010.
2010-12-31 21:57:54 +00:00
bz
2a04b52e8a Print the vnet pointer under DDB when iterating over flowtables of each
virtual network stack instance.

Sponsored by:	ISPsystem [1]
Reviewed by:	julian [1]
MFC after:	1 week

[1] Early 2010.
2010-12-31 21:20:32 +00:00
bz
558c7e035b Move the increment operation under the lock and split the condition
variable into two so that we can see on which one we are waiting.
This might also more properly propagate the update of the
flowclean_cycles flag and avoid "hangs" people were seeing.

Suggested by:	rwatson [1]
Sponsored by:	ISPsystem [1]
Reviewed by:	julian [1]
Updated by:	Mikolaj Golub (to.my.trociny gmail.com)
Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC After:	1 week

[1] Early 2010, initial version.
2010-12-31 21:06:52 +00:00
alc
971b02b7bc Introduce and use a new VM interface for temporarily pinning pages. This
new interface replaces the combined use of vm_fault_quick() and
pmap_extract_and_hold() throughout the kernel.

In collaboration with:	kib@
2010-12-25 21:26:56 +00:00
weongyo
6dc48cb05c Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through
ioctl(2).
2010-12-07 20:31:04 +00:00
weongyo
33417874f4 Introduces IFF_CANTCONFIG interface flag to point that the interface
isn't configurable in a meaningful way.  This is for ifconfig(8) or
other tools not to change code whenever IFT_USB-like interfaces are
registered at the interface list.

Reviewed by:	brooks
No objections:	gavin, jkim
2010-12-07 20:23:47 +00:00
maxim
c6ed377d5e o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize.
PR:		misc/152531
MFC after:	1 week
2010-11-24 05:50:19 +00:00
zec
163ca01105 Allow for vlan(4) ifnets to have overlapping unit numbers if they are
created in separated vnets.  As a side-effect of having a separated
if_cloner instance for each vnet, all vlan ifnets created in a vnet
will be automatically destroyed when vnet teardown is initiated.

Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are
associated with physical ifnets residing in parent vnets.

This is an interim vlan-specific solution which will be superseded by a
more generic if_cloner V_irtualization change from p4.  For nooptions
VIMAGE builds, this should be a no-op change.

Discussed with:	bz
MFC after:	3 days
2010-11-22 23:35:29 +00:00
dim
fb307d7d1d After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
bz
076c86b1ea Add a missing ';' and change the debugging sysctl from xint to int.
Submitted by:		Mikolaj Golub (to.my.trociny gmail.com)
MFC after:		3 days
2010-11-21 19:33:19 +00:00
dim
02778a6df6 Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
2010-11-14 20:40:55 +00:00
dim
fda4020a88 Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
dim
7dff36caf7 Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE. 2010-11-14 20:23:02 +00:00
marius
278d761d73 o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control
support in mii(4):
  - Merge generic flow control advertisement (which can be enabled by
    passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
    NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
    IFM_FLOW isn't implemented as a global option via the "don't care
    mask" but instead as a media specific option this. This has the
    following advantages:
    o allows flow control advertisement with autonegotiation to be
      turned on and off via ifconfig(8) with the default typically
      being off (though MIIF_FORCEPAUSE has been added causing flow
      control to be always advertised, allowing to easily MFC this
      changes for drivers that previously used home-grown support for
      flow control that behaved that way without breaking POLA)
    o allows to deal with PHY drivers where flow control advertisement
      with manual selection doesn't work or at least isn't implemented,
      like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
      by setting MIIF_NOMANPAUSE
    o the available combinations of media options are readily available
      from the `ifconfig -m` output
  - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
    and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
    these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
  - Change IFM_ETH_MASTER from being implemented as a global option via
    the "don't care mask" to a media specific one as it actually is only
    applicable to IFM_1000_T to date.
  - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
    actually configure manually selected slave mode (like we also do in
    the PHY specific implementations).
  - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
    is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
  e1000phy(4) and ip1000phy(4) to use the generic flow control support
  instead of home-grown solutions via IFM_FLAGs. This includes changing
  these PHY drivers and smcphy(4) to no longer unconditionally advertise
  support for flow control but only if the selected media has IFM_FLOW
  set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
  i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
  set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
  and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
  the contents of the BMSR via mii_phy_add_media() instead of hardcoding
  them. The latter approach seems to have developed historically, besides
  causing unnecessary code duplication it was also undesirable because
  brgphy_mii_phy_auto() already based the capability advertisement on the
  contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
  just BCM5701. Apparently this was a misinterpretation of a workaround
  in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
  BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
  this doesn't mean we can't set these as well on other PHYs for manual
  media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
  IFM_1000_T master mode support now is generally available with all PHY
  drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
  not applicable there.

Reviewed by:	yongari (plus additional testing)
Obtained from:	NetBSD (partially), OpenBSD (partially)
MFC after:	2 weeks
2010-11-14 13:26:10 +00:00
kib
1889bb0afa Use 'z' modifier for size_t printing. 2010-11-13 11:11:51 +00:00
dim
7b0aabca30 Similar to r212647, remove the workaround in sys/net/vnet.h for an ld
bug (incorrect placement of __start_SECNAME in some cases) that was
fixed in r210245.

There is already an UPDATING entry about needing a recent ld.

MFC after:	1 month
2010-11-12 22:59:50 +00:00
gnn
c3225b5eaa Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry.  This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out.  The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by:	bz, rpaulo
MFC after: 3 weeks
2010-11-12 22:03:02 +00:00
dim
2c7257916f Use the same treatment as in linker_set.h for the __start and __stop
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.

Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.

Reviewed by:	kib
2010-11-11 19:18:52 +00:00
rpaulo
2631ae0f3d Sync DLTs with the latest pcap version. 2010-10-29 18:41:09 +00:00
bz
491af1942e Factor out DDB commands from r204145, r204279 into if_debug.c for further
enhancements (1).  Switch to a standard 2-clause BSD license for this (2).

Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.

Suggested by:	rwatson (1) a while back.
Approved by:	thompsa (2) for the change from r204279.
MFC after:	6 days
2010-10-25 08:30:19 +00:00
pluknet
8950ed8036 Reshuffle SIOCGIFCONF32 handler from r155224.
- move all the chunks into one file, which allows to hide SIOCGIFCONF32
  global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
  doing internal 32bit specific changes within it (e.g. as it's done for
  SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.

Approved by:	kib (mentor)
MFC after:	2 weeks
2010-10-21 16:20:48 +00:00
bz
753b9262ae Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.

PR:		kern/146250
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	1 week
2010-10-16 19:25:27 +00:00
bz
e7e5079137 lltable_drain() has never been used so far, thus #if 0 it for now.
While touching it add the missing locking to the now disabled code
for the time when we'll resurrect it.

MFC after:	3 days
2010-10-16 18:42:09 +00:00
bz
054f4fff23 Only hide the ifa and not the tp under #ifdef INET as the tp is needed
for locking evenwhen there is no INET.

MFC after:	3 days
2010-10-01 15:14:14 +00:00
jhb
b8914d3479 - Expand scope of tun/tap softc locks to cover more softc fields and
driver-maintained ifnet fields (such as if_drv_flags).
- Use soft locks as the mutex that protects each interface's knote list
  rather than using the global knote list lock.  Also, use the softc
  for kn_hook instead of the cdev.
- Use mtx_sleep() instead of tsleep() when blocking in the read routines.
  This fixes a lost wakeup race.
- Remove D_NEEDGIANT now that the cdevsw routines use the softc lock
  where locking is needed.
- Lock IFQ when calculating the result for FIONREAD in tap(4).  tun(4)
  already did this.
- Remove remaining spl calls.

Submitted by:	Marcin Cieslak  saper of saper|info (3)
MFC after:	2 weeks
2010-09-22 21:02:43 +00:00
jkim
4b1a6e6702 Fix a typo in a comment.
Submitted by:	afiveg
2010-09-16 18:37:33 +00:00
mdf
ab3a8b533a Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
bz
06977c291c MFp4 CH=183259:
No reason to use if_free_type() as we don't change our type.
  Just if_free() is fine.

MFC after:	3 days
2010-09-02 16:11:12 +00:00
emaste
a9a1b47f1d Add a sysctl knob to accept input packets on any link in a failover lagg. 2010-09-01 16:53:38 +00:00
bz
70761a9ea6 MFp4 CH=182972:
Add explicit linkstate UP/DOWN for the epair.  This is needed by carp(4)
and other things to work.

MFC after:	5 days
2010-08-27 23:22:58 +00:00
rpaulo
ea11ba6788 Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
zec
e1e5264fc5 When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by:	julian
MFC after:	3 days
2010-08-13 18:17:32 +00:00
will
d548943ae9 Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
will
aa4e762c4a Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
jhb
4ab164bfd3 Adjust the interface type in the link layer socket address for vlan(4)
interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface
(IFT_ETHER).  The code already fixed if_type in the ifnet causing some
places to report the interface as a vlan (e.g. arp -a output) and other
places to report the interface as Ethernet (getifaddrs(3)).  Now they
should all report IFT_L2VLAN.

Reviewed by:	brooks
MFC after:	1 month
2010-08-06 15:15:26 +00:00
kib
c5df0ad3b8 Properly set ifi_datalen for compat32 struct if_data32.
PR:	kern/149240
Submitted by:	Stef Walter <stef memberwebs com>
MFC after:	1 weeks
2010-08-03 15:40:42 +00:00
glebius
b89adb17c9 Don't check malloc(M_WAITOK) result. 2010-07-27 11:56:49 +00:00
bz
0078d05705 Return NULL rather than 0 for a pointer.
MFC after:	3 days
2010-07-27 11:54:01 +00:00
glebius
43f917d899 When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.

This lead to losing LLE_PUB and/or LLE_PROXY flags.

We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.

PR:		kern/148784, kern/146539
Silence from:	qingli, 5 days
2010-07-27 10:05:27 +00:00
jkim
9fdab3dd15 Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock
regardless of the given flags.

MFC after:	3 days
2010-07-22 18:44:40 +00:00
luigi
bbbbe1a022 whitespace cleanup 2010-07-15 14:41:59 +00:00
luigi
9f2bcd601a small portability fix to build on linux/windows 2010-07-15 14:41:06 +00:00
jkim
14f08fd627 Implement flexible BPF timestamping framework.
- Allow setting format, resolution and accuracy of BPF time stamps per
listener.  Previously, we were only able to use microtime(9).  Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command.  Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts.  bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp.  The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries.  On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long.  On 32-bit platforms, the size may
increase by 8 bytes.  For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested.  However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver.  Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by:	net@
2010-06-15 19:28:44 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
zec
66c3a596d7 Provide a macro for registering a virtualized sysctl handler for
VNET opaque data.

MFC after:	30 days
2010-06-02 15:29:21 +00:00
qingli
f6ab4a6810 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
jhb
7df2d528cf Ignore failures from removing multicast addresses from the parent (trunk)
interface when tearing down a vlan interface.  If a trunk interface is
detached, all of its multicast addresses are removed before the ifnet
departure eventhandlers are invoked.  This means that all of the multicast
addresses are removed before the vlan interfaces are removed which causes
the if_delmulti() calls in the vlan teardown to fail.

In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer
valid parent interface.  In the !VLAN_ARRAY case, the eventhandler gets
stuck in an infinite loop retrying vlan_unconfig_locked() forever.  In
general the callers of vlan_unconfig_locked() do not expect nor handle
failure, so I believe it is safer to ignore the errors and tear down as
much of the vlan state as possible.

Silence from:	net@
MFC after:	4 days
2010-05-17 19:36:56 +00:00
kmacy
a68dd336d5 allocate ipv6 flows from the ipv6 flow zone
reported by: rrs@

MFC after:	3 days
2010-05-16 21:48:39 +00:00
bz
c9d1ca826b Fix an issue with the dynamic pcpu/vnet data allocators.
We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous.  For the moment do not support more than
*_MODMIN space and ignore the extra space (*).

(*) We know how to get it back but it'll need testing.

Discussed with:	jeff, rwatson (briefly)
Reviewed by:	jeff
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-05-14 21:11:58 +00:00
kmacy
5e73b28c4d workaround bug with ipv6 where a flow can have a null rtentry 2010-05-12 04:51:20 +00:00
alc
e6c77ecaea Remove page queues locking from all sf_buf_mext()-like functions. The page
lock now suffices.

Fix a couple nearby style violations.
2010-05-06 17:43:41 +00:00
alc
c9aaa1e2a2 Add page locking to the vm_page_cow* functions.
Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by:	kib
2010-05-04 15:55:41 +00:00
sobomax
213eac1f2c Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after:	1 month
2010-05-03 07:32:50 +00:00
alc
387e15c45a This is the first step in transitioning responsibility for synchronizing
access to the page's wire_count from the page queues lock to the page lock.

Submitted by:	kmacy
2010-05-03 05:41:50 +00:00
kmacy
1dc1263413 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
bz
0a90ef1728 MFP4: @176978-176982, 176984, 176990-176994, 177441
"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by:	jhb
Discussed with:	rwatson
Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	6 days
2010-04-29 11:52:42 +00:00
kmacy
e04522d342 need to initialize the lock before it is used
MFC after:	3 days
2010-04-27 23:48:50 +00:00
bz
7fcd42cfee MFP4: @177254
Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.

Sponsored by:	The FreeBSD Foundation
Sponsored by:	CK Software GmbH
MFC after:	4 days
2010-04-27 15:16:54 +00:00
kib
2861e20e63 Provide compat32 shims for bpf(4), except zero-copy facilities.
bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.

Submitted by:	jhb (version for 6.x)
Reviewed and tested by:	emaste
MFC after:	2 weeks
2010-04-25 16:43:41 +00:00
kib
1bf8e84d75 Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST.
This allows getifaddrs(3) to work for compat32 binaries.

Submitted by:	jhb (6.x version)
Reviewed by:	emaste
Tested by:	emaste and <pluknet gmail com>
MFC after:	2 weeks
2010-04-25 16:42:47 +00:00
julian
e5e811b1e9 Move two copies of the same definition to a common include file.
MFC after: 3 weeks
2010-04-14 23:06:07 +00:00
delphij
58be1647f8 When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland.  However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by:	bschmidt
MFC after:	2 weeks
2010-04-14 22:02:19 +00:00
bz
778f026aa2 Take a reference to make sure that the interface cannot go away during
if_clone_destroy() in case parallel threads try to.

PR:		kern/116837
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	10 days
2010-04-11 18:47:38 +00:00
bz
35f1e4bd73 Check that the interface is on the list of cloned interfaces before trying
to remove it to avoid panics in case of two threads trying to remove it in
parallel.

PR:		kern/116837
Submitted by:	Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version)
MFC after:	10 days
2010-04-11 18:41:31 +00:00
bz
d7a91dc6bf Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by:		qingli (earlier version)
MFC after:		10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
			Christian Kratzer (ck cksoft.de),
			Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-11 16:04:08 +00:00
bz
d5b3b3f9d9 In if_detach_internal() we cannot hold the af_data lock over the
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.

Discussed with:	rwatson
MFC after:	10 days
2010-04-11 11:51:44 +00:00
bz
674a87c918 In if_detach_internal() only try to do the detach run if if_attachdomain1()
has actually succeeded to initialize and attach.  There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.

Discussed with:	rwatson
MFC after:	10 days
2010-04-11 11:49:24 +00:00
jkim
2b1849a3fc Check the pointer to JIT binary filter before its de-allocation.
Submitted by:	Alexander Sack (asack at niksun dot com)
MFC after:	3 days
2010-03-29 20:24:03 +00:00
rpaulo
e14131f581 Add MCS to the list of media types.
Sponsored by:	iXsystems, inc.
2010-03-23 13:15:11 +00:00
kmacy
01cb21605b - boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
  when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
  allocating again until below 12.5%

MFC after:	7 days
2010-03-22 23:04:12 +00:00
emaste
41b3806743 Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA
ioctl call, as it may sleep.

Reviewed by:	rwatson
2010-03-21 15:00:33 +00:00
bz
102a1f8933 Split eventhandler_register() into an internal part and a wrapper function
that provides the allocated and setup eventhandler entry.

Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.

Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.

Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.

Sponsored by:	ISPsystem
Discussed with:	jhb
Reviewed by:	zec (earlier version), jhb
MFC after:	1 month
2010-03-19 19:51:03 +00:00
bz
abd6b4ebb7 Add ddb support to the "new" link layer code ("new-arp"):
- show all lltables [1] (optional flag to also show the llentries as well)
 - show lltable <struct lltable *>
 - show llentry <struct llentry *>

MFC after:	6 days
2010-03-18 09:09:59 +00:00
qingli
4ff4954e4e Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.

Also updated the ifconfig utility to show the LINKSTATE
capability if present.

Reviewed by:	rwatson, imp, juli
MFC after:	3 days
2010-03-16 17:59:12 +00:00
mlaier
d62719cf37 Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834.
MFC after:	3 days
2010-03-15 21:15:03 +00:00
kmacy
a5e0110227 flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB
pointed out by jkim@
2010-03-12 19:58:51 +00:00
jkim
5e802872af Fix a style(9) nit. 2010-03-12 19:42:42 +00:00