Commit Graph

2804 Commits

Author SHA1 Message Date
Bjoern A. Zeeb
3d07127c64 When adding IPv6 fwd support to ipfw in r225044 these two files were
not committed.  Initialize next_hop6 to align with the IPv4 code.

PR:		bin/117214
MFC after:	3 weeks
X-MFC with:	r225044
Approved by:	re (kib)
2011-08-27 08:49:55 +00:00
Attilio Rao
6aba400a70 Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by:	Sandvine Incorporated
Reported by:	rstone
Tested by:	pluknet
Reviewed by:	jhb, kib
Approved by:	re (bz)
MFC after:	3 weeks
2011-08-25 15:51:54 +00:00
Qing Li
fc96aabef1 When the RADIX_MPATH kernel option is enabled, the RADIX_MPATH code tries
to find the first route node of an ECMP chain before executing the route
command. If the system has a default route, and the specific route argument
to the command does not exist in the routing table, then the default route
would be reached. The current code does not verify the reached node matches
the given route argument, therefore erroneous removed the entry. This patch
fixes that bug.

Approved by:	re
MFC after:	3 days
2011-08-25 04:31:20 +00:00
Kevin Lo
e9ff3d45e4 In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is
initialized by flags (function argument) or-ed with ifa->ifa_flags.
If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s).
As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with
RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set
to zero in rtrequest1_fib(), and request to add network route is changed
under hands to request to add host route.

Tested by:	Andrew Boyer <aboyer at averesystems.com>
Submitted by:	Svatopluk Kraus <onwahe at gmail dot com>
Approved by:	re (hrs)
2011-08-08 05:25:51 +00:00
Sergey Kandaurov
c94a66f8ae Add missing MODULE_VERSION() definition to protect against duplicating
module loads.

PR:		kern/159345
Reported by:	Eugene Grosbein <egrosbein att rdtc ru>
Tested by:	Eugene Grosbein <egrosbein att rdtc ru>
Approved by:	re (kib)
MFC after:	1 week
2011-08-01 11:24:55 +00:00
Bjoern A. Zeeb
d9a362862c Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)

Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.

Discussed with:	rwatson (slightly earlier version)
2011-07-17 21:15:20 +00:00
Mark Peek
a4980a95b5 Clear the filter memory area before using it. Leaving it uninitialized may
leak previous kernel stack contents through a malicioius BPF filter.

PR:		kern/158880
Submitted by:	Guy Harris
Obtained from:	OpenBSD
MFC after:	1 week
2011-07-14 21:06:22 +00:00
Marko Zec
13e255fab7 Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address.  This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after:	3 days
2011-07-08 09:38:33 +00:00
Andrew Thompson
6069a2c0bd Grab the rlock before checking if our interface is enabled, it could be
possible to hit a dead pointer when changing interfaces.

PR:		kern/156978
Submitted by:	Andrew Boyer
MFC after:	1 week
2011-07-07 20:02:09 +00:00
Bjoern A. Zeeb
a34c6aeb85 Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).

Submitted by:	Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by:	julian
MFC after:	2 weeks
2011-07-03 16:08:38 +00:00
Bjoern A. Zeeb
43deddcdfe Remove extra white space to comply with style for the rest of the struct.
MFC after:	2 weeks
2011-07-03 15:34:09 +00:00
Bjoern A. Zeeb
35fd7bc020 Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by:	cjsp
Submitted by:	Alexander V. Chernikov (melifaro ipfw.ru)
		(original versions)
Reviewed by:	julian
Reviewed by:	Alexander V. Chernikov (melifaro ipfw.ru)
MFC after:	2 weeks
X-MFC:		use spare in struct ifnet
2011-07-03 12:22:02 +00:00
Sergey Kandaurov
235195988b Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.

PR:		kern/158369
Reported by:	Paul Procacci <pprocacci att gmail com>
MFC after:	1 week
2011-06-28 08:41:44 +00:00
Bjoern A. Zeeb
f5857e2d3d Garbage collect never used global, sysctl, externs.
MFC after:	1 week
2011-06-21 07:19:03 +00:00
Bjoern A. Zeeb
b8b8e0c981 Leave an extra comment about flowtable and IPv6 support rectifying a
previous comment.

MFC after:	1 week
2011-06-20 12:35:12 +00:00
Bjoern A. Zeeb
52dcd04ba3 gre(4) was using a field in the softc to detect possible recursion.
On MP systems this is not a usable solution anymore and could easily
lead to false positives triggering enough logging that even  using
the console was no longer usable (multiple parallel ping -f can do).

Switch to the suggested solution of using mbuf tags to carry per
packet state between gre_output() invocations.  Contrary to the
proposed solution modelled after gif(4) only allocate one mbuf tag
per packet rather than per packet and per gre_output() pass through.

As the sysctl to control the possible valid (gre in gre) nestings does
no sanity checks, make sure to always allocate space in the mbuf tag
for at least one, and at most 255 possible gre interfaces to detect
loops in addition to the counter.

Submitted by:	Cristian KLEIN (cristi net.utcluj.ro) (original version)
PR:		kern/114714
Reviewed by:	Cristian KLEIN (cristi net.utcluj.ro)
Reviewed bu:	Wooseog Choi (ben_choi hotmail.com)
Sponsored by:	Sandvine Incorporated
MFC after:	1 week
2011-06-18 09:34:03 +00:00
Luigi Rizzo
c9d658e9f7 Grab one of the ifcap bits for netmap, and enable printing in ifconfig.
Document the fact that we might want an IFCAP_CANTCHANGE mask,
even though the value is not yet used in sys/net/if.c

(asked on -current a week ago, no feedback so i assume no objection).
2011-06-14 12:40:55 +00:00
Marko Zec
2fe7ca2ca6 Set curvnet context in a callout-trigerred code path.
MFC after:	3 days
2011-06-07 20:46:03 +00:00
John Baldwin
190367ef1c Properly return an ENOBUFS error if a write to a tun(4) device fails
due to m_uiotombuf() failing.

While here, trim unneeded error handling related to tuninit() since it
can never fail.

Submitted by:	Martin Birgmeier  la5lbtyi aon at
Reviewed by:	glebius
MFC after:	1 week
2011-06-03 13:47:05 +00:00
Robert Watson
6cb52192fe Add an optional netisr dispatch point at ether_input(), but set the
default dispatch method to NETISR_DISPATCH_DIRECT in order to force
direct dispatch.  This adds a fairly negligble overhead without
changing default behavior, but in the future will allow deferred or
hybrid dispatch to other worker threads before link layer processing
has taken place.

For example, this could allow redistribution using RSS hashes
without ethernet header cache line hits, if the NIC was unable to
adequately implement load balancing to too small a number of input
queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in
a modern system with 16-128 threads.  This can happen on highly
threaded systems, where you want want an ithread per core,
redistributing work to other queues, but also on virtualised systems
where hardware hashing is (or is not) available, but only a single
queue has been directed to one VCPU on a VM.

Note: this adds a previously non-present assertion about the
equivalence of the ifnet from which the packet is received, and the
ifnet stamped in the mbuf header.  I believe this assertion to
generally be true, but we'll find out soon -- if it's not, we might
have to add additional overhead in some cases to add an m_tag with
the originating ifnet pointer stored in it.

Reviewed by:    bz
MFC after:      3 weeks
Sponsored by:   Juniper Networks, Inc.
2011-06-01 20:00:25 +00:00
Nathan Whitehorn
d098f93019 On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by:	jhb
2011-05-31 15:11:43 +00:00
Robert Watson
f2d2d69438 Rework netisr policy mechanism so that per-protocol dispatch policies can
be represented:

- A single policy namespace is defined, consisting of four possible
  policies: "default" to use the global default, "deferred" to force
  deferred dispatch, "direct" to employ direct dispatch where possible, and
  "hybrid" which makes a dynamic decision based on CPU affinity, ordering,
  etc.  Routines are implemented to convert between strings and an integer
  namespace.

- A new global variable, netisr_dispatch_policy, subsumes existing global
  variables for direct dispatch, forced direct dispatch, etc, and is used
  for explicit policy interpretation and composition.  Old variables remain
  so that they can be exported by legacy sysctls for use by old netstat(1)
  binaries.  A new sysctl and tunable, netisr.dispatch.policy, accepts the
  above strings for specifying a global policy default.

- The protocol registration structure, netisr_handler, grows an nh_dispatch
  field, which accepts a per-policy policy override.  The default value is
  '0', which corresponds to "default", meaning that protocols will accept
  the global default policy unless otherwise specified.

- Policies are now interpreted and composed explicitly at various points in
  packet dispatch; protocol policies override global policies.

- Protocols grow the ability to express a non-opinion about affinity even
  when implenting m2cpuid by returning NETISR_CPUID_NONE.  In that case, the
  framework falls back on source ordering, rather than simply using the
  current CPU.

These changes are in support of allowing link layer re-dispatch based on
RSS or similar hashes provided by NICs, especially in the case where the
number of hardware receive queues matches hardware core count, rather than
hardware thread count, requiring further software redistributeon.  (i.e.,
on RMI XLR).

MFC after:      3 weeks
Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-24 12:34:19 +00:00
Marko Zec
9f8cab7fc2 Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured
on top of epair(4) virtual interfaces, since there's no physical
hardware associated with epair interfaces which would imply any
constraints on MTU sizes.

MFC after:	3 days
2011-05-24 08:02:55 +00:00
Marko Zec
2dccdd4562 Let epair(4) virtual interfaces report fake link / media status,
by borrowing the skeleton of if_media manipulation and reporting
code from if_lagg(4).  The main motivation behind this change is
to allow for epair(4) interfaces to participate in STP if_bridge(4)
configurations.

Reviewed by:	bz
MFC after:	3 days
2011-05-24 07:57:28 +00:00
Qing Li
5b84dc789a The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by:	delphij
MFC after:	5 days
2011-05-20 19:12:20 +00:00
Marius Strobl
d09c5f16b0 - Add 10baseT as an alias for 10baseT/UTP.
- Add shorthand aliases for common media+option combinations as announced
  by miibus(4) so that one can actually supply the media strings found in
  the dmesg output to ifconfig(8).

Obtained from:	NetBSD (in principle)
MFC after:	2 weeks
2011-05-15 12:58:29 +00:00
Pyun YongHyeon
d2d0470dc6 Fix white space nits and style 2011-05-06 20:46:29 +00:00
Pyun YongHyeon
26b8066bce Do not increment collision counter if transmit have failed.
Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it
has nothing to do with collision.

Reported by:	Zeus V Panchenko (zeus <> ibs dot dn dot ua)
2011-05-06 20:37:07 +00:00
Andrew Thompson
627cecc5c9 LACP frames must not be send VLAN-tagged, check for that before processing.
PR:		kern/156743
Submitted by:	Dmitrij Tejblum
MFC after:	1 week
2011-04-30 20:34:52 +00:00
Bjoern A. Zeeb
a0ae8f04e8 Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs.  For module builds the framework needs
adjustments for at least carp.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:30:44 +00:00
Gleb Smirnoff
4c506522c1 When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.

Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.
2011-04-04 07:45:08 +00:00
Jeff Roberson
e4cd31dd3c - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
Dmitry Chagin
2093339ead Remove dead code.
MFC after:	1 Week
2011-03-20 08:35:00 +00:00
Dmitry Chagin
e579f1c1cf ouch, newrt is used on the return path, my fault.
Partialy revert the previous change.

MFC after:	1 Week.
2011-03-19 21:10:57 +00:00
Dmitry Chagin
523e60025b A bit rearranged rtalloc1_fib() code.
Initialize a variable when it is really needed.
To avoid code duplication move the miss label to line up and jump on it.

MFC after:	1 Week
2011-03-19 19:50:36 +00:00
Dmitry Chagin
6a873ef717 Remove a now unused variable.
MFC after:	1 Week
2011-03-19 16:52:06 +00:00
Ermal Luçi
5f82cfdf6c Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none.
Approved by:	thompsa(mentor)
MFC after:	1 week
2011-03-04 20:37:38 +00:00
Bjoern A. Zeeb
e3416ab0c0 Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4))
from processes inside jails if the addresses do not belong to the jail.

Originally reported by: Pieter de Boer via remko
PR:		kern/151119
Tested by:	Piotr KUCHARSKI (nospam 42.pl) [gif]
MFC after:	1 week
2011-03-02 21:39:08 +00:00
Rebecca Cran
6bccea7c2b Fix typos - remove duplicate "the".
PR:	bin/154928
Submitted by:	Eitan Adler <lists at eitanadler.com>
MFC after: 	3 days
2011-02-21 09:01:34 +00:00
Bjoern A. Zeeb
1fb51a12f2 Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks
2011-02-16 21:29:13 +00:00
Bjoern A. Zeeb
144e6203ff Mfp4 CH=177255:
Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match
  the call order of the VNET_DEBUG case.

  Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that
  INVARIANTS will still catch problems.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb

MFC after:	2 weeks
2011-02-11 14:17:58 +00:00
Bjoern A. Zeeb
0028e52461 Mfp4 CH=177255:
Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

  Change the syntax to match KASSERT() to allow more flexible panic
  messages rather than having a printf with hardcoded arguments
  before panic.

  Adjust the few assertions we have to the new format (and enhance
  the output).

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:	jhb

MFC after:	2 weeks
2011-02-11 13:27:00 +00:00
Bjoern A. Zeeb
6cf986ac19 Mfp4 CH=177255:
Use __func__ rather than __FUNCTION__.

MFC after:	2 weeks
2011-02-11 12:56:05 +00:00
Max Laier
826bf287b5 As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm
until rt_dispatch is done with the sockaddr.

Found by:	memguard
MFC after:	3 days
2011-02-10 01:24:09 +00:00
John Baldwin
5f3b301a43 Fix a LOR by dropping the global ifnet locks while allocating a new ifnet
table in if_grow().  The order of the SYSINIT's for ifnet state were swapped
so that the various locks were initialized before being used.

Reviewed by:	pluknet, bz
MFC after:	2 weeks
2011-01-24 22:21:58 +00:00
Matthew D Fleming
f8e4b4ef49 sysctl(8) should use the CTLTYPE to determine the type of data when
reading.  (This was already done for writing to a sysctl).  This
requires all SYSCTL setups to specify a type.  Most of them are now
checked at compile-time.

Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).

Succested by:	bde
2011-01-19 17:04:07 +00:00
Matthew D Fleming
f88910cdf5 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the net* piece.
2011-01-12 19:53:50 +00:00
John Baldwin
58ccf5b41c Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by:	bde
2011-01-11 13:59:06 +00:00
Bjoern A. Zeeb
9269189f98 MfP4 CH=185246 [1]:
Add FEATURE() to announce optional VIMAGE.

MFC after:	3 days
[1] for the moment put it in vnet.c.
2011-01-09 20:40:21 +00:00
John Baldwin
a8f4344f08 - Restore dropping the priority of syncer down to PPAUSE when it is idle.
This was lost when it was converted to using a condition variable instead
  of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
  since it is a similar background task.

MFC after:	2 weeks
2011-01-06 22:17:07 +00:00
Marius Strobl
a0fc3825c3 Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD
counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for
"flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback"
for "hw-loopback".

MFC after:	1 week
2011-01-05 15:28:30 +00:00
Marius Strobl
8b28e7e1a3 Fix whitespace.
MFC after:	1 week
2011-01-05 14:51:04 +00:00
Bjoern A. Zeeb
962be6dfb3 Use NULL rather than 0 to invalidate a pointer.
Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(),
call it directly (like we do for the RT_* macros).

Sponsored by:	ISPsystem [1]
Reviewed by:	julian [1]
MFC After:	1 week

[1] Early 2010.
2010-12-31 21:57:54 +00:00
Bjoern A. Zeeb
c9a2711a54 Print the vnet pointer under DDB when iterating over flowtables of each
virtual network stack instance.

Sponsored by:	ISPsystem [1]
Reviewed by:	julian [1]
MFC after:	1 week

[1] Early 2010.
2010-12-31 21:20:32 +00:00
Bjoern A. Zeeb
f0a56b0678 Move the increment operation under the lock and split the condition
variable into two so that we can see on which one we are waiting.
This might also more properly propagate the update of the
flowclean_cycles flag and avoid "hangs" people were seeing.

Suggested by:	rwatson [1]
Sponsored by:	ISPsystem [1]
Reviewed by:	julian [1]
Updated by:	Mikolaj Golub (to.my.trociny gmail.com)
Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC After:	1 week

[1] Early 2010, initial version.
2010-12-31 21:06:52 +00:00
Alan Cox
82de724fe1 Introduce and use a new VM interface for temporarily pinning pages. This
new interface replaces the combined use of vm_fault_quick() and
pmap_extract_and_hold() throughout the kernel.

In collaboration with:	kib@
2010-12-25 21:26:56 +00:00
Weongyo Jeong
c5649739a5 Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through
ioctl(2).
2010-12-07 20:31:04 +00:00
Weongyo Jeong
6e3cb00068 Introduces IFF_CANTCONFIG interface flag to point that the interface
isn't configurable in a meaningful way.  This is for ifconfig(8) or
other tools not to change code whenever IFT_USB-like interfaces are
registered at the interface list.

Reviewed by:	brooks
No objections:	gavin, jkim
2010-12-07 20:23:47 +00:00
Maxim Konovalov
57542d0481 o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize.
PR:		misc/152531
MFC after:	1 week
2010-11-24 05:50:19 +00:00
Marko Zec
ccf7ba972c Allow for vlan(4) ifnets to have overlapping unit numbers if they are
created in separated vnets.  As a side-effect of having a separated
if_cloner instance for each vnet, all vlan ifnets created in a vnet
will be automatically destroyed when vnet teardown is initiated.

Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are
associated with physical ifnets residing in parent vnets.

This is an interim vlan-specific solution which will be superseded by a
more generic if_cloner V_irtualization change from p4.  For nooptions
VIMAGE builds, this should be a no-op change.

Discussed with:	bz
MFC after:	3 days
2010-11-22 23:35:29 +00:00
Dimitry Andric
3e288e6238 After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files.  A better long-term solution is
still being considered.  This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
2010-11-22 19:32:54 +00:00
Bjoern A. Zeeb
2c8b047c07 Add a missing ';' and change the debugging sysctl from xint to int.
Submitted by:		Mikolaj Golub (to.my.trociny gmail.com)
MFC after:		3 days
2010-11-21 19:33:19 +00:00
Dimitry Andric
c3adda9fc3 Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
2010-11-14 20:40:55 +00:00
Dimitry Andric
31c6a0037e Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
2010-11-14 20:38:11 +00:00
Dimitry Andric
47d46d92c2 Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE. 2010-11-14 20:23:02 +00:00
Marius Strobl
efd4fc3fb3 o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control
support in mii(4):
  - Merge generic flow control advertisement (which can be enabled by
    passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
    NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
    IFM_FLOW isn't implemented as a global option via the "don't care
    mask" but instead as a media specific option this. This has the
    following advantages:
    o allows flow control advertisement with autonegotiation to be
      turned on and off via ifconfig(8) with the default typically
      being off (though MIIF_FORCEPAUSE has been added causing flow
      control to be always advertised, allowing to easily MFC this
      changes for drivers that previously used home-grown support for
      flow control that behaved that way without breaking POLA)
    o allows to deal with PHY drivers where flow control advertisement
      with manual selection doesn't work or at least isn't implemented,
      like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
      by setting MIIF_NOMANPAUSE
    o the available combinations of media options are readily available
      from the `ifconfig -m` output
  - Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
    and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
    these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
  - Change IFM_ETH_MASTER from being implemented as a global option via
    the "don't care mask" to a media specific one as it actually is only
    applicable to IFM_1000_T to date.
  - Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
    actually configure manually selected slave mode (like we also do in
    the PHY specific implementations).
  - Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
    is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
  e1000phy(4) and ip1000phy(4) to use the generic flow control support
  instead of home-grown solutions via IFM_FLAGs. This includes changing
  these PHY drivers and smcphy(4) to no longer unconditionally advertise
  support for flow control but only if the selected media has IFM_FLOW
  set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
  i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
  set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
  and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
  the contents of the BMSR via mii_phy_add_media() instead of hardcoding
  them. The latter approach seems to have developed historically, besides
  causing unnecessary code duplication it was also undesirable because
  brgphy_mii_phy_auto() already based the capability advertisement on the
  contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
  just BCM5701. Apparently this was a misinterpretation of a workaround
  in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
  BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
  this doesn't mean we can't set these as well on other PHYs for manual
  media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
  IFM_1000_T master mode support now is generally available with all PHY
  drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
  not applicable there.

Reviewed by:	yongari (plus additional testing)
Obtained from:	NetBSD (partially), OpenBSD (partially)
MFC after:	2 weeks
2010-11-14 13:26:10 +00:00
Konstantin Belousov
7b3b099e07 Use 'z' modifier for size_t printing. 2010-11-13 11:11:51 +00:00
Dimitry Andric
7e54af0831 Similar to r212647, remove the workaround in sys/net/vnet.h for an ld
bug (incorrect placement of __start_SECNAME in some cases) that was
fixed in r210245.

There is already an UPDATING entry about needing a recent ld.

MFC after:	1 month
2010-11-12 22:59:50 +00:00
George V. Neville-Neil
e162ea60d4 Add a queue to hold packets while we await an ARP reply.
When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry.  This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out.  The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by:	bz, rpaulo
MFC after: 3 weeks
2010-11-12 22:03:02 +00:00
Dimitry Andric
4403994d7d Use the same treatment as in linker_set.h for the __start and __stop
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.

Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.

Reviewed by:	kib
2010-11-11 19:18:52 +00:00
Rui Paulo
09b6dcf968 Sync DLTs with the latest pcap version. 2010-10-29 18:41:09 +00:00
Bjoern A. Zeeb
a38de0134b Factor out DDB commands from r204145, r204279 into if_debug.c for further
enhancements (1).  Switch to a standard 2-clause BSD license for this (2).

Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.

Suggested by:	rwatson (1) a while back.
Approved by:	thompsa (2) for the change from r204279.
MFC after:	6 days
2010-10-25 08:30:19 +00:00
Sergey Kandaurov
9af74f3d68 Reshuffle SIOCGIFCONF32 handler from r155224.
- move all the chunks into one file, which allows to hide SIOCGIFCONF32
  global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
  doing internal 32bit specific changes within it (e.g. as it's done for
  SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.

Approved by:	kib (mentor)
MFC after:	2 weeks
2010-10-21 16:20:48 +00:00
Bjoern A. Zeeb
ee7c7fee94 Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.

PR:		kern/146250
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	1 week
2010-10-16 19:25:27 +00:00
Bjoern A. Zeeb
fc2bfb3294 lltable_drain() has never been used so far, thus #if 0 it for now.
While touching it add the missing locking to the now disabled code
for the time when we'll resurrect it.

MFC after:	3 days
2010-10-16 18:42:09 +00:00
Bjoern A. Zeeb
b6b8c0779d Only hide the ifa and not the tp under #ifdef INET as the tp is needed
for locking evenwhen there is no INET.

MFC after:	3 days
2010-10-01 15:14:14 +00:00
John Baldwin
24f481fde2 - Expand scope of tun/tap softc locks to cover more softc fields and
driver-maintained ifnet fields (such as if_drv_flags).
- Use soft locks as the mutex that protects each interface's knote list
  rather than using the global knote list lock.  Also, use the softc
  for kn_hook instead of the cdev.
- Use mtx_sleep() instead of tsleep() when blocking in the read routines.
  This fixes a lost wakeup race.
- Remove D_NEEDGIANT now that the cdevsw routines use the softc lock
  where locking is needed.
- Lock IFQ when calculating the result for FIONREAD in tap(4).  tun(4)
  already did this.
- Remove remaining spl calls.

Submitted by:	Marcin Cieslak  saper of saper|info (3)
MFC after:	2 weeks
2010-09-22 21:02:43 +00:00
Jung-uk Kim
d0d7bcdf92 Fix a typo in a comment.
Submitted by:	afiveg
2010-09-16 18:37:33 +00:00
Matthew D Fleming
4d369413e1 Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function.  While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.
2010-09-10 16:42:16 +00:00
Bjoern A. Zeeb
73e39d6137 MFp4 CH=183259:
No reason to use if_free_type() as we don't change our type.
  Just if_free() is fine.

MFC after:	3 days
2010-09-02 16:11:12 +00:00
Ed Maste
be4572c896 Add a sysctl knob to accept input packets on any link in a failover lagg. 2010-09-01 16:53:38 +00:00
Bjoern A. Zeeb
c749353940 MFp4 CH=182972:
Add explicit linkstate UP/DOWN for the epair.  This is needed by carp(4)
and other things to work.

MFC after:	5 days
2010-08-27 23:22:58 +00:00
Rui Paulo
79856499bd Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by:	The FreeBSD Foundation
Discussed with:	rwaston [1]
2010-08-22 11:18:57 +00:00
Marko Zec
d3c351c50f When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by:	julian
MFC after:	3 days
2010-08-13 18:17:32 +00:00
Will Andrews
9963e8a52c Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by:	bz
Approved by:	ken (mentor)
2010-08-11 20:18:19 +00:00
Will Andrews
54bfbd5153 Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default.  Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by:	bz, simon
Approved by:	ken (mentor)
MFC after:	2 weeks
2010-08-11 00:51:50 +00:00
John Baldwin
3ba24fde11 Adjust the interface type in the link layer socket address for vlan(4)
interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface
(IFT_ETHER).  The code already fixed if_type in the ifnet causing some
places to report the interface as a vlan (e.g. arp -a output) and other
places to report the interface as Ethernet (getifaddrs(3)).  Now they
should all report IFT_L2VLAN.

Reviewed by:	brooks
MFC after:	1 month
2010-08-06 15:15:26 +00:00
Konstantin Belousov
04f3205755 Properly set ifi_datalen for compat32 struct if_data32.
PR:	kern/149240
Submitted by:	Stef Walter <stef memberwebs com>
MFC after:	1 weeks
2010-08-03 15:40:42 +00:00
Gleb Smirnoff
b17f26b00c Don't check malloc(M_WAITOK) result. 2010-07-27 11:56:49 +00:00
Bjoern A. Zeeb
cd292f1264 Return NULL rather than 0 for a pointer.
MFC after:	3 days
2010-07-27 11:54:01 +00:00
Gleb Smirnoff
85011246ac When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.

This lead to losing LLE_PUB and/or LLE_PROXY flags.

We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.

PR:		kern/148784, kern/146539
Silence from:	qingli, 5 days
2010-07-27 10:05:27 +00:00
Jung-uk Kim
82040afcf3 Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock
regardless of the given flags.

MFC after:	3 days
2010-07-22 18:44:40 +00:00
Luigi Rizzo
1f6ad072ea whitespace cleanup 2010-07-15 14:41:59 +00:00
Luigi Rizzo
b62cb72c48 small portability fix to build on linux/windows 2010-07-15 14:41:06 +00:00
Jung-uk Kim
547d94bde3 Implement flexible BPF timestamping framework.
- Allow setting format, resolution and accuracy of BPF time stamps per
listener.  Previously, we were only able to use microtime(9).  Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command.  Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts.  bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp.  The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries.  On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long.  On 32-bit platforms, the size may
increase by 8 bytes.  For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested.  However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver.  Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by:	net@
2010-06-15 19:28:44 +00:00
John Baldwin
3aa6d94e0c Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
Marko Zec
b1ae592bd4 Provide a macro for registering a virtualized sysctl handler for
VNET opaque data.

MFC after:	30 days
2010-06-02 15:29:21 +00:00
Qing Li
0ed6142b31 This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after:	3 days
2010-05-25 20:42:35 +00:00
John Baldwin
6f359e2828 Ignore failures from removing multicast addresses from the parent (trunk)
interface when tearing down a vlan interface.  If a trunk interface is
detached, all of its multicast addresses are removed before the ifnet
departure eventhandlers are invoked.  This means that all of the multicast
addresses are removed before the vlan interfaces are removed which causes
the if_delmulti() calls in the vlan teardown to fail.

In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer
valid parent interface.  In the !VLAN_ARRAY case, the eventhandler gets
stuck in an infinite loop retrying vlan_unconfig_locked() forever.  In
general the callers of vlan_unconfig_locked() do not expect nor handle
failure, so I believe it is safer to ignore the errors and tear down as
much of the vlan state as possible.

Silence from:	net@
MFC after:	4 days
2010-05-17 19:36:56 +00:00
Kip Macy
83e711ec14 allocate ipv6 flows from the ipv6 flow zone
reported by: rrs@

MFC after:	3 days
2010-05-16 21:48:39 +00:00