Commit Graph

2598 Commits

Author SHA1 Message Date
Konstantin Belousov
23dfb3511d MFC r207195:
Provide compat32 shims for bpf(4), except zero-copy facilities.
2010-05-09 12:36:51 +00:00
Konstantin Belousov
b0af835660 MFC r207194:
Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST.
2010-05-09 12:34:20 +00:00
Bjoern A. Zeeb
480d7c6c41 MFC r207369:
MFP4: @176978-176982, 176984, 176990-176994, 177441

  "Whitspace" churn after the VIMAGE/VNET whirls.

  Remove the need for some "init" functions within the network
  stack, like pim6_init(), icmp_init() or significantly shorten
  others like ip6_init() and nd6_init(), using static initialization
  again where possible and formerly missed.

  Move (most) variables back to the place they used to be before the
  container structs and VIMAGE_GLOABLS (before r185088) and try to
  reduce the diff to stable/7 and earlier as good as possible,
  to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

  This also removes some header file pollution for putatively
  static global variables.

  Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
  no longer needed.

  Reviewed by:	jhb
  Discussed with:	rwatson
  Sponsored by:	The FreeBSD Foundation
  Sponsored by:	CK Software GmbH
2010-05-06 06:44:19 +00:00
Bjoern A. Zeeb
6419e07f19 MFC r207278:
MFP4: @177254

  Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
  leaking the currently cached vnet into callers and to the process.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
2010-05-02 16:39:15 +00:00
Xin LI
0919a8fb99 MFC r206637:
When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland.  However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by:	bschmidt
2010-04-28 00:49:24 +00:00
Bjoern A. Zeeb
b6a02e249f MFC r206488:
Take a reference to make sure that the interface cannot go away during
  if_clone_destroy() in case parallel threads try to.

PR:		kern/116837
Submitted by:	Mikolaj Golub (to.my.trociny gmail.com)
2010-04-21 20:00:13 +00:00
Bjoern A. Zeeb
984c5c6804 MFC r206486:
Check that the interface is on the list of cloned interfaces before trying
  to remove it to avoid panics in case of two threads trying to remove it in
  parallel.

PR:	      kern/116837
Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version)
2010-04-21 19:55:43 +00:00
Bjoern A. Zeeb
feb3a5f7df MFC r206481:
Plug reference leaks in the link-layer code ("new-arp") that previously
  prevented the link-layer entry from being freed.

  In both in.c and in6.c (though that code path seems to be basically dead)
  plug a reference leak in case of a pending callout being drained.

  In if_ether.c consistently add a reference before resetting the callout
  and in case we canceled a pending one remove the reference for that.
  In the final case in arptimer, before freeing the expired entry, remove
  the reference again and explicitly call callout_stop() to clear the active
  flag.

  In nd6.c:nd6_free() we are only ever called from the callout function and
  thus need to remove the reference there as well before calling into
  llentry_free().

  In if_llatbl.c when freeing the entire tables make sure that in case we
  cancel a pending callout to remove the reference as well.

  Reviewed by:          qingli (earlier version)
  MFC after:            10 days
  Problem observed, patch tested by: simon on ipv6gw.f.o,
                        Christian Kratzer (ck cksoft.de),
                        Evgenii Davidov (dado korolev-net.ru)
PR:			kern/144564
Configurations still affected:	with options FLOWTABLE
2010-04-21 19:51:22 +00:00
Bjoern A. Zeeb
1ed532bb3d MFC r206470:
In if_detach_internal() we cannot hold the af_data lock over the
  dom_ifdetach() calls as they might sleep for callout_drain().
  Do as we do in if_attachdomain1() [r121470] and handle
  if_afdata_initialized earlier and call dom_ifdetach() unlocked.

  Discussed with:       rwatson
2010-04-21 19:48:40 +00:00
Bjoern A. Zeeb
b0cf9f5f20 MFC r206469:
In if_detach_internal() only try to do the detach run if if_attachdomain1()
  has actually succeeded to initialize and attach.  There is a theoretical
  possibility to drop out early in if_attachdomain1() leaving the array
  uninitialized if we cannot get the lock.

  Discussed with:       rwatson
2010-04-21 19:47:19 +00:00
Bjoern A. Zeeb
407b19379c MFC r205345:
Split eventhandler_register() into an internal part and a wrapper function
  that provides the allocated and setup eventhandler entry.

  Add a new wrapper for VIMAGE that allocates extra space to hold the
  callback function and argument in addition to an extra wrapper function.
  While the wrapper function goes as normal callback function the
  argument points to the extra space allocated holding the original func
  and arg that the wrapper function can then call.

  Provide an iterator function for the virtual network stack (vnet) that
  will call the callback function for each network stack.

  Provide a new set of macros for VNET that in the non-VIMAGE case will
  just call eventhandler_register() while in the VIMAGE case it will use
  vimage_eventhandler_register() passing in the extra iterator function
  but will only register once rather than per-vnet.
  We need a special macro in case we are interested in the tag returned
  as we must check for curvnet and can neither simply assign the
  return value, nor not change it in the non-vnet0 case without that.

  Discussed with:       jhb
  Reviewed by:  zec (earlier version), jhb
2010-04-21 19:45:41 +00:00
Xin LI
e546195f07 MFC r204901
Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg
interface.  The check itself seems to be coming from OpenBSD but does not
seem to be useful for our code.

Discussed with:	thomasa
2010-04-08 00:52:28 +00:00
Jung-uk Kim
29f7dafb4c MFC: r205095
Fix a style(9) nit.
2010-04-05 17:37:35 +00:00
Jung-uk Kim
7493cc345a MFC: r205858
Check the pointer to JIT binary filter before its de-allocation.
2010-04-05 17:32:49 +00:00
Qing Li
94190b3925 MFC 205222
Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.

Also updated the ifconfig utility to show the LINKSTATE
capability if present.

Reviewed by:  rwatson, imp, juli
2010-04-02 05:12:46 +00:00
Qing Li
243785f92f MFC 205024
The if_tap interface is of IFT_ETHERNET type, but it
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.

Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.

This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.
2010-04-02 05:05:51 +00:00
Qing Li
c951da56b4 MFC 204902
One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.
2010-04-02 05:02:50 +00:00
Qing Li
01104d862b MFC 205077
The flow-table module retrieves the destination and source
address as well as the transport protocol port information
from the outbound packets. The routing code is generic and
compares every byte in the given sockaddr object. Therefore
the temporary sockaddr objects must be cleared due to padding
bytes. In addition, the port information must be stripped
or the route search will either fail or return the incorrect
route entry.

Unit testing is done using OpenVPN over the if_tun interface.
2010-04-01 20:23:43 +00:00
Kip Macy
e952596a10 MFC 205066, 205069, 205093, 205097, 205488:
r205066:

Log:
 - restructure flowtable to support ipv6
 - add a name argument to flowtable_alloc for printing with ddb commands
 - extend ddb commands to print destination address or 4-tuples
 - don't parse ports in ulp header if FL_HASH_ALL is not passed
 - add kern_flowtable_insert to enable more generic use of flowtable
   (e.g. system calls for adding entries)
 - don't hash loopback addresses
 - cleanup whitespace
 - keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
 - add sysctls to accumulate stats and report aggregate

r205069:
Log:
 fix stats reporting sysctl

r205093:
Log:
 re-update copyright to 2010
 pointed out by danfe@

r205097:

Log:
 flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB

 pointed out by jkim@

r205488:

Log:
 - boot-time size the ipv4 flowtable and the maximum number of flows
 - increase flow cleaning frequency and decrease flow caching time
   when near the flow limit
 - stop allocating new flows when within 3% of maxflows don't start
   allocating again until below 12.5%
2010-04-01 00:36:40 +00:00
Ed Maste
83867e0aa4 MFC r205411:
Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA
  ioctl call, as it may sleep.

  Reviewed by:    rwatson
2010-03-29 00:08:58 +00:00
Bjoern A. Zeeb
0b93ad54a2 MFC r205276:
Add ddb support to the "new" link layer code ("new-arp"):
   - show all lltables [1] (optional flag to also show the llentries as well)
   - show lltable <struct lltable *>
   - show llentry <struct llentry *>
2010-03-27 17:52:56 +00:00
Bjoern A. Zeeb
a4f6889460 MFC r204805:
Rework reference counting in case we queue into the netisr,
  or overflow the netisr queue and fall back to the interface
  queue so that we can garuantee that the ifnet pointer stays
  valid.   Formerly we ended up with reference counts <= 0 in
  case the netisr had returned ENOBUFS.  The idea is to track
  any packet in the netisr queue and only change the refount
  on edge operations for the fallback interface queue. This
  also avoids problems in case the if_snd.ifq_len lies to us.

  Also rework refount assertions to make sure they trigger if
  we go below 1. Formerly a negative refence count did not
  trigger the assert as the refcount variable is u_int.
2010-03-27 17:48:13 +00:00
Bjoern A. Zeeb
1386abc0a7 MFC r204279:
Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
  in the db_show_all_table as 'show all ifnets' and with that follow the
  convention for showing complete lists.

  Submitted by: thompsa
2010-03-27 17:40:28 +00:00
Bjoern A. Zeeb
0519db7239 MFC r204145:
Start to implement ifnet DDB support:
  - 'show ifnets' prints a list of ifnet *s per virtual network stack,
  - 'show ifnet <struct ifnet *>' prints fields matching the given ifp.

  We do not yet print the complete set of fields and might want to
  factor this out to an extra if_debug.c file in case this grows
  a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].

  Suggested by: rwatson [1]
  Reviewed by:  rwatson
2010-03-27 17:39:02 +00:00
Bjoern A. Zeeb
9bdad32791 MFC r204142:
Enhance a panic string to contain more useful debugging information.
2010-03-27 17:33:19 +00:00
Bjoern A. Zeeb
78ba8b295c MFC r203729:
Add DDB support for printing vnet_sysinit and vnet_sysuninit
  ordered call lists. Try to lookup function/symbol names and print
  those in addition to the pointers, along with the constants for
  subsystem and order.
  This is useful for debugging vnet teardown ordering issues.

  Make it possible to call the actual printing frunction from normal
  code at runtime, ie. from vnet_sysuninit(), if DDB support is there.
2010-03-27 17:31:54 +00:00
Bjoern A. Zeeb
72ec67fcb7 MFC r203727:
Add an SDT provider for "vnet"s along with probes for vnet_alloc
  and vnet_destroy.
  Use the line number rather than NULL as dummy argument.

  Note: the fbt provider does not reliably provide :return probes
  (depending on optimization levels used at compile time) making
  it unusable for scripts to generate complete call-traces with
  well defined boundaries over allocations or destructions of
  virtual network stacks.
2010-03-27 17:29:50 +00:00
Luigi Rizzo
8018e843a3 MFC of a large number of ipfw and dummynet fixes and enhancements
done in CURRENT over the last 4 months.
HEAD and RELENG_8 are almost in sync now for ipfw, dummynet
the pfil hooks and related components.

Among the most noticeable changes:
- r200855 more efficient lookup of skipto rules, and remove O(N)
  blocks from critical sections in the kernel;
- r204591 large restructuring of the dummynet module, with support
  for multiple scheduling algorithms (4 available so far)
See the original commit logs for details.

Changes in the kernel/userland ABI should be harmless because the
kernel is able to understand previous requests from RELENG_8 and
RELENG_7. For this reason, this changeset would be applicable
to RELENG_7 as well, but i am not sure if it is worthwhile.
2010-03-23 09:58:59 +00:00
Pyun YongHyeon
9832320129 MFC r204156:
Add __FBSDID.
2010-03-22 23:23:47 +00:00
Hiroki Sato
7fe6975097 MFC r203272:
- Fix a bug when adding an interface with an invalid MTU sets the
  bridge's MTU if it is the firstly-added one while the addition
  itself fails.

- Allow SIOCSIFMTU only when all members have the same MTU.

- Remove IFT_GIF check when defining the brige MTU by the
  firstly-added interface's one.  The MTU of the gif interface
  has to be the same as the bridge's one.
2010-03-22 22:07:19 +00:00
Jung-uk Kim
1288863fa2 MFC: r205092
Tidy up callout for select(2) and read timeout.

- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.

PR:		kern/144453
Submitted by:	Alexander Sack (asack at niksun dot com)[1]
2010-03-22 19:59:00 +00:00
Pyun YongHyeon
d5eda01f75 MFC r204149:
Add TSO support on VLANs. Intentionally separated IFCAP_VLAN_HWTSO
  from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to
  TSO over VLAN without VLAN hardware tagging.
  Driver changes and userland support will follow.
2010-03-18 19:04:04 +00:00
Max Laier
29f2c008fd MFC r203834 and r205197: Make ALTQ work for drbr consumers. 2010-03-18 17:00:44 +00:00
Konstantin Belousov
272a1b6901 MFC r204464:
Several fixes for miscellaneous clone handlers in if_tun and if_tap.
2010-03-07 09:52:35 +00:00
Xin LI
a5a931b33f MFC 203052:
Add interface description capability as inspired by OpenBSD.  Thanks for
rwatson@, jhb@, brooks@ and others for feedback to the old implementation!

Sponsored by:	iXsystems, Inc.
2010-02-26 00:54:47 +00:00
Jung-uk Kim
7cfd788d1a MFC: r204105
Return partially filled buffer for non-blocking read(2)
in non-immediate mode.

PR:		kern/143855
Submitted by:	Guy Harris (guy at alum dot mit dot edu)
2010-02-26 00:11:17 +00:00
Matt Jacob
7733cf8fff MFC a number of changes from head for ISP (203478,203463,203444,202418,201758,
201408,201325,200089,198822,197373,197372,197214,196162). Since one of those
changes was a semicolon cleanup from somebody else, this touches a lot more.
2010-02-11 18:34:06 +00:00
Marko Zec
7526c9dfc7 MFC r203483:
Instead of spamming the console on each curvnet recursion event, print
  out each such call graph only once, along with a stack backtrace.  This
  should make kernels built with VNET_DEBUG reasonably usable again in
  busy / production environments.

  Introduce a new DDB command "show vnetrcrs" which dumps the whole log
  of distinctive curvnet recursion events.  This might be useful when
  recursion reports get burried / lost too deep in the message buffer.
  In the later case stack backtraces are not available.

  Reviewed by:  bz
2010-02-10 08:50:06 +00:00
Julian Elischer
2ae7ec29fd MFC of 197952 and 198075
Virtualize the pfil hooks so that different jails may chose different
    packet filters. ALso allows ipfw to be enabled on on ejail and disabled
    on another. In 8.0 it's a global setting.
and
    Unbreak the VIMAGE build with IPSEC, broken with r197952 by
    virtualizing the pfil hooks.
    For consistency add the V_ to virtualize the pfil hooks in here as well.
2010-02-07 09:00:22 +00:00
Shteryana Shopova
3ddba6330c MFC r202935:
While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR:		kern/142391, kern/142392
Reviewed by:	sam, rpaulo, bms
2010-01-31 11:30:28 +00:00
George V. Neville-Neil
fbbbfe0ba5 MFC r196797:
Add ARP statistics to the kernel and netstat.
2010-01-28 16:48:44 +00:00
Brooks Davis
52c240aaf4 MFC r201350:
The devices that supported EVFILT_NETDEV kqueue filters were removed in
  r195175.  Remove all definitions, documentation, and usage.

The change of function signature for vlan_link_state() was not merged to
maintain the ABI.
2010-01-22 19:51:34 +00:00
Bjoern A. Zeeb
cd10550438 MFC r201995:
Correct a typo.
2010-01-17 13:38:11 +00:00
Qing Li
130fd3bc32 MFC r201319
Remove a deleted comment line that was brought back by
my previous commit.
2010-01-05 22:37:05 +00:00
Qing Li
32c5340155 MFC r201282, r201543
r201282
-------
The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

r201543
-------
The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.

PR:		ports/141342, kern/141134
2010-01-05 22:14:55 +00:00
John Baldwin
02bcb7ecc3 MFC 201196:
Change vlan interfaces to cope more usefully with the parent interface being
renamed.  Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed.  Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
  renamed.  This flag can be checked in ifnet departure/arrival event
  handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
  ignore departure events due to a trunk interface being renamed.
2010-01-05 18:25:41 +00:00
John Baldwin
eee4cfb98f MFC 201351:
Use stricter checking to match possible vlan clones by not allowing extra
garbage characters around or within the tag.
2010-01-04 22:44:48 +00:00
Bjoern A. Zeeb
950cde5085 MFC r200473:
Throughout the network stack we have a few places of
        if (jailed(cred))
  left.  If you are running with a vnet (virtual network stack) those will
  return true and defer you to classic IP-jails handling and thus things
  will be "denied" or returned with an error.

  Work around this problem by introducing another "jailed()" function,
  jailed_without_vnet(), that also takes vnets into account, and permits
  the calls, should the jail from the given cred have its own virtual
  network stack.

  We cannot change the classic jailed() call to do that,  as it is used
  outside the network stack as well.

  Discussed with:       julian, zec, jamie, rwatson (back in Sept)
2009-12-28 14:40:58 +00:00
Robert Watson
19c576c8b2 Merge r198417 from head to stable/8:
Remove unneeded blank line from bpf_drvinit().
2009-12-14 11:45:53 +00:00
Michael Tuexen
cf19fced17 MFC 197288,197326,197327,197328,197342,197914,197929,
197955,199365,199370,199371,199373,199866
This MFCs all SCTP/VNET relevant fixes from head.

Approved by: rrs (mentor)
2009-12-07 07:33:51 +00:00