2005-01-07 01:45:51 +00:00
|
|
|
/*-
|
1994-05-24 10:09:53 +00:00
|
|
|
* Copyright (c) 1982, 1986, 1988, 1993
|
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
|
|
|
* @(#)ip_icmp.c 8.2 (Berkeley) 1/4/94
|
|
|
|
*/
|
|
|
|
|
2007-10-07 20:44:24 +00:00
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2011-04-27 19:36:35 +00:00
|
|
|
#include "opt_inet.h"
|
1999-12-22 19:13:38 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/mbuf.h>
|
|
|
|
#include <sys/protosw.h>
|
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/time.h>
|
|
|
|
#include <sys/kernel.h>
|
1995-03-16 18:17:34 +00:00
|
|
|
#include <sys/sysctl.h>
|
2010-08-14 21:04:27 +00:00
|
|
|
#include <sys/syslog.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <net/if.h>
|
2013-10-26 17:58:36 +00:00
|
|
|
#include <net/if_var.h>
|
2001-09-25 18:40:52 +00:00
|
|
|
#include <net/if_types.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <net/route.h>
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
#include <net/vnet.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
#include <netinet/in.h>
|
2003-11-20 20:07:39 +00:00
|
|
|
#include <netinet/in_pcb.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <netinet/in_systm.h>
|
|
|
|
#include <netinet/in_var.h>
|
|
|
|
#include <netinet/ip.h>
|
|
|
|
#include <netinet/ip_icmp.h>
|
1995-03-16 18:17:34 +00:00
|
|
|
#include <netinet/ip_var.h>
|
2005-11-18 20:12:40 +00:00
|
|
|
#include <netinet/ip_options.h>
|
2003-11-20 20:07:39 +00:00
|
|
|
#include <netinet/tcp.h>
|
|
|
|
#include <netinet/tcp_var.h>
|
|
|
|
#include <netinet/tcpip.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <netinet/icmp_var.h>
|
|
|
|
|
2011-04-27 19:36:35 +00:00
|
|
|
#ifdef INET
|
2002-10-16 02:25:05 +00:00
|
|
|
|
2000-05-06 18:19:58 +00:00
|
|
|
#include <machine/in_cksum.h>
|
|
|
|
|
2006-10-22 11:52:19 +00:00
|
|
|
#include <security/mac/mac_framework.h>
|
2011-04-27 19:36:35 +00:00
|
|
|
#endif /* INET */
|
2006-10-22 11:52:19 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* ICMP routines: error generation, receive packet processing, and
|
|
|
|
* routines to turnaround packets back to the originator, and
|
|
|
|
* host table maintenance routines.
|
|
|
|
*/
|
2011-04-27 19:36:35 +00:00
|
|
|
static VNET_DEFINE(int, icmplim) = 200;
|
|
|
|
#define V_icmplim VNET(icmplim)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, ICMPCTL_ICMPLIM, icmplim, CTLFLAG_VNET | CTLFLAG_RW,
|
2011-04-27 19:36:35 +00:00
|
|
|
&VNET_NAME(icmplim), 0,
|
|
|
|
"Maximum number of ICMP responses per second");
|
|
|
|
|
|
|
|
static VNET_DEFINE(int, icmplim_output) = 1;
|
|
|
|
#define V_icmplim_output VNET(icmplim_output)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, icmplim_output, CTLFLAG_VNET | CTLFLAG_RW,
|
2011-04-27 19:36:35 +00:00
|
|
|
&VNET_NAME(icmplim_output), 0,
|
2012-01-07 00:11:36 +00:00
|
|
|
"Enable logging of ICMP response rate limiting");
|
2011-04-27 19:36:35 +00:00
|
|
|
|
|
|
|
#ifdef INET
|
2013-07-09 09:50:15 +00:00
|
|
|
VNET_PCPUSTAT_DEFINE(struct icmpstat, icmpstat);
|
|
|
|
VNET_PCPUSTAT_SYSINIT(icmpstat);
|
|
|
|
SYSCTL_VNET_PCPUSTAT(_net_inet_icmp, ICMPCTL_STATS, stats, struct icmpstat,
|
|
|
|
icmpstat, "ICMP statistics (struct icmpstat, netinet/icmp_var.h)");
|
|
|
|
|
|
|
|
#ifdef VIMAGE
|
|
|
|
VNET_PCPUSTAT_SYSUNINIT(icmpstat);
|
|
|
|
#endif /* VIMAGE */
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(int, icmpmaskrepl) = 0;
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_icmpmaskrepl VNET(icmpmaskrepl)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, ICMPCTL_MASKREPL, maskrepl, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(icmpmaskrepl), 0,
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
"Reply to ICMP Address Mask Request packets.");
|
2003-03-21 15:43:06 +00:00
|
|
|
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(u_int, icmpmaskfake) = 0;
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_icmpmaskfake VNET(icmpmaskfake)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_UINT(_net_inet_icmp, OID_AUTO, maskfake, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(icmpmaskfake), 0,
|
|
|
|
"Fake reply to ICMP Address Mask Request packets.");
|
1995-11-14 20:34:56 +00:00
|
|
|
|
2012-10-10 19:06:11 +00:00
|
|
|
VNET_DEFINE(int, drop_redirect) = 0;
|
Kill custom in_matroute() radix mathing function removing one rte mutex lock.
Initially in_matrote() in_clsroute() in their current state was introduced by
r4105 20 years ago. Instead of deleting inactive routes immediately, we kept them
in route table, setting RTPRF_OURS flag and some expire time. After that, either
GC came or RTPRF_OURS got removed on first-packet. It was a good solution
in that days (and probably another decade after that) to keep TCP metrics.
However, after moving metrics to TCP hostcache in r122922, most of in_rmx
functionality became unused. It might had been used for flushing icmp-originated
routes before rte mutexes/refcounting, but I'm not sure about that.
So it looks like this is nearly impossible to make GC do its work nowadays:
in_rtkill() ignores non-RTPRF_OURS routes.
route can only become RTPRF_OURS after dropping last reference via rtfree()
which calls in_clsroute(), which, it turn, ignores UP and non-RTF_DYNAMIC routes.
Dynamic routes can still be installed via received redirect, but they
have default lifetime (no specific rt_expire) and no one has another trie walker
to call RTFREE() on them.
So, the changelist:
* remove custom rnh_match / rnh_close matching function.
* remove all GC functions
* partially revert r256695 (proto3 is no more used inside kernel,
it is not possible to use rt_expire from user point of view, proto3 support
is not complete)
* Finish r241884 (similar to this commit) and remove remaining IPv6 parts
MFC after: 1 month
2014-11-11 02:52:40 +00:00
|
|
|
#define V_drop_redirect VNET(drop_redirect)
|
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, drop_redirect, CTLFLAG_VNET | CTLFLAG_RW,
|
|
|
|
&VNET_NAME(drop_redirect), 0, "Ignore ICMP redirects");
|
1999-08-10 09:45:33 +00:00
|
|
|
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(int, log_redirect) = 0;
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_log_redirect VNET(log_redirect)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, log_redirect, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(log_redirect), 0,
|
|
|
|
"Log ICMP redirects to the console");
|
1999-09-14 16:40:28 +00:00
|
|
|
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(char, reply_src[IFNAMSIZ]);
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_reply_src VNET(reply_src)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_STRING(_net_inet_icmp, OID_AUTO, reply_src, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(reply_src), IFNAMSIZ,
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
"icmp reply source for non-local packets.");
|
2004-02-02 22:53:16 +00:00
|
|
|
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(int, icmp_rfi) = 0;
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_icmp_rfi VNET(icmp_rfi)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, reply_from_interface, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(icmp_rfi), 0,
|
|
|
|
"ICMP reply from incoming interface for non-local packets");
|
2005-08-21 12:29:39 +00:00
|
|
|
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(int, icmp_quotelen) = 8;
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_icmp_quotelen VNET(icmp_quotelen)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, quotelen, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(icmp_quotelen), 0,
|
|
|
|
"Number of bytes from original packet to quote in ICMP reply");
|
2005-08-21 15:09:07 +00:00
|
|
|
|
1998-12-04 04:21:25 +00:00
|
|
|
/*
|
|
|
|
* ICMP broadcast echo sysctl
|
|
|
|
*/
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(int, icmpbmcastecho) = 0;
|
2010-04-29 11:52:42 +00:00
|
|
|
#define V_icmpbmcastecho VNET(icmpbmcastecho)
|
2014-11-07 09:39:05 +00:00
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, bmcastecho, CTLFLAG_VNET | CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(icmpbmcastecho), 0,
|
|
|
|
"");
|
1997-08-25 01:25:31 +00:00
|
|
|
|
2014-10-01 18:07:34 +00:00
|
|
|
static VNET_DEFINE(int, icmptstamprepl) = 1;
|
|
|
|
#define V_icmptstamprepl VNET(icmptstamprepl)
|
|
|
|
SYSCTL_INT(_net_inet_icmp, OID_AUTO, tstamprepl, CTLFLAG_RW,
|
|
|
|
&VNET_NAME(icmptstamprepl), 0, "Respond to ICMP Timestamp packets");
|
1998-12-03 20:23:21 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
#ifdef ICMPPRINTFS
|
|
|
|
int icmpprintfs = 0;
|
|
|
|
#endif
|
|
|
|
|
2002-03-19 21:25:46 +00:00
|
|
|
static void icmp_reflect(struct mbuf *);
|
2003-11-14 21:48:57 +00:00
|
|
|
static void icmp_send(struct mbuf *, struct mbuf *);
|
1995-11-14 20:34:56 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
extern struct protosw inetsw[];
|
|
|
|
|
2009-08-02 19:43:32 +00:00
|
|
|
/*
|
|
|
|
* Kernel module interface for updating icmpstat. The argument is an index
|
|
|
|
* into icmpstat treated as an array of u_long. While this encodes the
|
|
|
|
* general layout of icmpstat into the caller, it doesn't encode its
|
|
|
|
* location, so that future changes to add, for example, per-CPU stats
|
|
|
|
* support won't cause binary compatibility problems for kernel modules.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
kmod_icmpstat_inc(int statnum)
|
|
|
|
{
|
|
|
|
|
2013-07-09 09:50:15 +00:00
|
|
|
counter_u64_add(VNET(icmpstat)[statnum], 1);
|
2009-08-02 19:43:32 +00:00
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Generate an error packet of type error
|
|
|
|
* in response to bad packet ip.
|
|
|
|
*/
|
|
|
|
void
|
2009-02-13 15:14:43 +00:00
|
|
|
icmp_error(struct mbuf *n, int type, int code, uint32_t dest, int mtu)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
register struct ip *oip = mtod(n, struct ip *), *nip;
|
2005-11-18 14:48:42 +00:00
|
|
|
register unsigned oiphlen = oip->ip_hl << 2;
|
1994-05-24 10:09:53 +00:00
|
|
|
register struct icmp *icp;
|
|
|
|
register struct mbuf *m;
|
2005-11-18 14:48:42 +00:00
|
|
|
unsigned icmplen, icmpelen, nlen;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2005-11-18 14:48:42 +00:00
|
|
|
KASSERT((u_int)type <= ICMP_MAXTYPE, ("%s: illegal ICMP type", __func__));
|
1994-05-24 10:09:53 +00:00
|
|
|
#ifdef ICMPPRINTFS
|
|
|
|
if (icmpprintfs)
|
1994-10-02 17:48:58 +00:00
|
|
|
printf("icmp_error(%p, %x, %d)\n", oip, type, code);
|
1994-05-24 10:09:53 +00:00
|
|
|
#endif
|
|
|
|
if (type != ICMP_REDIRECT)
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_error);
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2005-11-18 14:48:42 +00:00
|
|
|
* Don't send error:
|
|
|
|
* if the original packet was encrypted.
|
|
|
|
* if not the first fragment of message.
|
|
|
|
* in response to a multicast or broadcast packet.
|
|
|
|
* if the old packet protocol was an ICMP error message.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2004-06-07 09:56:59 +00:00
|
|
|
if (n->m_flags & M_DECRYPTED)
|
|
|
|
goto freeit;
|
2012-10-22 21:09:03 +00:00
|
|
|
if (oip->ip_off & htons(~(IP_MF|IP_DF)))
|
2005-11-18 14:48:42 +00:00
|
|
|
goto freeit;
|
|
|
|
if (n->m_flags & (M_BCAST|M_MCAST))
|
1994-05-24 10:09:53 +00:00
|
|
|
goto freeit;
|
|
|
|
if (oip->ip_p == IPPROTO_ICMP && type != ICMP_REDIRECT &&
|
2005-11-18 14:48:42 +00:00
|
|
|
n->m_len >= oiphlen + ICMP_MINLEN &&
|
|
|
|
!ICMP_INFOTYPE(((struct icmp *)((caddr_t)oip + oiphlen))->icmp_type)) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_oldicmp);
|
1994-05-24 10:09:53 +00:00
|
|
|
goto freeit;
|
|
|
|
}
|
2005-11-18 14:48:42 +00:00
|
|
|
/* Drop if IP header plus 8 bytes is not contignous in first mbuf. */
|
|
|
|
if (oiphlen + 8 > n->m_len)
|
1994-05-24 10:09:53 +00:00
|
|
|
goto freeit;
|
2005-08-21 15:09:07 +00:00
|
|
|
/*
|
|
|
|
* Calculate length to quote from original packet and
|
|
|
|
* prevent the ICMP mbuf from overflowing.
|
2005-11-18 14:48:42 +00:00
|
|
|
* Unfortunatly this is non-trivial since ip_forward()
|
|
|
|
* sends us truncated packets.
|
2005-08-21 15:09:07 +00:00
|
|
|
*/
|
2005-11-18 14:48:42 +00:00
|
|
|
nlen = m_length(n, NULL);
|
2005-08-22 14:12:18 +00:00
|
|
|
if (oip->ip_p == IPPROTO_TCP) {
|
|
|
|
struct tcphdr *th;
|
|
|
|
int tcphlen;
|
|
|
|
|
2005-11-18 14:48:42 +00:00
|
|
|
if (oiphlen + sizeof(struct tcphdr) > n->m_len &&
|
|
|
|
n->m_next == NULL)
|
|
|
|
goto stdreply;
|
|
|
|
if (n->m_len < oiphlen + sizeof(struct tcphdr) &&
|
|
|
|
((n = m_pullup(n, oiphlen + sizeof(struct tcphdr))) == NULL))
|
2005-08-22 14:12:18 +00:00
|
|
|
goto freeit;
|
2005-11-18 14:48:42 +00:00
|
|
|
th = (struct tcphdr *)((caddr_t)oip + oiphlen);
|
2005-08-22 14:12:18 +00:00
|
|
|
tcphlen = th->th_off << 2;
|
|
|
|
if (tcphlen < sizeof(struct tcphdr))
|
|
|
|
goto freeit;
|
2012-10-22 21:09:03 +00:00
|
|
|
if (ntohs(oip->ip_len) < oiphlen + tcphlen)
|
2005-08-22 14:12:18 +00:00
|
|
|
goto freeit;
|
2005-11-18 14:48:42 +00:00
|
|
|
if (oiphlen + tcphlen > n->m_len && n->m_next == NULL)
|
|
|
|
goto stdreply;
|
|
|
|
if (n->m_len < oiphlen + tcphlen &&
|
|
|
|
((n = m_pullup(n, oiphlen + tcphlen)) == NULL))
|
2005-08-22 14:12:18 +00:00
|
|
|
goto freeit;
|
2012-10-22 21:09:03 +00:00
|
|
|
icmpelen = max(tcphlen, min(V_icmp_quotelen,
|
|
|
|
ntohs(oip->ip_len) - oiphlen));
|
2005-08-22 14:12:18 +00:00
|
|
|
} else
|
2012-10-22 21:09:03 +00:00
|
|
|
stdreply: icmpelen = max(8, min(V_icmp_quotelen, ntohs(oip->ip_len) - oiphlen));
|
2005-11-18 14:48:42 +00:00
|
|
|
|
|
|
|
icmplen = min(oiphlen + icmpelen, nlen);
|
It was possible for ip_forward() to supply to icmp_error()
an IP header with ip_len in network byte order. For certain
values of ip_len, this could cause icmp_error() to write
beyond the end of an mbuf, causing mbuf free-list corruption.
This problem was observed during generation of ICMP redirects.
We now make quite sure that the copy of the IP header kept
for icmp_error() is stored in a non-shared mbuf header so
that it will not be modified by ip_output().
Also:
- Calculate the correct number of bytes that need to be
retained for icmp_error(), instead of assuming that 64
is enough (it's not).
- In icmp_error(), use m_copydata instead of bcopy() to
copy from the supplied mbuf chain, in case the first 8
bytes of IP payload are not stored directly after the IP
header.
- Sanity-check ip_len in icmp_error(), and panic if it is
less than sizeof(struct ip). Incoming packets with bad
ip_len values are discarded in ip_input(), so this should
only be triggered by bugs in the code, not by bad packets.
This patch results from code and suggestions from Ruslan, Bosko,
Jonathan Lemon and Matt Dillon, with important testing by Mike
Tancsa, who could reproduce this problem at will.
Reported by: Mike Tancsa <mike@sentex.net>
Reviewed by: ru, bmilekic, jlemon, dillon
2001-03-08 19:03:26 +00:00
|
|
|
if (icmplen < sizeof(struct ip))
|
2005-08-22 14:12:18 +00:00
|
|
|
goto freeit;
|
2005-11-18 14:48:42 +00:00
|
|
|
|
|
|
|
if (MHLEN > sizeof(struct ip) + ICMP_MINLEN + icmplen)
|
2012-12-05 08:04:20 +00:00
|
|
|
m = m_gethdr(M_NOWAIT, MT_DATA);
|
2005-11-18 14:48:42 +00:00
|
|
|
else
|
2012-12-05 08:04:20 +00:00
|
|
|
m = m_getcl(M_NOWAIT, MT_DATA, M_PKTHDR);
|
2005-11-18 14:48:42 +00:00
|
|
|
if (m == NULL)
|
|
|
|
goto freeit;
|
|
|
|
#ifdef MAC
|
2007-10-28 17:12:48 +00:00
|
|
|
mac_netinet_icmp_reply(n, m);
|
2005-11-18 14:48:42 +00:00
|
|
|
#endif
|
|
|
|
icmplen = min(icmplen, M_TRAILINGSPACE(m) - sizeof(struct ip) - ICMP_MINLEN);
|
|
|
|
m_align(m, ICMP_MINLEN + icmplen);
|
|
|
|
m->m_len = ICMP_MINLEN + icmplen;
|
|
|
|
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
/* XXX MRT make the outgoing packet use the same FIB
|
|
|
|
* that was associated with the incoming packet
|
|
|
|
*/
|
|
|
|
M_SETFIB(m, M_GETFIB(n));
|
1994-05-24 10:09:53 +00:00
|
|
|
icp = mtod(m, struct icmp *);
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_outhist[type]);
|
1994-05-24 10:09:53 +00:00
|
|
|
icp->icmp_type = type;
|
|
|
|
if (type == ICMP_REDIRECT)
|
|
|
|
icp->icmp_gwaddr.s_addr = dest;
|
|
|
|
else {
|
|
|
|
icp->icmp_void = 0;
|
1995-05-30 08:16:23 +00:00
|
|
|
/*
|
1994-05-24 10:09:53 +00:00
|
|
|
* The following assignments assume an overlay with the
|
2005-11-18 14:48:42 +00:00
|
|
|
* just zeroed icmp_void field.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
if (type == ICMP_PARAMPROB) {
|
|
|
|
icp->icmp_pptr = code;
|
|
|
|
code = 0;
|
|
|
|
} else if (type == ICMP_UNREACH &&
|
2005-05-04 13:09:19 +00:00
|
|
|
code == ICMP_UNREACH_NEEDFRAG && mtu) {
|
|
|
|
icp->icmp_nextmtu = htons(mtu);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
icp->icmp_code = code;
|
Fixed broken ICMP error generation, unified conversion of IP header
fields between host and network byte order. The details:
o icmp_error() now does not add IP header length. This fixes the problem
when icmp_error() is called from ip_forward(). In this case the ip_len
of the original IP datagram returned with ICMP error was wrong.
o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
byte order, so DTRT and convert these fields back to network byte order
before sending a message. This fixes the problem described in PR 16240
and PR 20877 (ip_id field was returned in host byte order).
o ip_ttl decrement operation in ip_forward() was moved down to make sure
that it does not corrupt the copy of original IP datagram passed later
to icmp_error().
o A copy of original IP datagram in ip_forward() was made a read-write,
independent copy. This fixes the problem I first reported to Garrett
Wollman and Bill Fenner and later put in audit trail of PR 16240:
ip_output() (not always) converts fields of original datagram to network
byte order, but because copy (mcopy) and its original (m) most likely
share the same mbuf cluster, ip_output()'s manipulations on original
also corrupted the copy.
o ip_output() now expects all three fields, ip_len, ip_off and (what is
significant) ip_id in host byte order. It was a headache for years that
ip_id was handled differently. The only compatibility issue here is the
raw IP socket interface with IP_HDRINCL socket option set and a non-zero
ip_id field, but ip.4 manual page was unclear on whether in this case
ip_id field should be in host or network byte order.
2000-09-01 12:33:03 +00:00
|
|
|
|
|
|
|
/*
|
2005-11-18 14:48:42 +00:00
|
|
|
* Copy the quotation into ICMP message and
|
|
|
|
* convert quoted IP header back to network representation.
|
Fixed broken ICMP error generation, unified conversion of IP header
fields between host and network byte order. The details:
o icmp_error() now does not add IP header length. This fixes the problem
when icmp_error() is called from ip_forward(). In this case the ip_len
of the original IP datagram returned with ICMP error was wrong.
o icmp_error() expects all three fields, ip_len, ip_id and ip_off in host
byte order, so DTRT and convert these fields back to network byte order
before sending a message. This fixes the problem described in PR 16240
and PR 20877 (ip_id field was returned in host byte order).
o ip_ttl decrement operation in ip_forward() was moved down to make sure
that it does not corrupt the copy of original IP datagram passed later
to icmp_error().
o A copy of original IP datagram in ip_forward() was made a read-write,
independent copy. This fixes the problem I first reported to Garrett
Wollman and Bill Fenner and later put in audit trail of PR 16240:
ip_output() (not always) converts fields of original datagram to network
byte order, but because copy (mcopy) and its original (m) most likely
share the same mbuf cluster, ip_output()'s manipulations on original
also corrupted the copy.
o ip_output() now expects all three fields, ip_len, ip_off and (what is
significant) ip_id in host byte order. It was a headache for years that
ip_id was handled differently. The only compatibility issue here is the
raw IP socket interface with IP_HDRINCL socket option set and a non-zero
ip_id field, but ip.4 manual page was unclear on whether in this case
ip_id field should be in host or network byte order.
2000-09-01 12:33:03 +00:00
|
|
|
*/
|
2005-11-18 14:48:42 +00:00
|
|
|
m_copydata(n, 0, icmplen, (caddr_t)&icp->icmp_ip);
|
|
|
|
nip = &icp->icmp_ip;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
/*
|
2005-11-18 14:48:42 +00:00
|
|
|
* Set up ICMP message mbuf and copy old IP header (without options
|
|
|
|
* in front of ICMP message.
|
2004-07-17 05:10:06 +00:00
|
|
|
* If the original mbuf was meant to bypass the firewall, the error
|
|
|
|
* reply should bypass as well.
|
|
|
|
*/
|
|
|
|
m->m_flags |= n->m_flags & M_SKIP_FIREWALL;
|
1994-05-24 10:09:53 +00:00
|
|
|
m->m_data -= sizeof(struct ip);
|
|
|
|
m->m_len += sizeof(struct ip);
|
|
|
|
m->m_pkthdr.len = m->m_len;
|
|
|
|
m->m_pkthdr.rcvif = n->m_pkthdr.rcvif;
|
|
|
|
nip = mtod(m, struct ip *);
|
|
|
|
bcopy((caddr_t)oip, (caddr_t)nip, sizeof(struct ip));
|
2012-10-22 21:09:03 +00:00
|
|
|
nip->ip_len = htons(m->m_len);
|
2002-10-20 22:52:07 +00:00
|
|
|
nip->ip_v = IPVERSION;
|
|
|
|
nip->ip_hl = 5;
|
1994-05-24 10:09:53 +00:00
|
|
|
nip->ip_p = IPPROTO_ICMP;
|
|
|
|
nip->ip_tos = 0;
|
2014-03-31 13:00:49 +00:00
|
|
|
nip->ip_off = 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
icmp_reflect(m);
|
|
|
|
|
|
|
|
freeit:
|
|
|
|
m_freem(n);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Process a received ICMP message.
|
|
|
|
*/
|
2014-08-08 01:57:15 +00:00
|
|
|
int
|
|
|
|
icmp_input(struct mbuf **mp, int *offp, int proto)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2003-11-13 00:32:13 +00:00
|
|
|
struct icmp *icp;
|
|
|
|
struct in_ifaddr *ia;
|
2014-08-08 01:57:15 +00:00
|
|
|
struct mbuf *m = *mp;
|
2003-11-13 00:32:13 +00:00
|
|
|
struct ip *ip = mtod(m, struct ip *);
|
|
|
|
struct sockaddr_in icmpsrc, icmpdst, icmpgw;
|
2014-08-08 01:57:15 +00:00
|
|
|
int hlen = *offp;
|
|
|
|
int icmplen = ntohs(ip->ip_len) - *offp;
|
2003-11-13 00:32:13 +00:00
|
|
|
int i, code;
|
2002-03-19 21:25:46 +00:00
|
|
|
void (*ctlfunc)(int, struct sockaddr *, void *);
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
int fibnum;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2014-08-08 01:57:15 +00:00
|
|
|
*mp = NULL;
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Locate icmp structure in mbuf, and check
|
|
|
|
* that not corrupted and of at least minimum length.
|
|
|
|
*/
|
|
|
|
#ifdef ICMPPRINTFS
|
1995-08-29 17:49:04 +00:00
|
|
|
if (icmpprintfs) {
|
|
|
|
char buf[4 * sizeof "123"];
|
|
|
|
strcpy(buf, inet_ntoa(ip->ip_src));
|
|
|
|
printf("icmp_input from %s to %s, len %d\n",
|
|
|
|
buf, inet_ntoa(ip->ip_dst), icmplen);
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
#endif
|
|
|
|
if (icmplen < ICMP_MINLEN) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_tooshort);
|
1994-05-24 10:09:53 +00:00
|
|
|
goto freeit;
|
|
|
|
}
|
|
|
|
i = hlen + min(icmplen, ICMP_ADVLENMIN);
|
2009-10-13 20:29:14 +00:00
|
|
|
if (m->m_len < i && (m = m_pullup(m, i)) == NULL) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_tooshort);
|
2014-08-08 01:57:15 +00:00
|
|
|
return (IPPROTO_DONE);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
ip = mtod(m, struct ip *);
|
|
|
|
m->m_len -= hlen;
|
|
|
|
m->m_data += hlen;
|
|
|
|
icp = mtod(m, struct icmp *);
|
|
|
|
if (in_cksum(m, icmplen)) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_checksum);
|
1994-05-24 10:09:53 +00:00
|
|
|
goto freeit;
|
|
|
|
}
|
|
|
|
m->m_len += hlen;
|
|
|
|
m->m_data -= hlen;
|
|
|
|
|
|
|
|
#ifdef ICMPPRINTFS
|
|
|
|
if (icmpprintfs)
|
|
|
|
printf("icmp_input, type %d code %d\n", icp->icmp_type,
|
|
|
|
icp->icmp_code);
|
|
|
|
#endif
|
1995-07-10 16:16:00 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Message type specific processing.
|
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
if (icp->icmp_type > ICMP_MAXTYPE)
|
|
|
|
goto raw;
|
2003-11-13 00:32:13 +00:00
|
|
|
|
|
|
|
/* Initialize */
|
|
|
|
bzero(&icmpsrc, sizeof(icmpsrc));
|
|
|
|
icmpsrc.sin_len = sizeof(struct sockaddr_in);
|
|
|
|
icmpsrc.sin_family = AF_INET;
|
|
|
|
bzero(&icmpdst, sizeof(icmpdst));
|
|
|
|
icmpdst.sin_len = sizeof(struct sockaddr_in);
|
|
|
|
icmpdst.sin_family = AF_INET;
|
|
|
|
bzero(&icmpgw, sizeof(icmpgw));
|
|
|
|
icmpgw.sin_len = sizeof(struct sockaddr_in);
|
|
|
|
icmpgw.sin_family = AF_INET;
|
|
|
|
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_inhist[icp->icmp_type]);
|
1994-05-24 10:09:53 +00:00
|
|
|
code = icp->icmp_code;
|
|
|
|
switch (icp->icmp_type) {
|
|
|
|
|
|
|
|
case ICMP_UNREACH:
|
|
|
|
switch (code) {
|
|
|
|
case ICMP_UNREACH_NET:
|
|
|
|
case ICMP_UNREACH_HOST:
|
|
|
|
case ICMP_UNREACH_SRCFAIL:
|
2001-02-23 20:51:46 +00:00
|
|
|
case ICMP_UNREACH_NET_UNKNOWN:
|
|
|
|
case ICMP_UNREACH_HOST_UNKNOWN:
|
|
|
|
case ICMP_UNREACH_ISOLATED:
|
|
|
|
case ICMP_UNREACH_TOSNET:
|
|
|
|
case ICMP_UNREACH_TOSHOST:
|
|
|
|
case ICMP_UNREACH_HOST_PRECEDENCE:
|
|
|
|
case ICMP_UNREACH_PRECEDENCE_CUTOFF:
|
|
|
|
code = PRC_UNREACH_NET;
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ICMP_UNREACH_NEEDFRAG:
|
|
|
|
code = PRC_MSGSIZE;
|
|
|
|
break;
|
1995-05-30 08:16:23 +00:00
|
|
|
|
2001-02-23 20:51:46 +00:00
|
|
|
/*
|
|
|
|
* RFC 1122, Sections 3.2.2.1 and 4.2.3.9.
|
|
|
|
* Treat subcodes 2,3 as immediate RST
|
|
|
|
*/
|
|
|
|
case ICMP_UNREACH_PROTOCOL:
|
|
|
|
case ICMP_UNREACH_PORT:
|
2001-03-28 14:13:19 +00:00
|
|
|
code = PRC_UNREACH_PORT;
|
2001-02-18 09:34:55 +00:00
|
|
|
break;
|
We currently does not react to ICMP administratively prohibited
messages send by routers when they deny our traffic, this causes
a timeout when trying to connect to TCP ports/services on a remote
host, which is blocked by routers or firewalls.
rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually
requi re that we treat such a message for a TCP session, that we
treat it like if we had recieved a RST.
quote begin.
A Destination Unreachable message that is received MUST be
reported to the transport layer. The transport layer SHOULD
use the information appropriately; for example, see Sections
4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
that has its own mechanism for notifying the sender that a
port is unreachable (e.g., TCP, which sends RST segments)
MUST nevertheless accept an ICMP Port Unreachable for the
same purpose.
quote end.
I've written a small extension that implement this, it also create
a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if
this new behaviour is activated.
When it's activated (set to 1) we'll treat a ICMP administratively
prohibited message (icmp type 3 code 9, 10 and 13) for a TCP
sessions, as if we recived a TCP RST, but only if the TCP session
is in SYN_SENT state.
The reason for only reacting when in SYN_SENT state, is that this
will solve the problem, and at the same time minimize the risk of
this being abused.
I suggest that we enable this new behaviour by default, but it
would be a change of current behaviour, so if people prefer to
leave it disabled by default, at least for now, this would be ok
for me, the attached diff actually have the sysctl set to 0 by
default.
PR: 23086
Submitted by: Jesper Skriver <jesper@skriver.dk>
2000-12-16 19:42:06 +00:00
|
|
|
|
2001-02-23 20:51:46 +00:00
|
|
|
case ICMP_UNREACH_NET_PROHIB:
|
1994-05-24 10:09:53 +00:00
|
|
|
case ICMP_UNREACH_HOST_PROHIB:
|
1996-09-20 08:23:54 +00:00
|
|
|
case ICMP_UNREACH_FILTER_PROHIB:
|
2001-02-18 09:34:55 +00:00
|
|
|
code = PRC_UNREACH_ADMIN_PROHIB;
|
|
|
|
break;
|
We currently does not react to ICMP administratively prohibited
messages send by routers when they deny our traffic, this causes
a timeout when trying to connect to TCP ports/services on a remote
host, which is blocked by routers or firewalls.
rfc1122 (Requirements for Internet Hosts) section 3.2.2.1 actually
requi re that we treat such a message for a TCP session, that we
treat it like if we had recieved a RST.
quote begin.
A Destination Unreachable message that is received MUST be
reported to the transport layer. The transport layer SHOULD
use the information appropriately; for example, see Sections
4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
that has its own mechanism for notifying the sender that a
port is unreachable (e.g., TCP, which sends RST segments)
MUST nevertheless accept an ICMP Port Unreachable for the
same purpose.
quote end.
I've written a small extension that implement this, it also create
a sysctl "net.inet.tcp.icmp_admin_prohib_like_rst" to control if
this new behaviour is activated.
When it's activated (set to 1) we'll treat a ICMP administratively
prohibited message (icmp type 3 code 9, 10 and 13) for a TCP
sessions, as if we recived a TCP RST, but only if the TCP session
is in SYN_SENT state.
The reason for only reacting when in SYN_SENT state, is that this
will solve the problem, and at the same time minimize the risk of
this being abused.
I suggest that we enable this new behaviour by default, but it
would be a change of current behaviour, so if people prefer to
leave it disabled by default, at least for now, this would be ok
for me, the attached diff actually have the sysctl set to 0 by
default.
PR: 23086
Submitted by: Jesper Skriver <jesper@skriver.dk>
2000-12-16 19:42:06 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
default:
|
|
|
|
goto badcode;
|
|
|
|
}
|
|
|
|
goto deliver;
|
|
|
|
|
|
|
|
case ICMP_TIMXCEED:
|
|
|
|
if (code > 1)
|
|
|
|
goto badcode;
|
|
|
|
code += PRC_TIMXCEED_INTRANS;
|
|
|
|
goto deliver;
|
|
|
|
|
|
|
|
case ICMP_PARAMPROB:
|
|
|
|
if (code > 1)
|
|
|
|
goto badcode;
|
|
|
|
code = PRC_PARAMPROB;
|
|
|
|
deliver:
|
|
|
|
/*
|
|
|
|
* Problem with datagram; advise higher level routines.
|
|
|
|
*/
|
|
|
|
if (icmplen < ICMP_ADVLENMIN || icmplen < ICMP_ADVLEN(icp) ||
|
2002-10-20 22:52:07 +00:00
|
|
|
icp->icmp_ip.ip_hl < (sizeof(struct ip) >> 2)) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_badlen);
|
1994-05-24 10:09:53 +00:00
|
|
|
goto freeit;
|
|
|
|
}
|
1995-07-10 16:16:00 +00:00
|
|
|
/* Discard ICMP's in response to multicast packets */
|
|
|
|
if (IN_MULTICAST(ntohl(icp->icmp_ip.ip_dst.s_addr)))
|
|
|
|
goto badcode;
|
1994-05-24 10:09:53 +00:00
|
|
|
#ifdef ICMPPRINTFS
|
|
|
|
if (icmpprintfs)
|
|
|
|
printf("deliver to protocol %d\n", icp->icmp_ip.ip_p);
|
|
|
|
#endif
|
|
|
|
icmpsrc.sin_addr = icp->icmp_ip.ip_dst;
|
1999-12-22 19:13:38 +00:00
|
|
|
/*
|
|
|
|
* XXX if the packet contains [IPv4 AH TCP], we can't make a
|
|
|
|
* notification to TCP layer.
|
|
|
|
*/
|
1994-10-02 17:48:58 +00:00
|
|
|
ctlfunc = inetsw[ip_protox[icp->icmp_ip.ip_p]].pr_ctlinput;
|
|
|
|
if (ctlfunc)
|
1994-05-24 10:09:53 +00:00
|
|
|
(*ctlfunc)(code, (struct sockaddr *)&icmpsrc,
|
1995-12-16 02:14:44 +00:00
|
|
|
(void *)&icp->icmp_ip);
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
badcode:
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_badcode);
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
|
|
|
|
case ICMP_ECHO:
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (!V_icmpbmcastecho
|
1998-05-26 11:34:30 +00:00
|
|
|
&& (m->m_flags & (M_MCAST | M_BCAST)) != 0) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_bmcastecho);
|
1997-08-25 01:25:31 +00:00
|
|
|
break;
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
icp->icmp_type = ICMP_ECHOREPLY;
|
2001-02-11 07:39:51 +00:00
|
|
|
if (badport_bandlim(BANDLIM_ICMP_ECHO) < 0)
|
2000-12-15 21:45:49 +00:00
|
|
|
goto freeit;
|
|
|
|
else
|
|
|
|
goto reflect;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
case ICMP_TSTAMP:
|
2014-10-01 18:07:34 +00:00
|
|
|
if (V_icmptstamprepl == 0)
|
|
|
|
break;
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (!V_icmpbmcastecho
|
1998-05-26 11:34:30 +00:00
|
|
|
&& (m->m_flags & (M_MCAST | M_BCAST)) != 0) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_bmcasttstamp);
|
1997-08-25 16:29:27 +00:00
|
|
|
break;
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
if (icmplen < ICMP_TSLEN) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_badlen);
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
icp->icmp_type = ICMP_TSTAMPREPLY;
|
|
|
|
icp->icmp_rtime = iptime();
|
|
|
|
icp->icmp_ttime = icp->icmp_rtime; /* bogus, do later! */
|
2001-02-11 07:39:51 +00:00
|
|
|
if (badport_bandlim(BANDLIM_ICMP_TSTAMP) < 0)
|
2000-12-15 21:45:49 +00:00
|
|
|
goto freeit;
|
|
|
|
else
|
|
|
|
goto reflect;
|
1995-05-30 08:16:23 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
case ICMP_MASKREQ:
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (V_icmpmaskrepl == 0)
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
/*
|
|
|
|
* We are not able to respond with all ones broadcast
|
|
|
|
* unless we receive it over a point-to-point interface.
|
|
|
|
*/
|
|
|
|
if (icmplen < ICMP_MASKLEN)
|
|
|
|
break;
|
|
|
|
switch (ip->ip_dst.s_addr) {
|
|
|
|
|
|
|
|
case INADDR_BROADCAST:
|
|
|
|
case INADDR_ANY:
|
|
|
|
icmpdst.sin_addr = ip->ip_src;
|
|
|
|
break;
|
|
|
|
|
|
|
|
default:
|
|
|
|
icmpdst.sin_addr = ip->ip_dst;
|
|
|
|
}
|
|
|
|
ia = (struct in_ifaddr *)ifaof_ifpforaddr(
|
|
|
|
(struct sockaddr *)&icmpdst, m->m_pkthdr.rcvif);
|
2009-06-23 20:19:09 +00:00
|
|
|
if (ia == NULL)
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
2009-06-23 20:19:09 +00:00
|
|
|
if (ia->ia_ifp == NULL) {
|
|
|
|
ifa_free(&ia->ia_ifa);
|
1996-04-02 12:26:10 +00:00
|
|
|
break;
|
2009-06-23 20:19:09 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
icp->icmp_type = ICMP_MASKREPLY;
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (V_icmpmaskfake == 0)
|
2003-03-21 15:43:06 +00:00
|
|
|
icp->icmp_mask = ia->ia_sockmask.sin_addr.s_addr;
|
|
|
|
else
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
icp->icmp_mask = V_icmpmaskfake;
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ip->ip_src.s_addr == 0) {
|
|
|
|
if (ia->ia_ifp->if_flags & IFF_BROADCAST)
|
|
|
|
ip->ip_src = satosin(&ia->ia_broadaddr)->sin_addr;
|
|
|
|
else if (ia->ia_ifp->if_flags & IFF_POINTOPOINT)
|
|
|
|
ip->ip_src = satosin(&ia->ia_dstaddr)->sin_addr;
|
|
|
|
}
|
2009-06-23 20:19:09 +00:00
|
|
|
ifa_free(&ia->ia_ifa);
|
1994-05-24 10:09:53 +00:00
|
|
|
reflect:
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_reflect);
|
|
|
|
ICMPSTAT_INC(icps_outhist[icp->icmp_type]);
|
1994-05-24 10:09:53 +00:00
|
|
|
icmp_reflect(m);
|
2014-08-08 01:57:15 +00:00
|
|
|
return (IPPROTO_DONE);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
case ICMP_REDIRECT:
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (V_log_redirect) {
|
1999-08-10 09:45:33 +00:00
|
|
|
u_long src, dst, gw;
|
|
|
|
|
|
|
|
src = ntohl(ip->ip_src.s_addr);
|
|
|
|
dst = ntohl(icp->icmp_ip.ip_dst.s_addr);
|
|
|
|
gw = ntohl(icp->icmp_gwaddr.s_addr);
|
|
|
|
printf("icmp redirect from %d.%d.%d.%d: "
|
|
|
|
"%d.%d.%d.%d => %d.%d.%d.%d\n",
|
|
|
|
(int)(src >> 24), (int)((src >> 16) & 0xff),
|
|
|
|
(int)((src >> 8) & 0xff), (int)(src & 0xff),
|
|
|
|
(int)(dst >> 24), (int)((dst >> 16) & 0xff),
|
|
|
|
(int)((dst >> 8) & 0xff), (int)(dst & 0xff),
|
|
|
|
(int)(gw >> 24), (int)((gw >> 16) & 0xff),
|
|
|
|
(int)((gw >> 8) & 0xff), (int)(gw & 0xff));
|
|
|
|
}
|
2004-01-06 23:20:07 +00:00
|
|
|
/*
|
|
|
|
* RFC1812 says we must ignore ICMP redirects if we
|
|
|
|
* are acting as router.
|
|
|
|
*/
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (V_drop_redirect || V_ipforwarding)
|
1999-08-10 09:45:33 +00:00
|
|
|
break;
|
1994-05-24 10:09:53 +00:00
|
|
|
if (code > 3)
|
|
|
|
goto badcode;
|
|
|
|
if (icmplen < ICMP_ADVLENMIN || icmplen < ICMP_ADVLEN(icp) ||
|
2002-10-20 22:52:07 +00:00
|
|
|
icp->icmp_ip.ip_hl < (sizeof(struct ip) >> 2)) {
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_badlen);
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Short circuit routing redirects to force
|
|
|
|
* immediate change in the kernel's routing
|
|
|
|
* tables. The message is also handed to anyone
|
|
|
|
* listening on a raw socket (e.g. the routing
|
|
|
|
* daemon for use in updating its tables).
|
|
|
|
*/
|
|
|
|
icmpgw.sin_addr = ip->ip_src;
|
|
|
|
icmpdst.sin_addr = icp->icmp_gwaddr;
|
|
|
|
#ifdef ICMPPRINTFS
|
1995-08-29 17:49:04 +00:00
|
|
|
if (icmpprintfs) {
|
|
|
|
char buf[4 * sizeof "123"];
|
|
|
|
strcpy(buf, inet_ntoa(icp->icmp_ip.ip_dst));
|
|
|
|
|
|
|
|
printf("redirect dst %s to %s\n",
|
|
|
|
buf, inet_ntoa(icp->icmp_gwaddr));
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
#endif
|
|
|
|
icmpsrc.sin_addr = icp->icmp_ip.ip_dst;
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
for ( fibnum = 0; fibnum < rt_numfibs; fibnum++) {
|
|
|
|
in_rtredirect((struct sockaddr *)&icmpsrc,
|
|
|
|
(struct sockaddr *)&icmpdst,
|
|
|
|
(struct sockaddr *)0, RTF_GATEWAY | RTF_HOST,
|
|
|
|
(struct sockaddr *)&icmpgw, fibnum);
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
pfctlinput(PRC_REDIRECT_HOST, (struct sockaddr *)&icmpsrc);
|
|
|
|
break;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* No kernel processing for the following;
|
|
|
|
* just fall through to send to raw listener.
|
|
|
|
*/
|
|
|
|
case ICMP_ECHOREPLY:
|
|
|
|
case ICMP_ROUTERADVERT:
|
|
|
|
case ICMP_ROUTERSOLICIT:
|
|
|
|
case ICMP_TSTAMPREPLY:
|
|
|
|
case ICMP_IREQREPLY:
|
|
|
|
case ICMP_MASKREPLY:
|
2014-11-10 23:10:01 +00:00
|
|
|
case ICMP_SOURCEQUENCH:
|
1994-05-24 10:09:53 +00:00
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
raw:
|
2014-08-08 01:57:15 +00:00
|
|
|
*mp = m;
|
|
|
|
rip_input(mp, offp, proto);
|
|
|
|
return (IPPROTO_DONE);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
freeit:
|
|
|
|
m_freem(m);
|
2014-08-08 01:57:15 +00:00
|
|
|
return (IPPROTO_DONE);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Reflect the ip packet back to the source
|
|
|
|
*/
|
1995-11-14 20:34:56 +00:00
|
|
|
static void
|
2007-05-10 15:58:48 +00:00
|
|
|
icmp_reflect(struct mbuf *m)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
2001-09-29 04:34:11 +00:00
|
|
|
struct ip *ip = mtod(m, struct ip *);
|
|
|
|
struct ifaddr *ifa;
|
2009-04-20 13:45:39 +00:00
|
|
|
struct ifnet *ifp;
|
2001-09-29 04:34:11 +00:00
|
|
|
struct in_ifaddr *ia;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct in_addr t;
|
1995-03-16 18:17:34 +00:00
|
|
|
struct mbuf *opts = 0;
|
2002-10-20 22:52:07 +00:00
|
|
|
int optlen = (ip->ip_hl << 2) - sizeof(struct ip);
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2008-04-17 12:50:42 +00:00
|
|
|
if (IN_MULTICAST(ntohl(ip->ip_src.s_addr)) ||
|
|
|
|
IN_EXPERIMENTAL(ntohl(ip->ip_src.s_addr)) ||
|
|
|
|
IN_ZERONET(ntohl(ip->ip_src.s_addr)) ) {
|
1994-05-24 10:09:53 +00:00
|
|
|
m_freem(m); /* Bad return address */
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_badaddr);
|
1994-05-24 10:09:53 +00:00
|
|
|
goto done; /* Ip_output() will check for broadcast */
|
|
|
|
}
|
2008-04-17 12:50:42 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
t = ip->ip_dst;
|
|
|
|
ip->ip_dst = ip->ip_src;
|
2004-02-02 22:17:09 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2004-02-02 22:17:09 +00:00
|
|
|
* Source selection for ICMP replies:
|
|
|
|
*
|
|
|
|
* If the incoming packet was addressed directly to one of our
|
|
|
|
* own addresses, use dst as the src for the reply.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2009-06-25 11:52:33 +00:00
|
|
|
IN_IFADDR_RLOCK();
|
2009-04-20 13:45:39 +00:00
|
|
|
LIST_FOREACH(ia, INADDR_HASH(t.s_addr), ia_hash) {
|
|
|
|
if (t.s_addr == IA_SIN(ia)->sin_addr.s_addr) {
|
|
|
|
t = IA_SIN(ia)->sin_addr;
|
2009-06-25 11:52:33 +00:00
|
|
|
IN_IFADDR_RUNLOCK();
|
2001-09-29 04:34:11 +00:00
|
|
|
goto match;
|
2009-04-20 13:45:39 +00:00
|
|
|
}
|
|
|
|
}
|
2009-06-25 11:52:33 +00:00
|
|
|
IN_IFADDR_RUNLOCK();
|
|
|
|
|
2004-02-02 22:17:09 +00:00
|
|
|
/*
|
|
|
|
* If the incoming packet was addressed to one of our broadcast
|
|
|
|
* addresses, use the first non-broadcast address which corresponds
|
|
|
|
* to the incoming interface.
|
|
|
|
*/
|
2009-04-20 13:45:39 +00:00
|
|
|
ifp = m->m_pkthdr.rcvif;
|
|
|
|
if (ifp != NULL && ifp->if_flags & IFF_BROADCAST) {
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RLOCK(ifp);
|
2009-04-20 13:45:39 +00:00
|
|
|
TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2001-09-29 04:34:11 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
2001-12-14 19:32:47 +00:00
|
|
|
ia = ifatoia(ifa);
|
|
|
|
if (satosin(&ia->ia_broadaddr)->sin_addr.s_addr ==
|
2009-04-20 13:45:39 +00:00
|
|
|
t.s_addr) {
|
|
|
|
t = IA_SIN(ia)->sin_addr;
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RUNLOCK(ifp);
|
2001-12-14 19:32:47 +00:00
|
|
|
goto match;
|
2009-04-20 13:45:39 +00:00
|
|
|
}
|
2001-12-14 19:32:47 +00:00
|
|
|
}
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RUNLOCK(ifp);
|
2001-12-14 19:32:47 +00:00
|
|
|
}
|
2005-08-21 12:29:39 +00:00
|
|
|
/*
|
|
|
|
* If the packet was transiting through us, use the address of
|
|
|
|
* the interface the packet came through in. If that interface
|
|
|
|
* doesn't have a suitable IP address, the normal selection
|
|
|
|
* criteria apply.
|
|
|
|
*/
|
2009-04-20 13:45:39 +00:00
|
|
|
if (V_icmp_rfi && ifp != NULL) {
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RLOCK(ifp);
|
2009-04-20 13:45:39 +00:00
|
|
|
TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2005-08-21 12:29:39 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
|
|
|
ia = ifatoia(ifa);
|
2009-04-20 13:45:39 +00:00
|
|
|
t = IA_SIN(ia)->sin_addr;
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RUNLOCK(ifp);
|
2005-08-21 12:29:39 +00:00
|
|
|
goto match;
|
|
|
|
}
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RUNLOCK(ifp);
|
2005-08-21 12:29:39 +00:00
|
|
|
}
|
2004-02-02 22:53:16 +00:00
|
|
|
/*
|
|
|
|
* If the incoming packet was not addressed directly to us, use
|
|
|
|
* designated interface for icmp replies specified by sysctl
|
|
|
|
* net.inet.icmp.reply_src (default not set). Otherwise continue
|
|
|
|
* with normal source selection.
|
|
|
|
*/
|
2009-04-20 13:45:39 +00:00
|
|
|
if (V_reply_src[0] != '\0' && (ifp = ifunit(V_reply_src))) {
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RLOCK(ifp);
|
2009-04-20 13:45:39 +00:00
|
|
|
TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
|
2004-02-02 22:53:16 +00:00
|
|
|
if (ifa->ifa_addr->sa_family != AF_INET)
|
|
|
|
continue;
|
|
|
|
ia = ifatoia(ifa);
|
2009-04-20 13:45:39 +00:00
|
|
|
t = IA_SIN(ia)->sin_addr;
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RUNLOCK(ifp);
|
2004-02-02 22:53:16 +00:00
|
|
|
goto match;
|
|
|
|
}
|
2012-01-05 19:00:36 +00:00
|
|
|
IF_ADDR_RUNLOCK(ifp);
|
2004-02-02 22:53:16 +00:00
|
|
|
}
|
2004-08-16 18:32:07 +00:00
|
|
|
/*
|
2004-02-02 22:17:09 +00:00
|
|
|
* If the packet was transiting through us, use the address of
|
|
|
|
* the interface that is the closest to the packet source.
|
|
|
|
* When we don't have a route back to the packet source, stop here
|
|
|
|
* and drop the packet.
|
|
|
|
*/
|
Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.
Constraints:
------------
I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".
One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.
One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).
You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.
This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.
2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.
6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.
Early testing experience:
-------------------------
Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.
For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any
count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.
Where to next:
--------------------
After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.
When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.
Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
2008-05-09 23:03:00 +00:00
|
|
|
ia = ip_rtaddr(ip->ip_dst, M_GETFIB(m));
|
2001-11-27 19:58:09 +00:00
|
|
|
if (ia == NULL) {
|
|
|
|
m_freem(m);
|
2009-04-12 13:22:33 +00:00
|
|
|
ICMPSTAT_INC(icps_noroute);
|
2001-11-27 19:58:09 +00:00
|
|
|
goto done;
|
|
|
|
}
|
2009-04-20 13:45:39 +00:00
|
|
|
t = IA_SIN(ia)->sin_addr;
|
2009-06-23 20:19:09 +00:00
|
|
|
ifa_free(&ia->ia_ifa);
|
2001-09-29 04:34:11 +00:00
|
|
|
match:
|
2003-08-21 18:39:16 +00:00
|
|
|
#ifdef MAC
|
2007-10-28 17:12:48 +00:00
|
|
|
mac_netinet_icmp_replyinplace(m);
|
2003-08-21 18:39:16 +00:00
|
|
|
#endif
|
1994-05-24 10:09:53 +00:00
|
|
|
ip->ip_src = t;
|
Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course
of the next few weeks.
Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
2008-08-17 23:27:27 +00:00
|
|
|
ip->ip_ttl = V_ip_defttl;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
if (optlen > 0) {
|
|
|
|
register u_char *cp;
|
|
|
|
int opt, cnt;
|
|
|
|
u_int len;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Retrieve any source routing from the incoming packet;
|
|
|
|
* add on any record-route or timestamp options.
|
|
|
|
*/
|
|
|
|
cp = (u_char *) (ip + 1);
|
2004-09-15 20:13:26 +00:00
|
|
|
if ((opts = ip_srcroute(m)) == 0 &&
|
2012-12-05 08:04:20 +00:00
|
|
|
(opts = m_gethdr(M_NOWAIT, MT_DATA))) {
|
1994-05-24 10:09:53 +00:00
|
|
|
opts->m_len = sizeof(struct in_addr);
|
|
|
|
mtod(opts, struct in_addr *)->s_addr = 0;
|
|
|
|
}
|
|
|
|
if (opts) {
|
|
|
|
#ifdef ICMPPRINTFS
|
|
|
|
if (icmpprintfs)
|
|
|
|
printf("icmp_reflect optlen %d rt %d => ",
|
|
|
|
optlen, opts->m_len);
|
|
|
|
#endif
|
|
|
|
for (cnt = optlen; cnt > 0; cnt -= len, cp += len) {
|
|
|
|
opt = cp[IPOPT_OPTVAL];
|
|
|
|
if (opt == IPOPT_EOL)
|
|
|
|
break;
|
|
|
|
if (opt == IPOPT_NOP)
|
|
|
|
len = 1;
|
|
|
|
else {
|
2000-06-02 20:18:38 +00:00
|
|
|
if (cnt < IPOPT_OLEN + sizeof(*cp))
|
|
|
|
break;
|
1994-05-24 10:09:53 +00:00
|
|
|
len = cp[IPOPT_OLEN];
|
2000-06-02 20:18:38 +00:00
|
|
|
if (len < IPOPT_OLEN + sizeof(*cp) ||
|
|
|
|
len > cnt)
|
1994-05-24 10:09:53 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
/*
|
|
|
|
* Should check for overflow, but it "can't happen"
|
|
|
|
*/
|
1995-05-30 08:16:23 +00:00
|
|
|
if (opt == IPOPT_RR || opt == IPOPT_TS ||
|
1994-05-24 10:09:53 +00:00
|
|
|
opt == IPOPT_SECURITY) {
|
|
|
|
bcopy((caddr_t)cp,
|
|
|
|
mtod(opts, caddr_t) + opts->m_len, len);
|
|
|
|
opts->m_len += len;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* Terminate & pad, if necessary */
|
1994-10-02 17:48:58 +00:00
|
|
|
cnt = opts->m_len % 4;
|
|
|
|
if (cnt) {
|
1994-05-24 10:09:53 +00:00
|
|
|
for (; cnt < 4; cnt++) {
|
|
|
|
*(mtod(opts, caddr_t) + opts->m_len) =
|
|
|
|
IPOPT_EOL;
|
|
|
|
opts->m_len++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#ifdef ICMPPRINTFS
|
|
|
|
if (icmpprintfs)
|
|
|
|
printf("%d\n", opts->m_len);
|
|
|
|
#endif
|
|
|
|
}
|
2012-10-23 10:30:09 +00:00
|
|
|
ip_stripoptions(m);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2003-10-29 05:40:07 +00:00
|
|
|
m_tag_delete_nonpersistent(m);
|
1994-05-24 10:09:53 +00:00
|
|
|
m->m_flags &= ~(M_BCAST|M_MCAST);
|
2003-11-14 21:48:57 +00:00
|
|
|
icmp_send(m, opts);
|
1994-05-24 10:09:53 +00:00
|
|
|
done:
|
|
|
|
if (opts)
|
|
|
|
(void)m_free(opts);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Send an icmp packet back to the ip level,
|
|
|
|
* after supplying a checksum.
|
|
|
|
*/
|
1995-11-14 20:34:56 +00:00
|
|
|
static void
|
2007-05-10 15:58:48 +00:00
|
|
|
icmp_send(struct mbuf *m, struct mbuf *opts)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
register struct ip *ip = mtod(m, struct ip *);
|
|
|
|
register int hlen;
|
|
|
|
register struct icmp *icp;
|
|
|
|
|
2002-10-20 22:52:07 +00:00
|
|
|
hlen = ip->ip_hl << 2;
|
1994-05-24 10:09:53 +00:00
|
|
|
m->m_data += hlen;
|
|
|
|
m->m_len -= hlen;
|
|
|
|
icp = mtod(m, struct icmp *);
|
|
|
|
icp->icmp_cksum = 0;
|
2012-10-22 21:09:03 +00:00
|
|
|
icp->icmp_cksum = in_cksum(m, ntohs(ip->ip_len) - hlen);
|
1994-05-24 10:09:53 +00:00
|
|
|
m->m_data -= hlen;
|
|
|
|
m->m_len += hlen;
|
1999-03-06 23:10:42 +00:00
|
|
|
m->m_pkthdr.rcvif = (struct ifnet *)0;
|
1994-05-24 10:09:53 +00:00
|
|
|
#ifdef ICMPPRINTFS
|
1995-08-29 17:49:04 +00:00
|
|
|
if (icmpprintfs) {
|
|
|
|
char buf[4 * sizeof "123"];
|
|
|
|
strcpy(buf, inet_ntoa(ip->ip_dst));
|
|
|
|
printf("icmp_send dst %s src %s\n",
|
|
|
|
buf, inet_ntoa(ip->ip_src));
|
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
#endif
|
2003-11-14 21:48:57 +00:00
|
|
|
(void) ip_output(m, opts, NULL, 0, NULL, NULL);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
2009-02-13 15:14:43 +00:00
|
|
|
/*
|
2014-12-21 05:07:11 +00:00
|
|
|
* Return milliseconds since 00:00 UTC in network format.
|
2009-02-13 15:14:43 +00:00
|
|
|
*/
|
|
|
|
uint32_t
|
2007-05-10 15:58:48 +00:00
|
|
|
iptime(void)
|
1994-05-24 10:09:53 +00:00
|
|
|
{
|
|
|
|
struct timeval atv;
|
|
|
|
u_long t;
|
|
|
|
|
2000-12-16 21:39:48 +00:00
|
|
|
getmicrotime(&atv);
|
1994-05-24 10:09:53 +00:00
|
|
|
t = (atv.tv_sec % (24*60*60)) * 1000 + atv.tv_usec / 1000;
|
|
|
|
return (htonl(t));
|
|
|
|
}
|
|
|
|
|
1995-09-18 15:51:40 +00:00
|
|
|
/*
|
|
|
|
* Return the next larger or smaller MTU plateau (table from RFC 1191)
|
|
|
|
* given current value MTU. If DIR is less than zero, a larger plateau
|
|
|
|
* is returned; otherwise, a smaller value is returned.
|
|
|
|
*/
|
2005-04-21 14:29:34 +00:00
|
|
|
int
|
2007-05-10 15:58:48 +00:00
|
|
|
ip_next_mtu(int mtu, int dir)
|
1995-09-18 15:51:40 +00:00
|
|
|
{
|
|
|
|
static int mtutab[] = {
|
2005-05-04 13:23:54 +00:00
|
|
|
65535, 32000, 17914, 8166, 4352, 2002, 1492, 1280, 1006, 508,
|
|
|
|
296, 68, 0
|
1995-09-18 15:51:40 +00:00
|
|
|
};
|
2006-01-23 17:06:32 +00:00
|
|
|
int i, size;
|
1995-09-18 15:51:40 +00:00
|
|
|
|
2006-01-23 17:06:32 +00:00
|
|
|
size = (sizeof mtutab) / (sizeof mtutab[0]);
|
|
|
|
if (dir >= 0) {
|
2006-01-23 20:10:49 +00:00
|
|
|
for (i = 0; i < size; i++)
|
2006-01-23 17:06:32 +00:00
|
|
|
if (mtu > mtutab[i])
|
|
|
|
return mtutab[i];
|
1995-09-18 15:51:40 +00:00
|
|
|
} else {
|
2006-01-23 17:06:32 +00:00
|
|
|
for (i = size - 1; i >= 0; i--)
|
|
|
|
if (mtu < mtutab[i])
|
|
|
|
return mtutab[i];
|
|
|
|
if (mtu == mtutab[0])
|
|
|
|
return mtutab[0];
|
1995-09-18 15:51:40 +00:00
|
|
|
}
|
2006-01-23 17:06:32 +00:00
|
|
|
return 0;
|
1995-09-18 15:51:40 +00:00
|
|
|
}
|
2011-04-27 19:36:35 +00:00
|
|
|
#endif /* INET */
|
1998-12-03 20:23:21 +00:00
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* badport_bandlim() - check for ICMP bandwidth limit
|
|
|
|
*
|
|
|
|
* Return 0 if it is ok to send an ICMP error response, -1 if we have
|
2004-08-16 18:32:07 +00:00
|
|
|
* hit our bandwidth limit and it is not ok.
|
1998-12-03 20:23:21 +00:00
|
|
|
*
|
|
|
|
* If icmplim is <= 0, the feature is disabled and 0 is returned.
|
|
|
|
*
|
|
|
|
* For now we separate the TCP and UDP subsystems w/ different 'which'
|
|
|
|
* values. We may eventually remove this separation (and simplify the
|
|
|
|
* code further).
|
|
|
|
*
|
|
|
|
* Note that the printing of the error message is delayed so we can
|
|
|
|
* properly print the icmp error rate that the system was trying to do
|
|
|
|
* (i.e. 22000/100 pps, etc...). This can cause long delays in printing
|
2004-08-16 18:32:07 +00:00
|
|
|
* the 'final' error, but it doesn't make sense to solve the printing
|
1998-12-03 20:23:21 +00:00
|
|
|
* delay with more complex code.
|
|
|
|
*/
|
|
|
|
|
|
|
|
int
|
|
|
|
badport_bandlim(int which)
|
|
|
|
{
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
|
2002-12-21 00:08:20 +00:00
|
|
|
#define N(a) (sizeof (a) / sizeof (a[0]))
|
|
|
|
static struct rate {
|
|
|
|
const char *type;
|
|
|
|
struct timeval lasttime;
|
2004-07-13 16:06:19 +00:00
|
|
|
int curpps;
|
2002-12-21 00:08:20 +00:00
|
|
|
} rates[BANDLIM_MAX+1] = {
|
|
|
|
{ "icmp unreach response" },
|
|
|
|
{ "icmp ping response" },
|
|
|
|
{ "icmp tstamp response" },
|
|
|
|
{ "closed port RST response" },
|
2007-07-19 22:34:25 +00:00
|
|
|
{ "open port RST response" },
|
2012-06-18 17:11:24 +00:00
|
|
|
{ "icmp6 unreach response" },
|
|
|
|
{ "sctp ootb response" }
|
2002-12-21 00:08:20 +00:00
|
|
|
};
|
1998-12-03 20:23:21 +00:00
|
|
|
|
|
|
|
/*
|
2002-12-21 00:08:20 +00:00
|
|
|
* Return ok status if feature disabled or argument out of range.
|
1998-12-03 20:23:21 +00:00
|
|
|
*/
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (V_icmplim > 0 && (u_int) which < N(rates)) {
|
2002-12-21 00:08:20 +00:00
|
|
|
struct rate *r = &rates[which];
|
|
|
|
int opps = r->curpps;
|
1998-12-03 20:23:21 +00:00
|
|
|
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (!ppsratecheck(&r->lasttime, &r->curpps, V_icmplim))
|
2002-12-21 00:08:20 +00:00
|
|
|
return -1; /* discard packet */
|
|
|
|
/*
|
|
|
|
* If we've dropped below the threshold after having
|
|
|
|
* rate-limited traffic print the message. This preserves
|
|
|
|
* the previous behaviour at the expense of added complexity.
|
|
|
|
*/
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
if (V_icmplim_output && opps > V_icmplim)
|
2010-08-14 21:04:27 +00:00
|
|
|
log(LOG_NOTICE, "Limiting %s from %d to %d packets/sec\n",
|
Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
|
|
|
r->type, opps, V_icmplim);
|
1998-12-03 20:23:21 +00:00
|
|
|
}
|
2002-12-21 00:08:20 +00:00
|
|
|
return 0; /* okay to send packet */
|
|
|
|
#undef N
|
1998-12-03 20:23:21 +00:00
|
|
|
}
|