2009-04-19 00:16:04 +00:00
|
|
|
/**************************************************************************
|
|
|
|
|
2010-03-12 19:26:45 +00:00
|
|
|
Copyright (c) 2008-2010, BitGravity Inc.
|
2009-04-19 00:16:04 +00:00
|
|
|
All rights reserved.
|
|
|
|
|
|
|
|
Redistribution and use in source and binary forms, with or without
|
|
|
|
modification, are permitted provided that the following conditions are met:
|
|
|
|
|
|
|
|
1. Redistributions of source code must retain the above copyright notice,
|
|
|
|
this list of conditions and the following disclaimer.
|
|
|
|
|
|
|
|
2. Neither the name of the BitGravity Corporation nor the names of its
|
|
|
|
contributors may be used to endorse or promote products derived from
|
|
|
|
this software without specific prior written permission.
|
|
|
|
|
|
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
|
|
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
|
|
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
|
|
POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
|
|
|
|
***************************************************************************/
|
|
|
|
|
|
|
|
#include "opt_route.h"
|
|
|
|
#include "opt_mpath.h"
|
2009-08-18 20:28:58 +00:00
|
|
|
#include "opt_ddb.h"
|
2010-03-12 05:03:26 +00:00
|
|
|
#include "opt_inet.h"
|
|
|
|
#include "opt_inet6.h"
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
2014-02-07 10:05:12 +00:00
|
|
|
#include <sys/param.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/bitstring.h>
|
2009-08-18 20:28:58 +00:00
|
|
|
#include <sys/condvar.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <sys/callout.h>
|
2012-09-04 12:07:33 +00:00
|
|
|
#include <sys/hash.h>
|
2014-02-07 10:05:12 +00:00
|
|
|
#include <sys/kernel.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <sys/kthread.h>
|
|
|
|
#include <sys/limits.h>
|
|
|
|
#include <sys/malloc.h>
|
|
|
|
#include <sys/mbuf.h>
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#include <sys/pcpu.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <sys/proc.h>
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#include <sys/queue.h>
|
2010-03-12 05:03:26 +00:00
|
|
|
#include <sys/sbuf.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <sys/sched.h>
|
|
|
|
#include <sys/smp.h>
|
|
|
|
#include <sys/socket.h>
|
|
|
|
#include <sys/syslog.h>
|
|
|
|
#include <sys/sysctl.h>
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#include <vm/uma.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
#include <net/if.h>
|
|
|
|
#include <net/if_llatbl.h>
|
|
|
|
#include <net/if_var.h>
|
2014-02-07 10:05:12 +00:00
|
|
|
#include <net/route.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <net/flowtable.h>
|
Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:
- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.
Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)
2009-07-23 20:46:49 +00:00
|
|
|
#include <net/vnet.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
#include <netinet/in.h>
|
|
|
|
#include <netinet/in_systm.h>
|
|
|
|
#include <netinet/in_var.h>
|
|
|
|
#include <netinet/if_ether.h>
|
|
|
|
#include <netinet/ip.h>
|
2010-03-12 05:03:26 +00:00
|
|
|
#ifdef INET6
|
|
|
|
#include <netinet/ip6.h>
|
|
|
|
#endif
|
2009-04-19 00:16:04 +00:00
|
|
|
#include <netinet/tcp.h>
|
|
|
|
#include <netinet/udp.h>
|
|
|
|
#include <netinet/sctp.h>
|
|
|
|
|
2009-08-18 20:28:58 +00:00
|
|
|
#include <ddb/ddb.h>
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#ifdef INET
|
2009-04-19 00:16:04 +00:00
|
|
|
struct ipv4_tuple {
|
|
|
|
uint16_t ip_sport; /* source port */
|
|
|
|
uint16_t ip_dport; /* destination port */
|
|
|
|
in_addr_t ip_saddr; /* source address */
|
|
|
|
in_addr_t ip_daddr; /* destination address */
|
|
|
|
};
|
|
|
|
|
|
|
|
union ipv4_flow {
|
|
|
|
struct ipv4_tuple ipf_ipt;
|
|
|
|
uint32_t ipf_key[3];
|
|
|
|
};
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#endif
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#ifdef INET6
|
2009-04-19 00:16:04 +00:00
|
|
|
struct ipv6_tuple {
|
|
|
|
uint16_t ip_sport; /* source port */
|
|
|
|
uint16_t ip_dport; /* destination port */
|
|
|
|
struct in6_addr ip_saddr; /* source address */
|
|
|
|
struct in6_addr ip_daddr; /* destination address */
|
|
|
|
};
|
|
|
|
|
|
|
|
union ipv6_flow {
|
|
|
|
struct ipv6_tuple ipf_ipt;
|
|
|
|
uint32_t ipf_key[9];
|
|
|
|
};
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#endif
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
struct flentry {
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
uint32_t f_fhash; /* hash flowing forward */
|
2009-04-19 00:16:04 +00:00
|
|
|
uint16_t f_flags; /* flow flags */
|
2014-02-08 22:10:53 +00:00
|
|
|
uint8_t f_pad;
|
2009-04-19 00:16:04 +00:00
|
|
|
uint8_t f_proto; /* protocol */
|
2009-08-18 20:28:58 +00:00
|
|
|
uint32_t f_fibnum; /* fib index */
|
2009-04-19 00:16:04 +00:00
|
|
|
uint32_t f_uptime; /* uptime at last access */
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SLIST_ENTRY(flentry) f_next; /* pointer to collision entry */
|
|
|
|
struct rtentry *f_rt; /* rtentry for flow */
|
|
|
|
struct llentry *f_lle; /* llentry for flow */
|
|
|
|
union {
|
|
|
|
#ifdef INET
|
|
|
|
union ipv4_flow v4;
|
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
|
|
|
union ipv6_flow v6;
|
|
|
|
#endif
|
|
|
|
} f_flow;
|
|
|
|
#define f_flow4 f_flow.v4
|
|
|
|
#define f_flow6 f_flow.v6
|
2009-04-19 00:16:04 +00:00
|
|
|
};
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#define KEYLEN(flags) ((((flags) & FL_IPV6) ? 9 : 3) * 4)
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
/* Make sure f_flow begins with key. */
|
|
|
|
#ifdef INET
|
|
|
|
CTASSERT(offsetof(struct flentry, f_flow) ==
|
|
|
|
offsetof(struct flentry, f_flow4.ipf_key));
|
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
|
|
|
CTASSERT(offsetof(struct flentry, f_flow) ==
|
|
|
|
offsetof(struct flentry, f_flow6.ipf_key));
|
|
|
|
#endif
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SLIST_HEAD(flist, flentry);
|
|
|
|
/* Make sure we can use pcpu_zone_ptr for struct flist. */
|
|
|
|
CTASSERT(sizeof(struct flist) == sizeof(void *));
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
#define SECS_PER_HOUR 3600
|
|
|
|
#define SECS_PER_DAY (24*SECS_PER_HOUR)
|
|
|
|
|
|
|
|
#define SYN_IDLE 300
|
|
|
|
#define UDP_IDLE 300
|
|
|
|
#define FIN_WAIT_IDLE 600
|
|
|
|
#define TCP_IDLE SECS_PER_DAY
|
|
|
|
|
|
|
|
struct flowtable {
|
2014-02-07 15:18:23 +00:00
|
|
|
counter_u64_t *ft_stat;
|
2009-04-19 00:16:04 +00:00
|
|
|
int ft_size;
|
|
|
|
uint32_t ft_flags;
|
2014-02-07 15:18:23 +00:00
|
|
|
uint32_t ft_max_depth;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
|
2010-03-22 23:04:12 +00:00
|
|
|
/*
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
* ft_table is a malloc(9)ed array of pointers. Pointers point to
|
|
|
|
* memory from UMA_ZONE_PCPU zone.
|
|
|
|
* ft_masks is per-cpu pointer itself. Each instance points
|
|
|
|
* to a malloc(9)ed bitset, that is private to corresponding CPU.
|
2014-02-07 10:05:12 +00:00
|
|
|
*/
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
struct flist **ft_table;
|
|
|
|
bitstr_t **ft_masks;
|
2009-04-20 16:16:43 +00:00
|
|
|
bitstr_t *ft_tmpmask;
|
2010-03-22 23:04:12 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
uint32_t ft_udp_idle;
|
2010-03-22 23:04:12 +00:00
|
|
|
uint32_t ft_fin_wait_idle;
|
|
|
|
uint32_t ft_syn_idle;
|
|
|
|
uint32_t ft_tcp_idle;
|
|
|
|
boolean_t ft_full;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
};
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
#define FLOWSTAT_ADD(ft, name, v) \
|
|
|
|
counter_u64_add((ft)->ft_stat[offsetof(struct flowtable_stat, name) / sizeof(uint64_t)], (v))
|
|
|
|
#define FLOWSTAT_INC(ft, name) FLOWSTAT_ADD(ft, name, 1)
|
2009-06-22 21:19:24 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
static struct proc *flowcleanerproc;
|
|
|
|
static uint32_t flow_hashjitter;
|
2010-03-22 23:04:12 +00:00
|
|
|
|
2010-12-31 21:06:52 +00:00
|
|
|
static struct cv flowclean_f_cv;
|
|
|
|
static struct cv flowclean_c_cv;
|
2009-08-18 20:28:58 +00:00
|
|
|
static struct mtx flowclean_lock;
|
|
|
|
static uint32_t flowclean_cycles;
|
2010-03-22 23:04:12 +00:00
|
|
|
static uint32_t flowclean_freq;
|
2009-08-18 20:28:58 +00:00
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
/*
|
|
|
|
* TODO:
|
2014-02-07 10:05:12 +00:00
|
|
|
* - add sysctls to resize && flush flow tables
|
2009-04-19 04:39:42 +00:00
|
|
|
* - Add per flowtable sysctls for statistics and configuring timeouts
|
2009-04-19 00:16:04 +00:00
|
|
|
* - add saturation counter to rtentry to support per-packet load-balancing
|
|
|
|
* add flag to indicate round-robin flow, add list lookup from head
|
|
|
|
for flows
|
|
|
|
* - add sysctl / device node / syscall to support exporting and importing
|
|
|
|
* of flows with flag to indicate that a flow was imported so should
|
|
|
|
* not be considered for auto-cleaning
|
|
|
|
* - support explicit connection state (currently only ad-hoc for DSR)
|
2009-06-22 21:19:24 +00:00
|
|
|
* - idetach() cleanup for options VIMAGE builds.
|
2009-04-19 00:16:04 +00:00
|
|
|
*/
|
2014-02-07 15:18:23 +00:00
|
|
|
#ifdef INET
|
|
|
|
static VNET_DEFINE(struct flowtable, ip4_ft);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#define V_ip4_ft VNET(ip4_ft)
|
2014-02-07 15:18:23 +00:00
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
|
|
|
static VNET_DEFINE(struct flowtable, ip6_ft);
|
|
|
|
#define V_ip6_ft VNET(ip6_ft)
|
|
|
|
#endif
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
static uma_zone_t flow_zone;
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
static VNET_DEFINE(int, flowtable_enable) = 1;
|
2010-11-22 19:32:54 +00:00
|
|
|
static VNET_DEFINE(int, flowtable_syn_expire) = SYN_IDLE;
|
|
|
|
static VNET_DEFINE(int, flowtable_udp_expire) = UDP_IDLE;
|
|
|
|
static VNET_DEFINE(int, flowtable_fin_wait_expire) = FIN_WAIT_IDLE;
|
|
|
|
static VNET_DEFINE(int, flowtable_tcp_expire) = TCP_IDLE;
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
|
2009-07-16 21:13:04 +00:00
|
|
|
#define V_flowtable_enable VNET(flowtable_enable)
|
|
|
|
#define V_flowtable_syn_expire VNET(flowtable_syn_expire)
|
|
|
|
#define V_flowtable_udp_expire VNET(flowtable_udp_expire)
|
|
|
|
#define V_flowtable_fin_wait_expire VNET(flowtable_fin_wait_expire)
|
|
|
|
#define V_flowtable_tcp_expire VNET(flowtable_tcp_expire)
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
static SYSCTL_NODE(_net, OID_AUTO, flowtable, CTLFLAG_RD, NULL,
|
2011-11-07 15:43:11 +00:00
|
|
|
"flowtable");
|
2014-02-07 15:18:23 +00:00
|
|
|
SYSCTL_VNET_INT(_net_flowtable, OID_AUTO, enable, CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(flowtable_enable), 0, "enable flowtable caching.");
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SYSCTL_UMA_MAX(_net_flowtable, OID_AUTO, maxflows, CTLFLAG_RW,
|
|
|
|
&flow_zone, "Maximum number of flows allowed");
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* XXX This does not end up updating timeouts at runtime
|
|
|
|
* and only reflects the value for the last table added :-/
|
|
|
|
*/
|
2014-02-07 15:18:23 +00:00
|
|
|
SYSCTL_VNET_INT(_net_flowtable, OID_AUTO, syn_expire, CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(flowtable_syn_expire), 0,
|
2009-06-22 21:19:24 +00:00
|
|
|
"seconds after which to remove syn allocated flow.");
|
2014-02-07 15:18:23 +00:00
|
|
|
SYSCTL_VNET_INT(_net_flowtable, OID_AUTO, udp_expire, CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(flowtable_udp_expire), 0,
|
2009-06-22 21:19:24 +00:00
|
|
|
"seconds after which to remove flow allocated to UDP.");
|
2014-02-07 15:18:23 +00:00
|
|
|
SYSCTL_VNET_INT(_net_flowtable, OID_AUTO, fin_wait_expire, CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(flowtable_fin_wait_expire), 0,
|
2009-06-22 21:19:24 +00:00
|
|
|
"seconds after which to remove a flow in FIN_WAIT.");
|
2014-02-07 15:18:23 +00:00
|
|
|
SYSCTL_VNET_INT(_net_flowtable, OID_AUTO, tcp_expire, CTLFLAG_RW,
|
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
2009-07-14 22:48:30 +00:00
|
|
|
&VNET_NAME(flowtable_tcp_expire), 0,
|
2009-06-22 21:19:24 +00:00
|
|
|
"seconds after which to remove flow allocated to a TCP connection.");
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
#define FL_STALE (1<<8)
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
static MALLOC_DEFINE(M_FTABLE, "flowtable", "flowtable hashes and bitstrings");
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
static struct flentry *flowtable_lookup_common(struct flowtable *,
|
|
|
|
struct sockaddr_storage *, struct sockaddr_storage *, struct mbuf *, int);
|
2010-03-12 05:03:26 +00:00
|
|
|
|
|
|
|
static __inline int
|
|
|
|
proto_to_flags(uint8_t proto)
|
|
|
|
{
|
|
|
|
int flag;
|
|
|
|
|
|
|
|
switch (proto) {
|
|
|
|
case IPPROTO_TCP:
|
|
|
|
flag = FL_TCP;
|
|
|
|
break;
|
|
|
|
case IPPROTO_SCTP:
|
|
|
|
flag = FL_SCTP;
|
2014-02-08 22:10:53 +00:00
|
|
|
break;
|
2010-03-12 05:03:26 +00:00
|
|
|
case IPPROTO_UDP:
|
|
|
|
flag = FL_UDP;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
flag = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (flag);
|
|
|
|
}
|
|
|
|
|
|
|
|
static __inline int
|
|
|
|
flags_to_proto(int flags)
|
|
|
|
{
|
|
|
|
int proto, protoflags;
|
|
|
|
|
|
|
|
protoflags = flags & (FL_TCP|FL_SCTP|FL_UDP);
|
|
|
|
switch (protoflags) {
|
|
|
|
case FL_TCP:
|
|
|
|
proto = IPPROTO_TCP;
|
|
|
|
break;
|
|
|
|
case FL_SCTP:
|
|
|
|
proto = IPPROTO_SCTP;
|
|
|
|
break;
|
|
|
|
case FL_UDP:
|
|
|
|
proto = IPPROTO_UDP;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
proto = 0;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return (proto);
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef INET
|
|
|
|
static int
|
2014-02-07 15:18:23 +00:00
|
|
|
ipv4_mbuf_demarshal(struct mbuf *m, struct sockaddr_in *ssin,
|
|
|
|
struct sockaddr_in *dsin, uint16_t *flags)
|
2010-03-12 05:03:26 +00:00
|
|
|
{
|
|
|
|
struct ip *ip;
|
|
|
|
uint8_t proto;
|
2009-04-19 00:16:04 +00:00
|
|
|
int iphlen;
|
|
|
|
struct tcphdr *th;
|
|
|
|
struct udphdr *uh;
|
|
|
|
struct sctphdr *sh;
|
2010-03-12 05:03:26 +00:00
|
|
|
uint16_t sport, dport;
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
proto = sport = dport = 0;
|
|
|
|
ip = mtod(m, struct ip *);
|
|
|
|
dsin->sin_family = AF_INET;
|
|
|
|
dsin->sin_len = sizeof(*dsin);
|
|
|
|
dsin->sin_addr = ip->ip_dst;
|
|
|
|
ssin->sin_family = AF_INET;
|
|
|
|
ssin->sin_len = sizeof(*ssin);
|
2014-02-08 22:10:53 +00:00
|
|
|
ssin->sin_addr = ip->ip_src;
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
proto = ip->ip_p;
|
2014-02-07 15:18:23 +00:00
|
|
|
if ((*flags & FL_HASH_ALL) == 0)
|
2009-04-19 00:16:04 +00:00
|
|
|
goto skipports;
|
|
|
|
|
|
|
|
iphlen = ip->ip_hl << 2; /* XXX options? */
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
switch (proto) {
|
|
|
|
case IPPROTO_TCP:
|
|
|
|
th = (struct tcphdr *)((caddr_t)ip + iphlen);
|
2010-03-12 05:03:26 +00:00
|
|
|
sport = th->th_sport;
|
|
|
|
dport = th->th_dport;
|
|
|
|
if ((*flags & FL_HASH_ALL) &&
|
|
|
|
(th->th_flags & (TH_RST|TH_FIN)))
|
2009-04-19 00:16:04 +00:00
|
|
|
*flags |= FL_STALE;
|
2014-02-07 10:05:12 +00:00
|
|
|
break;
|
2009-04-19 00:16:04 +00:00
|
|
|
case IPPROTO_UDP:
|
|
|
|
uh = (struct udphdr *)((caddr_t)ip + iphlen);
|
|
|
|
sport = uh->uh_sport;
|
|
|
|
dport = uh->uh_dport;
|
2014-02-07 10:05:12 +00:00
|
|
|
break;
|
2009-04-19 00:16:04 +00:00
|
|
|
case IPPROTO_SCTP:
|
|
|
|
sh = (struct sctphdr *)((caddr_t)ip + iphlen);
|
|
|
|
sport = sh->src_port;
|
|
|
|
dport = sh->dest_port;
|
2014-02-07 10:05:12 +00:00
|
|
|
break;
|
2009-04-19 00:16:04 +00:00
|
|
|
default:
|
2010-03-12 05:03:26 +00:00
|
|
|
return (ENOTSUP);
|
2009-04-19 00:16:04 +00:00
|
|
|
/* no port - hence not a protocol we care about */
|
2010-01-07 21:01:37 +00:00
|
|
|
break;
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
skipports:
|
|
|
|
*flags |= proto_to_flags(proto);
|
|
|
|
ssin->sin_port = sport;
|
|
|
|
dsin->sin_port = dport;
|
|
|
|
return (0);
|
|
|
|
}
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
static uint32_t
|
2014-02-07 15:18:23 +00:00
|
|
|
ipv4_flow_lookup_hash(
|
2014-02-07 10:05:12 +00:00
|
|
|
struct sockaddr_in *ssin, struct sockaddr_in *dsin,
|
2010-03-12 05:03:26 +00:00
|
|
|
uint32_t *key, uint16_t flags)
|
|
|
|
{
|
|
|
|
uint16_t sport, dport;
|
|
|
|
uint8_t proto;
|
|
|
|
int offset = 0;
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
proto = flags_to_proto(flags);
|
|
|
|
sport = dport = key[2] = key[1] = key[0] = 0;
|
|
|
|
if ((ssin != NULL) && (flags & FL_HASH_ALL)) {
|
|
|
|
key[1] = ssin->sin_addr.s_addr;
|
|
|
|
sport = ssin->sin_port;
|
|
|
|
}
|
|
|
|
if (dsin != NULL) {
|
|
|
|
key[2] = dsin->sin_addr.s_addr;
|
|
|
|
dport = dsin->sin_port;
|
|
|
|
}
|
|
|
|
if (flags & FL_HASH_ALL) {
|
|
|
|
((uint16_t *)key)[0] = sport;
|
2014-02-07 10:05:12 +00:00
|
|
|
((uint16_t *)key)[1] = dport;
|
2010-03-12 05:03:26 +00:00
|
|
|
} else
|
2014-02-07 15:18:23 +00:00
|
|
|
offset = flow_hashjitter + proto;
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2012-09-04 12:07:33 +00:00
|
|
|
return (jenkins_hash32(key, 3, offset));
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct flentry *
|
2014-02-07 15:18:23 +00:00
|
|
|
flowtable_lookup_ipv4(struct mbuf *m)
|
2010-03-12 05:03:26 +00:00
|
|
|
{
|
|
|
|
struct sockaddr_storage ssa, dsa;
|
|
|
|
uint16_t flags;
|
|
|
|
struct sockaddr_in *dsin, *ssin;
|
|
|
|
|
|
|
|
dsin = (struct sockaddr_in *)&dsa;
|
|
|
|
ssin = (struct sockaddr_in *)&ssa;
|
2010-03-12 10:24:58 +00:00
|
|
|
bzero(dsin, sizeof(*dsin));
|
|
|
|
bzero(ssin, sizeof(*ssin));
|
2014-02-07 15:18:23 +00:00
|
|
|
flags = V_ip4_ft.ft_flags;
|
|
|
|
if (ipv4_mbuf_demarshal(m, ssin, dsin, &flags) != 0)
|
2010-03-12 05:03:26 +00:00
|
|
|
return (NULL);
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
return (flowtable_lookup_common(&V_ip4_ft, &ssa, &dsa, m, flags));
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
flow_to_route(struct flentry *fle, struct route *ro)
|
|
|
|
{
|
|
|
|
uint32_t *hashkey = NULL;
|
|
|
|
struct sockaddr_in *sin;
|
|
|
|
|
|
|
|
sin = (struct sockaddr_in *)&ro->ro_dst;
|
|
|
|
sin->sin_family = AF_INET;
|
|
|
|
sin->sin_len = sizeof(*sin);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
hashkey = fle->f_flow4.ipf_key;
|
2010-03-12 05:03:26 +00:00
|
|
|
sin->sin_addr.s_addr = hashkey[2];
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
ro->ro_rt = fle->f_rt;
|
|
|
|
ro->ro_lle = fle->f_lle;
|
2012-07-04 07:37:53 +00:00
|
|
|
ro->ro_flags |= RT_NORTREF;
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
#endif /* INET */
|
|
|
|
|
|
|
|
#ifdef INET6
|
|
|
|
/*
|
|
|
|
* PULLUP_TO(len, p, T) makes sure that len + sizeof(T) is contiguous,
|
|
|
|
* then it sets p to point at the offset "len" in the mbuf. WARNING: the
|
|
|
|
* pointer might become stale after other pullups (but we never use it
|
|
|
|
* this way).
|
|
|
|
*/
|
|
|
|
#define PULLUP_TO(_len, p, T) \
|
|
|
|
do { \
|
|
|
|
int x = (_len) + sizeof(T); \
|
|
|
|
if ((m)->m_len < x) { \
|
|
|
|
goto receive_failed; \
|
|
|
|
} \
|
|
|
|
p = (mtod(m, char *) + (_len)); \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
#define TCP(p) ((struct tcphdr *)(p))
|
|
|
|
#define SCTP(p) ((struct sctphdr *)(p))
|
|
|
|
#define UDP(p) ((struct udphdr *)(p))
|
|
|
|
|
|
|
|
static int
|
2014-02-07 15:18:23 +00:00
|
|
|
ipv6_mbuf_demarshal(struct mbuf *m, struct sockaddr_in6 *ssin6,
|
|
|
|
struct sockaddr_in6 *dsin6, uint16_t *flags)
|
2010-03-12 05:03:26 +00:00
|
|
|
{
|
|
|
|
struct ip6_hdr *ip6;
|
|
|
|
uint8_t proto;
|
|
|
|
int hlen;
|
|
|
|
uint16_t src_port, dst_port;
|
|
|
|
u_short offset;
|
|
|
|
void *ulp;
|
|
|
|
|
|
|
|
offset = hlen = src_port = dst_port = 0;
|
|
|
|
ulp = NULL;
|
|
|
|
ip6 = mtod(m, struct ip6_hdr *);
|
|
|
|
hlen = sizeof(struct ip6_hdr);
|
|
|
|
proto = ip6->ip6_nxt;
|
|
|
|
|
|
|
|
if ((*flags & FL_HASH_ALL) == 0)
|
|
|
|
goto skipports;
|
|
|
|
|
|
|
|
while (ulp == NULL) {
|
|
|
|
switch (proto) {
|
|
|
|
case IPPROTO_ICMPV6:
|
|
|
|
case IPPROTO_OSPFIGP:
|
|
|
|
case IPPROTO_PIM:
|
|
|
|
case IPPROTO_CARP:
|
|
|
|
case IPPROTO_ESP:
|
|
|
|
case IPPROTO_NONE:
|
|
|
|
ulp = ip6;
|
|
|
|
break;
|
|
|
|
case IPPROTO_TCP:
|
|
|
|
PULLUP_TO(hlen, ulp, struct tcphdr);
|
|
|
|
dst_port = TCP(ulp)->th_dport;
|
|
|
|
src_port = TCP(ulp)->th_sport;
|
|
|
|
if ((*flags & FL_HASH_ALL) &&
|
|
|
|
(TCP(ulp)->th_flags & (TH_RST|TH_FIN)))
|
|
|
|
*flags |= FL_STALE;
|
|
|
|
break;
|
|
|
|
case IPPROTO_SCTP:
|
|
|
|
PULLUP_TO(hlen, ulp, struct sctphdr);
|
|
|
|
src_port = SCTP(ulp)->src_port;
|
|
|
|
dst_port = SCTP(ulp)->dest_port;
|
|
|
|
break;
|
|
|
|
case IPPROTO_UDP:
|
|
|
|
PULLUP_TO(hlen, ulp, struct udphdr);
|
|
|
|
dst_port = UDP(ulp)->uh_dport;
|
|
|
|
src_port = UDP(ulp)->uh_sport;
|
|
|
|
break;
|
|
|
|
case IPPROTO_HOPOPTS: /* RFC 2460 */
|
|
|
|
PULLUP_TO(hlen, ulp, struct ip6_hbh);
|
|
|
|
hlen += (((struct ip6_hbh *)ulp)->ip6h_len + 1) << 3;
|
|
|
|
proto = ((struct ip6_hbh *)ulp)->ip6h_nxt;
|
|
|
|
ulp = NULL;
|
|
|
|
break;
|
|
|
|
case IPPROTO_ROUTING: /* RFC 2460 */
|
2014-02-08 22:10:53 +00:00
|
|
|
PULLUP_TO(hlen, ulp, struct ip6_rthdr);
|
2010-03-12 05:03:26 +00:00
|
|
|
hlen += (((struct ip6_rthdr *)ulp)->ip6r_len + 1) << 3;
|
|
|
|
proto = ((struct ip6_rthdr *)ulp)->ip6r_nxt;
|
|
|
|
ulp = NULL;
|
|
|
|
break;
|
|
|
|
case IPPROTO_FRAGMENT: /* RFC 2460 */
|
|
|
|
PULLUP_TO(hlen, ulp, struct ip6_frag);
|
|
|
|
hlen += sizeof (struct ip6_frag);
|
|
|
|
proto = ((struct ip6_frag *)ulp)->ip6f_nxt;
|
|
|
|
offset = ((struct ip6_frag *)ulp)->ip6f_offlg &
|
|
|
|
IP6F_OFF_MASK;
|
|
|
|
ulp = NULL;
|
|
|
|
break;
|
|
|
|
case IPPROTO_DSTOPTS: /* RFC 2460 */
|
|
|
|
PULLUP_TO(hlen, ulp, struct ip6_hbh);
|
|
|
|
hlen += (((struct ip6_hbh *)ulp)->ip6h_len + 1) << 3;
|
|
|
|
proto = ((struct ip6_hbh *)ulp)->ip6h_nxt;
|
|
|
|
ulp = NULL;
|
|
|
|
break;
|
|
|
|
case IPPROTO_AH: /* RFC 2402 */
|
|
|
|
PULLUP_TO(hlen, ulp, struct ip6_ext);
|
|
|
|
hlen += (((struct ip6_ext *)ulp)->ip6e_len + 2) << 2;
|
|
|
|
proto = ((struct ip6_ext *)ulp)->ip6e_nxt;
|
|
|
|
ulp = NULL;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
PULLUP_TO(hlen, ulp, struct ip6_ext);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (src_port == 0) {
|
|
|
|
receive_failed:
|
|
|
|
return (ENOTSUP);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
skipports:
|
|
|
|
dsin6->sin6_family = AF_INET6;
|
|
|
|
dsin6->sin6_len = sizeof(*dsin6);
|
|
|
|
dsin6->sin6_port = dst_port;
|
|
|
|
memcpy(&dsin6->sin6_addr, &ip6->ip6_dst, sizeof(struct in6_addr));
|
|
|
|
|
|
|
|
ssin6->sin6_family = AF_INET6;
|
|
|
|
ssin6->sin6_len = sizeof(*ssin6);
|
|
|
|
ssin6->sin6_port = src_port;
|
|
|
|
memcpy(&ssin6->sin6_addr, &ip6->ip6_src, sizeof(struct in6_addr));
|
|
|
|
*flags |= proto_to_flags(proto);
|
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
#define zero_key(key) \
|
|
|
|
do { \
|
|
|
|
key[0] = 0; \
|
|
|
|
key[1] = 0; \
|
|
|
|
key[2] = 0; \
|
|
|
|
key[3] = 0; \
|
|
|
|
key[4] = 0; \
|
|
|
|
key[5] = 0; \
|
|
|
|
key[6] = 0; \
|
|
|
|
key[7] = 0; \
|
|
|
|
key[8] = 0; \
|
|
|
|
} while (0)
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
static uint32_t
|
2014-02-07 15:18:23 +00:00
|
|
|
ipv6_flow_lookup_hash(
|
2014-02-07 10:05:12 +00:00
|
|
|
struct sockaddr_in6 *ssin6, struct sockaddr_in6 *dsin6,
|
2010-03-12 05:03:26 +00:00
|
|
|
uint32_t *key, uint16_t flags)
|
|
|
|
{
|
|
|
|
uint16_t sport, dport;
|
|
|
|
uint8_t proto;
|
|
|
|
int offset = 0;
|
|
|
|
|
|
|
|
proto = flags_to_proto(flags);
|
|
|
|
zero_key(key);
|
|
|
|
sport = dport = 0;
|
|
|
|
if (dsin6 != NULL) {
|
|
|
|
memcpy(&key[1], &dsin6->sin6_addr, sizeof(struct in6_addr));
|
|
|
|
dport = dsin6->sin6_port;
|
|
|
|
}
|
|
|
|
if ((ssin6 != NULL) && (flags & FL_HASH_ALL)) {
|
|
|
|
memcpy(&key[5], &ssin6->sin6_addr, sizeof(struct in6_addr));
|
|
|
|
sport = ssin6->sin6_port;
|
|
|
|
}
|
|
|
|
if (flags & FL_HASH_ALL) {
|
|
|
|
((uint16_t *)key)[0] = sport;
|
2014-02-07 10:05:12 +00:00
|
|
|
((uint16_t *)key)[1] = dport;
|
2010-03-12 05:03:26 +00:00
|
|
|
} else
|
2014-02-07 15:18:23 +00:00
|
|
|
offset = flow_hashjitter + proto;
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2012-09-04 12:07:33 +00:00
|
|
|
return (jenkins_hash32(key, 9, offset));
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct flentry *
|
2014-02-07 15:18:23 +00:00
|
|
|
flowtable_lookup_ipv6(struct mbuf *m)
|
2010-03-12 05:03:26 +00:00
|
|
|
{
|
|
|
|
struct sockaddr_storage ssa, dsa;
|
2014-02-08 22:10:53 +00:00
|
|
|
struct sockaddr_in6 *dsin6, *ssin6;
|
2010-03-12 05:03:26 +00:00
|
|
|
uint16_t flags;
|
|
|
|
|
|
|
|
dsin6 = (struct sockaddr_in6 *)&dsa;
|
|
|
|
ssin6 = (struct sockaddr_in6 *)&ssa;
|
2010-03-12 10:24:58 +00:00
|
|
|
bzero(dsin6, sizeof(*dsin6));
|
|
|
|
bzero(ssin6, sizeof(*ssin6));
|
2014-02-07 15:18:23 +00:00
|
|
|
flags = V_ip6_ft.ft_flags;
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
if (ipv6_mbuf_demarshal(m, ssin6, dsin6, &flags) != 0)
|
2010-03-12 05:03:26 +00:00
|
|
|
return (NULL);
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
return (flowtable_lookup_common(&V_ip6_ft, &ssa, &dsa, m, flags));
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
flow_to_route_in6(struct flentry *fle, struct route_in6 *ro)
|
|
|
|
{
|
|
|
|
uint32_t *hashkey = NULL;
|
|
|
|
struct sockaddr_in6 *sin6;
|
|
|
|
|
|
|
|
sin6 = (struct sockaddr_in6 *)&ro->ro_dst;
|
|
|
|
|
|
|
|
sin6->sin6_family = AF_INET6;
|
|
|
|
sin6->sin6_len = sizeof(*sin6);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
hashkey = fle->f_flow6.ipf_key;
|
2010-03-12 05:03:26 +00:00
|
|
|
memcpy(&sin6->sin6_addr, &hashkey[5], sizeof (struct in6_addr));
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
ro->ro_rt = fle->f_rt;
|
|
|
|
ro->ro_lle = fle->f_lle;
|
2012-07-04 07:37:53 +00:00
|
|
|
ro->ro_flags |= RT_NORTREF;
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
#endif /* INET6 */
|
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
static bitstr_t *
|
|
|
|
flowtable_mask(struct flowtable *ft)
|
|
|
|
{
|
2009-08-18 20:28:58 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
/*
|
|
|
|
* flowtable_free_stale() calls w/o critical section, but
|
|
|
|
* with sched_bind(). Since pointer is stable throughout
|
|
|
|
* ft lifetime, it is safe, otherwise...
|
|
|
|
*
|
|
|
|
* CRITICAL_ASSERT(curthread);
|
|
|
|
*/
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
return (*(bitstr_t **)zpcpu_get(ft->ft_masks));
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
static struct flist *
|
|
|
|
flowtable_list(struct flowtable *ft, uint32_t hash)
|
2009-04-19 00:16:04 +00:00
|
|
|
{
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
CRITICAL_ASSERT(curthread);
|
|
|
|
return (zpcpu_get(ft->ft_table[hash % ft->ft_size]));
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
flow_stale(struct flowtable *ft, struct flentry *fle)
|
|
|
|
{
|
|
|
|
time_t idle_time;
|
|
|
|
|
|
|
|
if ((fle->f_fhash == 0)
|
|
|
|
|| ((fle->f_rt->rt_flags & RTF_HOST) &&
|
|
|
|
((fle->f_rt->rt_flags & (RTF_UP))
|
|
|
|
!= (RTF_UP)))
|
2010-03-09 01:11:45 +00:00
|
|
|
|| (fle->f_rt->rt_ifp == NULL)
|
|
|
|
|| !RT_LINK_IS_UP(fle->f_rt->rt_ifp))
|
2009-04-19 00:16:04 +00:00
|
|
|
return (1);
|
|
|
|
|
|
|
|
idle_time = time_uptime - fle->f_uptime;
|
|
|
|
|
|
|
|
if ((fle->f_flags & FL_STALE) ||
|
|
|
|
((fle->f_flags & (TH_SYN|TH_ACK|TH_FIN)) == 0
|
|
|
|
&& (idle_time > ft->ft_udp_idle)) ||
|
|
|
|
((fle->f_flags & TH_FIN)
|
|
|
|
&& (idle_time > ft->ft_fin_wait_idle)) ||
|
|
|
|
((fle->f_flags & (TH_SYN|TH_ACK)) == TH_SYN
|
|
|
|
&& (idle_time > ft->ft_syn_idle)) ||
|
|
|
|
((fle->f_flags & (TH_SYN|TH_ACK)) == (TH_SYN|TH_ACK)
|
|
|
|
&& (idle_time > ft->ft_tcp_idle)) ||
|
2014-02-07 10:05:12 +00:00
|
|
|
((fle->f_rt->rt_flags & RTF_UP) == 0 ||
|
2009-04-19 00:16:04 +00:00
|
|
|
(fle->f_rt->rt_ifp == NULL)))
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2010-03-22 23:04:12 +00:00
|
|
|
static int
|
|
|
|
flow_full(struct flowtable *ft)
|
|
|
|
{
|
|
|
|
boolean_t full;
|
2014-02-07 15:18:23 +00:00
|
|
|
int count, max;
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2010-03-22 23:04:12 +00:00
|
|
|
full = ft->ft_full;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
count = uma_zone_get_cur(flow_zone);
|
|
|
|
max = uma_zone_get_max(flow_zone);
|
2010-03-22 23:04:12 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
if (full && (count < (max - (max >> 3))))
|
2010-03-22 23:04:12 +00:00
|
|
|
ft->ft_full = FALSE;
|
2014-02-07 15:18:23 +00:00
|
|
|
else if (!full && (count > (max - (max >> 5))))
|
2010-03-22 23:04:12 +00:00
|
|
|
ft->ft_full = TRUE;
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2010-03-22 23:04:12 +00:00
|
|
|
if (full && !ft->ft_full) {
|
|
|
|
flowclean_freq = 4*hz;
|
|
|
|
if ((ft->ft_flags & FL_HASH_ALL) == 0)
|
|
|
|
ft->ft_udp_idle = ft->ft_fin_wait_idle =
|
|
|
|
ft->ft_syn_idle = ft->ft_tcp_idle = 5;
|
2010-12-31 21:06:52 +00:00
|
|
|
cv_broadcast(&flowclean_c_cv);
|
2010-03-22 23:04:12 +00:00
|
|
|
} else if (!full && ft->ft_full) {
|
|
|
|
flowclean_freq = 20*hz;
|
|
|
|
if ((ft->ft_flags & FL_HASH_ALL) == 0)
|
|
|
|
ft->ft_udp_idle = ft->ft_fin_wait_idle =
|
|
|
|
ft->ft_syn_idle = ft->ft_tcp_idle = 30;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (ft->ft_full);
|
|
|
|
}
|
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
static int
|
2014-02-14 10:56:26 +00:00
|
|
|
flow_matches(struct flentry *fle, uint32_t hash, uint32_t *key, uint8_t
|
|
|
|
proto, uint32_t fibnum)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (fle->f_fhash == hash &&
|
|
|
|
bcmp(&fle->f_flow, key, KEYLEN(fle->f_flags)) == 0 &&
|
|
|
|
proto == fle->f_proto && fibnum == fle->f_fibnum &&
|
|
|
|
(fle->f_rt->rt_flags & RTF_UP) &&
|
|
|
|
fle->f_rt->rt_ifp != NULL &&
|
|
|
|
(fle->f_lle->la_flags & LLE_VALID))
|
|
|
|
return (1);
|
|
|
|
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct flentry *
|
2009-04-19 00:16:04 +00:00
|
|
|
flowtable_insert(struct flowtable *ft, uint32_t hash, uint32_t *key,
|
2010-03-12 05:03:26 +00:00
|
|
|
uint32_t fibnum, struct route *ro, uint16_t flags)
|
2009-04-19 00:16:04 +00:00
|
|
|
{
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
struct flist *flist;
|
|
|
|
struct flentry *fle, *iter;
|
2009-04-19 00:16:04 +00:00
|
|
|
bitstr_t *mask;
|
2014-02-14 10:56:26 +00:00
|
|
|
int depth;
|
|
|
|
uint8_t proto;
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
fle = uma_zalloc(flow_zone, M_NOWAIT | M_ZERO);
|
|
|
|
if (fle == NULL)
|
2014-02-14 10:56:26 +00:00
|
|
|
return (NULL);
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-14 10:56:26 +00:00
|
|
|
proto = flags_to_proto(flags);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
bcopy(key, &fle->f_flow, KEYLEN(flags));
|
|
|
|
fle->f_flags |= (flags & FL_IPV6);
|
2014-02-14 10:56:26 +00:00
|
|
|
fle->f_proto = proto;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
fle->f_rt = ro->ro_rt;
|
|
|
|
fle->f_lle = ro->ro_lle;
|
|
|
|
fle->f_fhash = hash;
|
|
|
|
fle->f_fibnum = fibnum;
|
|
|
|
fle->f_uptime = time_uptime;
|
2010-03-12 05:03:26 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
critical_enter();
|
2009-04-19 00:16:04 +00:00
|
|
|
mask = flowtable_mask(ft);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
flist = flowtable_list(ft, hash);
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
if (SLIST_EMPTY(flist)) {
|
|
|
|
bit_set(mask, (hash % ft->ft_size));
|
|
|
|
SLIST_INSERT_HEAD(flist, fle, f_next);
|
2009-04-19 00:16:04 +00:00
|
|
|
goto skip;
|
2014-02-07 10:05:12 +00:00
|
|
|
}
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
depth = 0;
|
|
|
|
/*
|
|
|
|
* find end of list and make sure that we were not
|
|
|
|
* preempted by another thread handling this flow
|
|
|
|
*/
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SLIST_FOREACH(iter, flist, f_next) {
|
2014-02-14 10:56:26 +00:00
|
|
|
if (flow_matches(iter, hash, key, proto, fibnum)) {
|
2009-04-19 00:16:04 +00:00
|
|
|
/*
|
2014-02-14 10:56:26 +00:00
|
|
|
* We probably migrated to an other CPU after
|
|
|
|
* lookup in flowtable_lookup_common() failed.
|
|
|
|
* It appeared that this CPU already has flow
|
|
|
|
* entry.
|
2009-04-19 00:16:04 +00:00
|
|
|
*/
|
2014-02-14 10:56:26 +00:00
|
|
|
iter->f_uptime = time_uptime;
|
|
|
|
iter->f_flags |= flags;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
critical_exit();
|
2014-02-14 10:56:26 +00:00
|
|
|
FLOWSTAT_INC(ft, ft_collisions);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
uma_zfree(flow_zone, fle);
|
2014-02-14 10:56:26 +00:00
|
|
|
return (iter);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
depth++;
|
2014-02-07 10:05:12 +00:00
|
|
|
}
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
if (depth > ft->ft_max_depth)
|
|
|
|
ft->ft_max_depth = depth;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
|
|
|
|
SLIST_INSERT_HEAD(flist, fle, f_next);
|
2009-04-19 00:16:04 +00:00
|
|
|
skip:
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
critical_exit();
|
2014-02-14 10:56:26 +00:00
|
|
|
FLOWSTAT_INC(ft, ft_inserts);
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-14 10:56:26 +00:00
|
|
|
return (fle);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
struct flentry *
|
2014-02-07 15:18:23 +00:00
|
|
|
flowtable_lookup(sa_family_t sa, struct mbuf *m)
|
2010-03-12 05:03:26 +00:00
|
|
|
{
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
switch (sa) {
|
2010-03-12 05:03:26 +00:00
|
|
|
#ifdef INET
|
2014-02-07 15:18:23 +00:00
|
|
|
case AF_INET:
|
|
|
|
return (flowtable_lookup_ipv4(m));
|
2010-03-12 05:03:26 +00:00
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
2014-02-07 15:18:23 +00:00
|
|
|
case AF_INET6:
|
|
|
|
return (flowtable_lookup_ipv6(m));
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
panic("%s: sa %d", __func__, sa);
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
}
|
2014-02-07 15:18:23 +00:00
|
|
|
|
|
|
|
static struct flentry *
|
|
|
|
flowtable_lookup_common(struct flowtable *ft, struct sockaddr_storage *ssa,
|
|
|
|
struct sockaddr_storage *dsa, struct mbuf *m, int flags)
|
2009-04-19 00:16:04 +00:00
|
|
|
{
|
2014-02-07 15:18:23 +00:00
|
|
|
struct route_in6 sro6;
|
|
|
|
struct route sro, *ro;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
struct flist *flist;
|
2009-04-19 00:16:04 +00:00
|
|
|
struct flentry *fle;
|
|
|
|
struct rtentry *rt;
|
|
|
|
struct llentry *lle;
|
2014-02-07 15:18:23 +00:00
|
|
|
struct sockaddr_storage *l3addr;
|
|
|
|
struct ifnet *ifp;
|
|
|
|
uint32_t key[9], hash, fibnum;
|
|
|
|
uint8_t proto;
|
|
|
|
|
|
|
|
if (V_flowtable_enable == 0)
|
|
|
|
return (NULL);
|
2010-03-12 05:03:26 +00:00
|
|
|
|
|
|
|
sro.ro_rt = sro6.ro_rt = NULL;
|
|
|
|
sro.ro_lle = sro6.ro_lle = NULL;
|
|
|
|
flags |= ft->ft_flags;
|
|
|
|
proto = flags_to_proto(flags);
|
2014-02-07 15:18:23 +00:00
|
|
|
fibnum = M_GETFIB(m);
|
|
|
|
|
|
|
|
switch (ssa->ss_family) {
|
2010-03-12 05:03:26 +00:00
|
|
|
#ifdef INET
|
2014-02-07 15:18:23 +00:00
|
|
|
case AF_INET: {
|
2010-03-12 05:03:26 +00:00
|
|
|
struct sockaddr_in *ssin, *dsin;
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
KASSERT(dsa->ss_family == AF_INET,
|
|
|
|
("%s: dsa family %d\n", __func__, dsa->ss_family));
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
ro = &sro;
|
|
|
|
memcpy(&ro->ro_dst, dsa, sizeof(struct sockaddr_in));
|
2010-03-12 10:24:58 +00:00
|
|
|
/*
|
|
|
|
* The harvested source and destination addresses
|
2014-02-07 10:05:12 +00:00
|
|
|
* may contain port information if the packet is
|
|
|
|
* from a transport protocol (e.g. TCP/UDP). The
|
|
|
|
* port field must be cleared before performing
|
2010-03-12 10:24:58 +00:00
|
|
|
* a route lookup.
|
|
|
|
*/
|
|
|
|
((struct sockaddr_in *)&ro->ro_dst)->sin_port = 0;
|
2010-03-12 05:03:26 +00:00
|
|
|
dsin = (struct sockaddr_in *)dsa;
|
|
|
|
ssin = (struct sockaddr_in *)ssa;
|
|
|
|
if ((dsin->sin_addr.s_addr == ssin->sin_addr.s_addr) ||
|
|
|
|
(ntohl(dsin->sin_addr.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET ||
|
|
|
|
(ntohl(ssin->sin_addr.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET)
|
|
|
|
return (NULL);
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
hash = ipv4_flow_lookup_hash(ssin, dsin, key, flags);
|
|
|
|
break;
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
2014-02-07 15:18:23 +00:00
|
|
|
case AF_INET6: {
|
2010-03-12 05:03:26 +00:00
|
|
|
struct sockaddr_in6 *ssin6, *dsin6;
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
KASSERT(dsa->ss_family == AF_INET6,
|
|
|
|
("%s: dsa family %d\n", __func__, dsa->ss_family));
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
ro = (struct route *)&sro6;
|
|
|
|
memcpy(&sro6.ro_dst, dsa,
|
|
|
|
sizeof(struct sockaddr_in6));
|
2010-03-12 10:24:58 +00:00
|
|
|
((struct sockaddr_in6 *)&ro->ro_dst)->sin6_port = 0;
|
2010-03-12 05:03:26 +00:00
|
|
|
dsin6 = (struct sockaddr_in6 *)dsa;
|
|
|
|
ssin6 = (struct sockaddr_in6 *)ssa;
|
|
|
|
|
|
|
|
flags |= FL_IPV6;
|
2014-02-07 15:18:23 +00:00
|
|
|
hash = ipv6_flow_lookup_hash(ssin6, dsin6, key, flags);
|
|
|
|
break;
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
#endif
|
2014-02-07 15:18:23 +00:00
|
|
|
default:
|
|
|
|
panic("%s: ssa family %d", __func__, ssa->ss_family);
|
|
|
|
}
|
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
/*
|
|
|
|
* Ports are zero and this isn't a transmit cache
|
2014-02-07 10:05:12 +00:00
|
|
|
* - thus not a protocol for which we need to keep
|
2009-04-19 00:16:04 +00:00
|
|
|
* state
|
2010-03-12 05:03:26 +00:00
|
|
|
* FL_HASH_ALL => key[0] != 0 for TCP || UDP || SCTP
|
2009-04-19 00:16:04 +00:00
|
|
|
*/
|
2014-02-07 15:18:23 +00:00
|
|
|
if (key[0] == 0 && (ft->ft_flags & FL_HASH_ALL))
|
2010-03-12 05:03:26 +00:00
|
|
|
return (NULL);
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
FLOWSTAT_INC(ft, ft_lookups);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
|
|
|
|
critical_enter();
|
|
|
|
flist = flowtable_list(ft, hash);
|
|
|
|
SLIST_FOREACH(fle, flist, f_next)
|
2014-02-14 10:56:26 +00:00
|
|
|
if (flow_matches(fle, hash, key, proto, fibnum)) {
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
fle->f_uptime = time_uptime;
|
|
|
|
fle->f_flags |= flags;
|
|
|
|
critical_exit();
|
|
|
|
FLOWSTAT_INC(ft, ft_hits);
|
|
|
|
goto success;
|
|
|
|
}
|
|
|
|
critical_exit();
|
|
|
|
|
2014-02-13 05:19:09 +00:00
|
|
|
if (flow_full(ft))
|
2010-03-12 05:03:26 +00:00
|
|
|
return (NULL);
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
FLOWSTAT_INC(ft, ft_misses);
|
2009-04-19 00:16:04 +00:00
|
|
|
/*
|
|
|
|
* This bit of code ends up locking the
|
|
|
|
* same route 3 times (just like ip_output + ether_output)
|
|
|
|
* - at lookup
|
|
|
|
* - in rt_check when called by arpresolve
|
|
|
|
* - dropping the refcount for the rtentry
|
|
|
|
*
|
|
|
|
* This could be consolidated to one if we wrote a variant
|
|
|
|
* of arpresolve with an rt_check variant that expected to
|
|
|
|
* receive the route locked
|
|
|
|
*/
|
|
|
|
|
2014-02-08 22:12:00 +00:00
|
|
|
#ifdef RADIX_MPATH
|
|
|
|
rtalloc_mpath_fib(ro, hash, fibnum);
|
|
|
|
#else
|
|
|
|
rtalloc_ign_fib(ro, 0, fibnum);
|
|
|
|
#endif
|
|
|
|
|
2014-02-07 10:05:12 +00:00
|
|
|
if (ro->ro_rt == NULL)
|
2014-02-07 15:18:23 +00:00
|
|
|
return (NULL);
|
2009-08-28 07:01:09 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
rt = ro->ro_rt;
|
|
|
|
ifp = rt->rt_ifp;
|
|
|
|
|
|
|
|
if (ifp->if_flags & (IFF_POINTOPOINT | IFF_LOOPBACK)) {
|
|
|
|
RTFREE(rt);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (ssa->ss_family) {
|
2010-03-12 05:03:26 +00:00
|
|
|
#ifdef INET
|
2014-02-07 15:18:23 +00:00
|
|
|
case AF_INET:
|
|
|
|
if (rt->rt_flags & RTF_GATEWAY)
|
|
|
|
l3addr = (struct sockaddr_storage *)rt->rt_gateway;
|
|
|
|
else
|
|
|
|
l3addr = (struct sockaddr_storage *)&ro->ro_dst;
|
2014-02-08 22:10:53 +00:00
|
|
|
lle = llentry_alloc(ifp, LLTABLE(ifp), l3addr);
|
2014-02-07 15:18:23 +00:00
|
|
|
break;
|
2010-03-12 05:03:26 +00:00
|
|
|
#endif
|
2014-02-07 15:18:23 +00:00
|
|
|
#ifdef INET6
|
|
|
|
case AF_INET6: {
|
|
|
|
struct sockaddr_in6 *dsin6;
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-08 22:10:53 +00:00
|
|
|
dsin6 = (struct sockaddr_in6 *)dsa;
|
2014-02-07 15:18:23 +00:00
|
|
|
if (in6_localaddr(&dsin6->sin6_addr)) {
|
2009-04-19 00:16:04 +00:00
|
|
|
RTFREE(rt);
|
2014-02-08 22:10:53 +00:00
|
|
|
return (NULL);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
if (rt->rt_flags & RTF_GATEWAY)
|
|
|
|
l3addr = (struct sockaddr_storage *)rt->rt_gateway;
|
|
|
|
else
|
|
|
|
l3addr = (struct sockaddr_storage *)&ro->ro_dst;
|
|
|
|
lle = llentry_alloc(ifp, LLTABLE6(ifp), l3addr);
|
|
|
|
break;
|
|
|
|
}
|
2014-02-08 22:10:53 +00:00
|
|
|
#endif
|
2014-02-07 15:18:23 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (lle == NULL) {
|
|
|
|
RTFREE(rt);
|
|
|
|
return (NULL);
|
2014-02-07 10:05:12 +00:00
|
|
|
}
|
2014-02-14 00:05:09 +00:00
|
|
|
|
|
|
|
/* Don't insert the entry if the ARP hasn't yet finished resolving */
|
|
|
|
if ((lle->la_flags & LLE_VALID) == 0) {
|
|
|
|
RTFREE(rt);
|
|
|
|
LLE_FREE(lle);
|
|
|
|
FLOWSTAT_INC(ft, ft_fail_lle_invalid);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
ro->ro_lle = lle;
|
2009-04-19 00:16:04 +00:00
|
|
|
|
2014-02-14 10:56:26 +00:00
|
|
|
fle = flowtable_insert(ft, hash, key, fibnum, ro, flags);
|
|
|
|
if (fle == NULL) {
|
2014-02-07 15:18:23 +00:00
|
|
|
RTFREE(rt);
|
|
|
|
LLE_FREE(lle);
|
|
|
|
return (NULL);
|
|
|
|
}
|
|
|
|
|
|
|
|
success:
|
2014-02-15 07:57:01 +00:00
|
|
|
if (! (m->m_flags & M_FLOWID)) {
|
2014-02-07 15:18:23 +00:00
|
|
|
m->m_flags |= M_FLOWID;
|
|
|
|
m->m_pkthdr.flowid = fle->f_fhash;
|
|
|
|
}
|
2014-02-14 10:56:26 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
return (fle);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* used by the bit_alloc macro
|
|
|
|
*/
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#define calloc(count, size) malloc((count)*(size), M_FTABLE, M_WAITOK | M_ZERO)
|
2014-02-07 15:18:23 +00:00
|
|
|
static void
|
|
|
|
flowtable_alloc(struct flowtable *ft)
|
2009-04-19 00:16:04 +00:00
|
|
|
{
|
2010-03-12 05:03:26 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
ft->ft_table = malloc(ft->ft_size * sizeof(struct flist),
|
|
|
|
M_FTABLE, M_WAITOK);
|
|
|
|
for (int i = 0; i < ft->ft_size; i++)
|
|
|
|
ft->ft_table[i] = uma_zalloc(pcpu_zone_ptr, M_WAITOK | M_ZERO);
|
2009-04-19 00:16:04 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
ft->ft_masks = uma_zalloc(pcpu_zone_ptr, M_WAITOK);
|
|
|
|
for (int i = 0; i < mp_ncpus; i++) {
|
|
|
|
bitstr_t **b;
|
|
|
|
|
|
|
|
b = zpcpu_get_cpu(ft->ft_masks, i);
|
|
|
|
*b = bit_alloc(ft->ft_size);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2014-02-07 15:18:23 +00:00
|
|
|
ft->ft_tmpmask = bit_alloc(ft->ft_size);
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
/*
|
2014-02-07 10:05:12 +00:00
|
|
|
* In the local transmit case the table truly is
|
2009-04-19 00:16:04 +00:00
|
|
|
* just a cache - so everything is eligible for
|
|
|
|
* replacement after 5s of non-use
|
|
|
|
*/
|
2014-02-07 15:18:23 +00:00
|
|
|
if (ft->ft_flags & FL_HASH_ALL) {
|
2009-06-22 21:19:24 +00:00
|
|
|
ft->ft_udp_idle = V_flowtable_udp_expire;
|
|
|
|
ft->ft_syn_idle = V_flowtable_syn_expire;
|
|
|
|
ft->ft_fin_wait_idle = V_flowtable_fin_wait_expire;
|
|
|
|
ft->ft_tcp_idle = V_flowtable_fin_wait_expire;
|
2009-04-19 00:16:04 +00:00
|
|
|
} else {
|
|
|
|
ft->ft_udp_idle = ft->ft_fin_wait_idle =
|
|
|
|
ft->ft_syn_idle = ft->ft_tcp_idle = 30;
|
2014-02-08 22:10:53 +00:00
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
}
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
#undef calloc
|
2009-04-19 00:16:04 +00:00
|
|
|
|
|
|
|
static void
|
2009-10-01 20:32:29 +00:00
|
|
|
flowtable_free_stale(struct flowtable *ft, struct rtentry *rt)
|
2009-04-19 00:16:04 +00:00
|
|
|
{
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
struct flist *flist, freelist;
|
|
|
|
struct flentry *fle, *fle1, *fleprev;
|
2009-04-20 16:16:43 +00:00
|
|
|
bitstr_t *mask, *tmpmask;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
int curbit, tmpsize;
|
2010-03-12 05:03:26 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SLIST_INIT(&freelist);
|
2009-04-19 00:16:04 +00:00
|
|
|
mask = flowtable_mask(ft);
|
2009-04-20 16:16:43 +00:00
|
|
|
tmpmask = ft->ft_tmpmask;
|
2013-10-15 21:28:51 +00:00
|
|
|
tmpsize = ft->ft_size;
|
2009-04-20 16:16:43 +00:00
|
|
|
memcpy(tmpmask, mask, ft->ft_size/8);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
curbit = 0;
|
2009-04-20 16:16:43 +00:00
|
|
|
/*
|
|
|
|
* XXX Note to self, bit_ffs operates at the byte level
|
|
|
|
* and thus adds gratuitous overhead
|
|
|
|
*/
|
|
|
|
bit_ffs(tmpmask, ft->ft_size, &curbit);
|
|
|
|
while (curbit != -1) {
|
2009-04-19 04:24:56 +00:00
|
|
|
if (curbit >= ft->ft_size || curbit < -1) {
|
|
|
|
log(LOG_ALERT,
|
|
|
|
"warning: bad curbit value %d \n",
|
2009-04-19 00:16:04 +00:00
|
|
|
curbit);
|
2009-04-19 04:24:56 +00:00
|
|
|
break;
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
FLOWSTAT_INC(ft, ft_free_checks);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
|
|
|
|
critical_enter();
|
|
|
|
flist = flowtable_list(ft, curbit);
|
2009-04-19 04:24:56 +00:00
|
|
|
#ifdef DIAGNOSTIC
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
if (SLIST_EMPTY(flist) && curbit > 0) {
|
2009-04-19 04:24:56 +00:00
|
|
|
log(LOG_ALERT,
|
|
|
|
"warning bit=%d set, but no fle found\n",
|
|
|
|
curbit);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2014-02-08 22:10:53 +00:00
|
|
|
#endif
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SLIST_FOREACH_SAFE(fle, flist, f_next, fle1) {
|
|
|
|
if (rt != NULL && fle->f_rt != rt) {
|
2009-04-19 00:16:04 +00:00
|
|
|
fleprev = fle;
|
|
|
|
continue;
|
|
|
|
}
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
if (!flow_stale(ft, fle)) {
|
|
|
|
fleprev = fle;
|
|
|
|
continue;
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2010-03-12 05:03:26 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
if (fle == SLIST_FIRST(flist))
|
|
|
|
SLIST_REMOVE_HEAD(flist, f_next);
|
|
|
|
else
|
|
|
|
SLIST_REMOVE_AFTER(fleprev, f_next);
|
|
|
|
SLIST_INSERT_HEAD(&freelist, fle, f_next);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
if (SLIST_EMPTY(flist))
|
2009-04-19 00:16:04 +00:00
|
|
|
bit_clear(mask, curbit);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
critical_exit();
|
|
|
|
|
2009-04-20 16:16:43 +00:00
|
|
|
bit_clear(tmpmask, curbit);
|
2013-10-15 21:28:51 +00:00
|
|
|
tmpmask += (curbit / 8);
|
|
|
|
tmpsize -= (curbit / 8) * 8;
|
|
|
|
bit_ffs(tmpmask, tmpsize, &curbit);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
|
|
|
|
SLIST_FOREACH_SAFE(fle, &freelist, f_next, fle1) {
|
2014-02-07 15:18:23 +00:00
|
|
|
FLOWSTAT_INC(ft, ft_frees);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
if (fle->f_rt != NULL)
|
|
|
|
RTFREE(fle->f_rt);
|
|
|
|
if (fle->f_lle != NULL)
|
|
|
|
LLE_FREE(fle->f_lle);
|
|
|
|
uma_zfree(flow_zone, fle);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
flowtable_clean_vnet(struct flowtable *ft, struct rtentry *rt)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
CPU_FOREACH(i) {
|
|
|
|
if (smp_started == 1) {
|
|
|
|
thread_lock(curthread);
|
|
|
|
sched_bind(curthread, i);
|
|
|
|
thread_unlock(curthread);
|
|
|
|
}
|
|
|
|
|
|
|
|
flowtable_free_stale(ft, rt);
|
|
|
|
|
|
|
|
if (smp_started == 1) {
|
|
|
|
thread_lock(curthread);
|
|
|
|
sched_unbind(curthread);
|
|
|
|
thread_unlock(curthread);
|
|
|
|
}
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2009-10-01 20:32:29 +00:00
|
|
|
void
|
2014-02-07 15:18:23 +00:00
|
|
|
flowtable_route_flush(sa_family_t sa, struct rtentry *rt)
|
2009-10-01 20:32:29 +00:00
|
|
|
{
|
2014-02-07 15:18:23 +00:00
|
|
|
struct flowtable *ft;
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
switch (sa) {
|
|
|
|
#ifdef INET
|
|
|
|
case AF_INET:
|
|
|
|
ft = &V_ip4_ft;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
|
|
|
case AF_INET6:
|
|
|
|
ft = &V_ip6_ft;
|
|
|
|
break;
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
panic("%s: sa %d", __func__, sa);
|
|
|
|
}
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
flowtable_clean_vnet(ft, rt);
|
2009-06-22 21:19:24 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
flowtable_cleaner(void)
|
|
|
|
{
|
|
|
|
VNET_ITERATOR_DECL(vnet_iter);
|
2011-01-06 22:17:07 +00:00
|
|
|
struct thread *td;
|
2009-06-22 21:19:24 +00:00
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
if (bootverbose)
|
|
|
|
log(LOG_INFO, "flowtable cleaner started\n");
|
2011-01-06 22:17:07 +00:00
|
|
|
td = curthread;
|
2009-04-19 00:16:04 +00:00
|
|
|
while (1) {
|
2009-06-22 21:19:24 +00:00
|
|
|
VNET_LIST_RLOCK();
|
|
|
|
VNET_FOREACH(vnet_iter) {
|
|
|
|
CURVNET_SET(vnet_iter);
|
2014-02-07 15:18:23 +00:00
|
|
|
#ifdef INET
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
flowtable_clean_vnet(&V_ip4_ft, NULL);
|
2014-02-07 15:18:23 +00:00
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
flowtable_clean_vnet(&V_ip6_ft, NULL);
|
2014-02-07 15:18:23 +00:00
|
|
|
#endif
|
2009-06-22 21:19:24 +00:00
|
|
|
CURVNET_RESTORE();
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2009-06-22 21:19:24 +00:00
|
|
|
VNET_LIST_RUNLOCK();
|
|
|
|
|
2009-04-19 00:16:04 +00:00
|
|
|
/*
|
2009-08-19 20:13:09 +00:00
|
|
|
* The 10 second interval between cleaning checks
|
2009-04-19 00:16:04 +00:00
|
|
|
* is arbitrary
|
|
|
|
*/
|
2009-08-18 20:28:58 +00:00
|
|
|
mtx_lock(&flowclean_lock);
|
2011-01-06 22:17:07 +00:00
|
|
|
thread_lock(td);
|
|
|
|
sched_prio(td, PPAUSE);
|
|
|
|
thread_unlock(td);
|
2010-12-31 21:06:52 +00:00
|
|
|
flowclean_cycles++;
|
|
|
|
cv_broadcast(&flowclean_f_cv);
|
|
|
|
cv_timedwait(&flowclean_c_cv, &flowclean_lock, flowclean_freq);
|
2009-08-18 20:28:58 +00:00
|
|
|
mtx_unlock(&flowclean_lock);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
flowtable_flush(void *unused __unused)
|
|
|
|
{
|
|
|
|
uint64_t start;
|
2010-03-12 05:03:26 +00:00
|
|
|
|
2009-08-18 20:28:58 +00:00
|
|
|
mtx_lock(&flowclean_lock);
|
|
|
|
start = flowclean_cycles;
|
|
|
|
while (start == flowclean_cycles) {
|
2010-12-31 21:06:52 +00:00
|
|
|
cv_broadcast(&flowclean_c_cv);
|
|
|
|
cv_wait(&flowclean_f_cv, &flowclean_lock);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
2009-08-18 20:28:58 +00:00
|
|
|
mtx_unlock(&flowclean_lock);
|
2009-04-19 00:16:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct kproc_desc flow_kp = {
|
|
|
|
"flowcleaner",
|
|
|
|
flowtable_cleaner,
|
|
|
|
&flowcleanerproc
|
|
|
|
};
|
|
|
|
SYSINIT(flowcleaner, SI_SUB_KTHREAD_IDLE, SI_ORDER_ANY, kproc_start, &flow_kp);
|
2009-06-09 21:55:28 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
static int
|
|
|
|
flowtable_get_size(char *name)
|
2009-08-18 20:28:58 +00:00
|
|
|
{
|
2014-02-07 15:18:23 +00:00
|
|
|
int size;
|
|
|
|
|
|
|
|
if (TUNABLE_INT_FETCH(name, &size)) {
|
|
|
|
if (size < 256)
|
|
|
|
size = 256;
|
|
|
|
if (!powerof2(size)) {
|
|
|
|
printf("%s must be power of 2\n", name);
|
|
|
|
size = 2048;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* round up to the next power of 2
|
|
|
|
*/
|
|
|
|
size = 1 << fls((1024 + maxusers * 64) - 1);
|
|
|
|
}
|
2009-08-18 20:28:58 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
return (size);
|
2009-08-18 20:28:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
flowtable_init(const void *unused __unused)
|
|
|
|
{
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
flow_hashjitter = arc4random();
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
flow_zone = uma_zcreate("flows", sizeof(struct flentry),
|
2014-02-07 15:18:23 +00:00
|
|
|
NULL, NULL, NULL, NULL, UMA_ALIGN_CACHE, UMA_ZONE_MAXBUCKET);
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
uma_zone_set_max(flow_zone, 1024 + maxusers * 64 * mp_ncpus);
|
2014-02-07 15:18:23 +00:00
|
|
|
|
2010-12-31 21:06:52 +00:00
|
|
|
cv_init(&flowclean_c_cv, "c_flowcleanwait");
|
|
|
|
cv_init(&flowclean_f_cv, "f_flowcleanwait");
|
2009-08-18 20:28:58 +00:00
|
|
|
mtx_init(&flowclean_lock, "flowclean lock", NULL, MTX_DEF);
|
|
|
|
EVENTHANDLER_REGISTER(ifnet_departure_event, flowtable_flush, NULL,
|
|
|
|
EVENTHANDLER_PRI_ANY);
|
2010-03-22 23:04:12 +00:00
|
|
|
flowclean_freq = 20*hz;
|
2009-08-18 20:28:58 +00:00
|
|
|
}
|
2014-02-07 15:18:23 +00:00
|
|
|
SYSINIT(flowtable_init, SI_SUB_PROTO_BEGIN, SI_ORDER_FIRST,
|
2009-08-18 20:28:58 +00:00
|
|
|
flowtable_init, NULL);
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
#ifdef INET
|
|
|
|
static SYSCTL_NODE(_net_flowtable, OID_AUTO, ip4, CTLFLAG_RD, NULL,
|
|
|
|
"Flowtable for IPv4");
|
|
|
|
|
|
|
|
static VNET_PCPUSTAT_DEFINE(struct flowtable_stat, ip4_ftstat);
|
|
|
|
VNET_PCPUSTAT_SYSINIT(ip4_ftstat);
|
|
|
|
VNET_PCPUSTAT_SYSUNINIT(ip4_ftstat);
|
|
|
|
SYSCTL_VNET_PCPUSTAT(_net_flowtable_ip4, OID_AUTO, stat, struct flowtable_stat,
|
|
|
|
ip4_ftstat, "Flowtable statistics for IPv4 "
|
|
|
|
"(struct flowtable_stat, net/flowtable.h)");
|
2009-08-18 20:28:58 +00:00
|
|
|
|
|
|
|
static void
|
2014-02-07 15:18:23 +00:00
|
|
|
flowtable_init_vnet_v4(const void *unused __unused)
|
2009-08-18 20:28:58 +00:00
|
|
|
{
|
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
V_ip4_ft.ft_size = flowtable_get_size("net.flowtable.ip4.size");
|
|
|
|
V_ip4_ft.ft_stat = VNET(ip4_ftstat);
|
|
|
|
flowtable_alloc(&V_ip4_ft);
|
2009-08-18 20:28:58 +00:00
|
|
|
}
|
2014-02-07 15:18:23 +00:00
|
|
|
VNET_SYSINIT(ft_vnet_v4, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY,
|
|
|
|
flowtable_init_vnet_v4, NULL);
|
|
|
|
#endif /* INET */
|
2009-08-18 20:28:58 +00:00
|
|
|
|
2014-02-07 15:18:23 +00:00
|
|
|
#ifdef INET6
|
|
|
|
static SYSCTL_NODE(_net_flowtable, OID_AUTO, ip6, CTLFLAG_RD, NULL,
|
|
|
|
"Flowtable for IPv6");
|
|
|
|
|
|
|
|
static VNET_PCPUSTAT_DEFINE(struct flowtable_stat, ip6_ftstat);
|
|
|
|
VNET_PCPUSTAT_SYSINIT(ip6_ftstat);
|
|
|
|
VNET_PCPUSTAT_SYSUNINIT(ip6_ftstat);
|
|
|
|
SYSCTL_VNET_PCPUSTAT(_net_flowtable_ip6, OID_AUTO, stat, struct flowtable_stat,
|
|
|
|
ip6_ftstat, "Flowtable statistics for IPv6 "
|
|
|
|
"(struct flowtable_stat, net/flowtable.h)");
|
|
|
|
|
|
|
|
static void
|
|
|
|
flowtable_init_vnet_v6(const void *unused __unused)
|
|
|
|
{
|
|
|
|
|
|
|
|
V_ip6_ft.ft_size = flowtable_get_size("net.flowtable.ip6.size");
|
|
|
|
V_ip6_ft.ft_stat = VNET(ip6_ftstat);
|
|
|
|
flowtable_alloc(&V_ip6_ft);
|
|
|
|
}
|
|
|
|
VNET_SYSINIT(flowtable_init_vnet_v6, SI_SUB_PROTO_IFATTACHDOMAIN, SI_ORDER_ANY,
|
|
|
|
flowtable_init_vnet_v6, NULL);
|
|
|
|
#endif /* INET6 */
|
2009-08-18 20:28:58 +00:00
|
|
|
|
|
|
|
#ifdef DDB
|
|
|
|
static bitstr_t *
|
|
|
|
flowtable_mask_pcpu(struct flowtable *ft, int cpuid)
|
|
|
|
{
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
return (zpcpu_get_cpu(*ft->ft_masks, cpuid));
|
2009-08-18 20:28:58 +00:00
|
|
|
}
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
static struct flist *
|
|
|
|
flowtable_list_pcpu(struct flowtable *ft, uint32_t hash, int cpuid)
|
2009-08-18 20:28:58 +00:00
|
|
|
{
|
2014-02-08 22:10:53 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
return (zpcpu_get_cpu(&ft->ft_table[hash % ft->ft_size], cpuid));
|
2009-08-18 20:28:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
flow_show(struct flowtable *ft, struct flentry *fle)
|
|
|
|
{
|
|
|
|
int idle_time;
|
2010-03-12 05:03:26 +00:00
|
|
|
int rt_valid, ifp_valid;
|
|
|
|
uint16_t sport, dport;
|
|
|
|
uint32_t *hashkey;
|
|
|
|
char saddr[4*sizeof "123"], daddr[4*sizeof "123"];
|
|
|
|
volatile struct rtentry *rt;
|
|
|
|
struct ifnet *ifp = NULL;
|
2009-08-18 20:28:58 +00:00
|
|
|
|
|
|
|
idle_time = (int)(time_uptime - fle->f_uptime);
|
2010-03-12 05:03:26 +00:00
|
|
|
rt = fle->f_rt;
|
|
|
|
rt_valid = rt != NULL;
|
2014-02-07 10:05:12 +00:00
|
|
|
if (rt_valid)
|
2010-03-12 05:03:26 +00:00
|
|
|
ifp = rt->rt_ifp;
|
|
|
|
ifp_valid = ifp != NULL;
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
hashkey = (uint32_t *)&fle->f_flow;
|
2010-03-12 05:03:26 +00:00
|
|
|
if (fle->f_flags & FL_IPV6)
|
|
|
|
goto skipaddr;
|
|
|
|
|
|
|
|
inet_ntoa_r(*(struct in_addr *) &hashkey[2], daddr);
|
|
|
|
if (ft->ft_flags & FL_HASH_ALL) {
|
2014-02-08 22:10:53 +00:00
|
|
|
inet_ntoa_r(*(struct in_addr *) &hashkey[1], saddr);
|
2010-03-12 05:03:26 +00:00
|
|
|
sport = ntohs(((uint16_t *)hashkey)[0]);
|
|
|
|
dport = ntohs(((uint16_t *)hashkey)[1]);
|
|
|
|
db_printf("%s:%d->%s:%d",
|
|
|
|
saddr, sport, daddr,
|
|
|
|
dport);
|
2014-02-07 10:05:12 +00:00
|
|
|
} else
|
2010-03-12 05:03:26 +00:00
|
|
|
db_printf("%s ", daddr);
|
2014-02-07 10:05:12 +00:00
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
skipaddr:
|
2009-08-18 20:28:58 +00:00
|
|
|
if (fle->f_flags & FL_STALE)
|
|
|
|
db_printf(" FL_STALE ");
|
2010-03-12 05:03:26 +00:00
|
|
|
if (fle->f_flags & FL_TCP)
|
|
|
|
db_printf(" FL_TCP ");
|
|
|
|
if (fle->f_flags & FL_UDP)
|
|
|
|
db_printf(" FL_UDP ");
|
|
|
|
if (rt_valid) {
|
|
|
|
if (rt->rt_flags & RTF_UP)
|
|
|
|
db_printf(" RTF_UP ");
|
|
|
|
}
|
|
|
|
if (ifp_valid) {
|
|
|
|
if (ifp->if_flags & IFF_LOOPBACK)
|
|
|
|
db_printf(" IFF_LOOPBACK ");
|
|
|
|
if (ifp->if_flags & IFF_UP)
|
2014-02-08 22:10:53 +00:00
|
|
|
db_printf(" IFF_UP ");
|
2010-03-12 05:03:26 +00:00
|
|
|
if (ifp->if_flags & IFF_POINTOPOINT)
|
2014-02-08 22:10:53 +00:00
|
|
|
db_printf(" IFF_POINTOPOINT ");
|
2010-03-12 05:03:26 +00:00
|
|
|
}
|
|
|
|
if (fle->f_flags & FL_IPV6)
|
|
|
|
db_printf("\n\tkey=%08x:%08x:%08x%08x:%08x:%08x%08x:%08x:%08x",
|
|
|
|
hashkey[0], hashkey[1], hashkey[2],
|
|
|
|
hashkey[3], hashkey[4], hashkey[5],
|
|
|
|
hashkey[6], hashkey[7], hashkey[8]);
|
|
|
|
else
|
|
|
|
db_printf("\n\tkey=%08x:%08x:%08x ",
|
|
|
|
hashkey[0], hashkey[1], hashkey[2]);
|
|
|
|
db_printf("hash=%08x idle_time=%03d"
|
|
|
|
"\n\tfibnum=%02d rt=%p",
|
|
|
|
fle->f_fhash, idle_time, fle->f_fibnum, fle->f_rt);
|
2009-08-18 20:28:58 +00:00
|
|
|
db_printf("\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
flowtable_show(struct flowtable *ft, int cpuid)
|
|
|
|
{
|
|
|
|
int curbit = 0;
|
|
|
|
bitstr_t *mask, *tmpmask;
|
|
|
|
|
2010-03-12 05:03:26 +00:00
|
|
|
if (cpuid != -1)
|
|
|
|
db_printf("cpu: %d\n", cpuid);
|
2009-08-18 20:28:58 +00:00
|
|
|
mask = flowtable_mask_pcpu(ft, cpuid);
|
|
|
|
tmpmask = ft->ft_tmpmask;
|
|
|
|
memcpy(tmpmask, mask, ft->ft_size/8);
|
|
|
|
/*
|
|
|
|
* XXX Note to self, bit_ffs operates at the byte level
|
|
|
|
* and thus adds gratuitous overhead
|
|
|
|
*/
|
|
|
|
bit_ffs(tmpmask, ft->ft_size, &curbit);
|
|
|
|
while (curbit != -1) {
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
struct flist *flist;
|
|
|
|
struct flentry *fle;
|
|
|
|
|
2009-08-18 20:28:58 +00:00
|
|
|
if (curbit >= ft->ft_size || curbit < -1) {
|
|
|
|
db_printf("warning: bad curbit value %d \n",
|
|
|
|
curbit);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
flist = flowtable_list_pcpu(ft, curbit, cpuid);
|
2009-08-18 20:28:58 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
SLIST_FOREACH(fle, flist, f_next)
|
2009-08-18 20:28:58 +00:00
|
|
|
flow_show(ft, fle);
|
|
|
|
bit_clear(tmpmask, curbit);
|
|
|
|
bit_ffs(tmpmask, ft->ft_size, &curbit);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2014-02-07 15:18:23 +00:00
|
|
|
flowtable_show_vnet(struct flowtable *ft)
|
2009-08-18 20:28:58 +00:00
|
|
|
{
|
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
int i;
|
2014-02-07 15:18:23 +00:00
|
|
|
|
o Axe non-pcpu flowtable implementation. It wasn't enabled or used,
and probably is a leftover from first prototyping by Kip. The
non-pcpu implementation used mutexes, so it doubtfully worked
better than simple routing lookup.
o Use UMA_ZONE_PCPU zone for pointers instead of [MAXCPU] arrays,
use zpcpu_get() to access data in there.
o Substitute own single list implementation with SLIST(). This
has two functional side effects:
- new flows go into head of a list, before they went to tail.
- a bug when incorrect flow was deleted in flow cleaner is
fixed.
o Due to cache line alignment, there is no reason to keep
different zones for IPv4 and IPv6 flows. Both consume one
cache line, real size of allocation is equal.
o Rely on that f_hash, f_rt, f_lle are stable during fle
lifetime, remove useless volatile quilifiers.
o More INET/INET6 splitting.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
2014-02-13 04:59:18 +00:00
|
|
|
CPU_FOREACH(i)
|
|
|
|
flowtable_show(ft, i);
|
2009-08-18 20:28:58 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
DB_SHOW_COMMAND(flowtables, db_show_flowtables)
|
|
|
|
{
|
|
|
|
VNET_ITERATOR_DECL(vnet_iter);
|
|
|
|
|
|
|
|
VNET_FOREACH(vnet_iter) {
|
|
|
|
CURVNET_SET(vnet_iter);
|
2010-12-31 21:20:32 +00:00
|
|
|
#ifdef VIMAGE
|
|
|
|
db_printf("vnet %p\n", vnet_iter);
|
|
|
|
#endif
|
2014-02-07 15:18:23 +00:00
|
|
|
#ifdef INET
|
|
|
|
printf("IPv4:\n");
|
|
|
|
flowtable_show_vnet(&V_ip4_ft);
|
|
|
|
#endif
|
|
|
|
#ifdef INET6
|
|
|
|
printf("IPv6:\n");
|
|
|
|
flowtable_show_vnet(&V_ip6_ft);
|
|
|
|
#endif
|
2009-08-18 20:28:58 +00:00
|
|
|
CURVNET_RESTORE();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif
|