Commit Graph

490 Commits

Author SHA1 Message Date
Gleb Smirnoff
46425317db Fix compilation for 32-bit machines. 2014-03-06 02:00:01 +00:00
Gleb Smirnoff
5274e55eb3 Hide struct rtentry from userland. 2014-03-05 01:47:08 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
f0e49f6631 Whenever flowtable lookup fails, we do route lookup and then try to
insert flow entry. During the route lookup the critical section is
exited. It may happen, that after route lookup we will be executed
on an other CPU that already has such flowentry. Before this change
we simply freed the flowentry and returned to ip_output() with
failure.

Actually there is nothing wrong with using previously allocated
flow entry, updating it properly. Thus, make flowentry_insert()
return the new either old fle, and make use of it.

Count reuses as "collisions" and real inserts as "inserts".

Reviewed by:	adrian
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-14 10:56:26 +00:00
Adrian Chadd
6d9223ac7e Reword.
Suggestion:	glebius
2014-02-14 07:43:39 +00:00
Adrian Chadd
0e778c88c9 Don't insert a flowtable entry if the lle isn't yet valid.
Some of the collisions that are occuring are due to flowtable lookups
that succeed but have an invalid lle - typically because the L2 adjacency
lookup hasn't completed.  This would lead to a follow-up insert which
would then fail (ie, collision) and the code would fall through to doing
a slow-path L2/L3 lookup in the netinet/netinet6 code.

This patch simply aborts storing a new flowtable entry if the lle isn't
yet valid.

Whilst I'm here, add a new pcpu counter for the item so the number of
failures can be tracked separately from generic "collisions."

Reviewed by:	glebius
MFC after:	10 days
Sponsored by:	Netflix, Inc.
2014-02-14 00:05:09 +00:00
Gleb Smirnoff
5d6d7e756b o Revamp API between flowtable and netinet, netinet6.
- ip_output() and ip_output6() simply call flowtable_lookup(),
    passing mbuf and address family. That's the only code under
    #ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
  - Remove hand made pcpu stats, and utilize counter(9).
  - Snapshot of statistics is available via 'netstat -rs'.
  - All sysctls are moved into net.flowtable namespace, since
    spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
  - Remove chain of multiple flowtables. We simply have one for
    IPv4 and one for IPv6.
  - Flowtables are allocated in flowtable.c, symbols are static.
  - With proper argument to SYSINIT() we no longer need flowtable_ready.
  - Hash salt doesn't need to be per-VNET.
  - Removed rudimentary debugging, which use quite useless in dtrace era.

The runtime behavior of flowtable shouldn't be changed by this commit.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-07 15:18:23 +00:00
Bjoern A. Zeeb
a4993b252e Print the MD5 signature information introduced in r221023 in the
TCP statistics output.

MFC after:	3 weeks
2014-02-05 20:43:03 +00:00
Alexander V. Chernikov
120dc21b86 Bump dates in nestat(1) and route(8) man pages.
Fix several small errors introduced by r260524.

Suggested by:	glebius
MFC after:	2 weeks
2014-01-11 09:44:00 +00:00
Alexander V. Chernikov
17ed2e8ea8 Add -4/-6 shorthand for -finet/-finet6 in route(8) and netstat(8).
MFC after:	2 weeks
2014-01-10 23:08:18 +00:00
Alexander V. Chernikov
dbfdd46b70 Explicitly free rt_tables to please Coverity.
Reported by:	Coverity
Coverity CID:	1147174
MFC after:	2 weeks
2013-12-31 12:11:48 +00:00
Gleb Smirnoff
526b18aa8b Claim copyright since I've almost rewritten this file in r256512. 2013-12-29 19:31:49 +00:00
Alexander V. Chernikov
8e1dc13857 Further split kvm(3) and sysctl interfaces for route table printing.
MFC after:	4 weeks
Sponsored by:	Yandex LLC
2013-12-20 12:08:36 +00:00
Alexander V. Chernikov
fc47e028bb Use more fine-grained kvm(3) symbol lookup: routing code retrieves only
necessary symbols needed per subsystem. Main kvm(3) init is now delayed
as much as possbile. This finally fixes performance issues reported in
kern/167204.
Some non-working code (ng_socket.ko symbol addresses calculation) removed.
Some global variables eliminated.

PR:		kern/167204
MFC after:	4 weeks
2013-12-20 00:17:26 +00:00
Alexander V. Chernikov
11188df260 Restore corefiles handling via kvm(3).
Found by:	John-Mark Gurney <jmg at funkthat.com>
MFC after:	4 weeks
2013-12-18 20:04:04 +00:00
Alexander V. Chernikov
c49b4b8055 Switch netstat -rn to use standard API for retrieving list of routes
instead of peeking inside in-kernel radix via kget.
This permits us to change kernel structures without breaking userland.
Additionally, this change provide more reliable and faster output.

`Refs` and `Use` fields available in IPv4 by default (and via -W
for other families) were removed. `Refs` is radix-specific thing
which is not informative for users. `Use` field value is handy sometimes,
but a) current API does not support it and b) I'm not sure we will
support per-rte pcpu counters in near future.

Old method of retrieving data is still supported (either by defining
NewTree=0 or running netstat with -A). However, Refs/Use fields are
hidden.

Sponsored by:	Yandex LLC
MFC after:	4 weeks
PR:		kern/167204
2013-12-18 18:25:27 +00:00
Gleb Smirnoff
82b90729fb 'netstat -i' no longer supports working on a vmcore. 2013-10-30 08:13:42 +00:00
Gleb Smirnoff
3e4d5cd37b Make userland tools honor WITHOUT_PF build option.
Tested by:	dt71@gmx.com
2013-10-29 17:38:13 +00:00
Gleb Smirnoff
84c1edcbad Rewrite netstat/if.c to use getifaddrs(3) and getifmaddrs(3) instead of
libkvm digging in kernel memory. This is possible since r231506 made
getifaddrs(3) to supply if_data for each ifaddr.

  The pros of this change is that now netstat(1) doesn't know about kernel
struct ifnet and struct ifaddr. And these structs are about to change
significantly in head soon. New netstat binary will work well with 10.0
and any future kernel.

  The cons is that now it isn't possible to obtain interface statistics
from a vmcore.

  Functions intpr() and sidewaysintpr() were rewritten from scratch.

  The output of netstat(1) has underwent the following changes:

1) The MTU is not printed for protocol addresses, since it has no notion.
   Dash is printed instead. If there would be a strong desire to return
   previous output, it is doable.
2) Output interface queue drops are not printed. Currently this data isn't
   available to userland via any API. We plan to drop 'struct ifqueue' from
   'struct ifnet' very soon, so old kvm(3) access to queue drops is soon
   to be broken, too. The plan is that drivers would handle their queues
   theirselves and a new field in if_data would be updated in case of drops.
3) In-kernel reference count for multicast addresses isn't printed. I doubt
   that anyone used it. Anyway, netstat(1) is sysadmin tool, not kernel
   debugger.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 09:55:07 +00:00
Gleb Smirnoff
d35acb5099 Remove obtained, but never used data.
Found by:	gcc
2013-10-15 09:21:05 +00:00
Hiroki Sato
84dde578a9 - Use getnameinfo(3) instead of gethostbyaddr(3) or inet_ntop(3).
- Fill sin6_scope_id from in6p.sin6_addr.s6_addr[2].  struct inpcb has
  struct in6_addr for the endpoint addresses, so sin6_scope_id must be filled.
2013-08-17 17:23:42 +00:00
Andrey V. Elsukov
6794f46021 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
Gleb Smirnoff
c780f37850 Sweep unused nlist entries.
Sponsored by:	Nginx, Inc.
2013-07-16 12:22:36 +00:00
Andrey V. Elsukov
05d1f5bce0 Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with:	glebius
2013-07-15 06:16:57 +00:00
Hiroki Sato
3fddef95af Add -F fibnum option to specify an FIB number for -r flag. 2013-07-12 17:11:30 +00:00
Andrey V. Elsukov
db8c087944 Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat,
ipsec4stat, ipsec6stat to PCPU counters.
2013-07-09 10:08:13 +00:00
Andrey V. Elsukov
69edf037d7 Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
Andrey V. Elsukov
a786f67981 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Andrey V. Elsukov
5da0521fce Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).
2013-07-09 09:43:03 +00:00
Andrey V. Elsukov
c80211e3cf Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
Andrey V. Elsukov
b543f3b167 Replace hardcoded numbers. Also use interface-local scope name instead
of node-local.
2013-04-16 11:25:45 +00:00
Gleb Smirnoff
29dde48df4 Use kvm_counter_u64_fetch() to fix obtaining ipstat and tcpstat from
kernel core files.

Sponsored by:	Nginx, Inc.
2013-04-10 20:29:23 +00:00
Gleb Smirnoff
5923c29332 Merge from projects/counters: TCP/IP stats.
Convert 'struct ipstat' and 'struct tcpstat' to counter(9).

  This speeds up IP forwarding at extreme packet rates, and
makes accounting more precise.

Sponsored by:	Nginx, Inc.
2013-04-08 19:57:21 +00:00
Alexander V. Chernikov
7e4f8ab64a Add forgotten .El
MFC with:	r248112
2013-03-09 20:04:47 +00:00
Alexander V. Chernikov
69cf5b29fd Document netstat -Q flags meaning.
MFC after:	1 week
2013-03-09 20:01:35 +00:00
Philippe Charnier
321ae07f36 WARNS=6 compliance 2013-02-19 13:17:16 +00:00
Gleb Smirnoff
f89bcdb03b Use pluralies() for "entry"/"entries". 2013-01-22 18:07:59 +00:00
Hiroki Sato
6bbfef9004 Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to
userland via routing socket or sysctl.  This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:

 sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);

This behavior can be controlled by net.inet6.ip6.deembed_scopeid.  This is
set to 1 by default (sin6_scope_id will be filled in the kernel).

Reviewed by:	bz
2012-11-17 20:19:00 +00:00
Alfred Perlstein
504a74fa90 Show the number of times we block waiting for mbufs.
Machines can stall out because mbufs are low, however sometimes we won't
see "requests denied", instead we see user land processes or kernel threads
blocking waiting for mbufs because they set M_WAIT.  These consumers do not
see errors, only stalling.

Unfortunately until now, netstat did not export this information
so you could have experienced an mbuf shortage and have no way of
seeing it unless you happen to run netstat at the exact time of the
shortage and see "in use" = "max".

By exporting the number of times processes are blocked, we can
effectively see how often non-interrupt context threads are effectively
"denied".

MFC after: 2 weeks
2012-10-25 02:12:05 +00:00
Eitan Adler
398de06dcb Remove unused variable. Newer versions of gcc care.
Submitted by:	Sascha Wildner <saw@online.de>
Approved by:	cperciva
MFC after:	3 days
2012-10-22 02:59:59 +00:00
Gleb Smirnoff
d6d3f01e0a Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
Michael Tuexen
3dcc856b6c Allow netstat to be build if INET is not defined in the kernel.
Thanks to Garrett Cooper for reporting the issue.

MFC after: 3 days
X-MFC: 238501
2012-07-16 06:43:04 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Xin LI
96c3073f76 Eliminate an unused parameter of static method igmp_stats_live_old().
MFC after:	1 month
2012-04-13 22:35:53 +00:00
Gleb Smirnoff
16410af7fa With pf 4.5 import the name of pfsync stats sysctl has changed, thus
'netstat -sp pfsync' got broken. Fix this.
2012-04-04 08:30:32 +00:00
Dimitry Andric
6aa83769fd After r232745, which makes sure __bswap16(), ntohs() and htons() return
__uint16_t, we can partially undo r228668.

Note the remark "Work around a clang false positive with format string
warnings and ntohs macros (see LLVM PR 11313)" was actually incorrect.

Before r232745, on some arches, the ntohs() macros did in fact return
int, not uint16_t, so clang was right in warning about the %hu format
string.

MFC after:	2 weeks
2012-03-09 20:50:15 +00:00
Dimitry Andric
07b202a847 Define several extra macros in bsd.sys.mk and sys/conf/kern.pre.mk, to
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.

Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang       (disables -Werror)
- NO_WCAST_ALIGN.clang  (disables -Wcast-align)
- NO_WFORMAT.clang	(disables -Wformat and friends)
- CLANG_NO_IAS		(disables integrated assembler)
- CLANG_OPT_SMALL	(adds flags for extra small size optimizations)

As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf!  For clang, use the following:

CC=clang
CXX=clang++
CPP=clang-cpp

MFC after:	2 weeks
2012-02-28 18:30:18 +00:00
Bjoern A. Zeeb
4fd5619bb1 Teach netstat -r (display contents of routing tables) about multi-FIB for
IPv6 in addition to IPv4.
While here harmonize naming of variables a bit with what we use in kernel.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 15:26:55 +00:00
Michael Tuexen
23a0e422aa Don't print a warning when using netstat to print
SCTP statistics when there is not SCTP in the kernel.
This problem was reported by Sean Mahood.

MFC after: 1 week.
2012-01-25 21:49:48 +00:00