Commit Graph

620 Commits

Author SHA1 Message Date
Andrey V. Elsukov
ccc53de916 Add the reverse part to rule #9. Also change its description in the
netstat(8) output.

MFC after:	1 week
2014-09-01 09:30:34 +00:00
Mark Johnston
d77e67e495 Suppress warnings when retrieving protocol stats from interfaces that
don't support IPv6 (e.g. pflog(4)).

Reviewed by:	hrs
MFC after:	2 weeks
2014-08-22 19:23:38 +00:00
Simon J. Gerraty
ee7b0571c2 Merge head from 7/28 2014-08-19 06:50:54 +00:00
Joel Dahl
275b78396e Minor mdoc nit. 2014-06-06 08:42:03 +00:00
Allan Jude
997a303f17 Sadly, we do not actually live in the future.
Approved by:	wblock (mentor)
2014-06-04 16:55:38 +00:00
Allan Jude
fd2c6bc9e1 Further updates to the netstat(1) man page and usage message
- Reformat the entire man page
- Create a proper synopsis section
- Use itemized-lists to describe each flag, rather than paragraphs
- Cross-reference common flags to a 'general flags' sub-section with short
inline description of the flag
- Label 'general flags' sub-section
- Apply additional fixes suggested by wblock, brueffer, and bdrewery
- Update .Dd that got undone previously
- Change the order of the .Op Fl to be alphabetical
- Add the -i | -I interface flags to the description of 'interface
display mode'
- Fix missing parameters in man page
- Fix missing parameters in usage()
- Sync man page and usage()

MFC Note: stable/9 and stable/10 do not have -R, will need to be removed
when merged

CR:		D58
Reviewed by:	brueffer, bcr
Approved by:	wblock (mentor)
MFC after:	7 days
Sponsored by:	ScaleEngine Inc.
2014-06-04 04:18:33 +00:00
Allan Jude
ada0a6dd14 Add path markup on sys/mbuf.h to previous netstat(1) man page update
Submitted by:	brueffer
Reviewed by:	eadler (mentor)
2014-05-25 08:09:55 +00:00
Allan Jude
e9fa95e9e9 Document the new -R flag of netstat(1) introduced in r266448 that tracks the
flowid for each socket.

Reviewed by:	adrian
Approved by:	eadler (mentor)
2014-05-25 07:41:12 +00:00
Hiroki Sato
c4f55e08be - Fix a bug which can make sysctl() fail when -F is specified.
- Increase WID_IF_DEFAULT() from 6 to 8 (the default for AF_INET6) because
  we have interfaces with longer names than 6 chars like epairN{a,b}.
- Style fixes.
2014-05-21 10:04:51 +00:00
Adrian Chadd
85b0f0f325 Add -R to netstat to dump RSS/flow information.
This is intended to help in diagnostics and debugging of NIC and stack
flowid support.

Eventually this will grow another column (RSS CPU ID) but
that currently isn't cached in the inpcb.

There's also no clean flowtype -> flowtype identifier string.  This is
the mbuf M_HASHTYPE_* values for RSS.

Here's some example output:

adrian@adrian-hackbox:~/work/freebsd/head/src % netstat -Rn | more
Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address           flowid ftype
tcp4       0      0 10.11.1.65.22          10.11.1.64.12409        29041942     2
udp4       0      0 127.0.0.1.123          *.*                     00000000     0
udp6       0      0 fe80::1%lo0.123        *.*                     00000000     0
udp6       0      0 ::1.123                *.*                     00000000     0
udp4       0      0 10.11.1.65.123         *.*                     00000000     0

Tested:

* amd64 system w/ igb NIC; local driver changes to expose RSS flowid in if_igb.
2014-05-19 17:11:43 +00:00
Simon J. Gerraty
fae50821ae Updated dependencies 2014-05-16 14:09:51 +00:00
Hiroki Sato
0e798e1faa - Do not override sin6_scope_id in LLA when it is already set to non-zero.
This fixes destination list in output of netstat -r.
- Plug a memory leak.
- Add RTM_VERSION check.
- Minor style fixes.
2014-05-15 19:26:20 +00:00
Simon J. Gerraty
76b28ad6ab Updated dependencies 2014-05-10 05:16:28 +00:00
Simon J. Gerraty
cc3f4b9965 Merge from head 2014-05-08 23:54:15 +00:00
Warner Losh
c6063d0da8 Use src.opts.mk in preference to bsd.own.mk except where we need stuff
from the latter.
2014-05-06 04:22:01 +00:00
Gleb Smirnoff
c669105d17 - Remove net.inet.tcp.reass.overflows sysctl. It counts exactly
same events that tcpstat's tcps_rcvmemdrop counter counts.
- Rename tcps_rcvmemdrop to tcps_rcvreassfull and improve its
  description in netstat(1) output.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-05-06 00:00:07 +00:00
Alexander V. Chernikov
68bbdd0e71 Fix "netstat -gW" behavior broken in r259638.
netstat has two options for printing multicast tables:
sysctl (the default one for live systems) and kvm-based one (for cores).
It looks like kvm-based one hasn't been working since it's been introduced
in r190012 due to absence of mfctablesize kernel symbol.
Check for all ipv4-multicast symbols being correctly resolved was introduced
in r259638 regardless of 'live' value leading to "No IPv4 MROUTING" error
message.

Reported by:	Olivier Cochard-Labbé
MFC after:	1 week
2014-04-29 16:51:28 +00:00
Simon J. Gerraty
3b8f084595 Merge head 2014-04-28 07:50:45 +00:00
Gleb Smirnoff
55fb7d688b Now, after r263102 we have ifi_oqdrops in if_data, restore printing of
output queue drops in netstat(1).

No driver, neither kernel fills this field in if_data, yet.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-19 03:33:32 +00:00
Gleb Smirnoff
66dcee729c Garbage collect long time obsoleted (or never used) stuff from routing API. 2014-03-15 06:49:32 +00:00
Gleb Smirnoff
45c203fce2 Remove AppleTalk support.
AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 06:29:43 +00:00
Gleb Smirnoff
2c284d9395 Remove IPX support.
IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.
2014-03-14 02:58:48 +00:00
Gleb Smirnoff
b245f96c44 Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
  notion of data (packet counters, etc) are by no means MD. And it is a
  bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
  which at modern speeds overflow within a second.

  This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
  make future changes to if_data less ABI breaking. Unfortunately the
  8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with:	emax
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-03-13 03:42:24 +00:00
Gleb Smirnoff
46425317db Fix compilation for 32-bit machines. 2014-03-06 02:00:01 +00:00
Gleb Smirnoff
5274e55eb3 Hide struct rtentry from userland. 2014-03-05 01:47:08 +00:00
Gleb Smirnoff
e3a7aa6f56 - Remove rt_metrics_lite and simply put its members into rtentry.
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
  removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
  mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.

The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.

Discussed with:		melifaro
Sponsored by:		Netflix
Sponsored by:		Nginx, Inc.
2014-03-05 01:17:47 +00:00
Gleb Smirnoff
f0e49f6631 Whenever flowtable lookup fails, we do route lookup and then try to
insert flow entry. During the route lookup the critical section is
exited. It may happen, that after route lookup we will be executed
on an other CPU that already has such flowentry. Before this change
we simply freed the flowentry and returned to ip_output() with
failure.

Actually there is nothing wrong with using previously allocated
flow entry, updating it properly. Thus, make flowentry_insert()
return the new either old fle, and make use of it.

Count reuses as "collisions" and real inserts as "inserts".

Reviewed by:	adrian
Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-14 10:56:26 +00:00
Adrian Chadd
6d9223ac7e Reword.
Suggestion:	glebius
2014-02-14 07:43:39 +00:00
Adrian Chadd
0e778c88c9 Don't insert a flowtable entry if the lle isn't yet valid.
Some of the collisions that are occuring are due to flowtable lookups
that succeed but have an invalid lle - typically because the L2 adjacency
lookup hasn't completed.  This would lead to a follow-up insert which
would then fail (ie, collision) and the code would fall through to doing
a slow-path L2/L3 lookup in the netinet/netinet6 code.

This patch simply aborts storing a new flowtable entry if the lle isn't
yet valid.

Whilst I'm here, add a new pcpu counter for the item so the number of
failures can be tracked separately from generic "collisions."

Reviewed by:	glebius
MFC after:	10 days
Sponsored by:	Netflix, Inc.
2014-02-14 00:05:09 +00:00
Gleb Smirnoff
5d6d7e756b o Revamp API between flowtable and netinet, netinet6.
- ip_output() and ip_output6() simply call flowtable_lookup(),
    passing mbuf and address family. That's the only code under
    #ifdef FLOWTABLE in the protocols code now.
o Revamp statistics gathering and export.
  - Remove hand made pcpu stats, and utilize counter(9).
  - Snapshot of statistics is available via 'netstat -rs'.
  - All sysctls are moved into net.flowtable namespace, since
    spreading them over net.inet isn't correct.
o Properly separate at compile time INET and INET6 parts.
o General cleanup.
  - Remove chain of multiple flowtables. We simply have one for
    IPv4 and one for IPv6.
  - Flowtables are allocated in flowtable.c, symbols are static.
  - With proper argument to SYSINIT() we no longer need flowtable_ready.
  - Hash salt doesn't need to be per-VNET.
  - Removed rudimentary debugging, which use quite useless in dtrace era.

The runtime behavior of flowtable shouldn't be changed by this commit.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2014-02-07 15:18:23 +00:00
Bjoern A. Zeeb
a4993b252e Print the MD5 signature information introduced in r221023 in the
TCP statistics output.

MFC after:	3 weeks
2014-02-05 20:43:03 +00:00
Alexander V. Chernikov
120dc21b86 Bump dates in nestat(1) and route(8) man pages.
Fix several small errors introduced by r260524.

Suggested by:	glebius
MFC after:	2 weeks
2014-01-11 09:44:00 +00:00
Alexander V. Chernikov
17ed2e8ea8 Add -4/-6 shorthand for -finet/-finet6 in route(8) and netstat(8).
MFC after:	2 weeks
2014-01-10 23:08:18 +00:00
Alexander V. Chernikov
dbfdd46b70 Explicitly free rt_tables to please Coverity.
Reported by:	Coverity
Coverity CID:	1147174
MFC after:	2 weeks
2013-12-31 12:11:48 +00:00
Gleb Smirnoff
526b18aa8b Claim copyright since I've almost rewritten this file in r256512. 2013-12-29 19:31:49 +00:00
Alexander V. Chernikov
8e1dc13857 Further split kvm(3) and sysctl interfaces for route table printing.
MFC after:	4 weeks
Sponsored by:	Yandex LLC
2013-12-20 12:08:36 +00:00
Alexander V. Chernikov
fc47e028bb Use more fine-grained kvm(3) symbol lookup: routing code retrieves only
necessary symbols needed per subsystem. Main kvm(3) init is now delayed
as much as possbile. This finally fixes performance issues reported in
kern/167204.
Some non-working code (ng_socket.ko symbol addresses calculation) removed.
Some global variables eliminated.

PR:		kern/167204
MFC after:	4 weeks
2013-12-20 00:17:26 +00:00
Alexander V. Chernikov
11188df260 Restore corefiles handling via kvm(3).
Found by:	John-Mark Gurney <jmg at funkthat.com>
MFC after:	4 weeks
2013-12-18 20:04:04 +00:00
Alexander V. Chernikov
c49b4b8055 Switch netstat -rn to use standard API for retrieving list of routes
instead of peeking inside in-kernel radix via kget.
This permits us to change kernel structures without breaking userland.
Additionally, this change provide more reliable and faster output.

`Refs` and `Use` fields available in IPv4 by default (and via -W
for other families) were removed. `Refs` is radix-specific thing
which is not informative for users. `Use` field value is handy sometimes,
but a) current API does not support it and b) I'm not sure we will
support per-rte pcpu counters in near future.

Old method of retrieving data is still supported (either by defining
NewTree=0 or running netstat with -A). However, Refs/Use fields are
hidden.

Sponsored by:	Yandex LLC
MFC after:	4 weeks
PR:		kern/167204
2013-12-18 18:25:27 +00:00
Gleb Smirnoff
82b90729fb 'netstat -i' no longer supports working on a vmcore. 2013-10-30 08:13:42 +00:00
Gleb Smirnoff
3e4d5cd37b Make userland tools honor WITHOUT_PF build option.
Tested by:	dt71@gmx.com
2013-10-29 17:38:13 +00:00
Gleb Smirnoff
84c1edcbad Rewrite netstat/if.c to use getifaddrs(3) and getifmaddrs(3) instead of
libkvm digging in kernel memory. This is possible since r231506 made
getifaddrs(3) to supply if_data for each ifaddr.

  The pros of this change is that now netstat(1) doesn't know about kernel
struct ifnet and struct ifaddr. And these structs are about to change
significantly in head soon. New netstat binary will work well with 10.0
and any future kernel.

  The cons is that now it isn't possible to obtain interface statistics
from a vmcore.

  Functions intpr() and sidewaysintpr() were rewritten from scratch.

  The output of netstat(1) has underwent the following changes:

1) The MTU is not printed for protocol addresses, since it has no notion.
   Dash is printed instead. If there would be a strong desire to return
   previous output, it is doable.
2) Output interface queue drops are not printed. Currently this data isn't
   available to userland via any API. We plan to drop 'struct ifqueue' from
   'struct ifnet' very soon, so old kvm(3) access to queue drops is soon
   to be broken, too. The plan is that drivers would handle their queues
   theirselves and a new field in if_data would be updated in case of drops.
3) In-kernel reference count for multicast addresses isn't printed. I doubt
   that anyone used it. Anyway, netstat(1) is sysadmin tool, not kernel
   debugger.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2013-10-15 09:55:07 +00:00
Gleb Smirnoff
d35acb5099 Remove obtained, but never used data.
Found by:	gcc
2013-10-15 09:21:05 +00:00
Simon J. Gerraty
d1d0158641 Merge from head 2013-09-05 20:18:59 +00:00
Hiroki Sato
84dde578a9 - Use getnameinfo(3) instead of gethostbyaddr(3) or inet_ntop(3).
- Fill sin6_scope_id from in6p.sin6_addr.s6_addr[2].  struct inpcb has
  struct in6_addr for the endpoint addresses, so sin6_scope_id must be filled.
2013-08-17 17:23:42 +00:00
Andrey V. Elsukov
6794f46021 Remove the large part of struct ipsecstat. Only few fields of this
structure is used, but they already have equal fields in the struct
newipsecstat, that was introduced with FAST_IPSEC and then was merged
together with old ipsecstat structure.

This fixes kernel stack overflow on some architectures after migration
ipsecstat to PCPU counters.

Reported by:	Taku YAMAMOTO, Maciej Milewski
2013-07-23 14:14:24 +00:00
Gleb Smirnoff
c780f37850 Sweep unused nlist entries.
Sponsored by:	Nginx, Inc.
2013-07-16 12:22:36 +00:00
Andrey V. Elsukov
05d1f5bce0 Introduce new structure sfstat for collecting sendfile's statistics
and remove corresponding fields from struct mbstat. Use PCPU counters
and SFSTAT_INC() macro for update these statistics.

Discussed with:	glebius
2013-07-15 06:16:57 +00:00
Hiroki Sato
3fddef95af Add -F fibnum option to specify an FIB number for -r flag. 2013-07-12 17:11:30 +00:00
Andrey V. Elsukov
db8c087944 Migrate structs ahstat, espstat, ipcompstat, ipipstat, pfkeystat,
ipsec4stat, ipsec6stat to PCPU counters.
2013-07-09 10:08:13 +00:00
Andrey V. Elsukov
69edf037d7 Migrate struct carpstats to PCPU counters. 2013-07-09 10:02:51 +00:00
Andrey V. Elsukov
a786f67981 Migrate structs ip6stat, icmp6stat and rip6stat to PCPU counters. 2013-07-09 09:54:54 +00:00
Andrey V. Elsukov
5b7cb97c2b Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.
2013-07-09 09:50:15 +00:00
Andrey V. Elsukov
5da0521fce Use new macros to implement ipstat and tcpstat using PCPU counters.
Change interface of kread_counters() similar ot kread() in the netstat(1).
2013-07-09 09:43:03 +00:00
Andrey V. Elsukov
c80211e3cf Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with:	arch@
2013-07-09 09:32:06 +00:00
Andrey V. Elsukov
b543f3b167 Replace hardcoded numbers. Also use interface-local scope name instead
of node-local.
2013-04-16 11:25:45 +00:00
Simon J. Gerraty
69e6d7b75e sync from head 2013-04-12 20:48:55 +00:00
Gleb Smirnoff
29dde48df4 Use kvm_counter_u64_fetch() to fix obtaining ipstat and tcpstat from
kernel core files.

Sponsored by:	Nginx, Inc.
2013-04-10 20:29:23 +00:00
Gleb Smirnoff
5923c29332 Merge from projects/counters: TCP/IP stats.
Convert 'struct ipstat' and 'struct tcpstat' to counter(9).

  This speeds up IP forwarding at extreme packet rates, and
makes accounting more precise.

Sponsored by:	Nginx, Inc.
2013-04-08 19:57:21 +00:00
Simon J. Gerraty
7cf3a1c6b2 Updated dependencies 2013-03-11 17:21:52 +00:00
Alexander V. Chernikov
7e4f8ab64a Add forgotten .El
MFC with:	r248112
2013-03-09 20:04:47 +00:00
Alexander V. Chernikov
69cf5b29fd Document netstat -Q flags meaning.
MFC after:	1 week
2013-03-09 20:01:35 +00:00
Philippe Charnier
321ae07f36 WARNS=6 compliance 2013-02-19 13:17:16 +00:00
Simon J. Gerraty
f5f7c05209 Updated dependencies 2013-02-16 01:23:54 +00:00
David E. O'Brien
d9a447559b Sync with HEAD. 2013-02-08 16:10:16 +00:00
Gleb Smirnoff
f89bcdb03b Use pluralies() for "entry"/"entries". 2013-01-22 18:07:59 +00:00
Hiroki Sato
6bbfef9004 Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to
userland via routing socket or sysctl.  This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:

 sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);

This behavior can be controlled by net.inet6.ip6.deembed_scopeid.  This is
set to 1 by default (sin6_scope_id will be filled in the kernel).

Reviewed by:	bz
2012-11-17 20:19:00 +00:00
Simon J. Gerraty
23090366f7 Sync from head 2012-11-04 02:52:03 +00:00
Alfred Perlstein
504a74fa90 Show the number of times we block waiting for mbufs.
Machines can stall out because mbufs are low, however sometimes we won't
see "requests denied", instead we see user land processes or kernel threads
blocking waiting for mbufs because they set M_WAIT.  These consumers do not
see errors, only stalling.

Unfortunately until now, netstat did not export this information
so you could have experienced an mbuf shortage and have no way of
seeing it unless you happen to run netstat at the exact time of the
shortage and see "in use" = "max".

By exporting the number of times processes are blocked, we can
effectively see how often non-interrupt context threads are effectively
"denied".

MFC after: 2 weeks
2012-10-25 02:12:05 +00:00
Eitan Adler
398de06dcb Remove unused variable. Newer versions of gcc care.
Submitted by:	Sascha Wildner <saw@online.de>
Approved by:	cperciva
MFC after:	3 days
2012-10-22 02:59:59 +00:00
Gleb Smirnoff
d6d3f01e0a Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

 o Fine grained locking, thus much better performance.
 o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

  Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by:	Florian Smeets <flo freebsd.org>
Tested by:	Chekaluk Vitaly <artemrts ukr.net>
Tested by:	Ben Wilber <ben desync.com>
Tested by:	Ian FREISLICH <ianf cloudseed.co.za>
2012-09-08 06:41:54 +00:00
Marcel Moolenaar
7750ad47a9 Sync FreeBSD's bmake branch with Juniper's internal bmake branch.
Requested by: Simon Gerraty <sjg@juniper.net>
2012-08-22 19:25:57 +00:00
Michael Tuexen
3dcc856b6c Allow netstat to be build if INET is not defined in the kernel.
Thanks to Garrett Cooper for reporting the issue.

MFC after: 3 days
X-MFC: 238501
2012-07-16 06:43:04 +00:00
Navdeep Parhar
09fe63205c - Updated TOE support in the kernel.
- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
  These are available as t3_tom and t4_tom modules that augment cxgb(4)
  and cxgbe(4) respectively.  The cxgb/cxgbe drivers continue to work as
  usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs).  T4 iWARP in the
  works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload?  Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded?  Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by:	bz, gnn
Sponsored by:	Chelsio communications.
MFC after:	~3 months (after 9.1, and after ensuring MFC is feasible)
2012-06-19 07:34:13 +00:00
Xin LI
96c3073f76 Eliminate an unused parameter of static method igmp_stats_live_old().
MFC after:	1 month
2012-04-13 22:35:53 +00:00
Gleb Smirnoff
16410af7fa With pf 4.5 import the name of pfsync stats sysctl has changed, thus
'netstat -sp pfsync' got broken. Fix this.
2012-04-04 08:30:32 +00:00
Dimitry Andric
6aa83769fd After r232745, which makes sure __bswap16(), ntohs() and htons() return
__uint16_t, we can partially undo r228668.

Note the remark "Work around a clang false positive with format string
warnings and ntohs macros (see LLVM PR 11313)" was actually incorrect.

Before r232745, on some arches, the ntohs() macros did in fact return
int, not uint16_t, so clang was right in warning about the %hu format
string.

MFC after:	2 weeks
2012-03-09 20:50:15 +00:00
Dimitry Andric
07b202a847 Define several extra macros in bsd.sys.mk and sys/conf/kern.pre.mk, to
get rid of testing explicitly for clang (using ${CC:T:Mclang}) in
individual Makefiles.

Instead, use the following extra macros, for use with clang:
- NO_WERROR.clang       (disables -Werror)
- NO_WCAST_ALIGN.clang  (disables -Wcast-align)
- NO_WFORMAT.clang	(disables -Wformat and friends)
- CLANG_NO_IAS		(disables integrated assembler)
- CLANG_OPT_SMALL	(adds flags for extra small size optimizations)

As a side effect, this enables setting CC/CXX/CPP in src.conf instead of
make.conf!  For clang, use the following:

CC=clang
CXX=clang++
CPP=clang-cpp

MFC after:	2 weeks
2012-02-28 18:30:18 +00:00
Bjoern A. Zeeb
4fd5619bb1 Teach netstat -r (display contents of routing tables) about multi-FIB for
IPv6 in addition to IPv4.
While here harmonize naming of variables a bit with what we use in kernel.

Sponsored by:	Cisco Systems, Inc.
2012-02-03 15:26:55 +00:00
Michael Tuexen
23a0e422aa Don't print a warning when using netstat to print
SCTP statistics when there is not SCTP in the kernel.
This problem was reported by Sean Mahood.

MFC after: 1 week.
2012-01-25 21:49:48 +00:00
Gleb Smirnoff
4b2b8a370c In ng_socket(4) expose less kernel internals to userland. This commit
breaks ABI, but makes probability of ABI breakage in future less.
2012-01-23 15:39:45 +00:00
Eitan Adler
b4f7ea1936 Fix warning when compiling with gcc46:
error: variable 'ifnetfound' set but not used

Approved by:	dim
MFC after:      3 days
2012-01-10 02:58:36 +00:00
Ed Schouten
b3608ae18f Replace index() and rindex() calls with strchr() and strrchr().
The index() and rindex() functions were marked LEGACY in the 2001
revision of POSIX and were subsequently removed from the 2008 revision.
The strchr() and strrchr() functions are part of the C standard.

This makes the source code a lot more consistent, as most of these C
files also call into other str*() routines. In fact, about a dozen
already perform strchr() calls.
2012-01-03 18:51:58 +00:00
Ulrich Spörlein
487ac9ac21 Spelling fixes for usr.bin/ 2011-12-30 11:02:40 +00:00
Maxim Konovalov
d96ea877a7 o Convert IPv6 read-only stats sysctls to the read-write ones.
o Teach netstat(1) -z to reset these stats sysctls.

PR:		bin/153206
Reviewed by:	glebuis
Sponsored by:	NGINX, Inc.
MFC after:	1 month
2011-12-19 05:50:34 +00:00
Dimitry Andric
d88ccef562 Revert r228650, and work around the clang false positive with printf
formats in usr.bin/netstat/atalk.c by conditionally adding NO_WFORMAT to
the Makefile instead.

MFC after:	1 week
2011-12-17 22:32:00 +00:00
Dimitry Andric
288fcda320 In usr.bin/netstat/atalk.c, work around a clang false positive with
printf format warnings and conditional operators.

MFC after:	1 week
2011-12-17 17:21:47 +00:00
Michael Tuexen
2e34c19b93 Fix the following bugs related to the SCTP support of netstat:
* Correctly handle -a.
* -A isn't supported.
* Show all closed 1-to-1 and 1-to-many style sockets.
* Show all listening 1-to-many style sockets.
* Use consistent formatting for -W.

PR: 150642
Approved by: re@
MFC after: 4 weeks.
2011-07-22 16:42:12 +00:00
Michael Tuexen
62372898d6 Truncate link addresses like it is done for any
other address type.

MFC after: 4 weeks
2011-07-12 11:47:08 +00:00
Robert Watson
8f092df025 Teach netstat(1) about the new global netisr policy sysctl,
net.isr.dispatch, and about per-protocol dispatch policies.

MFC after:	3 weeks
Reviewed by:	bz
Sponsored by:	Juniper Networks, Inc.
2011-05-24 12:38:00 +00:00
Ruslan Ermilov
afc7401525 Fixed sockets display somewhat (-L, -T, -x, -Lx, with and without -A).
(I didn't try to fix negative TCP timers with -x.)

MFC after:	3 days
2011-03-26 19:09:28 +00:00
Jeff Roberson
aa0a1e58f0 - Merge in OFED 1.5.3 from projects/ofed/head 2011-03-21 09:58:24 +00:00
Rebecca Cran
e17c4d4eb8 Fix typo. 2011-03-13 16:47:21 +00:00
Robert Watson
bc211d112d While printing out the WSID and CPU ID only the first time it appears for
each workstream, rather than on every protocol, is prettier, it makes
machine-parsing of netstat -Q output a lot harder.  Repeat the information
and hope that the user forgives us slightly dense formatting.

MFC after:	3 days
Reported by:	bz
Sponsored by:	Juniper Networks
2011-01-24 10:58:36 +00:00
Robert Watson
7ff67a2025 Fix off-by-one whitespace error in netstat -Q workstream listing.
Reported by:	bz
MFC after:	3 days
Sponsored by:	Juniper Networks
2011-01-24 10:54:09 +00:00
Hajimu UMEMOTO
cd05232a21 - Hide the internal scope address representation of the KAME IPv6
stack from the output of `netstat -ani'.
- The node-local multicast address in the output of `netstat -rn'
  should be handled as well.

Spotted by:	Bernd Walter <ticso__at__cicely7.cicely.de>
2011-01-20 15:22:01 +00:00
Joel Dahl
da52b4caaf Remove the advertising clause from UCB copyrighted files in usr.bin. This
is in accordance with the information provided at
ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change

Also add $FreeBSD$ to a few files to keep svn happy.

Discussed with:	imp, rwatson
2010-12-11 08:32:16 +00:00
Rebecca Cran
a709913fdd Fix typo. 2010-11-27 21:35:16 +00:00
George V. Neville-Neil
b205d03d78 Restore the (state) and \n printout when not using -T.
Pointed out by:	brucec@
MFC after:	3 weeks
2010-11-22 22:55:43 +00:00
Ryan Stone
3c578b2a05 When netstat was run with -i/-I and -w1 to produce running counters, the idrop
field printed an absolute value rather than the delta from the last value

Approved by:	emaste (mentor)
MFC after:	1 week
2010-11-18 23:46:55 +00:00
George V. Neville-Neil
f5d34df525 Add new, per connection, statistics for TCP, including:
Retransmitted Packets
Zero Window Advertisements
Out of Order Receives

These statistics are available via the -T argument to
netstat(1).
MFC after:	2 weeks
2010-11-17 18:55:12 +00:00
Dimitry Andric
93f8854c00 Remove superfluous cast in usr.bin/netstat/sctp.c.
Found by:	clang
Submitted by:	Norberto Lopes, nlopes dot ml at gmail dot com
Approved by:	rpaulo (mentor)
2010-10-08 20:40:05 +00:00
Ruslan Ermilov
699ef999ee Show hostcache statistics.
Submitted by:	Maxim Dounin
2010-10-05 05:15:27 +00:00
Ed Maste
5eb2aa4543 Remove more extraneous ;s. 2010-07-15 00:04:14 +00:00
Gleb Smirnoff
c1b90938b1 Now fix functionality of 'netstat -f netgraph' that hasn't worked
starting from netgraph import in 1999.

netstat(8) used pointer to node as node address, oops. That didn't
work, we need the node ID in brackets to successfully address a node.
We can't look into ng_node, due to inability to include netgraph/netgraph.h
in userland code. So let the node make a hint for a userland, storing
the node ID in its private data.

MFC after:	2 weeks
2010-03-12 15:04:59 +00:00
Robert Watson
bd9e7af25e Prefer vocabulary of 'Current' and 'Limit' to 'Value' and 'Maximum' in
netstat -Q.

MFC after:	6 days
Sponsored by:	Juniper Networks
2010-03-01 12:11:37 +00:00
Robert Watson
88737be2dd Teach netstat -Q to work with -N and -M by adding libkvm versions of data
query routines.  This code is necessarily more fragile in the presence of
kernel changes than querying the kernel via sysctl (the default), but
useful when investigating crashes or live kernel state via firewire.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-03-01 00:46:45 +00:00
Robert Watson
f836657290 Update date on netstat(1) for -Q.
Suggested by:	bz
MFC after:	1 week
2010-02-22 16:05:45 +00:00
Robert Watson
0153eb6688 Teach netstat(1) to print out netisr statistics when given the -Q argument.
Currently supports only reporting on live systems via sysctl, kmem support
needs to be edded.

MFC after:	1 week
Sponsored by:	Juniper Networks
2010-02-22 15:57:36 +00:00
Xin LI
bf10ffe1d3 Add a new option, -q howmany, which when used in conjuction with -w,
exits netstat after _howmany_ outputs.

Requested by:	thomasa
Reviewed by:	freebsd-net (bms, old version in early 2007)
MFC after:	1 month
2010-01-11 03:00:17 +00:00
Xin LI
821df508e8 Revert most part of 200420 as requested, as more review and polish is
needed.
2009-12-13 03:14:06 +00:00
Xin LI
6f2d322192 Remove unneeded header includes from usr.bin/ except contributed code.
Tested with:	make universe
2009-12-11 23:35:38 +00:00
John Baldwin
3ee7e0d48e Remove -t from the manpage and usage. 2009-12-01 15:18:25 +00:00
Bjoern A. Zeeb
aaae58c491 Unbreak user space after if_timer/if_watchdog removal in r199975.
Tested by:	glebius
2009-12-01 14:56:00 +00:00
Bjoern A. Zeeb
90b4c081d0 Add more statistics variables for IPcomp.
Try to version the struct in a backward compatible way.
People asked for the versioning of the stats structs in general before.

MFC after:	5 days
2009-11-29 20:37:30 +00:00
Attilio Rao
d72dc9a7eb Add the possibility to show informations about dropped packets on the
input path when showing informations about the interfaces.

Obtained from:	Sandvine Incorporated
Reviewed by:	emaste
Sponsored by:	Sandvine Incorporated
MFC:		2 weeks
2009-11-25 15:02:32 +00:00
Robert Watson
c8359dde47 Print routing statistics as unsigned short rather than unsigned int,
otherwise sign extension leads to unlikely values when in the negative
range of the signed short structure fields that hold the statistics.
The type used to hold routing statistics is arguably also incorrect.

MFC after:	3 days
2009-10-15 10:31:24 +00:00
Robert Watson
963b7ccd3b netstat(1) support for UNIX SOCK_SEQPACKET sockets -- changes were required
only for the kvm case, as we supported SOCK_SEQPACKET via sysctl already.

Sponsored by:	Google
MFC after:	3 months
2009-10-05 15:06:14 +00:00
Mike Silbersack
aae0914135 In netstat -x, do not try to print out tcp timer status for udp sockets. 2009-09-23 05:32:33 +00:00
Mike Silbersack
b8614722ff Add the ability to see TCP timers via netstat -x. This can be a useful
feature when you have a seemingly stuck socket and want to figure
out why it has not been closed yet.

No plans to MFC this, as it changes the netstat sysctl ABI.

Reviewed by:	andre, rwatson, Eric Van Gyzen
2009-09-16 05:33:15 +00:00
George V. Neville-Neil
54fc657d59 Add ARP statistics to the kernel and netstat.
New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after:	2 weeks
In collaboration with: bz
2009-09-03 21:10:57 +00:00
Edward Tomasz Napierala
e50022797f Add manual page links to advertise procstat(1) a little better.
Approved by:	re (kib)
2009-07-09 16:40:00 +00:00
Christian S.J. Peron
0e37f3e196 Implement the -z (zero counters) option for the various bpf counters.
Add necessary changes to the kernel for this (basically introduce a
bpf_zero_counters() function).  As well, update the man page.

MFC after:	1 month
Discussed with:	rwatson
2009-06-19 20:31:44 +00:00
Bjoern A. Zeeb
c2c2a7c11e Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by:	julian, rwatson, zec
X-MFC:		not possible
2009-06-01 15:49:42 +00:00
Bruce M Simpson
7d9d64ba0d Add MLDv2 statistic IDs to netstat for IPv6 stack. 2009-04-29 09:52:04 +00:00
Bruce M Simpson
86979280fc Bracket struct mfc and struct rtdetq with #ifdef _KERNEL.
Match the bracketing in netstat.
Since the cleanup of MROUTING, ports have broken because they
expect to include <netinet/ip_mroute.h> without including
<sys/queue.h>. Fix breakage at source.

The real fix, of course, is to fix the MROUTING APIs by blowing them
away and replacing them with something else...
2009-04-21 12:47:09 +00:00
Bruce M Simpson
8392059492 Fix size_t merge-o. 2009-03-19 10:23:26 +00:00
Bruce M Simpson
443fc3176d Introduce a number of changes to the MROUTING code.
This is purely a forwarding plane cleanup; no control plane
code is involved.

Summary:
 * Split IPv4 and IPv6 MROUTING support. The static compile-time
   kernel option remains the same, however, the modules may now
   be built for IPv4 and IPv6 separately as ip_mroute_mod and
   ip6_mroute_mod.
 * Clean up the IPv4 multicast forwarding code to use BSD queue
   and hash table constructs. Don't build our own timer abstractions
   when ratecheck() and timevalclear() etc will do.
 * Expose the multicast forwarding cache (MFC) and virtual interface
   table (VIF) as sysctls, to reduce netstat's dependence on libkvm
   for this information for running kernels.
   * bandwidth meters however still require libkvm.
 * Make the MFC hash table size a boot/load-time tunable ULONG,
   net.inet.ip.mfchashsize (defaults to 256).
 * Remove unused members from struct vif and struct mfc.
 * Kill RSVP support, as no current RSVP implementation uses it.
   These stubs could be moved to raw_ip.c.
 * Don't share locks or initialization between IPv4 and IPv6.
 * Don't use a static struct route_in6 in ip6_mroute.c.
   The v6 code is still using a cached struct route_in6, this is
   moved to mif6 for the time being.
 * More cleanup remains to be merged from ip_mroute.c to ip6_mroute.c.

v4 path tested using ports/net/mcast-tools.
v6 changes are mostly mechanical locking and *have not* been tested.
As these changes partially break some kernel ABIs, they will not
be MFCed. There is a lot more work to be done here.

Reviewed by:	Pavlin Radoslavov
2009-03-19 01:43:03 +00:00
Robert Watson
ad71fe3c35 Correct a number of evolved problems with inp_vflag and inp_flags:
certain flags that should have been in inp_flags ended up in inp_vflag,
meaning that they were inconsistently locked, and in one case,
interpreted.  Move the following flags from inp_vflag to gaps in the
inp_flags space (and clean up the inp_flags constants to make gaps
more obvious to future takers):

  INP_TIMEWAIT
  INP_SOCKREF
  INP_ONESBCAST
  INP_DROPPED

Some aspects of this change have no effect on kernel ABI at all, as these
are UDP/TCP/IP-internal uses; however, netstat and sockstat detect
INP_TIMEWAIT when listing TCP sockets, so any MFC will need to take this
into account.

MFC after:      1 week (or after dependencies are MFC'd)
Reviewed by:    bz
2009-03-15 09:58:31 +00:00
Bruce M Simpson
d10910e6ce Merge IGMPv3 and Source-Specific Multicast (SSM) to the FreeBSD
IPv4 stack.

Diffs are minimized against p4.
PCS has been used for some protocol verification, more widespread
testing of recorded sources in Group-and-Source queries is needed.
sizeof(struct igmpstat) has changed.

__FreeBSD_version is bumped to 800070.
2009-03-09 17:53:05 +00:00
Bruce M Simpson
57e9cb8cda Now that ifmcstat(8) does not suck, retire host-mode netstat -g.
This change will not be back-ported.
2009-02-15 16:16:38 +00:00
Bjoern A. Zeeb
09f8c3ff36 Remove the single global unlocked route cache ip6_forward_rt
from the inet6 stack along with statistics and make sure we
properly free the rt in all cases.

While the current situation is not better performance wise it
prevents panics seen more often these days.
After more inet6 and ipsec cleanup we should be able to improve
the situation again passing the rt to ip6_forward directly.

Leave the ip6_forward_rt entry in struct vinet6 but mark it
for removal.

PR:		kern/128247, kern/131038
MFC after:	25 days
Committed from:	Bugathon #6
Tested by:	Denis Ahrens <denis@h3q.com> (different initial version)
2009-02-01 21:11:08 +00:00
Maxim Konovalov
3ed817f149 o Respect -ss flags (suppress zero counters) for icmp6 "histogram
of error messages" section.

Submitted by:	naddy
MFC after:	1 week
2009-01-13 07:58:57 +00:00
Ruslan Ermilov
82d383bc96 Fix usage() with SYNOPSIS. 2009-01-10 22:49:02 +00:00
Ruslan Ermilov
8bee8d8961 Fix markup and spelling. 2009-01-10 22:48:12 +00:00
Ruslan Ermilov
83708764a7 Fix crash with "netstat -m -N foo".
PR:		bin/124724
MFC after:	3 days
2009-01-10 12:39:12 +00:00
Maxim Konovalov
f1c0a78d99 o With -L flag show unix sockets listen queues stats. It is useful
to know number of not accepted connections for monitoring purposes.

PR:		bin/128871
Submitted by:	Anton Yuzhaninov
MFC after:	1 month
2008-12-31 08:56:49 +00:00
Maxim Konovalov
c0c6601311 o Fix grammar.
PR:		bin/129938
Submitted by:	Bruce Cran
2008-12-26 07:16:20 +00:00
Qing Li
6e6b3f7cbc This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
George V. Neville-Neil
94f138fe60 Fix a printing problem when using the -L flag to netstat caused
by adding the -x flag earlier.

Submitted by:	Anton Yuzhaninov
MFC after:	3 days
2008-11-28 18:35:14 +00:00
Xin LI
1c10962832 Use strlcpy() when we mean it. 2008-10-17 21:14:50 +00:00
Sam Leffler
690f477d75 add new build knobs and jigger some existing controls to improve
control over the result of buildworld and installworld; this especially
helps packaging systems such as nanobsd

Reviewed by:	various (posted to arch)
MFC after:	1 month
2008-09-21 22:02:26 +00:00
David E. O'Brien
dd335a1577 Minimize changes CURRENT<->releng7. 2008-09-01 15:04:38 +00:00
Rui Paulo
4816ba93ac Add ECN stats. 2008-08-26 15:12:29 +00:00
Maksim Yevmenkin
f35a20921e Fix build 2008-07-29 21:20:03 +00:00
George V. Neville-Neil
49f287f8c5 Update the kernel to count the number of mbufs and clusters
(all types) used per socket buffer.

Add support to netstat to print out all of the socket buffer
statistics.

Update the netstat manual page to describe the new -x flag
which gives the extended output.

Reviewed by:	rwatson, julian
2008-05-15 20:18:44 +00:00
Xin LI
5d699a2889 Fix build. 2008-05-10 09:22:17 +00:00
Julian Elischer
a15370c6aa Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not  always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
   Such packets select a FIB from a number associated with the
   socket/PCB. This in turn is inherited from the process,
   but can be changed by a socket option. The process in turn
   inherits it on fork. I have written a utility call setfib
   that acts a bit like nice..

       setfib -3 ping target.example.com # will use fib 3 for ping.

   It is an obvious extension to make it a property of a jail
   but I have not done so. It can be achieved by combining the setfib and
   jail commands.

2/ packets received on an interface for forwarding.
   By default these packets would use table 0,
   (or possibly a number settable in a sysctl(not yet)).
   but prior to routing the firewall can inspect them (see below).
   (possibly in the future you may be able to associate a FIB
   with packets received on an interface..  An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
   associate a fib with it on a packet by packet basis.
   A fib assigned to a packet by a packet classifier
   (such as ipfw) would over-ride a fib associated by
   a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
   accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
   or icmp packets). These should use the FIB associated with the
   packet being reponded to.

6/ Packets generated during encapsulation.
   gif, tun and other tunnel interfaces will encapsulate using the FIB
   that was in effect withthe proces that set up the tunnel.
   thus setfib 1 ifconfig gif0 [tunnel instructions]
   will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

PR:
Reviewed by:	several including rwatson, bz and mlair (parts each)
Approved by:
Obtained from:	Ironport systems/Cisco
MFC after:
Security:
2008-05-09 23:00:22 +00:00
Randall Stewart
4db051c8a5 Fixes typo's in sctp.c 2008-04-16 17:40:30 +00:00
Christian S.J. Peron
582908b314 Catch netstat up for the new bpf stats structures. Print 64 bit values
properly.

Sponsored by:	Seccuris Inc
MFC after:	4 months
2008-03-24 13:50:39 +00:00