Commit Graph

617 Commits

Author SHA1 Message Date
fabient
97541ca804 Add a SPD cache to speed up lookups.
When large SPDs are used, we face two problems:

- too many CPU cycles are spent during the linear searches in the SPD
  for each packet
- too much contention on multi socket systems, since we use a single
  shared lock.

Main changes:

- added the sysctl tree 'net.key.spdcache' to control the SPD cache
  (disabled by default).
- cache the sp indexes that are used to perform SP lookups.
- use a range of dedicated mutexes to protect the cache lines.

Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu>
Reviewed by: ae
Sponsored by:	Stormshield
Differential Revision: https://reviews.freebsd.org/D15050
2018-05-22 15:54:25 +00:00
jtl
80fab35507 Bump netstat.1's .Dd after r331347. 2018-03-22 09:43:15 +00:00
jtl
a93bdf6963 Add the "TCP Blackbox Recorder" which we discussed at the developer
summits at BSDCan and BSDCam in 2017.

The TCP Blackbox Recorder allows you to capture events on a TCP connection
in a ring buffer. It stores metadata with the event. It optionally stores
the TCP header associated with an event (if the event is associated with a
packet) and also optionally stores information on the sockets.

It supports setting a log ID on a TCP connection and using this to correlate
multiple connections that share a common log ID.

You can log connections in different modes. If you are doing a coordinated
test with a particular connection, you may tell the system to put it in
mode 4 (continuous dump). Or, if you just want to monitor for errors, you
can put it in mode 1 (ring buffer) and dump all the ring buffers associated
with the connection ID when we receive an error signal for that connection
ID. You can set a default mode that will be applied to a particular ratio
of incoming connections. You can also manually set a mode using a socket
option.

This commit includes only basic probes. rrs@ has added quite an abundance
of probes in his TCP development work. He plans to commit those soon.

There are user-space programs which we plan to commit as ports. These read
the data from the log device and output pcapng files, and then let you
analyze the data (and metadata) in the pcapng files.

Reviewed by:	gnn (previous version)
Obtained from:	Netflix, Inc.
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D11085
2018-03-22 09:40:08 +00:00
pfg
7551d83c35 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
pfg
872b698bd4 General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
pfg
9da7bdde06 spdx: initial adoption of licensing ID tags.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.

Initially, only tag files that use BSD 4-Clause "Original" license.

RelNotes:	yes
Differential Revision:	https://reviews.freebsd.org/D13133
2017-11-18 14:26:50 +00:00
bdrewery
a598c4b809 DIRDEPS_BUILD: Update dependencies.
Sponsored by:	Dell EMC Isilon
2017-10-31 00:07:04 +00:00
glebius
7168fac388 Hide struct socket and struct unpcb from the userland.
Violators may define _WANT_SOCKET and _WANT_UNPCB respectively and
are not guaranteed for stability of the structures.  The violators
list is the the usual one: libprocstat(3) and netstat(1) internally
and lsof in ports.

In struct xunpcb remove the inclusion of kernel structure and add
a bunch of spare fields.  The xsocket already has socket not included,
but add there spares as well.  Embed xsockbuf into xsocket.

Sort declarations in sys/socketvar.h to separate kernel only from
userland available ones.

PR:		221820 (exp-run)
2017-10-02 23:29:56 +00:00
tuexen
c7bf4af187 The combination of IPv6 and SCTP is also supported.
MFC after:	1 week
2017-09-09 07:48:58 +00:00
bapt
32dca7a62d Don't call kresolv_list() if using netstat on live kernel
kresolve_list() is calling many kldsym(2). Removing that call on when collecting
stats for the running kernel improves the startup time and CPU usage.

Submitted by:	Nikita Kozlov (nikita.kozlov@blade-group.com)
Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	blade
Differential Revision:	https://reviews.freebsd.org/D12151
2017-08-30 07:58:33 +00:00
sbruno
d0208bfad0 Use counter(9) for PLPMTUD counters.
Remove unused PLPMTUD sysctl counters.

Bump UPDATING and FreeBSD Version to indicate a rebuild is required.

Submitted by:	kevin.bowling@kev009.com
Reviewed by:	jtl
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12003
2017-08-25 19:41:38 +00:00
bz
a0dcb7af20 After inpcb route caching was put back in place there is no need for
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).

Reviewed by:		np
Differential Revision:	https://reviews.freebsd.org/D11448
2017-07-27 13:03:36 +00:00
glebius
e35d543ec1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
jhb
5c4f9e6e12 Add descriptions for AES-GCM IPSec authentication (AH) counters.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-05-27 16:53:39 +00:00
pkelsey
f3f1c24017 Fix userland tools that don't check the format of routing socket
messages before accessing message fields that may not be present,
removing dead/duplicate/misleading code along the way.

Document the message format for each routing socket message in
route.h.

Fix a bug in usr.bin/netstat introduced in r287351 that resulted in
pointer computation with essentially random 16-bit offsets and
dereferencing of the results.

Reviewed by:	ae
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D10330
2017-04-16 19:17:10 +00:00
asomers
bf46da77c5 usr.bin/netstat: strcpy -> strlcpy
Reported by:	Coverity
CID:		1006741, 1006744
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2017-04-07 15:15:10 +00:00
glebius
3a5c9aaf2b Hide struct inpcb, struct tcpcb from the userland.
This is a painful change, but it is needed.  On the one hand, we avoid
modifying them, and this slows down some ideas, on the other hand we still
eventually modify them and tools like netstat(1) never work on next version of
FreeBSD.  We maintain a ton of spares in them, and we already got some ifdef
hell at the end of tcpcb.

Details:
- Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
- Make struct xinpcb, struct xtcpcb pure API structures, not including
  kernel structures inpcb and tcpcb inside.  Export into these structures
  the fields from inpcb and tcpcb that are known to be used, and put there
  a ton of spare space.
- Make kernel and userland utilities compilable after these changes.
- Bump __FreeBSD_version.

Reviewed by:	rrs, gnn
Differential Revision:	D10018
2017-03-21 06:39:49 +00:00
glebius
22a79f89d4 Typo. 2017-03-10 19:08:31 +00:00
imp
7e6cabd06e Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
ae
0fb6ad528e Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
smh
33b2c28852 Fix rstat: symbol not in namelist from netstat
Load kvm symbols earlier to prevent rstat: symbol not in namelist
error when running netstat -rs.

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
Sponsored by:	Multiplay
2017-01-09 09:28:03 +00:00
delphij
5454c47191 Fix typo.
MFC after:	3 days
2017-01-09 07:36:31 +00:00
ume
0682ff789a When displaying netstat details with libxo in JSON
or XML modes, the value conversion for tcp6 and udp6
port numbers drops last digit.

PR:		215682
MFC after:	3 days
2017-01-05 11:44:27 +00:00
delphij
1b12c4f0ad Use strlcpy and snprintf in netstat(1).
Expand inet6name() line buffer to NI_MAXHOST and use strlcpy/snprintf
in various places.

Reported by:	Anton Yuzhaninov <citrin citrin ru>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D8916
2017-01-05 09:23:54 +00:00
araujo
b35cc2e296 Print hostcache usage counts with TCP statistics.
PR:		196252
Submitted by:	Anton Yuzhaninov <citrin+pr@citrin.ru>
MFC after:	3 weeks.
2016-12-28 13:11:22 +00:00
glebius
8c35911278 Use bogus_page to properly reduce number of I/Os in sendfile(2). The new
sendfile_swapin() loop works this way:

- Find first invalid page in the request.
- Do vm_pager_has_page() and get count of pages, that can be taken in
  single I/O.
- Trim valid pages from the end of the request.
- Cycle through the request and substitute to bogus_page all valid
  pages that are in the middle of the request.
- After I/O launched (pager copies array of pages into buf(9), it
  is important to restore proper page pointers with help vm_page_lookup().

Count bogus pages used and report them in sendfile stats.
2016-11-17 21:02:55 +00:00
bde
75e3719bec Fix build without INET6 and with gcc. A function definition was ifdefed
for INET6, but its protototype was not, and gcc detects the error.
2016-08-27 11:06:06 +00:00
tuexen
615d47e6cd Fix the output for scope statistics.
MFC after: 3 days
2016-08-17 16:56:20 +00:00
tuexen
697fdb532f Use names for SCTP and UDPLite when reporting the input histogram.
MFC after: 3 days
2016-08-17 14:44:47 +00:00
araujo
87eb49e67e Use nitems() from sys/param.h.
MFC after:	2 weeks.
Sponsored by:	gandi.net (BSD Day Taiwan)
2016-07-30 07:06:23 +00:00
tuexen
6144aff95c Don't duplicate code for SCTP, just use the ones used for UDP and TCP.
This fixes a bug with link local addresses. This will require and
upcoming change in the kernel to bring SCTP to the same behaviour
as UDP and TCP.

MFC after:	3 days
2016-07-17 11:43:27 +00:00
tuexen
3ffa3182a7 Ensure that the -a, -W, -L options for SCTP behave similar
as for TCP.

MFC after:	3 days
2016-07-15 23:13:57 +00:00
tuexen
7df95e052c When calling netstat -Laptcp the local address values are not aligned
with the corresponding entry in the table header.
r295136 increased the value width from 14 to 32 without the corresponding
change to the table header. This commit adds the change to the table
header width.

MFC after:	3 days
2016-07-15 17:40:34 +00:00
tuexen
12a4a4a008 Fix a bug which results in a core dump when running netstat with
the -W option and having a listening SCTP socket.
The bug was introduced in r279122 when adding support for libxo.

MFC after:	3 days
2016-07-15 15:55:36 +00:00
araujo
244576d3c3 Use macro MAX() from sys/param.h.
MFC after:	2 weeks.
2016-04-22 03:37:27 +00:00
araujo
d2d25bb0ca Use NULL instead of 0 for pointers.
Also malloc will return NULL if it cannot allocate memory.

MFC after:	2 weeks.
2016-04-18 05:46:18 +00:00
pfg
a7966dbd97 netstat: avoid returning uninitialized value in p_sockaddr().
In the case the width is less than 0, we are returning an uninitialized
value. For practical purposes the return value is ignored but initialize
it to avoid trouble.

CID:	1341619
2016-03-27 20:02:21 +00:00
glebius
c39b2fd5d1 Print running TCP connection counts with TCP statistics. 2016-03-15 00:19:30 +00:00
bdrewery
2a891f1feb DIRDEPS_BUILD: Regenerate without local dependencies.
These are no longer needed after the recent 'beforebuild: depend' changes
and hooking DIRDEPS_BUILD into a subset of FAST_DEPEND which supports
skipping 'make depend'.

Sponsored by:	EMC / Isilon Storage Division
2016-02-24 17:20:11 +00:00
alfred
b04e8a547e Increase max allowed backlog for listen sockets
from short to int.

PR: 203922
Submitted by: White Knight <white_knight@2ch.net>
MFC After: 4 weeks
2016-02-02 05:57:59 +00:00
glebius
aaa09777e1 New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates:	Netflix global launch!
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
Relnotes:	yes
2016-01-08 20:34:57 +00:00
gnn
a7e4d91c23 Switch the IPsec related statistics to using the built in sysctl
variable set rather than reading from kernel memory.
This also makes the -z (zero) flag work correctly

MFC after:	1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D4591
2015-12-17 02:02:09 +00:00
rodrigc
9405dd95c9 Add more text to explain --libxo flag. 2015-12-01 19:18:53 +00:00
ume
878e05a4e4 At the time a destination or a gateway of `netstat -r'
protrudes its field, narrow the next field to raise
readability bit.
2015-12-01 16:04:50 +00:00
bdrewery
44973a75bb Update dependencies after r291406 added libelf to libkvm.
Unfortunately filemon/meta mode tracks all indirect dependencies here
since ld(1) is reading libelf when linking in libkvm.  Churn would be
reduced if this was able to be limited to direct dependencies.

Sponsored by:	EMC / Isilon Storage Division
2015-12-01 05:18:48 +00:00
bdrewery
2ab2ea6fbd Replace DPSRCS that work fine in SRCS.
This is so that 'make depend' is not a required build step in these
files.

DPSRCS is overall unneeded.  DPSRCS already contains SRCS, so anything
which can safely be in SRCS should be.  DPSRCS is mostly just a way to
generate files that should not be linked into the final PROG/LIB.  For
headers and grammars it is safe for them to be in SRCS since they will
be excluded during linking and installation.

The only remaining uses of DPSRCS are for generating .c or .o files that
must be built before 'make depend' can run 'mkdep' on the SRCS c files
list.  A semi-proper example is in tests/sys/kern/acct/Makefile where a
checked-in .c file has an #include on a generated .c file.  The
generated .c file should not be linked into the final PROG though since
it is #include'd.  The more proper way here is just to build/link it in
though without DPSRCS.  Another example is in sys/modules/linux/Makefile
where a shell script runs to parse a DPSRCS .o file that should not be
linked into the module.  Beyond those, the need for DPSRCS is largely
unneeded, redundant, and forces 'make depend' to be ran.  Generally,
these Makefiles should avoid the need for DPSRCS and define proper
dependencies for their files as well.

An example of an improper usage and why this matters is in usr.bin/netstat.
nl_defs.h was only in DPSRCS and so was not generated during 'make all',
but only during 'make depend'.  The files including it lacked proper
depenencies on it, which forced running 'make depend' to workaround that
bug.  The 'make depend' target should mostly be used for incremental build
help, not to produce a working build.  This specific example was broken in
the meta build until r287905 since it does not run 'make depend'.

The gnu/lib/libreadline/readline case is fine since bsd.lib.mk has 'OBJS:
SRCS:M*.h' when there is no .depend file.

Sponsored by:	EMC / Isilon Storage Division
MFC after:	1 week
2015-11-25 20:38:17 +00:00
ume
97e188cb58 Fix udp entry of `netstat -TW'. 2015-11-25 11:20:54 +00:00
ume
745eb99f16 Correct alignment of the addresses in the `netstat -aW' output. 2015-11-24 14:25:40 +00:00
ume
34233609c9 Add missing error check after xo_parse_args() in netstat(8).
Submitted by:	Oliver Pinter
Differential Revision:	https://reviews.freebsd.org/D4233
2015-11-24 11:07:37 +00:00
ume
bbdebe789d Don't truncate an interface name when -W option is specified.
Spotted by:	Jim Thompson <jim__at__netgate.com>
MFC after:	1 week
2015-11-20 12:32:49 +00:00