Commit Graph

605 Commits

Author SHA1 Message Date
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
John Baldwin
708e72dc37 Add descriptions for AES-GCM IPSec authentication (AH) counters.
MFC after:	1 week
Sponsored by:	Chelsio Communications
2017-05-27 16:53:39 +00:00
Patrick Kelsey
2f8c6c0a58 Fix userland tools that don't check the format of routing socket
messages before accessing message fields that may not be present,
removing dead/duplicate/misleading code along the way.

Document the message format for each routing socket message in
route.h.

Fix a bug in usr.bin/netstat introduced in r287351 that resulted in
pointer computation with essentially random 16-bit offsets and
dereferencing of the results.

Reviewed by:	ae
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D10330
2017-04-16 19:17:10 +00:00
Alan Somers
2b8aaacea9 usr.bin/netstat: strcpy -> strlcpy
Reported by:	Coverity
CID:		1006741, 1006744
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
2017-04-07 15:15:10 +00:00
Gleb Smirnoff
cc65eb4e79 Hide struct inpcb, struct tcpcb from the userland.
This is a painful change, but it is needed.  On the one hand, we avoid
modifying them, and this slows down some ideas, on the other hand we still
eventually modify them and tools like netstat(1) never work on next version of
FreeBSD.  We maintain a ton of spares in them, and we already got some ifdef
hell at the end of tcpcb.

Details:
- Hide struct inpcb, struct tcpcb under _KERNEL || _WANT_FOO.
- Make struct xinpcb, struct xtcpcb pure API structures, not including
  kernel structures inpcb and tcpcb inside.  Export into these structures
  the fields from inpcb and tcpcb that are known to be used, and put there
  a ton of spare space.
- Make kernel and userland utilities compilable after these changes.
- Bump __FreeBSD_version.

Reviewed by:	rrs, gnn
Differential Revision:	D10018
2017-03-21 06:39:49 +00:00
Gleb Smirnoff
d9646465e3 Typo. 2017-03-10 19:08:31 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Andrey V. Elsukov
fcf596178b Merge projects/ipsec into head/.
Small summary
 -------------

o Almost all IPsec releated code was moved into sys/netipsec.
o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel
  option IPSEC_SUPPORT added. It enables support for loading
  and unloading of ipsec.ko and tcpmd5.ko kernel modules.
o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by
  default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type
  support was removed. Added TCP/UDP checksum handling for
  inbound packets that were decapsulated by transport mode SAs.
  setkey(8) modified to show run-time NAT-T configuration of SA.
o New network pseudo interface if_ipsec(4) added. For now it is
  build as part of ipsec.ko module (or with IPSEC kernel).
  It implements IPsec virtual tunnels to create route-based VPNs.
o The network stack now invokes IPsec functions using special
  methods. The only one header file <netipsec/ipsec_support.h>
  should be included to declare all the needed things to work
  with IPsec.
o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed.
  Now these protocols are handled directly via IPsec methods.
o TCP_SIGNATURE support was reworked to be more close to RFC.
o PF_KEY SADB was reworked:
  - now all security associations stored in the single SPI namespace,
    and all SAs MUST have unique SPI.
  - several hash tables added to speed up lookups in SADB.
  - SADB now uses rmlock to protect access, and concurrent threads
    can do SA lookups in the same time.
  - many PF_KEY message handlers were reworked to reflect changes
    in SADB.
  - SADB_UPDATE message was extended to support new PF_KEY headers:
    SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They
    can be used by IKE daemon to change SA addresses.
o ipsecrequest and secpolicy structures were cardinally changed to
  avoid locking protection for ipsecrequest. Now we support
  only limited number (4) of bundled SAs, but they are supported
  for both INET and INET6.
o INPCB security policy cache was introduced. Each PCB now caches
  used security policies to avoid SP lookup for each packet.
o For inbound security policies added the mode, when the kernel does
  check for full history of applied IPsec transforms.
o References counting rules for security policies and security
  associations were changed. The proper SA locking added into xform
  code.
o xform code was also changed. Now it is possible to unregister xforms.
  tdb_xxx structures were changed and renamed to reflect changes in
  SADB/SPDB, and changed rules for locking and refcounting.

Reviewed by:	gnn, wblock
Obtained from:	Yandex LLC
Relnotes:	yes
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D9352
2017-02-06 08:49:57 +00:00
Steven Hartland
5d693684b9 Fix rstat: symbol not in namelist from netstat
Load kvm symbols earlier to prevent rstat: symbol not in namelist
error when running netstat -rs.

Submitted by:	Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after:	1 week
Sponsored by:	Multiplay
2017-01-09 09:28:03 +00:00
Xin LI
f0dac7b3f3 Fix typo.
MFC after:	3 days
2017-01-09 07:36:31 +00:00
Hajimu UMEMOTO
f03f398cda When displaying netstat details with libxo in JSON
or XML modes, the value conversion for tcp6 and udp6
port numbers drops last digit.

PR:		215682
MFC after:	3 days
2017-01-05 11:44:27 +00:00
Xin LI
f193c8ce0d Use strlcpy and snprintf in netstat(1).
Expand inet6name() line buffer to NI_MAXHOST and use strlcpy/snprintf
in various places.

Reported by:	Anton Yuzhaninov <citrin citrin ru>
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D8916
2017-01-05 09:23:54 +00:00
Marcelo Araujo
e9d2c20108 Print hostcache usage counts with TCP statistics.
PR:		196252
Submitted by:	Anton Yuzhaninov <citrin+pr@citrin.ru>
MFC after:	3 weeks.
2016-12-28 13:11:22 +00:00
Gleb Smirnoff
5dba303d01 Use bogus_page to properly reduce number of I/Os in sendfile(2). The new
sendfile_swapin() loop works this way:

- Find first invalid page in the request.
- Do vm_pager_has_page() and get count of pages, that can be taken in
  single I/O.
- Trim valid pages from the end of the request.
- Cycle through the request and substitute to bogus_page all valid
  pages that are in the middle of the request.
- After I/O launched (pager copies array of pages into buf(9), it
  is important to restore proper page pointers with help vm_page_lookup().

Count bogus pages used and report them in sendfile stats.
2016-11-17 21:02:55 +00:00
Bruce Evans
de618daaf0 Fix build without INET6 and with gcc. A function definition was ifdefed
for INET6, but its protototype was not, and gcc detects the error.
2016-08-27 11:06:06 +00:00
Michael Tuexen
5ad497eead Fix the output for scope statistics.
MFC after: 3 days
2016-08-17 16:56:20 +00:00
Michael Tuexen
e0694d2f0a Use names for SCTP and UDPLite when reporting the input histogram.
MFC after: 3 days
2016-08-17 14:44:47 +00:00
Marcelo Araujo
d19ba08a16 Use nitems() from sys/param.h.
MFC after:	2 weeks.
Sponsored by:	gandi.net (BSD Day Taiwan)
2016-07-30 07:06:23 +00:00
Michael Tuexen
9bf9ce8144 Don't duplicate code for SCTP, just use the ones used for UDP and TCP.
This fixes a bug with link local addresses. This will require and
upcoming change in the kernel to bring SCTP to the same behaviour
as UDP and TCP.

MFC after:	3 days
2016-07-17 11:43:27 +00:00
Michael Tuexen
ca658db39c Ensure that the -a, -W, -L options for SCTP behave similar
as for TCP.

MFC after:	3 days
2016-07-15 23:13:57 +00:00
Michael Tuexen
db2627f4b1 When calling netstat -Laptcp the local address values are not aligned
with the corresponding entry in the table header.
r295136 increased the value width from 14 to 32 without the corresponding
change to the table header. This commit adds the change to the table
header width.

MFC after:	3 days
2016-07-15 17:40:34 +00:00
Michael Tuexen
282d1dd7ff Fix a bug which results in a core dump when running netstat with
the -W option and having a listening SCTP socket.
The bug was introduced in r279122 when adding support for libxo.

MFC after:	3 days
2016-07-15 15:55:36 +00:00
Marcelo Araujo
fcc0131f63 Use macro MAX() from sys/param.h.
MFC after:	2 weeks.
2016-04-22 03:37:27 +00:00
Marcelo Araujo
ef1cb629d8 Use NULL instead of 0 for pointers.
Also malloc will return NULL if it cannot allocate memory.

MFC after:	2 weeks.
2016-04-18 05:46:18 +00:00
Pedro F. Giffuni
cfe3da09e2 netstat: avoid returning uninitialized value in p_sockaddr().
In the case the width is less than 0, we are returning an uninitialized
value. For practical purposes the return value is ignored but initialize
it to avoid trouble.

CID:	1341619
2016-03-27 20:02:21 +00:00
Gleb Smirnoff
dbfd87087b Print running TCP connection counts with TCP statistics. 2016-03-15 00:19:30 +00:00
Bryan Drewery
bd18fd57db DIRDEPS_BUILD: Regenerate without local dependencies.
These are no longer needed after the recent 'beforebuild: depend' changes
and hooking DIRDEPS_BUILD into a subset of FAST_DEPEND which supports
skipping 'make depend'.

Sponsored by:	EMC / Isilon Storage Division
2016-02-24 17:20:11 +00:00
Alfred Perlstein
7325dfbb59 Increase max allowed backlog for listen sockets
from short to int.

PR: 203922
Submitted by: White Knight <white_knight@2ch.net>
MFC After: 4 weeks
2016-02-02 05:57:59 +00:00
Gleb Smirnoff
2bab0c5535 New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates:	Netflix global launch!
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
Relnotes:	yes
2016-01-08 20:34:57 +00:00
George V. Neville-Neil
9d2d8e7bec Switch the IPsec related statistics to using the built in sysctl
variable set rather than reading from kernel memory.
This also makes the -z (zero) flag work correctly

MFC after:	1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D4591
2015-12-17 02:02:09 +00:00
Craig Rodrigues
06691045ba Add more text to explain --libxo flag. 2015-12-01 19:18:53 +00:00
Hajimu UMEMOTO
4fc31adf6a At the time a destination or a gateway of `netstat -r'
protrudes its field, narrow the next field to raise
readability bit.
2015-12-01 16:04:50 +00:00
Bryan Drewery
cf990407e1 Update dependencies after r291406 added libelf to libkvm.
Unfortunately filemon/meta mode tracks all indirect dependencies here
since ld(1) is reading libelf when linking in libkvm.  Churn would be
reduced if this was able to be limited to direct dependencies.

Sponsored by:	EMC / Isilon Storage Division
2015-12-01 05:18:48 +00:00
Bryan Drewery
114350b9de Replace DPSRCS that work fine in SRCS.
This is so that 'make depend' is not a required build step in these
files.

DPSRCS is overall unneeded.  DPSRCS already contains SRCS, so anything
which can safely be in SRCS should be.  DPSRCS is mostly just a way to
generate files that should not be linked into the final PROG/LIB.  For
headers and grammars it is safe for them to be in SRCS since they will
be excluded during linking and installation.

The only remaining uses of DPSRCS are for generating .c or .o files that
must be built before 'make depend' can run 'mkdep' on the SRCS c files
list.  A semi-proper example is in tests/sys/kern/acct/Makefile where a
checked-in .c file has an #include on a generated .c file.  The
generated .c file should not be linked into the final PROG though since
it is #include'd.  The more proper way here is just to build/link it in
though without DPSRCS.  Another example is in sys/modules/linux/Makefile
where a shell script runs to parse a DPSRCS .o file that should not be
linked into the module.  Beyond those, the need for DPSRCS is largely
unneeded, redundant, and forces 'make depend' to be ran.  Generally,
these Makefiles should avoid the need for DPSRCS and define proper
dependencies for their files as well.

An example of an improper usage and why this matters is in usr.bin/netstat.
nl_defs.h was only in DPSRCS and so was not generated during 'make all',
but only during 'make depend'.  The files including it lacked proper
depenencies on it, which forced running 'make depend' to workaround that
bug.  The 'make depend' target should mostly be used for incremental build
help, not to produce a working build.  This specific example was broken in
the meta build until r287905 since it does not run 'make depend'.

The gnu/lib/libreadline/readline case is fine since bsd.lib.mk has 'OBJS:
SRCS:M*.h' when there is no .depend file.

Sponsored by:	EMC / Isilon Storage Division
MFC after:	1 week
2015-11-25 20:38:17 +00:00
Hajimu UMEMOTO
0eaa116e1e Fix udp entry of `netstat -TW'. 2015-11-25 11:20:54 +00:00
Hajimu UMEMOTO
046aad399a Correct alignment of the addresses in the `netstat -aW' output. 2015-11-24 14:25:40 +00:00
Hajimu UMEMOTO
06ff7ccb5f Add missing error check after xo_parse_args() in netstat(8).
Submitted by:	Oliver Pinter
Differential Revision:	https://reviews.freebsd.org/D4233
2015-11-24 11:07:37 +00:00
Hajimu UMEMOTO
857357b6c9 Don't truncate an interface name when -W option is specified.
Spotted by:	Jim Thompson <jim__at__netgate.com>
MFC after:	1 week
2015-11-20 12:32:49 +00:00
Hajimu UMEMOTO
b1a302a533 Avoid core dump when output style is html. 2015-11-20 12:15:58 +00:00
Hajimu UMEMOTO
d48d68fa9e JSON doesn't permit a hexadecimal notation of an integer. 2015-11-17 12:09:57 +00:00
Hajimu UMEMOTO
b670e89ac2 Do not truncate addresses when printing in encoded format. 2015-11-06 14:50:23 +00:00
Hajimu UMEMOTO
fb3ab86f32 - Fix alignment for padding link address.
- Trim whitespace of link address.
2015-11-06 14:35:22 +00:00
Enji Cooper
f9b3502ce1 Fix compiling netstat after r290367 by substituting sys/types.h for
sys/param.h, as sys/param.h defines the MAX(..) macro

Reported by: O. Hartmann <ohartman@zedat.fu-berlin.de>
Pointyhat to: ume
2015-11-06 08:43:12 +00:00
Hajimu UMEMOTO
cca052c621 Give enough room for addresses when -W option is specified. 2015-11-05 11:06:46 +00:00
Hajimu UMEMOTO
011ae9c907 Fix alignment of `Drop' header. 2015-11-05 11:04:43 +00:00
Hajimu UMEMOTO
f3ffc9fdf9 Use returned network name from getnetbyaddr() correctly. 2015-11-05 11:02:28 +00:00
Hajimu UMEMOTO
6f53a03868 Revert previous workaround. This problem was fixed
by r290318.
2015-11-05 10:58:19 +00:00
Hajimu UMEMOTO
6ad5f7ca01 Since sa->sa_len doesn't match sizeof(struct sockaddr_dl),
getnameinfo() fails against sockaddr_dl.  This commit is workaround
for this problem.
2015-11-04 19:09:42 +00:00
Hajimu UMEMOTO
3e687cd867 Fix alignment of AF_LINK address. 2015-11-04 19:05:04 +00:00
Hajimu UMEMOTO
86a598950f Simplify r290367 using asterisk for a field width
and precision.
2015-11-04 16:59:12 +00:00