Commit Graph

466 Commits

Author SHA1 Message Date
Mark Johnston
79db6fe7aa Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
Jonathan T. Looney
e77f0bdcb5 r334853 added a "socket destructor" callback. However, as implemented, it
was really a "socket close" callback.

Update the socket destructor functionality to run when a socket is
destroyed (rather than when it is closed). The original submitter has
confirmed that this change satisfies the intended use case.

Suggested by:	rwatson
Submitted by:	Michio Honda <micchie at sfc.wide.ad.jp>
Tested by:	Michio Honda <micchie at sfc.wide.ad.jp>
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17590
2018-10-18 14:20:15 +00:00
Gleb Smirnoff
ad7eb8cad5 In PR 227259, a user is reporting that they have code which is using
shutdown() to wakeup another thread blocked on a stream listen socket.
This code is failing, while it used to work on FreeBSD 10 and still
works on Linux.

It seems reasonable to add another exception to support something users are
actually doing, which used to work on FreeBSD 10, and still works on Linux.
And, it seems like it should be acceptable to POSIX, as we still return
ENOTCONN.

This patch is different to what had been committed to stable/11, since
code around listening sockets is different. Patch in D15019 is written
by jtl@, slightly modified by me.

PR:		227259
Obtained from:	jtl
Approved by:	re (kib)
Differential Revision:  D15019
2018-10-03 17:40:04 +00:00
Michael Tuexen
6b01d4d433 Add SOL_SOCKET level socket option with name SO_DOMAIN to get
the domain of a socket.

This is helpful when testing and Solaris and Linux have the same
socket option using the same name.

Reviewed by:		bcr@, rrs@
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D16791
2018-08-21 14:04:30 +00:00
Brooks Davis
3a20f06a1c Use uintptr_t alone when assigning to kvaddr_t variables.
Suggested by:	jhb
2018-07-10 13:03:06 +00:00
Brooks Davis
7524b4c14b Correct breakage on 32-bit platforms from r335979. 2018-07-06 10:03:33 +00:00
Brooks Davis
f38b68ae8a Make struct xinpcb and friends word-size independent.
Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662.  This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR:		228301 (exp-run by antoine)
Reviewed by:	jtl, kib, rwatson (various versions)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15386
2018-07-05 13:13:48 +00:00
Matt Macy
0ea9d9376e limit change to fixing controlp handling pending review 2018-06-11 17:10:19 +00:00
Matt Macy
c34bf30069 soreceive_stream: correctly handle edge cases
- non NULL controlp is not an error, returning EINVAL
  would cause X forwarding to fail

- MSG_PEEK and MSG_WAITALL are fairly exceptional, but we still
  want to handle them - punt to soreceive_generic
2018-06-11 16:31:42 +00:00
Jonathan T. Looney
1fbe13cf4b Add a socket destructor callback. This allows kernel providers to set
callbacks to perform additional cleanup actions at the time a socket is
closed.

Michio Honda presented a use for this at BSDCan 2018.
(See https://www.bsdcan.org/2018/schedule/events/965.en.html .)

Submitted by:	Michio Honda <micchie at sfc.wide.ad.jp> (previous version)
Reviewed by:	lstewart (previous version)
Differential Revision:	https://reviews.freebsd.org/D15706
2018-06-08 19:35:24 +00:00
Sean Bruno
1a43cff92a Load balance sockets with new SO_REUSEPORT_LB option.
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations:
As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967.  Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by:	Johannes Lundberg <johalun0@gmail.com>
Obtained from:	DragonflyBSD
Relnotes:	Yes
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-06-06 15:45:57 +00:00
Sean Bruno
7875017ca9 Revert r332894 at the request of the submitter.
Submitted by:	Johannes Lundberg <johalun0_gmail.com>
Sponsored by:	Limelight Networks
2018-04-24 19:55:12 +00:00
Sean Bruno
7b7796eea5 Load balance sockets with new SO_REUSEPORT_LB option
This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

Limitations
As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by:	Johannes Lundberg <johanlun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D11003
2018-04-23 19:51:00 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Gleb Smirnoff
584ab65a75 Fix locking in soisconnected().
When a newborn socket moves from incomplete queue to complete
one, we need to obtain the listening socket lock after the child,
which is a wrong order.  The old code did that in potentially
endless loop of mtx_trylock().  The new one does only one attempt
of mtx_trylock(), and in case of failure references listening
socket, unlocks child and locks everything in right order.  In
case if listening socket shuts down during that, just bail out.

Reported & tested by:	Jason Eggleston <jeggleston llnw.com>
Reported & tested by:	Jason Wolfe <jason llnw.com>
2017-09-14 18:05:54 +00:00
Gleb Smirnoff
555b3e2f2c Third take on the r319685 and r320480. Actually allow for call soisconnected()
via soisdisconnected(), and in the earlier unlock earlier to avoid lock
recursion.

This fixes a situation when a socket on accept queue is reset before being
accepted.

Reported by:	Jason Eggleston <jeggleston llnw.com>
2017-08-24 20:49:19 +00:00
Michael Tuexen
27d8bea898 Fix getsockopt() for listening sockets when using SO_SNDBUF, SO_RCVBUF,
SO_SNDLOWAT, SO_RCVLOWAT. Since r31972 it only worked for non-listening
sockets.

Sponsored by:	Netflix, Inc.
2017-07-21 07:44:43 +00:00
Hans Petter Selasky
fe715b8090 After r319722 two fields were left uninitialized when transforming a
socket structure into a listening socket. This resulted in an invalid
instruction fault for all 32-bit platforms.

When INVARIANTS is set the union where the two uninitialized fields
reside gets properly zeroed. This patch ensures the two uninitialized
fields are zeroed when INVARIANTS is undefined.

For 64-bit platforms this issue was not visible because so->sol_upcall
which is uninitialized overlaps with so->so_rcv.sb_state which is
already zero during soalloc();

For 32-bit platforms this issue was visible and resulted in an invalid
instruction fault, because so->sol_upcall overlaps with
so->so_rcv.sb_sel which is always initialized to a valid data pointer
during soalloc().

Verifying the offset locations mentioned above are identical is left
as an exercise to the reader.

PR: 220452
PR: 220358
Reviewed by:	ae (network), gallatin
Differential Revision:	https://reviews.freebsd.org/D11475
Sponsored by:	Mellanox Technologies
2017-07-04 18:23:17 +00:00
Gleb Smirnoff
64290befc1 Provide sbsetopt() that handles socket buffer related socket options.
It distinguishes between data flow sockets and listening sockets, and
in case of the latter doesn't change resource limits, since listening
sockets don't hold any buffers, they only carry values to be inherited
by their children.
2017-06-25 01:41:07 +00:00
Gleb Smirnoff
2b8e036bfc Plug read(2) and write(2) on listening sockets. 2017-06-15 20:11:29 +00:00
Gleb Smirnoff
779f106aa1 Listening sockets improvements.
o Separate fields of struct socket that belong to listening from
  fields that belong to normal dataflow, and unionize them.  This
  shrinks the structure a bit.
  - Take out selinfo's from the socket buffers into the socket. The
    first reason is to support braindamaged scenario when a socket is
    added to kevent(2) and then listen(2) is cast on it. The second
    reason is that there is future plan to make socket buffers pluggable,
    so that for a dataflow socket a socket buffer can be changed, and
    in this case we also want to keep same selinfos through the lifetime
    of a socket.
  - Remove struct struct so_accf. Since now listening stuff no longer
    affects struct socket size, just move its fields into listening part
    of the union.
  - Provide sol_upcall field and enforce that so_upcall_set() may be called
    only on a dataflow socket, which has buffers, and for listening sockets
    provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
  - Add a mutex to socket, to be used instead of socket buffer lock to lock
    fields of struct socket that don't belong to a socket buffer.
  - Allow to acquire two socket locks, but the first one must belong to a
    listening socket.
  - Make soref()/sorele() to use atomic(9).  This allows in some situations
    to do soref() without owning socket lock.  There is place for improvement
    here, it is possible to make sorele() also to lock optionally.
  - Most protocols aren't touched by this change, except UNIX local sockets.
    See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
  listening sockets: provide function solisten_dequeue(), and use it in
  the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
  infiniband, rpc.

o UNIX local sockets.
  - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
    local sockets.  Most races exist around spawning a new socket, when we
    are connecting to a local listening socket.  To cover them, we need to
    hold locks on both PCBs when spawning a third one.  This means holding
    them across sonewconn().  This creates a LOR between pcb locks and
    unp_list_lock.
  - To fix the new LOR, abandon the global unp_list_lock in favor of global
    unp_link_lock.  Indeed, separating these two locks didn't provide us any
    extra parralelism in the UNIX sockets.
  - Now call into uipc_attach() may happen with unp_link_lock hold if, we
    are accepting, or without unp_link_lock in case if we are just creating
    a socket.
  - Another problem in UNIX sockets is that uipc_close() basicly did nothing
    for a listening socket.  The vnode remained opened for connections.  This
    is fixed by removing vnode in uipc_close().  Maybe the right way would be
    to do it for all sockets (not only listening), simply move the vnode
    teardown from uipc_detach() to uipc_close()?

Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D9770
2017-06-08 21:30:34 +00:00
Gleb Smirnoff
b3244df799 Provide typedef for socket upcall function.
While here change so_gen_t type to modern uint64_t.
2017-06-07 01:48:11 +00:00
Gleb Smirnoff
b94f68dc52 Remove a piece of dead code. 2017-06-07 01:21:34 +00:00
Gleb Smirnoff
971af2a311 Rename accept filter getopt/setopt functions, so that they are prefixed
with module name and match other functions in the module.  There is no
functional change.
2017-06-02 17:49:21 +00:00
Patrick Kelsey
1431521236 Remove unnecessary check for NULL mbuf in soreceive_generic().
This check has been redundant since it was introduced in r162554.

Reviewed by:	emaste, glebius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10322
2017-04-25 19:54:34 +00:00
Maxim Sobolev
63649db042 Restore ability to shutdown DGRAM sockets, still forcing ENOTCONN to be returned
by the shutdown(2) system call. This ability has been lost as part of the svn
revision 285910.

Reviewed by:	ed, rwatson, glebius, hiren
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D10351
2017-04-14 17:23:28 +00:00
Hartmut Brandt
4b481ba0ed Merge filt_soread and filt_solisten and decide what to do when checking
for EVFILT_READ at the point of the check not when the event is registers.
This fixes a problem with asio when accepting a connection.

Reviewed by:	kib@, Scott Mitchell
2017-02-01 13:12:07 +00:00
Hans Petter Selasky
f3e7afe2d7 Implement kernel support for hardware rate limited sockets.
- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by:		wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision:	https://reviews.freebsd.org/D3687
Sponsored by:		Mellanox Technologies
MFC after:		3 months
2017-01-18 13:31:17 +00:00
Maxim Sobolev
339efd75a4 Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by:    adrian, gnn
Differential Revision:  https://reviews.freebsd.org/D9171
2017-01-16 17:46:38 +00:00
Hiren Panchasara
7d03ff1fe9 Add kevent EVFILT_EMPTY for notification when a client has received all data
i.e. everything outstanding has been acked.

Reviewed by:	bz, gnn (previous version)
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D9150
2017-01-16 08:25:33 +00:00
John Baldwin
14da48cbe4 Set MORETOCOME for AIO write requests on a socket.
Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

Reviewed by:	adrian
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D8955
2017-01-06 23:41:45 +00:00
Ruslan Bukin
dd7d4f199e Revert r306186 ("Adjust the sopt_val pointer on bigendian systems").
This logic doesn't work with bigger sopt_valsize (e.g. when ipfw
passing 2048 bytes rule).

Reported by:	adrian
Sponsored by:	DARPA, AFRL
2016-11-22 18:31:43 +00:00
Ruslan Bukin
30f3bfe58e Adjust the sopt_val pointer on bigendian systems (e.g. MIPS64EB).
sooptcopyin() checks if size of data provided by user is <= than we can
accept, else it strips down the size. On bigendian platforms we have to
move pointer as well so we copy the actual data.

Reviewed by:	gnn
Sponsored by:	DARPA, AFRL
Sponsored by:	HEIF5
Differential Revision:	https://reviews.freebsd.org/D7980
2016-09-22 12:41:53 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping #3 2016-09-15 13:16:20 +00:00
Kevin Lo
c3bef61e58 Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.
Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D7878
2016-09-15 07:41:48 +00:00
Baptiste Daroussin
306e53bce9 Fix typo introduced by me (not the submitter) when fixing typos 2016-05-22 13:10:48 +00:00
Baptiste Daroussin
2fd642c899 Fix typos in the comments
Submitted by:	cipherwraith666@gmail.com (via github)
2016-05-22 13:04:45 +00:00
Pedro F. Giffuni
e3043798aa sys/kern: spelling fixes in comments.
No functional change.
2016-04-29 22:15:33 +00:00
John Baldwin
8722384b22 Introduce a new protocol hook pru_aio_queue.
This allows a protocol to claim individual AIO requests instead of using
the default socket AIO handling.

Sponsored by:	Chelsio Communications
2016-04-29 20:11:09 +00:00
Maxim Konovalov
ab77175063 o "avaliable" -> "available".
PR:		208141
Submitted by:	Tyler Littlefield
2016-03-21 08:03:50 +00:00
John Baldwin
f3215338ef Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order.  Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels.  The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon.  This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system.  To protect against this, AIO
requests are only permitted for known "safe" files by default.  AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value.  The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default.  aio_mlock() is also enabled by default.

Reviewed by:	cem, jilles
Discussed with:	kib (earlier version)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D5289
2016-03-01 18:12:14 +00:00
Alfred Perlstein
7325dfbb59 Increase max allowed backlog for listen sockets
from short to int.

PR: 203922
Submitted by: White Knight <white_knight@2ch.net>
MFC After: 4 weeks
2016-02-02 05:57:59 +00:00
Ed Schouten
b114aa7959 Make shutdown() return ENOTCONN as required by POSIX, part deux.
Summary:
Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right.

Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries.

I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same.

Test Plan:
This issue was uncovered while writing tests for shutdown() in CloudABI:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/sys/socket/shutdown_test.c#L26

Reviewers: glebius, rwatson, #manpages, gnn, #network

Reviewed By: gnn, #network

Subscribers: bms, mjg, imp

Differential Revision: https://reviews.freebsd.org/D3039
2015-07-27 13:17:57 +00:00
Xin LI
1a7c14aec7 Fix a typo in comment.
Submitted by:	Yanhui Shen via twitter
MFC after:	3 days
2015-07-24 22:13:39 +00:00
Conrad Meyer
0c40f3532d Fix cleanup race between unp_dispose and unp_gc
unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.

This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.

To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

PR:		194264
Differential Revision:	https://reviews.freebsd.org/D3044
Reviewed by:	mjg (earlier version)
Approved by:	markj (mentor)
Obtained from:	mjg
MFC after:	1 month
Sponsored by:	EMC / Isilon Storage Division
2015-07-14 02:00:50 +00:00
Andrey V. Elsukov
e9b70483d1 soreceive_generic() still has similar KASSERT(), therefore instead of
remove KASSERT(), change it to check mbuf isn't NULL.

Suggested by:	kib
MFC after:	1 week
2015-02-23 15:24:43 +00:00
Andrey V. Elsukov
f21684bc75 In some cases soreceive_dgram() can return no data, but has control
message. This can happen when application is sending packets too big
for the path MTU and recvmsg() will return zero (indicating no data)
but there will be a cmsghdr with cmsg_type set to IPV6_PATHMTU.
Remove KASSERT() which does NULL pointer dereference in such case.
Also call m_freem() only when m isn't NULL.

PR:		197882
MFC after:	1 week
Sponsored by:	Yandex LLC
2015-02-23 13:41:35 +00:00
Davide Italiano
a76d4388e1 Don't access sockbuf fields directly, use accessor functions instead.
It is safe to move the call to socantsendmore_locked() after
sbdrop_locked() as long as we hold the sockbuf lock across the two
calls.

CR:	D1805
Reviewed by:	adrian, kmacy, julian, rwatson
2015-02-14 20:00:57 +00:00
Gleb Smirnoff
e834a84026 Revert r274494, r274712, r275955 and provide extra comments explaining
why there could appear a zero-sized mbufs in socket buffers.

A proper fix would be to divorce record socket buffers and stream
socket buffers, and divorce pru_send that accepts normal data from
pru_send that accepts control data.
2014-12-20 22:12:04 +00:00