20 Commits

Author SHA1 Message Date
Alan Cox
41bf90bb78 Address two problems with sendfile(..., SF_NOCACHE) and apply one
"optimization".  First, sendfile(..., SF_NOCACHE) frees pages without
checking whether those pages are mapped.  This can leave the system
with mappings to free or repurposed pages.  Second, a page can be
busied between the time of the current busy test and acquiring the
object lock.  Essentially, the test performed before the object lock
is acquired can only be regarded as an optimization to short-circuit
further work on the page.  It cannot, however, be relied upon to prove
that it is safe to free the page.  Third, when sendfile(..., SF_NOCACHE)
was originally implemented, vm_page_deactivate_noreuse() did not yet
exist.  Use vm_page_deactivate_noreuse() instead of vm_page_deactivate(),
because it comes closer to freeing the page.

In collaboration with:	glebius
Discussed with:	gallatin, kib, markj
X-MFC after:	r324448
2017-10-13 16:31:50 +00:00
Sean Bruno
1f9916ed08 match sendfile() error handling to send().
Sendfile() should match the error checking order of send() which
is currently:

SBS_CANTSENDMORE
so_error
SS_ISCONNECTED

Submitted by:	Jason Eggleston <jason@eggnet.com>
Reviewed by:	glebius
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12633
2017-10-10 22:21:05 +00:00
Sean Bruno
009ad5724d Revert r324405 at the request of the submitter pending better solution.
Submitted by:	Jason Eggleston <jason@eggnet.com>
Sponsored by:	Limelight Networks
2017-10-10 00:32:21 +00:00
Gleb Smirnoff
9c82bec42d Improvements to sendfile(2) mbuf free routine.
o Fall back to default m_ext free mech, using function pointer in
  m_ext_free, and remove sf_ext_free() called directly from mbuf code.
  Testing on modern CPUs showed no regression.
o Provide internally used flag EXT_FLAG_SYNC, to mark that I/O uses
  SF_SYNC flag.  Lack of the flag allows us not to dereference
  ext_arg2, saving from a cache line miss.
o Create function sendfile_free_page() that later will be used, for
  multi-page mbufs.  For now compiler will inline it into
  sendfile_free_mext().

In collaboration with:	gallatin
Differential Revision:	https://reviews.freebsd.org/D12615
2017-10-09 21:06:16 +00:00
Sean Bruno
75c8dfb6ae Check so_error early in sendfile() call. Prior to this patch, if a
connection was reset by the remote end, sendfile() would just report
ENOTCONN instead of ECONNRESET.

Submitted by:	Jason Eggleston <jason@eggnet.com>
Reviewed by:	glebius
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D12575
2017-10-07 23:30:57 +00:00
Gleb Smirnoff
d37aa3ccce Use soref() in sendfile(2) instead fhold() to reference a socket.
The problem is that fdrop() requires syscall context, as it may
enter sleep in some cases.  The reason to use it in the original
non-blocking sendfile implementation, was to avoid use of global
ACCEPT_LOCK() on every I/O completion. Now in head sorele() no
longer requires this lock.
2017-09-13 22:11:05 +00:00
Mark Johnston
af0460beda Have sendfile_swapin() use vm_page_grab_pages().
Reviewed by:	alc, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D11942
2017-08-11 16:32:24 +00:00
Alan Cox
6921451dab An invalid page can't be dirty.
Reviewed by:	kib
MFC after:	1 week
2017-08-11 16:27:54 +00:00
Gleb Smirnoff
ef3266d58a Plug uninitialized stack variable leak in sendfile(2).
Reported by:	Ilja Van Sprundel <ivansprundel ioactive.com>
Submitted by:	Domagoj Stolfa <domagoj.stolfa gmail.com>
MFC after:	1 week
Security:	uninitialized stack variable leak
2017-08-09 17:48:38 +00:00
Alan Cox
d712b799b5 The data type returned by vmoff() is too narrow in its range. This could
break the transmission of files longer than 4 GB on 32-bit architectures.

Reviewed by:	glebius, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D10019
2017-06-03 16:19:33 +00:00
Gleb Smirnoff
9e3c8bd3e2 Make sendfile(2) more robust against file change. This fixes a possible
crash when the file shrinks.  This also fixes sendfile(2) not sending more
data in a case when the file grows, and the request is open-ended or
specifies a size that is greater than old file size.

PR:		217789
Reviewed by:	gallatin
MFC after:	10 days
2017-03-24 16:01:19 +00:00
Gleb Smirnoff
bfc8c24c73 Move bogus_page declaration to vm_page.h and initialization to vm_page.c.
Reviewed by:	kib
2017-01-04 22:27:19 +00:00
Gleb Smirnoff
00b5ffde8e Add flag SF_USER_READAHEAD to sendfile(2). When specified, the syscall won't
do any speculations about readahead, and use exactly the amount of readahead
specified by user.  E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee
that no readahead at all will be performed.
2016-11-17 21:36:18 +00:00
Gleb Smirnoff
5dba303d01 Use bogus_page to properly reduce number of I/Os in sendfile(2). The new
sendfile_swapin() loop works this way:

- Find first invalid page in the request.
- Do vm_pager_has_page() and get count of pages, that can be taken in
  single I/O.
- Trim valid pages from the end of the request.
- Cycle through the request and substitute to bogus_page all valid
  pages that are in the middle of the request.
- After I/O launched (pager copies array of pages into buf(9), it
  is important to restore proper page pointers with help vm_page_lookup().

Count bogus pages used and report them in sendfile stats.
2016-11-17 21:02:55 +00:00
Gleb Smirnoff
a2d8f9d2fc Fix regression from r297400, which truncates headers in case of low socket
buffer and put a small optimization for low socket buffer case:

- Do not hack uio_resid, and let m_uiotombuf() properly take care of it. This
  fixes truncation of headers at low buffer.
- If headers ate all the space, jump right to the end of the cycle, to
  avoid doing single page I/O and allocating zero length mbuf.
- Clear hdr_uio only if space is positive, which indicates that all uio
  was copied in.

Reviewed by:	pluknet, jtl, emax, rrs, lstewart, emax, gallatin, scottl
2016-09-22 20:34:44 +00:00
Mariusz Zaborski
85b0f9de11 capsicum: propagate rights on accept(2)
Descriptor returned by accept(2) should inherits capabilities rights from
the listening socket.

PR:		201052
Reviewed by:	emaste, jonathan
Discussed with:	many
Differential Revision:	https://reviews.freebsd.org/D7724
2016-09-22 09:58:46 +00:00
Ed Maste
69a2875821 Renumber license clauses in sys/kern to avoid skipping 2016-09-15 13:16:20 +00:00
Gleb Smirnoff
9c64cfe56c The sendfile(2) allows to send extra data from userspace before the file
data (headers).  Historically the size of the headers was not checked
against the socket buffer space.  Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space.  In case when size of headers is bigger that socket space,
KASSERT fires.  Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
  the sbspace() check.  The uio size is trimmed by socket space there,
  which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
  up the stack to syscall wrappers.  This required a copy and paste of the
  code, but in turn this allowed to remove extra stack carried parameter
  from fo_sendfile_t, and embrace entire compat code into #ifdef.  If in
  future we got more fo_sendfile_t function, the copy and paste level would
  even reduce.

Reviewed by:	emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by:	Vitalij Satanivskij <satan ukr.net>
Sponsored by:	Netflix
2016-03-29 19:57:11 +00:00
Gleb Smirnoff
56a5f52e80 New way to manage reference counting of mbuf external storage.
The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount
value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used.
The first mbuf to attach a cluster stores the refcount. The further mbufs
to reference the cluster point at refcount in the first mbuf. The first
mbuf is freed only when the last reference is freed.

The benefit over refcounts stored in separate slabs is that now refcounts
of different, unrelated mbufs do not share a cache line.

For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd()
becomes void, making widely used M_EXTADD macro safe.

For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization
exactly against the cache aliasing problem with regular refcounting.

Discussed with:		rrs, rwatson, gnn, hiren, sbruno, np
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D5396
Sponsored by:		Netflix
2016-03-01 00:17:14 +00:00
Gleb Smirnoff
33a2a37b86 - Separate sendfile(2) implementation from uipc_syscalls.c into
separate file.  Claim my copyright.
- Provide more comments, better function and structure names.
- Sort out unneeded includes from resulting two files.

No functional changes.
2016-01-22 02:23:18 +00:00