- Add macros to allow preinitialization of cap_rights_t.
- Convert most commonly used code paths to use preinitialized cap_rights_t.
A 3.6% speedup in fstat was measured with this change.
Reported by: mjg
Reviewed by: oshogbo
Approved by: sbruno
MFC after: 1 month
With r332974, when performing a synchronized access of a page's "queue"
field, one must first check whether the page is logically dequeued. If
so, then the page lock does not prevent the page from being removed
from its page queue. Intoduce vm_page_queue(), which returns the page's
logical queue index. In some cases, direct access to the "queue" field
is still required, but such accesses should be confined to sys/vm.
Reported and tested by: pho
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15280
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.
Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.
Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.
Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
In many cases the page is not enqueued so the change will have no
effect. However, the change is needed to support an optimization in
the fault handler and in some cases (sendfile, the buffer cache) it
was being emulated by the caller anyway.
Reviewed by: alc
Tested by: pho
MFC after: 2 weeks
X-Differential Revision: https://reviews.freebsd.org/D14625
Previously, wiring a page would cause it to be removed from its page
queue. In the common case, unwiring causes it to be enqueued at the tail
of that page queue. This change modifies vm_page_wire() to not dequeue
the page, thus avoiding the highly contended page queue locks. Instead,
vm_page_unwire() takes care of requeuing the page as a single operation,
and the page daemon dequeues wired pages as they are encountered during
a queue scan to avoid needlessly revisiting them later. For pages in
PQ_ACTIVE we do even better, since a requeue is unnecessary.
The change improves scalability for some common workloads. For instance,
threads wiring pages into the buffer cache no longer need to modify
global page queues, and unwiring is usually done by the bufspace thread,
so concurrency is not as much of an issue. As another example, many
sysctl handlers wire the output buffer to avoid faults on copyout, and
since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid
modifying the page queue in this case.
The change also adds a block comment describing some properties of
struct vm_page's reference counters, and the busy lock.
Reviewed by: jeff
Discussed with: alc, kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D11943
"optimization". First, sendfile(..., SF_NOCACHE) frees pages without
checking whether those pages are mapped. This can leave the system
with mappings to free or repurposed pages. Second, a page can be
busied between the time of the current busy test and acquiring the
object lock. Essentially, the test performed before the object lock
is acquired can only be regarded as an optimization to short-circuit
further work on the page. It cannot, however, be relied upon to prove
that it is safe to free the page. Third, when sendfile(..., SF_NOCACHE)
was originally implemented, vm_page_deactivate_noreuse() did not yet
exist. Use vm_page_deactivate_noreuse() instead of vm_page_deactivate(),
because it comes closer to freeing the page.
In collaboration with: glebius
Discussed with: gallatin, kib, markj
X-MFC after: r324448
Sendfile() should match the error checking order of send() which
is currently:
SBS_CANTSENDMORE
so_error
SS_ISCONNECTED
Submitted by: Jason Eggleston <jason@eggnet.com>
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12633
o Fall back to default m_ext free mech, using function pointer in
m_ext_free, and remove sf_ext_free() called directly from mbuf code.
Testing on modern CPUs showed no regression.
o Provide internally used flag EXT_FLAG_SYNC, to mark that I/O uses
SF_SYNC flag. Lack of the flag allows us not to dereference
ext_arg2, saving from a cache line miss.
o Create function sendfile_free_page() that later will be used, for
multi-page mbufs. For now compiler will inline it into
sendfile_free_mext().
In collaboration with: gallatin
Differential Revision: https://reviews.freebsd.org/D12615
connection was reset by the remote end, sendfile() would just report
ENOTCONN instead of ECONNRESET.
Submitted by: Jason Eggleston <jason@eggnet.com>
Reviewed by: glebius
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12575
The problem is that fdrop() requires syscall context, as it may
enter sleep in some cases. The reason to use it in the original
non-blocking sendfile implementation, was to avoid use of global
ACCEPT_LOCK() on every I/O completion. Now in head sorele() no
longer requires this lock.
crash when the file shrinks. This also fixes sendfile(2) not sending more
data in a case when the file grows, and the request is open-ended or
specifies a size that is greater than old file size.
PR: 217789
Reviewed by: gallatin
MFC after: 10 days
do any speculations about readahead, and use exactly the amount of readahead
specified by user. E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee
that no readahead at all will be performed.
sendfile_swapin() loop works this way:
- Find first invalid page in the request.
- Do vm_pager_has_page() and get count of pages, that can be taken in
single I/O.
- Trim valid pages from the end of the request.
- Cycle through the request and substitute to bogus_page all valid
pages that are in the middle of the request.
- After I/O launched (pager copies array of pages into buf(9), it
is important to restore proper page pointers with help vm_page_lookup().
Count bogus pages used and report them in sendfile stats.
buffer and put a small optimization for low socket buffer case:
- Do not hack uio_resid, and let m_uiotombuf() properly take care of it. This
fixes truncation of headers at low buffer.
- If headers ate all the space, jump right to the end of the cycle, to
avoid doing single page I/O and allocating zero length mbuf.
- Clear hdr_uio only if space is positive, which indicates that all uio
was copied in.
Reviewed by: pluknet, jtl, emax, rrs, lstewart, emax, gallatin, scottl
Descriptor returned by accept(2) should inherits capabilities rights from
the listening socket.
PR: 201052
Reviewed by: emaste, jonathan
Discussed with: many
Differential Revision: https://reviews.freebsd.org/D7724
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.
With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.
o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.
Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix
The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount
value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used.
The first mbuf to attach a cluster stores the refcount. The further mbufs
to reference the cluster point at refcount in the first mbuf. The first
mbuf is freed only when the last reference is freed.
The benefit over refcounts stored in separate slabs is that now refcounts
of different, unrelated mbufs do not share a cache line.
For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd()
becomes void, making widely used M_EXTADD macro safe.
For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization
exactly against the cache aliasing problem with regular refcounting.
Discussed with: rrs, rwatson, gnn, hiren, sbruno, np
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D5396
Sponsored by: Netflix
separate file. Claim my copyright.
- Provide more comments, better function and structure names.
- Sort out unneeded includes from resulting two files.
No functional changes.