handler. For roughly twenty years, the page fault handler has used the
same basic strategy: Fetch a fixed number of non-resident pages both ahead
and behind the virtual page that was faulted on. Over the years,
alternative strategies have been implemented for optimizing the handling
of random and sequential access patterns, but the only change to the
default strategy has been to increase the number of pages read ahead to 7
and behind to 8.
The problem with the default page clustering strategy becomes apparent
when you look at how it behaves on the code section of an executable or
shared library. (To simplify the following explanation, I'm going to
ignore the read that is performed to obtain the header and assume that no
pages are resident at the start of execution.) Suppose that we have a
code section consisting of 32 pages. Further, suppose that we access
pages 4, 28, and 16 in that order. Under the default page clustering
strategy, we page fault three times and perform three I/O operations,
because the first and second page faults only read a truncated cluster of
12 pages. In contrast, if we access pages 8, 24, and 16 in that order, we
only fault twice and perform two I/O operations, because the first and
second page faults read a full cluster of 16 pages. In general, truncated
clusters are more common than full clusters.
To address this problem, this revision changes the default page clustering
strategy to align the start of the cluster to a page offset within the vm
object that is a multiple of the cluster size. This results in many fewer
truncated clusters. Returning to our example, if we now access pages 4,
28, and 16 in that order, the cluster that is read to satisfy the page
fault on page 28 will now include page 16. So, the access to page 16 will
no longer page fault and perform an I/O operation.
Since the revised default page clustering strategy is typically reading
more pages at a time, we are likely to read a few more pages that are
never accessed. However, for the various programs that we looked at,
including clang, emacs, firefox, and openjdk, the reduction in the number
of page faults and I/O operations far outweighed the increase in the
number of pages that are never accessed. Moreover, the extra resident
pages allowed for many more superpage mappings. For example, if we look
at the execution of clang during a buildworld, the number of (hard) page
faults on the code section drops by 26%, the number of superpage mappings
increases by about 29,000, but the number of never accessed pages only
increases from 30.38% to 33.66%. Finally, this leads to a small but
measureable reduction in execution time.
In collaboration with: Emily Pettigrew <ejp1@rice.edu>
Differential Revision: https://reviews.freebsd.org/D1500
Reviewed by: jhb, kib
MFC after: 6 weeks
If we aggregated status sending with data move and got error, allow status
to be updated and resent again separately. Without this command may stuck
without status sent at all.
MFC after: 2 weeks
This special case prevented locating vdevs which start with c[0-9] e.g.
gptid/c6cde092-504b-11e4-ba52-c45444453598 hence it was impossible to
online a vdev via its path.
Submitted by: Peter Xu <xzpeter@gmail.com>
MFC after: 2 weeks
Sponsored by: Multiplay
Quoting 19 years bpf.4 manual from bpf-1.2a1:
"
(SIOCGIFADDR is obsolete under BSD systems. SIOCGIFCONF should be
used to query link-level addresses.)
"
* SIOCGIFADDR was not imported in NetBSD (bpf.c 1.36) and OpenBSD.
* Last bits (e.g. manpage claiming SIOCGIFADDR exists) was cleaned
from NetBSD via kern/21513 5 years ago,
from OpenBSD via documentation/6352 5 years ago.
== SIG_DFL or SIG_IGN. Sloppy code does not fully initialize struct
sigaction for such cases, and being too demanding in the case of
default handler does not catch anything.
Reported and tested by: Alex Tutubalin <lexa@lexa.ru>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
resume sometimes (but not others). On powerup, other wierd issues show
up (sometimes the card comes up, but with really bogus pci config
space stuff. There may be more, but given my experience of historical
fussiness, stick to what works and make more minimal changes to that.
C++14 support[1], adds more C++1z features[2], and fixes the following
LWG issues[3]:
1450: Contradiction in regex_constants
2003: String exception inconsistency in erase.
2075: Progress guarantees, lock-free property, and scheduling
assumptions
2104: unique_lock move-assignment should not be noexcept
2112: User-defined classes that cannot be derived from
2132: std::function ambiguity
2135: Unclear requirement for exceptions thrown in
condition_variable::wait()
2142: packaged_task::operator() synchronization too broad?
2182: Container::[const_]reference types are misleadingly specified
2186: Incomplete action on async/launch::deferred
2188: Reverse iterator does not fully support targets that overload
operator&
2193: Default constructors for standard library containers are explicit
2205: Problematic postconditions of regex_match and regex_search
2213: Return value of std::regex_replace
2240: Probable misuse of term "function scope" in [thread.condition]
2252: Strong guarantee on vector::push_back() still broken with C++11?
2257: Simplify container requirements with the new algorithms
2258: a.erase(q1, q2) unable to directly return q2
2263: Comparing iterators and allocator pointers with different
const-character
2268: Setting a default argument in the declaration of a member
function assign of std::basic_string
2271: regex_traits::lookup_classname specification unclear
2272: quoted should use char_traits::eq for character comparison
2278: User-defined literals for Standard Library types
2280: begin / end for arrays should be constexpr and noexcept
2285: make_reverse_iterator
2288: Inconsistent requirements for shared mutexes
2291: std::hash is vulnerable to collision DoS attack
2293: Wrong facet used by num_put::do_put
2299: Effects of inaccessible key_compare::is_transparent type are not
clear
2301: Why is std::tie not constexpr?
2304: Complexity of count in unordered associative containers
2306: match_results::reference should be value_type&, not const
value_type&
2308: Clarify container destructor requirements w.r.t. std::array
2313: tuple_size should always derive from integral_constant<size_t, N>
2314: apply() should return decltype(auto) and use decay_t before
tuple_size
2315: weak_ptr should be movable
2316: weak_ptr::lock() should be atomic
2317: The type property queries should be UnaryTypeTraits returning
size_t
2320: select_on_container_copy_construction() takes allocators, not
containers
2322: Associative(initializer_list, stuff) constructors are
underspecified
2323: vector::resize(n, t)'s specification should be simplified
2324: Insert iterator constructors should use addressof()
2329: regex_match()/regex_search() with match_results should forbid
temporary strings
2330: regex("meow", regex::icase) is technically forbidden but should
be permitted
2332: regex_iterator/regex_token_iterator should forbid temporary
regexes
2339: Wording issue in nth_element
2341: Inconsistency between basic_ostream::seekp(pos) and
basic_ostream::seekp(off, dir)
2344: quoted()'s interaction with padding is unclear
2346: integral_constant's member functions should be marked noexcept
2350: min, max, and minmax should be constexpr
2356: Stability of erasure in unordered associative containers
2357: Remaining "Assignable" requirement
2359: How does regex_constants::nosubs affect basic_regex::mark_count()?
2360: reverse_iterator::operator*() is unimplementable
[1] http://libcxx.llvm.org/cxx1y_status.html
[2] http://libcxx.llvm.org/cxx1z_status.html
[3] http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html
Exp-run: antoine
MFC after: 1 month
periodic(8) run, taken from uname(1) '-U' and '-K'
flags.
Reviewed by: allanjude, dvl
Differential Revision: https://reviews.freebsd.org/D1541
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Implement a subset of the multiboot specification in order to boot Xen
and a FreeBSD Dom0 from the FreeBSD bootloader. This multiboot
implementation is tailored to boot Xen and FreeBSD Dom0, and it will
most surely fail to boot any other multiboot compilant kernel.
In order to detect and boot the Xen microkernel, two new file formats
are added to the bootloader, multiboot and multiboot_obj. Multiboot
support must be tested before regular ELF support, since Xen is a
multiboot kernel that also uses ELF. After a multiboot kernel is
detected, all the other loaded kernels/modules are parsed by the
multiboot_obj format.
The layout of the loaded objects in memory is the following; first the
Xen kernel is loaded as a 32bit ELF into memory (Xen will switch to
long mode by itself), after that the FreeBSD kernel is loaded as a RAW
file (Xen will parse and load it using it's internal ELF loader), and
finally the metadata and the modules are loaded using the native
FreeBSD way. After everything is loaded we jump into Xen's entry point
using a small trampoline. The order of the multiboot modules passed to
Xen is the following, the first module is the RAW FreeBSD kernel, and
the second module is the metadata and the FreeBSD modules.
Since Xen will relocate the memory position of the second
multiboot module (the one that contains the metadata and native
FreeBSD modules), we need to stash the original modulep address inside
of the metadata itself in order to recalculate its position once
booted. This also means the metadata must come before the loaded
modules, so after loading the FreeBSD kernel a portion of memory is
reserved in order to place the metadata before booting.
In order to tell the loader to boot Xen and then the FreeBSD kernel the
following has to be added to the /boot/loader.conf file:
xen_cmdline="dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga"
xen_kernel="/boot/xen"
The first argument contains the command line that will be passed to the Xen
kernel, while the second argument is the path to the Xen kernel itself. This
can also be done manually from the loader command line, by for example
typing the following set of commands:
OK unload
OK load /boot/xen dom0_mem=1024M dom0_max_vcpus=2 dom0pvh=1 console=com1,vga
OK load kernel
OK load zfs
OK load if_tap
OK load ...
OK boot
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D517
For the Forth bits:
Submitted by: Julien Grall <julien.grall AT citrix.com>
- Close a migration race where callout_reset() failed to set the
CALLOUT_ACTIVE flag.
- Callout callback functions are now allowed to be protected by
spinlocks.
- Switching the callout CPU number cannot always be done on a
per-callout basis. See the updated timeout(9) manual page for more
information.
- The timeout(9) manual page has been updated to reflect how all the
functions inside the callout API are working. The manual page has
been made function oriented to make it easier to deduce how each of
the functions making up the callout API are working without having
to first read the whole manual page. Group all functions into a
handful of sections which should give a quick top-level overview
when the different functions should be used.
- The CALLOUT_SHAREDLOCK flag and its functionality has been removed
to reduce the complexity in the callout code and to avoid problems
about atomically stopping callouts via callout_stop(). If someone
needs it, it can be re-added. From my quick grep there are no
CALLOUT_SHAREDLOCK clients in the kernel.
- A new callout API function named "callout_drain_async()" has been
added. See the updated timeout(9) manual page for a complete
description.
- Update the callout clients in the "kern/" folder to use the callout
API properly, like cv_timedwait(). Previously there was some custom
sleepqueue code in the callout subsystem, which has been removed,
because we now allow callouts to be protected by spinlocks. This
allows us to tear down the callout like done with regular mutexes,
and a "td_slpmutex" has been added to "struct thread" to atomically
teardown the "td_slpcallout". Further the "TDF_TIMOFAIL" and
"SWT_SLEEPQTIMO" states can now be completely removed. Currently
they are marked as available and will be cleaned up in a follow up
commit.
- Bump the __FreeBSD_version to indicate kernel modules need
recompilation.
- There has been several reports that this patch "seems to squash a
serious bug leading to a callout timeout and panic".
Kernel build testing: all architectures were built
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D1438
Sponsored by: Mellanox Technologies
Reviewed by: jhb, adrian, sbruno and emaste
While in theory this should have been a transparent change (and was for all
other drivers), cpsw(4) never used the proper accessor macros in a few
places but spelt the indirect m_hdr.mh_* out itself. Convert those to
use m_len and m_data and unbreak the driver build.
associating an optional PNP hint table with this module. In the
future, when these are added, these changes will silently ignore the
new type they would otherwise warn about. It will always be safe to
ignore this data. Get this into the builds today for some future
proofing.
MFC After: 3 days
only compile in those options in GENERIC that cannot be loaded as
modules. ufs is still included because many of its options aren't
present in the kernel module. There's some other exceptions documented
in the file. This is part of some work to get more things
automatically loading in the hopes of obsoleting GENERIC one day.
more generally make it easier to extend 'struct mbuf in the future', make
a number of changes to the data structure:
- As we anticipate embedding mbufs headers within variable-size regions of
memory in the future, change the definitions of byte arrays embedded in
mbufs to be of size [0] rather than [MLEN] and [MHLEN]. In fact, the
cxgbe driver already uses 'struct mbuf' on the front of other storage
sizes, but we would like the global mbuf allocator do be able to do this
as well.
- Fold 'struct m_hdr' into 'struct mbuf' itself, eliminating a set of
macros that aliased 'mh_foo' field names to 'm_foo' names such as
'm_next'. These present a particular problem as we would like to add
new mbuf-header fields -- e.g., 'm_size' -- that, if similarly named via
macros, would introduce collisions with many other variable names in the
kernel.
- Rename 'struct m_ext' to 'struct struct_m_ext' so that we can add
compile-time assertions without bumping into the still-extant 'm_ext'
macro.
- Remove the MSIZE compile-time assertion for 'struct mbuf', but add new
assertions for alignment of embedded data arrays (64-bit alignment even
on 32-bit platforms), and for the sizes the mbuf header, packet header,
and m_ext structure.
- Document that these assertions exist in comments in mbuf.h.
This change is not intended to cause (non-trivial) behavioural
differences, but is a precursor to further mbuf-allocator work.
Differential Revision: https://reviews.freebsd.org/D1483
Reviewed by: bz, gnn, np, glebius ("go ahead, I trust you")
Sponsored by: EMC / Isilon Storage Division
sanitizer sources. It is apparently unnecessary, and causes trouble for
people using WITHOUT_IPFILTER.
Reported by: Pawel Biernacki <pawel.biernacki@gmail.com>, Kurt Lidl <lidl@pix.net>
"delist_dev()" function. Make sure the character device structure
doesn't go away until the end of the "destroy_dev()" function due to
concurrently running cleanup code inside "devfs_populate()".
MFC after: 1 week
Reported by: dchagin@
According to ELF ABI, alignment 0 and 1 has the same meaning: the
section has no alignment constraints.
PR: 196715
Sponsored by: The FreeBSD Foundation
This makes sure that file descriptors of opened directories will
actually get these capabilities. Without this change, bindat() and
connectat() don't seem to work for me.
MFC after: 2 weeks
Reviewed by: rwatson, pjd
go back through HASWELL, IVY_BRIDGE, IVY_BRIDGE_XEON and SANDY_BRIDGE
to straighten out all the missing PMCs. We also add a new pmc tool
pmcstudy, this allows one to run the various formulas from
the documents "Using Intel Vtune Amplifier XE on XXX Generation platforms" for
IB/SB and Haswell. The tool also allows one to postulate your own
formulas with any of the various PMC's. At some point I will enahance
this to work with Brendan Gregg's flame-graphs so we can flamegraph
various PMC interactions. Note the manual page also needs some
work (lots of work) but gnn has committed to help me with that ;-)
Reviewed by: gnn
MFC after:1 month
Sponsored by: Netflix Inc.