This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.
* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
set in a variety of methods.
This is only relevant for very specific workloads.
This doesn't pretend to be a final NUMA solution.
The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.
This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.
Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.
Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.
Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.
Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!
Tested:
* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)
* Peter Holm ran a stress test suite on this work and found one
issue, but has not been able to verify it (it doesn't look NUMA
related, and he only saw it once over many testing runs.)
* I've tested bhyve instances running in fixed NUMA domains and cpusets;
all seems to work correctly.
Verified:
* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
NUMA policies for processes under test.
Review:
This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@. The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).
This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus. My hope is that with further
exposure and testing more functionality can be implemented and evaluated.
Notes:
* The VM doesn't handle unbalanced domains very well, and if you have an overly
unbalanced memory setup whilst under high memory pressure, VM page allocation
may fail leading to a kernel panic. This was a problem in the past, but it's
much more easily triggered now with these tools.
* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory
isn't really guaranteed in any way. That's next on my plate.
Sponsored by: Norse Corp, Inc.; Dell
really need it can find it in the devel/fmake port or pkg install fmake.
Note: This commit is orthogonal to the question 'can we fmake buildworld'.
Differential Revision: https://reviews.freebsd.org/D2840
This change among other things improve search capabilities over the manpages
allowing fine grain query.
A new build option WITHOUT_MANDOCDB has been added to keep the ancient version
of the database and the tools. The plan is to entirely remove this option before
11.0-RELEASE.
Differential Revision: https://reviews.freebsd.org/D2603
ELF toolchain readelf lacked some functionality at the time other tools
(like size, strip, nm, etc.) were switched over to the ELF toolchain
versions. That has been addressed as of the last update, so we can add
it to the list.
PR: 198950 [exp-run]
Reviewed by: bapt, imp, rpaulo
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2156
- Compatiblity with existing manpages has been improved
- Now support ".so" directive with compressed manpages (which fixes a regression
we have since we have new man(1))
Set WITH_ELFTOOLCHAIN_TOOLS in src.conf to use the elftoolchain version
of the following tools:
* addr2line
* elfcopy (strip / mcs)
* nm
* size
* strings
Reviewed by: bapt (earlier version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D1224
mandoc(1) does not provide an equivalent of the GNU groff's soelim(1) as an
external binary. It does provide the funcitonnality but internally.
Lots if manpages in ports uses ".so" directives to include the content of
another manpage, which works properly if the manpages are not compressed.
With compressed manpages it will fail. So we need to preprocess those manpages
with soelim(1) before compressing them.
soeliminate(1) add the minimum functionnality from soelim(1) required for that
task, in order to still be able to prepare properly those manpages in case we
ship the base system only with mandoc as a manpage renderer.
soeliminate(1) accept all the arguments from soelim(1) for compatibility but
only '-I dir' is really functionnal.
Name it soeliminate and not soelim, so groff from base or ports can still call
soelim(1) for its internal use and avoid potential incompatibilities
MFC after: 1 month
shortly thereafter via r274124 until I could get the right recipe
down w/respect to SUBDIR_DEPEND.
Thanks to: ngie, ian
Reviewed by: ian
MFC after: 21 days
X-MFC-to: stable/10 stable/9
X-MFC-with: 274116 274120 274121 274123 274144 274146
it fully passes the GNU timeout regression tests, it is written in a mostly
portable way (only signal parsing is relying on non portable structures)
Phabric: D377
The _SUPPORT knobs have a consistent meaning which differs from the
behaviour controlled by this knob. As the knob is opt-out and has not
appeared in a release the impact should be low.
Suggested by: imp, wblock
MFC after: 1 week
With the move by the FreeBSD Project away from CVSup as a distribution
mechanism, there is no longer a need to keep this in base.
Approved by: mux (around a year ago), silence on -hackers
X-MFC-after: never
vtfontcvt is useful for end users to convert arbitrary bitmap fonts
for use by vt(4). It can also be used as a build tool, allowing us
to keep the source font data in the src tree rather than uuencoded
binaries.
Reviewed by: ray, wblock (D183)
Sponsored by: The FreeBSD Foundation
In r266650, we made libatf-c and libatf-c++ private libraries so that no
components outside of the source tree could unintendedly depend on them.
This change does the same for the "atf-sh library" by moving the atf-sh
interpreter from its public location in /usr/bin/ to the private location
in /usr/libexec/. Our build system will ensure that our own test programs
use the right binary, but users won't be able to depend on atf-sh by
"mistake".
Committing this now to ride the UPDATING notice added with r267172 today.
install it as fmake. This defaults to no. This should be viewed as the
first step towards evental migration of this historic code to ports
and removal from the tree.
build world, so it is the only make we build or install. fmake is
still in the tree, but disconnected, and upgrades from older systems
that still have bmake has not been removed, but its state has not been
tested (it should work given how minimal the work to upgrade to bmake
is).
all the SUBDIR entries in parallel, instead of serially. Apply this
option to a selected number of Makefiles, which can greatly speed up the
build on multi-core machines, when using make -j.
This can be extended to more Makefiles later on, whenever they are
verified to work correctly with parallel building.
I tested this on a 24-core machine, with make -j48 buildworld (N = 6):
before stddev after stddev
======= ====== ======= ======
real time 1741.1 16.5 959.8 2.7
user time 12468.7 16.4 14393.0 16.8
sys time 1825.0 54.8 2110.6 22.8
(user+sys)/real 8.2 17.1
E.g. the build was approximately 45% faster in real time. On machines
with less cores, or with lower -j settings, the speedup will not be as
impressive. But at least you can now almost max out a machine with
buildworld!
Submitted by: jilles
MFC after: 2 weeks
There is no reason to keep the two knobs separate: if tests are
enabled, the ATF libraries are required; and if tests are disabled,
the ATF libraries are not necessary. Keeping the two just serves
to complicate the build.
Reviewed by: freebsd-testing
Approved by: rpaulo (mentor)
architectures where they are known not to work. For SVN itself, use
the least common denominator and disable them across the board. This
allows svnlite to build and run on all FreeBSD architectures.
Approved by: re (gjb)
exhausted.
- Add a new protect(1) command that can be used to set or revoke protection
from arbitrary processes. Similar to ktrace it can apply a change to all
existing descendants of a process as well as future descendants.
- Add a new procctl(2) system call that provides a generic interface for
control operations on processes (as opposed to the debugger-specific
operations provided by ptrace(2)). procctl(2) uses a combination of
idtype_t and an id to identify the set of processes on which to operate
similar to wait6().
- Add a PROC_SPROTECT control operation to manage the protection status
of a set of processes. MADV_PROTECT still works for backwards
compatability.
- Add a p_flag2 to struct proc (and a corresponding ki_flag2 to kinfo_proc)
the first bit of which is used to track if P_PROTECT should be inherited
by new child processes.
Reviewed by: kib, jilles (earlier version)
Approved by: re (delphij)
MFC after: 1 month