branch.
This function is used to drain a callout via a callback instead of
blocking the caller until the drain is complete. Refer to the
callout_drain_async() manual page for a detailed description.
Limitation: If a lock is used with the callout, the callout can only
be drained asynchronously one time unless the callout_init_mtx()
function is called again. This limitation is not present in
projects/hps_head and will require more invasive changes to the
timeout code, which was not in the scope of this patch.
Differential Revision: https://reviews.freebsd.org/D3521
Reviewed by: wblock
MFC after: 1 month
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.
When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.
NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath. As vnodes may move around during dump, this is racy.
So:
- Detect badly behaved notes in putnote() and pad underfilled notes.
- Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
exercise the NT_PROCSTAT_FILES corruption. It simply picks random
lengths to expand or truncate paths to in fo_fill_kinfo_vnode().
- Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
disable kinfo packing for PROCSTAT_FILES notes. This should avoid
both FILES note corruption and truncation, even if filenames change,
at the cost of about 1 kiB in padding bloat per open fd. Document
the new sysctl in core.5.
- Fix note_procstat_files to self-limit in the 2nd pass. Since
sometimes this will result in a short write, pad up to our advertised
size. This addresses note corruption, at the risk of sometimes
truncating the last several fd info entries.
- Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
zero padding.
With suggestions from: bjk, jhb, kib, wblock
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3548
go asking what debug flags to set for GEOM to make it work. Advice
them to use gpart(8) instead.
Something similar should probably done with disklabel,
but I need to rewrite the disklabel examples first.
Reviewed by: wblock@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3315
only gpiobus configured via FDT is supported. Bus enumeration is
supported. Devices are created for each device found. 1-Wire
temperature controllers are supported, but other drivers could be
written. Temperatures are polled and reported via a sysctl. Errors
are reported via sysctl counters. Mis-wired bus detection is included
for more trouble shooting. See ow(4), owc(4) and ow_temp(4) for
details of what's supported and known issues.
This has been tested on Raspberry Pi-B, Pi2 and Beagle Bone Black
with up to 7 devices.
Differential Revision: https://reviews.freebsd.org/D2956
Relnotes: yes
MFC after: 2 weeks
Reviewed by: loos@ (with many insightful comments)
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.
Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.
Reviewed by: gnn, eri
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D3466
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.
This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.
Reviewed by: rwatson, wblock
Differential Revision: https://reviews.freebsd.org/D3411
I/OAT is also referred to as Crystal Beach DMA and is a Platform Storage
Extension (PSE) on some Intel server platforms.
This driver currently supports DMA descriptors only and is part of a
larger effort to upstream an interconnect between multiple systems using
the Non-Transparent Bridge (NTB) PSE.
For now, this driver is only built on AMD64 platforms. It may be ported
to work on i386 later, if that is desired. The hardware is exclusive to
x86.
Further documentation on ioat(4), including API documentation and usage,
can be found in the new manual page.
Bring in a test tool, ioatcontrol(8), in tools/tools/ioat. The test
tool is not hooked up to the build and is not intended for end users.
Submitted by: jimharris, Carl Delsey <carl.r.delsey@intel.com>
Reviewed by: jimharris (reviewed my changes)
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: Intel
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3456
Provide and document the RANDOM_ENABLE_UMA option.
Change RANDOM_FAST to RANDOM_UMA to clarify the harvesting.
Remove RANDOM_DEBUG option, replace with SDT probes. These will be of
use to folks measuring the harvesting effect when deciding whether to
use RANDOM_ENABLE_UMA.
Requested by: scottl and others.
Approved by: so (/dev/random blanket)
Differential Revision: https://reviews.freebsd.org/D3197
CoDel is a parameterless queue discipline that handles variable bandwidth
and RTT.
It can be used as the single queue discipline on an interface or as a sub
discipline of existing queue disciplines such as PRIQ, CBQ, HFSC, FAIRQ.
Differential Revision: https://reviews.freebsd.org/D3272
Reviewd by: rpaulo, gnn (previous version)
Obtained from: pfSense
Sponsored by: Rubicon Communications (Netgate)
This driver allows read the software reset switch state and control the
status LEDs.
The GPIO pins have their direction (input/output) locked down to prevent
possible short circuits.
Note that most people get a reset button that is a hardware reset. The
software reset button is available on boards from Netgate.
Sponsored by: Rubicon Communications (Netgate)
if desired.
Retire randomdev_none.c and introduce random_infra.c for resident
infrastructure. Completely stub out random(4) calls in the "without
DEV_RANDOM" case.
Add RANDOM_LOADABLE option to allow loadable Yarrow/Fortuna/LocallyWritten
algorithm. Add a skeleton "other" algorithm framework for folks
to add their own processing code. NIST, anyone?
Retire the RANDOM_DUMMY option.
Build modules for Yarrow, Fortuna and "other".
Use atomics for the live entropy rate-tracking.
Convert ints to bools for the 'seeded' logic.
Move _write() function from the algorithm-specific areas to randomdev.c
Get rid of reseed() function - it is unused.
Tidy up the opt_*.h includes.
Update documentation for random(4) modules.
Fix test program (reviewers, please leave this).
Differential Revision: https://reviews.freebsd.org/D3354
Reviewed by: wblock,delphij,jmg,bjk
Approved by: so (/dev/random blanket)
- Add
nvlist_{add,get,take,move,exists,free}_{number,bool,string,nvlist,
descriptor} functions.
- Add support for (un)packing arrays.
- Add the nvl_array_next field to the nvlist structure.
If an array is added by the nvlist_{move,add}_nvlist_array function
this field will contains next element in the array.
- Add the nitems field to the nvpair and nvpair_header structure.
This field contains number of elements in the array.
- Add special flag (NV_FLAG_IN_ARRAY) which is set if nvlist is a part of
an array.
- Add special type (NV_TYPE_NVLIST_ARRAY_NEXT).This type is used only
on packing/unpacking.
- Add new API for traversing arrays (nvlist_get_array_next).
- Add the nvlist_get_pararr function which combines the
nvlist_get_array_next and nvlist_get_parent functions. If nvlist is in
the array it will return next element from array. If nvlist is last
element in array or it isn't in array it will return his
container (parent). This function should simplify traveling over nvlist.
- Add tests for new features.
- Add documentation for new functions.
- Add my copyright.
- Regenerate the sys/cddl/compat/opensolaris/sys/nvpair.h file.
PR: 191083
Reviewed by: allanjude (doc)
Approved by: pjd (mentor)
operation as a write barrier. That description has never been correct,
and it has caused confusion. An acquire operation orders writes as well
as reads, and a release operation orders reads as well as writes.
Also, explicitly say that a thread doesn't see its own accesses being
reordered. The reordering of a thread's accesses is only (potentially)
visible to another thread. Thus, memory barriers need only be used to
control the ordering of accesses between threads, not within a thread.
Reviewed by: bde, kib
Discussed with: jhb
MFC after: 1 week
with higher quality registers (presumably in a module that has just been
loaded), do not undo the user's choice by switching to the new timecounter.
Document that behavior, and also the fact that there is no way to unregister
a timecounter (and thus no way to unload a module containing one).
eliminating the need to build a custom kernel to use the CTS signal.
The historical UART_PPS_ON_CTS kernel option is still honored, but now it
can be overridden at runtime using a tunable to configure all uart devices
(hw.uart.pps_mode) or specific devices (dev.uart.#.pps_mode). The per-
device config is both a tunable and a writable sysctl.
This syncs the PPS capabilities of uart(4) with the enhancements recently
recently added to ucom(4) for capturing from USB serial devices.
Relnotes: yes
multiple processors. In particular, clearly state that the operations
are always atomic when they are applied to the default memory type
that is used by the kernel (and applications).
Reviewed by: kib, jhb (an earlier version)
MFC after: 1 week
There are still several bugs, but I've been using it for a while now.
Thanks to all the testers and to Adrian for his help with this
driver.
This driver isn't connected to the build yet, but it will be soon.
There's no MFC planned because the driver isn't very stable yet.
Reviewed by: adrian
Obtained from: https://github.com/rpaulo/iwm
Tested by: adrian, gjb, dumbbell (others that I forgot).
Relnotes: yes
CloudABI has two separate kernel modules: cloudabi and cloudabi64. The
first module contains all the pointer size independent code, whereas
cloudabi64 contains the actual 64-bits specific system calls and the ELF
loader.
Reviewed by: wblock
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3258
doubt most people will read to the end... Note the use of sys/cdefs.h
for pre-C11 compilers...
I didn't included a note about being compatibile w/ userland since a
C11 feature should be obviously usable in userland...
Suggested by: imp
to no longer claim they are experimental.
Reviewed by: rwatson@, wblock@
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D2985
The random_get() system call works similar to getentropy()/getrandom()
on OpenBSD/Linux. It fills a buffer with random data.
This change introduces a new function, read_random_uio(), that is used
to implement read() on the random devices. We can call into this
function from within the CloudABI compatibility layer.
Approved by: secteam
Reviewed by: jmg, markm, wblock
Obtained from: https://github.com/NuxiNL/freebsd
Differential Revision: https://reviews.freebsd.org/D3053
- Tweek man page.
- Remove all mention of RANDOM_FORTUNA. If the system owner wants YARROW or DUMMY, they ask for it, otherwise they get FORTUNA.
- Tidy up headers a bit.
- Tidy up declarations a bit.
- Make static in a couple of places where needed.
- Move Yarrow/Fortuna SYSINIT/SYSUNINIT to randomdev.c, moving us towards a single file where the algorithm context is used.
- Get rid of random_*_process_buffer() functions. They were only used in one place each, and are better subsumed into those places.
- Remove *_post_read() functions as they are stubs everywhere.
- Assert against buffer size illegalities.
- Clean up some silly code in the randomdev_read() routine.
- Make the harvesting more consistent.
- Make some requested argument name changes.
- Tidy up and clarify a few comments.
- Make some requested comment changes.
- Make some requested macro changes.
* NOTE: the thing calling itself a 'unit test' is not yet a proper
unit test, but it helps me ensure things work. It may be a proper
unit test at some time in the future, but for now please don't make
any assumptions or hold any expectations.
Differential Revision: https://reviews.freebsd.org/D2025
Approved by: so (/dev/random blanket)
This is based on work done by jeff@ and jhb@, as well as the numa.diff
patch that has been circulating when someone asks for first-touch NUMA
on -10 or -11.
* Introduce a simple set of VM policy and iterator types.
* tie the policy types into the vm_phys path for now, mirroring how
the initial first-touch allocation work was enabled.
* add syscalls to control changing thread and process defaults.
* add a global NUMA VM domain policy.
* implement a simple cascade policy order - if a thread policy exists, use it;
if a process policy exists, use it; use the default policy.
* processes inherit policies from their parent processes, threads inherit
policies from their parent threads.
* add a simple tool (numactl) to query and modify default thread/process
policities.
* add documentation for the new syscalls, for numa and for numactl.
* re-enable first touch NUMA again by default, as now policies can be
set in a variety of methods.
This is only relevant for very specific workloads.
This doesn't pretend to be a final NUMA solution.
The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
'sysctl vm.default_policy=rr'.
This is only relevant if MAXMEMDOM is set to something other than 1.
Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
this is a glorified no-op for you.
Thank you to Norse Corp for giving me access to rather large
(for FreeBSD!) NUMA machines in order to develop and verify this.
Thank you to Dell for providing me with dual socket sandybridge
and westmere v3 hardware to do NUMA development with.
Thank you to Scott Long at Netflix for providing me with access
to the two-socket, four-domain haswell v3 hardware.
Thank you to Peter Holm for running the stress testing suite
against the NUMA branch during various stages of development!
Tested:
* MIPS (regression testing; non-NUMA)
* i386 (regression testing; non-NUMA GENERIC)
* amd64 (regression testing; non-NUMA GENERIC)
* westmere, 2 socket (thankyou norse!)
* sandy bridge, 2 socket (thankyou dell!)
* ivy bridge, 2 socket (thankyou norse!)
* westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
* haswell, 2 socket (thankyou norse!)
* haswell v3, 2 socket (thankyou dell)
* haswell v3, 2x18 core (thankyou scott long / netflix!)
* Peter Holm ran a stress test suite on this work and found one
issue, but has not been able to verify it (it doesn't look NUMA
related, and he only saw it once over many testing runs.)
* I've tested bhyve instances running in fixed NUMA domains and cpusets;
all seems to work correctly.
Verified:
* intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
NUMA policies for processes under test.
Review:
This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
as well as privately and via emails to freebsd-arch@. The git history
with specific attributes is available at https://github.com/erikarn/freebsd/
in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).
This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
wblock) but not achieved a clear consensus. My hope is that with further
exposure and testing more functionality can be implemented and evaluated.
Notes:
* The VM doesn't handle unbalanced domains very well, and if you have an overly
unbalanced memory setup whilst under high memory pressure, VM page allocation
may fail leading to a kernel panic. This was a problem in the past, but it's
much more easily triggered now with these tools.
* This work only controls the path through vm_phys; it doesn't yet strongly/predictably
affect contigmalloc, KVA placement, UMA, etc. So, driver placement of memory
isn't really guaranteed in any way. That's next on my plate.
Sponsored by: Norse Corp, Inc.; Dell
OCF w/o documentation...
Document the new (8+ year old) device_t way of handling things, that
_unregister_all will leave no threads in newsession, the _SYNC flag,
the requirement that a flag be specified...
Other minor changes like breaking up a wall of text into paragraphs...
line statements inside of braces is recognized as an acceptable
style.
http://reviews.freebsd.org/V3
As always, this isn't license for wholesale change, etc.
mean what you think it should... This will be fixed in the future
with a flag rename, but document what the flag really does and make
the _IV_ flags clear what their presents (or lack there of) means...
Reviewed by: gnn, eri (both earlier version)
* GENERAL
- Update copyright.
- Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set
neither to ON, which means we want Fortuna
- If there is no 'device random' in the kernel, there will be NO
random(4) device in the kernel, and the KERN_ARND sysctl will
return nothing. With RANDOM_DUMMY there will be a random(4) that
always blocks.
- Repair kern.arandom (KERN_ARND sysctl). The old version went
through arc4random(9) and was a bit weird.
- Adjust arc4random stirring a bit - the existing code looks a little
suspect.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Redo read_random(9) so as to duplicate random(4)'s read internals.
This makes it a first-class citizen rather than a hack.
- Move stuff out of locked regions when it does not need to be
there.
- Trim RANDOM_DEBUG printfs. Some are excess to requirement, some
behind boot verbose.
- Use SYSINIT to sequence the startup.
- Fix init/deinit sysctl stuff.
- Make relevant sysctls also tunables.
- Add different harvesting "styles" to allow for different requirements
(direct, queue, fast).
- Add harvesting of FFS atime events. This needs to be checked for
weighing down the FS code.
- Add harvesting of slab allocator events. This needs to be checked for
weighing down the allocator code.
- Fix the random(9) manpage.
- Loadable modules are not present for now. These will be re-engineered
when the dust settles.
- Use macros for locks.
- Fix comments.
* src/share/man/...
- Update the man pages.
* src/etc/...
- The startup/shutdown work is done in D2924.
* src/UPDATING
- Add UPDATING announcement.
* src/sys/dev/random/build.sh
- Add copyright.
- Add libz for unit tests.
* src/sys/dev/random/dummy.c
- Remove; no longer needed. Functionality incorporated into randomdev.*.
* live_entropy_sources.c live_entropy_sources.h
- Remove; content moved.
- move content to randomdev.[ch] and optimise.
* src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h
- Remove; plugability is no longer used. Compile-time algorithm
selection is the way to go.
* src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h
- Add early (re)boot-time randomness caching.
* src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h
- Remove; no longer needed.
* src/sys/dev/random/uint128.h
- Provide a fake uint128_t; if a real one ever arrived, we can use
that instead. All that is needed here is N=0, N++, N==0, and some
localised trickery is used to manufacture a 128-bit 0ULLL.
* src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h
- Improve unit tests; previously the testing human needed clairvoyance;
now the test will do a basic check of compressibility. Clairvoyant
talent is still a good idea.
- This is still a long way off a proper unit test.
* src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h
- Improve messy union to just uint128_t.
- Remove unneeded 'static struct fortuna_start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
* src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h
- Improve messy union to just uint128_t.
- Remove unneeded 'staic struct start_cache'.
- Tighten up up arithmetic.
- Provide a method to allow eternal junk to be introduced; harden
it against blatant by compress/hashing.
- Assert that locks are held correctly.
- Fix the nasty pre- and post-read overloading by providing explictit
functions to do these tasks.
- Turn into self-sufficient module (no longer requires randomdev_soft.[ch])
- Fix some magic numbers elsewhere used as FAST and SLOW.
Differential Revision: https://reviews.freebsd.org/D2025
Reviewed by: vsevolod,delphij,rwatson,trasz,jmg
Approved by: so (delphij)
adding macros to define class lists.
This change is backwards compatible for all use within C and C++
programs. Only C++ programs will have added support to use the queue
macros within classes. Previously the queue macros could only be used
within structures.
The queue.3 manual page has been updated to describe the new
functionality and some alphabetic sorting has been done while
at it.
Differential Revision: https://reviews.freebsd.org/D2745
PR: 200827 (exp-run)
MFC after: 2 weeks
release barriers, not read and write barriers. They fence all memory
accesses from the respective side, not limited by the kind of
operation.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week