Commit Graph

90 Commits

Author SHA1 Message Date
Mark Johnston
29bb6c19f0 domainset: Define additional global policies
Add global definitions for first-touch and interleave policies.  The
former may be useful for UMA, which implements a similar policy without
using domainset iterators.

No functional change intended.

Reviewed by:	mav
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29104
2021-04-14 13:03:33 -04:00
Kyle Evans
60c4ec806d jail: allow root to implicitly widen its cpuset to attach
The default behavior for attaching processes to jails is that the jail's
cpuset augments the attaching processes, so that it cannot be used to
escalate a user's ability to take advantage of more CPUs than the
administrator wanted them to.

This is problematic when root needs to manage jails that have disjoint
sets with whatever process is attaching, as this would otherwise result
in a deadlock. Therefore, if we did not have an appropriate common
subset of cpus/domains for our new policy, we now allow the process to
simply take on the jail set *if* it has the privilege to widen its mask
anyways.

With the new logic, root can still usefully cpuset a process that
attaches to a jail with the desire of maintaining the set it was given
pre-attachment while still retaining the ability to manage child jails
without jumping through hoops.

A test has been added to demonstrate the issue; cpuset of a process
down to just the first CPU and attempting to attach to a jail without
access to any of the same CPUs previously resulted in EDEADLK and now
results in taking on the jail's mask for privileged users.

PR:		253724
Reviewed by:	jamie (also discussed with)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D28952
2021-03-01 12:38:31 -06:00
Kyle Evans
54a837c8cc kern: cpuset: allow jails to modify child jails' roots
This partially lifts a restriction imposed by r191639 ("Prevent a superuser
inside a jail from modifying the dedicated root cpuset of that jail") that's
perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still
cannot modify their own cpuset, but they can modify child jails' roots to
further restrict them or widen them back to the modifying jails' own mask.

As a side effect of this, the system root may once again widen the mask of
jails as long as they're still using a subset of the parent jails' mask.
This was previously prevented by the fact that cpuset_getroot of a root set
will return that root, rather than the root's parent -- cpuset_modify uses
cpuset_getroot since it was introduced in r327895, previously it was just
validating against set->cs_parent which allowed the system root to widen
jail masks.

Reviewed by:	jamie
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27352
2020-12-19 03:30:06 +00:00
Kyle Evans
f1b18a668d cpuset_set{affinity,domain}: do not allow empty masks
cpuset_modify() would not currently catch this, because it only checks that
the new mask is a subset of the root set and circumvents the EDEADLK check
in cpuset_testupdate().

This change both directly validates the mask coming in since we can
trivially detect an empty mask, and it updates cpuset_testupdate to catch
stuff like this going forward by always ensuring we don't end up with an
empty mask.

The check_mask argument has been renamed because the 'check' verbiage does
not imply to me that it's actually doing a different operation. We're either
augmenting the existing mask, or we are replacing it entirely.

Reported by:	syzbot+4e3b1009de98d2fabcda@syzkaller.appspotmail.com
Discussed with:	andrew
Reviewed by:	andrew, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27511
2020-12-08 18:47:22 +00:00
Kyle Evans
b2780e8537 kern: cpuset: resolve race between cpuset_lookup/cpuset_rel
The race plays out like so between threads A and B:

1. A ref's cpuset 10
2. B does a lookup of cpuset 10, grabs the cpuset lock and searches
   cpuset_ids
3. A rel's cpuset 10 and observes the last ref, waits on the cpuset lock
   while B is still searching and not yet ref'd
4. B ref's cpuset 10 and drops the cpuset lock
5. A proceeds to free the cpuset out from underneath B

Resolve the race by only releasing the last reference under the cpuset lock.
Thread A now picks up the spinlock and observes that the cpuset has been
revived, returning immediately for B to deal with later.

Reported by:	syzbot+92dff413e201164c796b@syzkaller.appspotmail.com
Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27498
2020-12-08 18:45:47 +00:00
Kyle Evans
9c83dab96c kern: cpuset: plug a unr leak
cpuset_rel_defer() is supposed to be functionally equivalent to
cpuset_rel() but with anything that might sleep deferred until
cpuset_rel_complete -- this setup is used specifically for cpuset_setproc.

Add in the missing unr free to match cpuset_rel. This fixes a leak that
was observed when I wrote a small userland application to try and debug
another issue, which effectively did:

cpuset(&newid);
cpuset(&scratch);

newid gets leaked when scratch is created; it's off the list, so there's
no mechanism for anything else to relinquish it. A more realistic reproducer
would likely be a process that inherits some cpuset that it's the only ref
for, but it creates a new one to modify. Alternatively, administratively
reassigning a process' cpuset that it's the last ref for will have the same
effect.

Discovered through D27498.

MFC after:	1 week
2020-12-08 18:44:06 +00:00
Kyle Evans
e07e3fa3c9 kern: cpuset: drop the lock to allocate domainsets
Restructure the loop a little bit to make it a little more clear how it
really operates: we never allocate any domains at the beginning of the first
iteration, and it will run until we've satisfied the amount we need or we
encounter an error.

The lock is now taken outside of the loop to make stuff inside the loop
easier to evaluate w.r.t. locking.

This fixes it to not try and allocate any domains for the freelist under the
spinlock, which would have happened before if we needed any new domains.

Reported by:	syzbot+6743fa07b9b7528dc561@syzkaller.appspotmail.com
Reviewed by:	markj
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D27371
2020-11-28 01:21:11 +00:00
Kyle Evans
d431dea5ac kern: cpuset: properly rebase when attaching to a jail
The current logic is a fine choice for a system administrator modifying
process cpusets or a process creating a new cpuset(2), but not ideal for
processes attaching to a jail.

Currently, when a process attaches to a jail, it does exactly what any other
process does and loses any mask it might have applied in the process of
doing so because cpuset_setproc() is entirely based around the assumption
that non-anonymous cpusets in the process can be replaced with the new
parent set.

This approach slightly improves the jail attach integration by modifying
cpuset_setproc() callers to indicate if they should rebase their cpuset to
the indicated set or not (i.e. cpuset_setproc_update_set).

If we're rebasing and the process currently has a cpuset assigned that is
not the containing jail's root set, then we will now create a new base set
for it hanging off the jail's root with the existing mask applied instead of
using the jail's root set as the new base set.

Note that the common case will be that the process doesn't have a cpuset
within the jail root, but the system root can freely assign a cpuset from
a jail to a process outside of the jail with no restriction. We assume that
that may have happened or that it could happen due to a race when we drop
the proc lock, so we must recheck both within the loop to gather up
sufficient freed cpusets and after the loop.

To recap, here's how it worked before in all cases:

0     4 <-- jail              0      4 <-- jail / process
|                             |
1                 ->          1
|
3 <-- process

Here's how it works now:

0     4 <-- jail             0       4 <-- jail
|                            |       |
1                 ->         1       5 <-- process
|
3 <-- process

or

0     4 <-- jail             0       4 <-- jail / process
|                            |
1 <-- process     ->         1

More importantly, in both cases, the attaching process still retains the
mask it had prior to attaching or the attach fails with EDEADLK if it's
left with no CPUs to run on or the domain policy is incompatible. The
author of this patch considers this almost a security feature, because a MAC
policy could grant PRIV_JAIL_ATTACH to an unprivileged user that's
restricted to some subset of available CPUs the ability to attach to a jail,
which might lift the user's restrictions if they attach to a jail with a
wider mask.

In most cases, it's anticipated that admins will use this to be able to,
for example, `cpuset -c -l 1 jail -c path=/ command=/long/running/cmd`,
and avoid the need for contortions to spawn a command inside a jail with a
more limited cpuset than the jail.

Reviewed by:	jamie
MFC after:	1 month (maybe)
Differential Revision:	https://reviews.freebsd.org/D27298
2020-11-25 03:14:25 +00:00
Kyle Evans
30b7c6f977 kern: cpuset: rename _cpuset_create() to cpuset_init()
cpuset_init() is better descriptor for what the function actually does. The
name was previously taken by a sysinit that setup cpuset_zero's mask
from all_cpus, it was removed in r331698 before stable/12 branched.

A comment referencing the removed sysinit has now also been removed, since
the setup previously done was moved into cpuset_thread0().

Suggested by:	markj
MFC after:	1 week
2020-11-25 02:12:24 +00:00
Kyle Evans
29d04ea8c3 kern: cpuset: allow cpuset_create() to take an allocated *setp
Currently, it must always allocate a new set to be used for passing to
_cpuset_create, but it doesn't have to. This is purely kern_cpuset.c
internal and it's sparsely used, so just change it to use *setp if it's
not-NULL and modify the two consumers to pass in the address of a NULL
cpuset.

This paves the way for consumers that want the unr allocation without the
possibility of sleeping as long as they've done their due diligence to
ensure that the mask will properly apply atop the supplied parent
(i.e. avoiding the free_unr() in the last failure path).

Reviewed by:	jamie, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D27297
2020-11-25 01:42:32 +00:00
Kyle Evans
dac521ebcf cpuset_setproc: use the appropriate parent for new anonymous sets
As far as I can tell, this has been the case since initially committed in
2008.  cpuset_setproc is the executor of cpuset reassignment; note this
excerpt from the description:

* 1) Set is non-null.  This reparents all anonymous sets to the provided
*    set and replaces all non-anonymous td_cpusets with the provided set.

However, reviewing cpuset_setproc_setthread() for some jail related work
unearthed the error: if tdset was not anonymous, we were replacing it with
`set`. If it was anonymous, then we'd rebase it onto `set` (i.e. copy the
thread's mask over and AND it with `set`) but give the new anonymous set
the original tdset as the parent (i.e. the base of the set we're supposed to
be leaving behind).

The primary visible consequences were that:

1.) cpuset_getid() following such assignment returns the wrong result, the
    setid that we left behind rather than the one we joined.
2.) When a process attached to the jail, the base set of any anonymous
    threads was a set outside of the jail.

This was initially bundled in D27298, but it's a minor fix that's fairly
easy to verify the correctness of.

A test is included in D27307 ("badparent"), which demonstrates the issue
with, effectively:

osetid = cpuset_getid()
newsetid = cpuset()
cpuset_setaffinity(thread)
cpuset_setid(osetid)
cpuset_getid(thread) -> observe that it matches newsetid instead of osetid.

MFC after:	1 week
2020-11-23 02:49:53 +00:00
Mateusz Guzik
1a7bb89629 cpuset: refcount-clean 2020-11-17 00:04:05 +00:00
Mateusz Guzik
6fed89b179 kern: clean up empty lines in .c and .h files 2020-09-01 22:12:32 +00:00
Mark Johnston
69b565d7c0 Allow accesses of the caller's CPU and domain sets in capability mode.
cpuset_(get|set)(affinity|domain)(2) permit a get or set of the calling
thread or process' CPU and domain set in capability mode, but only when
the thread or process ID is specified as -1.  Extend this to cover the
case where the ID actually matches the caller's TID or PID, since some
code, such as our pthread_attr_get_np() implementation, always provides
an explicit ID.

It was not and still is not permitted to access CPU and domain sets for
other threads in the same process when the process is in capability
mode.  This might change in the future.

Submitted by:	Greg V <greg@unrelenting.technology> (original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25552
2020-07-06 16:34:09 +00:00
Mark Johnston
9eb997cb48 Lift cpuset Capsicum checks into a subroutine.
Otherwise the same checks are duplicated across four different system
call implementations, cpuset_(get|set)(affinity|domain)().  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:33:21 +00:00
Ryan Libby
9825eadf2c bitset: rename confusing macro NAND to ANDNOT
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too.  The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with:	jeff
Reviewed by:	cem
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D22791
2019-12-13 09:32:16 +00:00
Mark Johnston
45cdd437ae Remove a redundant NULL pointer check in cpuset_modify_domain().
cpuset_getroot() is guaranteed to return a non-NULL pointer.

Reported by:	Mark Millard <marklmi@yahoo.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-09-12 16:47:38 +00:00
Stephen J. Kiernan
d57cd5ccd3 Bump up the low range of cpuset numbers to account for the kernel cpuset.
Reviewed by:	jeff
Obtained from:	Juniper Networks, Inc.
2019-09-05 17:48:39 +00:00
Mark Johnston
63cdd18e40 Restrict the input domain set in cpuset_setdomain(2) to all_domains.
To permit larger values of MAXMEMDOM, which is currently 8 on amd64,
cpuset_setdomain(2) accepts a mask of size 256.  In the kernel, domain
set masks are 64 bits wide, but can only represent a set of MAXMEMDOM
domains due to the use of the ds_order table.

Domain sets passed to cpuset_setdomain(2) are restricted to a subset
of their parent set, which is typically the root set, but before this
happens we modify the input set to exclude empty domains.
domainset_empty_vm() and other code which manipulates domain sets
expect the mask to be a subset of all_domains, so enforce that when
performing validation of cpuset_setdomain(2) parameters.

Reported and tested by:	pho
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21477
2019-09-01 21:38:08 +00:00
Mark Johnston
44e4def73b Remove an extraneous + 1 in _domainset_create().
DOMAINSET_FLS, like our fls(), is 1-indexed.

Reported by:	alc
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2019-08-27 15:42:08 +00:00
Mark Johnston
8e6975047e Fix several logic issues in domainset_empty_vm().
- Don't add 1 to the result of DOMAINSET_FLS.
- Do not modify domainsets containing only empty domains.
- Always flatten a _PREFER policy to _ROUNDROBIN if the preferred
  domain is empty.  Previously we were doing this only when ds_cnt > 1.

These bugs could cause hangs during boot if a VM domain is empty.

Tested by:	hselasky
Reviewed by:	hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21420
2019-08-27 14:06:34 +00:00
Mark Johnston
9978bd996b Add malloc_domainset(9) and _domainset variants to other allocator KPIs.
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain.  The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted.  Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.

This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.

Discussed with:	gallatin, jeff
Reported and tested by:	pho (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17418
2018-10-30 18:26:34 +00:00
Mark Johnston
920239efde Fix some problems that manifest when NUMA domain 0 is empty.
- In uma_prealloc(), we need to check for an empty domain before the
  first allocation attempt, not after.  Fix this by switching
  uma_prealloc() to use a vm_domainset iterator, which addresses the
  secondary issue of using a signed domain identifier in round-robin
  iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
  excluding empty domains; otherwise we may frequently specify an empty
  domain when calling in to the page allocator, wasting CPU time.
  Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
  total page counts for now: some vm_phys segments are created before
  the SRAT is parsed and are thus always identified as being in domain 0
  even when they are not.  Then, when bootstrap pages are freed, they
  are added to a domain that we had previously thought was empty.  Until
  this is corrected, we simply exclude them from the per-domain page
  count.

Reported and tested by:	Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by:	gallatin
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17704
2018-10-30 17:57:40 +00:00
Mark Johnston
b61f314290 Make it possible to disable NUMA support with a tunable.
This provides a chicken switch for anyone negatively impacted by
enabling NUMA in the amd64 GENERIC kernel configuration.  With
NUMA disabled at boot-time, information about the NUMA topology
is not exposed to the rest of the kernel, and all of physical
memory is viewed as coming from a single domain.

This method still has some performance overhead relative to disabling
NUMA support at compile time.

PR:		231460
Reviewed by:	alc, gallatin, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17439
2018-10-22 20:13:51 +00:00
Mark Johnston
662e7fa8d9 Create some global domainsets and refactor NUMA registration.
Pre-defined policies are useful when integrating the domainset(9)
policy machinery into various kernel memory allocators.

The refactoring will make it easier to add NUMA support for other
architectures.

No functional change intended.

Reviewed by:	alc, gallatin, jeff, kib
Tested by:	pho (part of a larger patch)
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17416
2018-10-20 17:36:00 +00:00
Andrew Gallatin
30c5525b3c Allow empty NUMA memory domains to support Threadripper2
The AMD Threadripper 2990WX is basically a slightly crippled Epyc.
Rather than having 4 memory controllers, one per NUMA domain, it has
only 2  memory controllers enabled. This means that only 2 of the
4 NUMA domains can be populated with physical memory, and the
others are empty.

Add support to FreeBSD for empty NUMA domains by:

- creating empty memory domains when parsing the SRAT table,
    rather than failing to parse the table
- not running the pageout deamon threads in empty domains
- adding defensive code to UMA to avoid allocating from empty domains
- adding defensive code to cpuset to avoid binding to an empty domain
    Thanks to Jeff for suggesting this strategy.

Reviewed by:	alc, markj
Approved by:	re (gjb@)
Differential Revision:	https://reviews.freebsd.org/D1683
2018-10-01 14:14:21 +00:00
Eric van Gyzen
70d66bcf28 kern_cpuset: fix small leak on error path
The "mask" was leaked on some error paths.

Reported by:	Coverity
CID:		1384683
Sponsored by:	Dell EMC
2018-05-26 14:23:11 +00:00
Matt Macy
a6c7423a92 cpuset: revert and annotate instead 2018-05-19 05:07:31 +00:00
Matt Macy
39eef2f45a cpuset_thread0: avoid unused assignment on non debug build 2018-05-19 04:14:00 +00:00
Jeff Roberson
e5818a53db Implement several enhancements to NUMA policies.
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.

Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user.  This
also eliminates the need for the complicated interrupt binding code.

Add a sysctl API for viewing and manipulating domainsets.  Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both.  This probably belongs in a dedicated subr file.

Attempt to improve the include situation.

Reviewed by:	kib
Discussed with:	jhb (cpuset parts)
Tested by:	pho (before review feedback)
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D14839
2018-03-29 02:54:50 +00:00
Jeff Roberson
27a3c9d710 Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all
platforms.  Original commit message as follows:

Only use CPUs in the domain the device is attached to for default
assignment.  Device drivers are able to override the default assignment
if they bind directly.  There are severe performance penalties for
handling interrupts on remote CPUs and this should only be done in
very controlled circumstances.

Reviewed by:    jhb, kib
Tested by:      pho
Sponsored by:   Netflix, Dell/EMC Isilon
Differential Revision:  https://reviews.freebsd.org/D14838
2018-03-28 18:47:35 +00:00
Brooks Davis
dd51fec3b9 Copyout a whole int to cpuset_domain's policy pointer.
The previous code only copied 16-bits and corrupted the target int.

Reviewed by:	kib, markj
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14611
2018-03-09 00:50:40 +00:00
Jeff Roberson
3f289c3fcf Implement 'domainset', a cpuset based NUMA policy mechanism. This allows
userspace to control NUMA policy administratively and programmatically.

Implement domainset based iterators in the page layer.

Remove the now legacy numa_* syscalls.

Cleanup some header polution created by having seq.h in proc.h.

Reviewed by:	markj, kib
Discussed with:	alc
Tested by:	pho
Sponsored by:	Netflix, Dell/EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13403
2018-01-12 22:48:23 +00:00
Pedro F. Giffuni
8a36da99de sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 15:20:12 +00:00
Jung-uk Kim
a1d0659ca9 Fix size to copyout(9) for cpuset_getid(2).
MFC after:	3 days
2017-08-22 20:46:29 +00:00
Allan Jude
f299c47b52 Allow cpuset_{get,set}affinity in capabilities mode
bhyve was recently sandboxed with capsicum, and needs to be able to
control the CPU sets of its vcpu threads

Reviewed by:	emaste, oshogbo, rwatson
MFC after:	2 weeks
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D10170
2017-05-24 00:58:30 +00:00
Conrad Meyer
29dfb631d8 Extend cpuset_get/setaffinity() APIs
Add IRQ placement-only and ithread-only API variants. intr_event_bind
has been extended with sibling methods, as it has many more callsites in
existing code.

Reviewed by:	kib@, adrian@ (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D10586
2017-05-03 18:41:08 +00:00
Gleb Smirnoff
6286dc78d4 Remove unneeded include of vm_phys.h. 2017-04-17 16:51:04 +00:00
Edward Tomasz Napierala
96ee43103d Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(),
and use it in compats instead of their sys_*() counterparts.

Reviewed by:	kib, jhb, dchagin
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9383
2017-02-05 13:24:54 +00:00
Edward Tomasz Napierala
ea2ebdc19e Add kern_cpuset_getid() and kern_cpuset_setid(), and use them
in compat32 instead of their sub_*() counterparts.

Reviewed by:	jhb@, kib@
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D9382
2017-01-31 15:11:23 +00:00
John Baldwin
62d70a8174 Add more fine-grained kernel options for NUMA support.
VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in
the virtual memory system.  DEVICE_NUMA is used to enable affinity
reporting for devices such as bus_get_domain().

MAXMEMDOM must still be set to a value greater than for any NUMA support
to be effective.  Note that 'cpuset -gd' always works if MAXMEMDOM is
enabled and the system supports NUMA.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5782
2016-04-09 13:58:04 +00:00
Adrian Chadd
5bbb2169d2 Un-static cpuset_which() - it's useful in other contexts, such as some
CPU set operations in my upcoming NUMA work.

Tested/compiled:

* i386 (run)
* amd64 (run)
* mips (run)
* mips64 (run)
* armv6 (built)

Sponsored by:	Norse Corp, Inc.
2015-06-26 04:14:05 +00:00
Jonathan Anderson
60aa2c85fa Allow sizeof(cpuset_t) to be queried in capability mode.
This allows functions that retrieve and inspect pthread_attr_t objects to
work correctly: querying the cpuset_t size is part of querying CPU
affinity information, which is part of creating a complete pthread_attr_t.

Approved by: rwatson (mentor)
Reviewed by: pjd
Sponsored by: NSERC
2015-05-14 15:14:03 +00:00
John Baldwin
bbf686ed60 Reject attempts to read the cpuset mask of a negative domain ID. 2015-01-08 19:11:14 +00:00
John Baldwin
c0ae66888b Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision:	https://reviews.freebsd.org/D1232
Submitted by:	jeff (kernel bits)
Reviewed by:	adrian, jeff
2015-01-08 15:53:13 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Adrian Chadd
7f7528fc79 Modify cpuset_setithread() to take a CPU ID as an integer, not a char.
We're going to end up having > 254 CPUs at some point.
2014-09-16 01:21:47 +00:00
Alexander V. Chernikov
c1d9ecf2be Fix error handling in cpuset_setithread() introduced in r267716.
Noted by:	kib
MFC after:	1 week
2014-09-13 13:46:16 +00:00
Alexander V. Chernikov
811985398d Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection:	jhb
Tested by:	hiren
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2014-06-22 11:32:23 +00:00
John Baldwin
cd32bd7ad1 Several improvements to rmlock(9). Many of these are based on patches
provided by Isilon.
- Add an rm_assert() supporting various lock assertions similar to other
  locking primitives.  Because rmlocks track readers the assertions are
  always fully accurate unlike rw_assert() and sx_assert().
- Flesh out the lock class methods for rmlocks to support sleeping via
  condvars and rm_sleep() (but only while holding write locks), rmlock
  details in 'show lock' in DDB, and the lc_owner method used by
  dtrace.
- Add an internal destroyed cookie so that API functions can assert
  that an rmlock is not destroyed.
- Make use of rm_assert() to add various assertions to the API (e.g.
  to assert locks are held when an unlock routine is called).
- Give RM_SLEEPABLE locks their own lock class and always use the
  rmlock's own lock_object with WITNESS.
- Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping
  while holding a read lock on an rmlock.

Submitted by:	andre
Obtained from:	EMC/Isilon
2013-06-25 18:44:15 +00:00