freebsd-skq

Author	SHA1	Message	Date
vangyzen	7293d43c89	kern_cpuset: fix small leak on error path The "mask" was leaked on some error paths. Reported by: Coverity CID: 1384683 Sponsored by: Dell EMC	2018-05-26 14:23:11 +00:00
mmacy	5eda5d6711	cpuset: revert and annotate instead	2018-05-19 05:07:31 +00:00
mmacy	a42e239a05	cpuset_thread0: avoid unused assignment on non debug build	2018-05-19 04:14:00 +00:00
jeff	5e244328ad	Implement several enhancements to NUMA policies. Add a new "interleave" allocation policy which stripes pages across domains with a stride or width keeping contiguity within a multi-page region. Move the kernel to the dedicated numbered cpuset #2 making it possible to assign kernel threads and memory policy separately from user. This also eliminates the need for the complicated interrupt binding code. Add a sysctl API for viewing and manipulating domainsets. Refactor some of the cpuset_t manipulation code using the generic bitset type so that it can be used for both. This probably belongs in a dedicated subr file. Attempt to improve the include situation. Reviewed by: kib Discussed with: jhb (cpuset parts) Tested by: pho (before review feedback) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14839	2018-03-29 02:54:50 +00:00
jeff	bfe01083f9	Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all platforms. Original commit message as follows: Only use CPUs in the domain the device is attached to for default assignment. Device drivers are able to override the default assignment if they bind directly. There are severe performance penalties for handling interrupts on remote CPUs and this should only be done in very controlled circumstances. Reviewed by: jhb, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14838	2018-03-28 18:47:35 +00:00
brooks	efdbf71b92	Copyout a whole int to cpuset_domain's policy pointer. The previous code only copied 16-bits and corrupted the target int. Reviewed by: kib, markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14611	2018-03-09 00:50:40 +00:00
jeff	94c7af8ca2	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403	2018-01-12 22:48:23 +00:00
pfg	cc22a86800	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:20:12 +00:00
jkim	cc8928bcd3	Fix size to copyout(9) for cpuset_getid(2). MFC after: 3 days	2017-08-22 20:46:29 +00:00
allanjude	56f722576f	Allow cpuset_{get,set}affinity in capabilities mode bhyve was recently sandboxed with capsicum, and needs to be able to control the CPU sets of its vcpu threads Reviewed by: emaste, oshogbo, rwatson MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D10170	2017-05-24 00:58:30 +00:00
cem	8638763f7e	Extend cpuset_get/setaffinity() APIs Add IRQ placement-only and ithread-only API variants. intr_event_bind has been extended with sibling methods, as it has many more callsites in existing code. Reviewed by: kib@, adrian@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10586	2017-05-03 18:41:08 +00:00
glebius	fef9613ffa	Remove unneeded include of vm_phys.h.	2017-04-17 16:51:04 +00:00
trasz	2751ab501f	Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(), and use it in compats instead of their sys_*() counterparts. Reviewed by: kib, jhb, dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9383	2017-02-05 13:24:54 +00:00
trasz	49f712a8ef	Add kern_cpuset_getid() and kern_cpuset_setid(), and use them in compat32 instead of their sub_*() counterparts. Reviewed by: jhb@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9382	2017-01-31 15:11:23 +00:00
jhb	6beb82443a	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782	2016-04-09 13:58:04 +00:00
adrian	bd7f5ebf0a	Un-static cpuset_which() - it's useful in other contexts, such as some CPU set operations in my upcoming NUMA work. Tested/compiled: * i386 (run) * amd64 (run) * mips (run) * mips64 (run) * armv6 (built) Sponsored by: Norse Corp, Inc.	2015-06-26 04:14:05 +00:00
jonathan	1288a9c619	Allow sizeof(cpuset_t) to be queried in capability mode. This allows functions that retrieve and inspect pthread_attr_t objects to work correctly: querying the cpuset_t size is part of querying CPU affinity information, which is part of creating a complete pthread_attr_t. Approved by: rwatson (mentor) Reviewed by: pjd Sponsored by: NSERC	2015-05-14 15:14:03 +00:00
jhb	8189659be8	Reject attempts to read the cpuset mask of a negative domain ID.	2015-01-08 19:11:14 +00:00
jhb	06e75f0dba	Create a cpuset mask for each NUMA domain that is available in the kernel via the global cpuset_domain[] array. To export these to userland, add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a specific domain. Add a -d flag to cpuset(1) that can be used to fetch the mask for a given domain. Differential Revision: https://reviews.freebsd.org/D1232 Submitted by: jeff (kernel bits) Reviewed by: adrian, jeff	2015-01-08 15:53:13 +00:00
hselasky	49c137f7be	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies	2014-10-21 07:31:21 +00:00
adrian	d3fedbed40	Modify cpuset_setithread() to take a CPU ID as an integer, not a char. We're going to end up having > 254 CPUs at some point.	2014-09-16 01:21:47 +00:00
melifaro	0a46d9d7d5	Fix error handling in cpuset_setithread() introduced in r267716. Noted by: kib MFC after: 1 week	2014-09-13 13:46:16 +00:00
melifaro	3f08add0c8	Permit changing cpu mask for cpu set 1 in presence of drivers binding their threads to particular CPU. Changing ithread cpu mask is now performed by special cpuset_setithread(). It creates additional cpuset root group on first bind invocation. No objection: jhb Tested by: hiren MFC after: 2 weeks Sponsored by: Yandex LLC	2014-06-22 11:32:23 +00:00
jhb	0807c44cdd	Several improvements to rmlock(9). Many of these are based on patches provided by Isilon. - Add an rm_assert() supporting various lock assertions similar to other locking primitives. Because rmlocks track readers the assertions are always fully accurate unlike rw_assert() and sx_assert(). - Flesh out the lock class methods for rmlocks to support sleeping via condvars and rm_sleep() (but only while holding write locks), rmlock details in 'show lock' in DDB, and the lc_owner method used by dtrace. - Add an internal destroyed cookie so that API functions can assert that an rmlock is not destroyed. - Make use of rm_assert() to add various assertions to the API (e.g. to assert locks are held when an unlock routine is called). - Give RM_SLEEPABLE locks their own lock class and always use the rmlock's own lock_object with WITNESS. - Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping while holding a read lock on an rmlock. Submitted by: andre Obtained from: EMC/Isilon	2013-06-25 18:44:15 +00:00
jeff	7ee88fb112	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division	2013-06-13 20:46:03 +00:00
jhb	190d5ac85b	Do not compare the existing mask of a cpuset with a new mask when changing the mask of a cpuset. Also, change the cpuset's mask before updating the masks of all children. Previously changing a cpuset's mask first required setting the mask to a super-set of both the old and new masks and then changing it a second time to the new mask.	2013-06-06 14:43:19 +00:00
attilio	ec1be6b9d0	Post r222812 KTR_CPUMASK started being initialized only as a tunable handler and not more statically. Unfortunately, it seems that this is not ideal for new platform bringup and boot low level development (which needs ktr_cpumask to be effective before tunables can be setup). Because of this, add a way to statically initialize cpusets, by passing an list of initializers, divided by commas. Also, provide a way to enforce an all-set mask, for above mentioned initializers. This imposes some differences on how KTR_CPUMASK is setup now as a kernel option, and in particular this makes the words specifications backward wrt. what is currently in -CURRENT. In order to avoid mismatches between KTR_CPUMASK definition and other way to setup the mask (tunable, sysctl) and to print it, change the ordering how cpusetobj_print() and cpusetobj_scan() acquire the words belonging to the set. Please give a look to sys/conf/NOTES in order to understand how the new format is supposed to work. Also, ktr manpages will be updated shortly by gjb which volountereed for this. This patch won't be merged because it changes a POLA (at least from the theoretical standpoint) and this is however a patch that proves to be effective only in development environments. Requested by: rpaulo Reviewed by: jeff, rpaulo	2012-08-30 21:22:47 +00:00
kevlo	07781bc7da	Add a missing curly bracket	2011-12-05 10:34:52 +00:00
kmacy	99851f359e	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)	2011-09-16 13:58:51 +00:00
attilio	a924571ff7	Fix KTR_CPUMASK in order to accept a string representing a cpuset_t. This introduce all the underlying support for making this possible (via the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64 implements its own assembly primitives for tracing events and needs to properly check it. Anyway the sparc64 logic is not implemented yet due to lack of knowledge (by me) and time (by marius), but it is just a matter of using ktr_cpumask when possible. Tested and fixed by: pluknet Reviewed by: marius	2011-05-31 20:48:58 +00:00
attilio	066c7ac96c	Revert a change that crept in during MFC.	2011-05-31 20:23:33 +00:00
attilio	b1bf71d3c5	MFC	2011-05-31 14:18:10 +00:00
attilio	66305282ac	Revert a patch that unvolountary sneaked in while I was MFCing.	2011-05-23 23:50:21 +00:00
attilio	6d7371f950	MFC	2011-05-23 01:17:30 +00:00
attilio	a8b367d89d	Merge r221912 from largeSMP project branch: Fix a long-standing bug in cpuset_thread0() where only the first part of cs_mask is set full. Submitted by: anonymous MFC after: 1 week	2011-05-22 21:35:03 +00:00
attilio	08bcb681d2	Make cpusetobj_strprint() prepare the string in order to print the least significant cpuset_t word at the outmost right part of the string (more far from the beginning of it). This follows the natural build of bits rappresentation in the words.	2011-05-22 20:29:47 +00:00
attilio	c5a5c48e70	Fix a longstanding bug where only the first part of the cpumask was correctly set full. Submitted by: anonymous	2011-05-14 19:36:12 +00:00
attilio	9309cc63ed	Simplify the code here. Submitted by: jhb	2011-05-14 18:22:08 +00:00
attilio	fe4de567b5	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno	2011-05-05 14:39:14 +00:00
jhb	269e1daa8f	When constructing a new cpuset, apply the parent cpuset's mask to the new set's mask rather than the root mask. This was causing the root mask to be modified incorrectly. Reviewed by: jeff MFC after: 1 week	2011-03-08 14:18:21 +00:00
davidxu	4c899bcdf5	Use integer for size of cpuset, as it won't be bigger than INT_MAX, This is requested by bge. Also move the sysctl into file kern_cpuset.c, because it should always be there, it is independent of thread scheduler.	2010-11-01 00:42:25 +00:00
davidxu	f8f25f57e2	- Revert r214409. - Use long word to figure out sizeof kernel cpuset, hope it works.	2010-10-27 09:29:03 +00:00
davidxu	29be5dcd22	If input parameter cpusetsize is zero, give userland size of cpuset mask kernel is using.	2010-10-27 02:32:54 +00:00
davidxu	bc55e49455	Use function tdfind() to find a thread.	2010-10-25 13:13:16 +00:00
jhb	1e218dfa91	Another nit that both I and ispell missed. Submitted by: Ben Kaduk minimarmot of gmail	2009-10-26 18:32:06 +00:00
jhb	81dc521c47	Fix some spelling nits.	2009-10-26 17:42:03 +00:00
jamie	4405625484	Remove unnecessary/redundant includes. Approved by: bz (mentor)	2009-06-23 14:39:21 +00:00
jamie	a013e0afcb	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)	2009-05-27 14:11:23 +00:00
bz	1507f5bd4d	Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail. Processes inside the jail will still be able to change child sets. A superuser outside of a jail will still be able to change the jail cpuset and thus limit the number of cpus available to the jail. Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman) PR: kern/134050 Reviewed by: jeff MFC after: 3 weeks X-MFC: backout r191596	2009-04-28 21:00:50 +00:00
bz	cb075fbf99	Correct a comment: the function name given had never existed in any (relevant) version of this file orany of my patches. MFC after: 1 month	2009-04-22 20:49:54 +00:00

1 2

64 Commits