Commit Graph

36 Commits

Author SHA1 Message Date
Kip Macy
8451d0dd78 In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by:	rwatson
Approved by:	re (bz)
2011-09-16 13:58:51 +00:00
Attilio Rao
e370959707 Fix KTR_CPUMASK in order to accept a string representing a cpuset_t.
This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported.  sparc64
implements its own assembly primitives for tracing events and needs to
properly check it.  Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.

Tested and fixed by:	pluknet
Reviewed by:		marius
2011-05-31 20:48:58 +00:00
Attilio Rao
d0984adc98 Revert a change that crept in during MFC. 2011-05-31 20:23:33 +00:00
Attilio Rao
5b6ea0b538 MFC 2011-05-31 14:18:10 +00:00
Attilio Rao
217e1c0ebc Revert a patch that unvolountary sneaked in while I was MFCing. 2011-05-23 23:50:21 +00:00
Attilio Rao
a9ff18a210 MFC 2011-05-23 01:17:30 +00:00
Attilio Rao
e3071102d6 Merge r221912 from largeSMP project branch:
Fix a long-standing bug in cpuset_thread0() where only the first part
of cs_mask is set full.

Submitted by:	anonymous
MFC after:	1 week
2011-05-22 21:35:03 +00:00
Attilio Rao
34a1e065bd Make cpusetobj_strprint() prepare the string in order to print the
least significant cpuset_t word at the outmost right part of the string
(more far from the beginning of it).  This follows the natural build of
bits rappresentation in the words.
2011-05-22 20:29:47 +00:00
Attilio Rao
f27aed53d0 Fix a longstanding bug where only the first part of the cpumask was
correctly set full.

Submitted by:	anonymous
2011-05-14 19:36:12 +00:00
Attilio Rao
faa0e911fb Simplify the code here.
Submitted by:	jhb
2011-05-14 18:22:08 +00:00
Attilio Rao
71a19bdc64 Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
  different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
  accessed avoiding migration, because the size of cpuset_t should be
  considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
  primirally done in order to leave some more space in userland to cope
  with KBI extensions). If you need to access kernel cpuset_t from the
  userland please refer to example in this patch on how to do that
  correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by:	pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by:	jeff, jhb, sbruno
2011-05-05 14:39:14 +00:00
John Baldwin
e84c2db137 When constructing a new cpuset, apply the parent cpuset's mask to the new
set's mask rather than the root mask.  This was causing the root mask to
be modified incorrectly.

Reviewed by:	jeff
MFC after:	1 week
2011-03-08 14:18:21 +00:00
David Xu
444528c026 Use integer for size of cpuset, as it won't be bigger than INT_MAX,
This is requested by bge.
Also move the sysctl into file kern_cpuset.c, because it should
always be there, it is independent of thread scheduler.
2010-11-01 00:42:25 +00:00
David Xu
4a5478709b - Revert r214409.
- Use long word to figure out sizeof kernel cpuset, hope it works.
2010-10-27 09:29:03 +00:00
David Xu
1676b42546 If input parameter cpusetsize is zero, give userland size of cpuset mask
kernel is using.
2010-10-27 02:32:54 +00:00
David Xu
42fe684c1a Use function tdfind() to find a thread. 2010-10-25 13:13:16 +00:00
John Baldwin
86855bf549 Another nit that both I and ispell missed.
Submitted by:	Ben Kaduk  minimarmot of gmail
2009-10-26 18:32:06 +00:00
John Baldwin
9390262576 Fix some spelling nits. 2009-10-26 17:42:03 +00:00
Jamie Gritton
399645e1b9 Remove unnecessary/redundant includes.
Approved by:	bz (mentor)
2009-06-23 14:39:21 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Bjoern A. Zeeb
6aaa0b3cf1 Prevent a superuser inside a jail from modifying the dedicated
root cpuset of that jail.
Processes inside the jail will still be able to change child sets.
A superuser outside of a jail will still be able to change the jail cpuset
and thus limit the number of cpus available to the jail.

Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman)
PR:		kern/134050
Reviewed by:	jeff
MFC after:	3 weeks
X-MFC:		backout r191596
2009-04-28 21:00:50 +00:00
Bjoern A. Zeeb
47479a8ceb Correct a comment: the function name given had never existed in any
(relevant) version of this file orany of my patches.

MFC after:	1 month
2009-04-22 20:49:54 +00:00
Bjoern A. Zeeb
413628a7e3 MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
  and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
  help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
  suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
  on cluster machines as well as all the testers and people
  who provided feedback the last months on freebsd-jail and
  other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by:	(see above)
MFC after:	3 months (this is just so that I get the mail)
X-MFC Before:   7.2-RELEASE if possible
2008-11-29 14:32:14 +00:00
Bjoern A. Zeeb
dea0ed6690 Add a `show cpusets' DDB command to print numbered root and
assigned CPU affinity sets.

Reviewed by:	brooks
2008-07-07 21:32:02 +00:00
Bjoern A. Zeeb
7a8f695a21 Move cpuset_refroot and cpuset_refbase functions up, grouping the
cpuset_ref* functions together. Will make it easier to read and
add code without forward declarations.
No functional changes.
2008-07-07 20:45:55 +00:00
Bjoern A. Zeeb
ba931c0855 Add a new priv 'PRIV_SCHED_CPUSET' to check if manipulating cpusets is
allowed and replace the suser() call. Do not allow it in jails.

Reviewed by:	rwatson
2008-06-29 17:58:16 +00:00
Konstantin Belousov
887aedc64e Take into account possible overflow when multiplying. The casuality is
the malloc call later, panicing kernel due to the oversized allocation.

Reported by:	pho
Reviewed by:	jeff
2008-05-26 10:01:13 +00:00
Jeff Roberson
9b33b154b5 - Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number.  Ignore the irq# for soft intrs.
 - Add support to cpuset for binding hardware interrupts.  This has the
   side effect of binding any ithread associated with the hard interrupt.
   As per restrictions imposed by MD code we can only bind interrupts to
   a single cpu presently.  Interrupts can be 'unbound' by binding them
   to all cpus.

Reviewed by:	jhb
Sponsored by:	Nokia
2008-04-11 03:26:41 +00:00
Jeff Roberson
3bc8c68d9f - Add a Nokia copyright to cpuset to reflect their generous
contribution to this work.
2008-04-04 01:22:04 +00:00
Jeff Roberson
a03ee0000e - Consistently return EDEADLK when presented with a new set that is
incompatible with existing bindings.
 - Try to copyout the setid in cpuset() before migrating the proc to the
   setid in case the user has supplied a bad buffer.
 - Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to
   be more descriptive and free cpuset_root to be used as a different
   type of symbol.
 - Make cpuset_root the cpuset_t set of all cpus in the system.  This
   should contain the same bitmask as all_cpus presently.
 - Add a CPU_CMP() macro to compare two sets.
2008-03-30 11:31:14 +00:00
Ruslan Ermilov
7f64829a5e Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t.
Prodded by:	davidxu
2008-03-25 09:11:53 +00:00
Jeff Roberson
374ae2a393 - Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from
requiring the per-process spinlock to only requiring the process lock.
 - Reflect these changes in the proc.h documentation and consumers throughout
   the kernel.  This is a substantial reduction in locking cost for these
   fields and was made possible by recent changes to threading support.
2008-03-19 06:19:01 +00:00
Jeff Roberson
c6440f72b6 - Add a missing unlock to cpuset_setaffinity(CPU_LEVEL_CPUSET, CPU_WHICH_PID)
Found by:	gallatin
2008-03-06 20:11:24 +00:00
Jeff Roberson
8bd75bdde4 - Don't overwrite the recently allocated 'nset' in cpuset_setthread() by
passing it to cpuset_which().  Pass in 'set' instead.  This argument
   is not used but for convenience cpuset_which() nulls all incoming
   parameters.

Submitted by:	davidxu
2008-03-05 08:08:32 +00:00
Jeff Roberson
73c40187fd - Verify that when a user supplies a mask that is bigger than the kernel
mask none of the upper bits are set.
 - Be more careful about enforcing the boundaries of masks and child sets.
 - Introduce a few more CPU_* macros for implementing these tests.
 - Change the cpusetsize argument to be bytes rather than bits to match
   other apis.

Sponsored by:	Nokia
2008-03-05 01:49:20 +00:00
Jeff Roberson
d7f687fc9b Add cpuset, an api for thread to cpu binding and cpu resource grouping
and assignment.
 - Add a reference to a struct cpuset in each thread that is inherited from
   the thread that created it.
 - Release the reference when the thread is destroyed.
 - Add prototypes for syscalls and macros for manipulating cpusets in
   sys/cpuset.h
 - Add syscalls to create, get, and set new numbered cpusets:
   cpuset(), cpuset_{get,set}id()
 - Add syscalls for getting and setting affinity masks for cpusets or
   individual threads: cpuid_{get,set}affinity()
 - Add types for the 'level' and 'which' parameters for the cpuset.  This
   will permit expansion of the api to cover cpu masks for other objects
   identifiable with an id_t integer.  For example, IRQs and Jails may be
   coming soon.
 - The root set 0 contains all valid cpus.  All thread initially belong to
   cpuset 1.  This permits migrating all threads off of certain cpus to
   reserve them for special applications.

Sponsored by:	Nokia
Discussed with:	arch, rwatson, brooks, davidxu, deischen
Reviewed by:	antoine
2008-03-02 07:39:22 +00:00