Commit Graph

156 Commits

Author SHA1 Message Date
Konstantin Belousov
02c6fc2114 Add a sysctl kern.pid_max, which limits the maximum pid the system is
allowed to allocate, and corresponding tunable with the same
name. Note that existing processes with higher pids are left intact.

MFC after:	1 week
2012-08-15 15:56:21 +00:00
Robert Watson
ff66f6a404 Define two new sysctl node flags: CTLFLAG_CAPRD and CTLFLAG_CAPRW, which
may be jointly referenced via the mask CTLFLAG_CAPRW.  Sysctls with these
flags are available in Capsicum's capability mode; other sysctl nodes are
not.

Flag several useful sysctls as available in capability mode, such as memory
layout sysctls required by the run-time linker and malloc(3).  Also expose
access to randomness and available kernel features.

A few sysctls are enabled to support name->MIB conversion; these may leak
information to capability mode by virtue of providing resolution on names
not flagged for access in capability mode.  This is, generally, not a huge
problem, but might be something to resolve in the future.  Flag these cases
with XXX comments.

Submitted by:	jonathan
Sponsored by:	Google, Inc.
2011-07-17 23:05:24 +00:00
Matthew D Fleming
fbbb13f962 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
Konstantin Belousov
87d45a0392 When compat32 binary asks for the value of hw.machine_arch, report the
name of 32bit sibling architecture instead of the host one. Do the
same for hw.machine on amd64.

Add a safety belt debug.adaptive_machine_arch sysctl, to turn the
substitution off.

Reviewed by:	jhb, nwhitehorn
MFC after:	2 weeks
2010-07-22 09:13:49 +00:00
Brooks Davis
93833c1db6 Declare the kern.ngroups sysctl to be read-only, but tunable at boot for
better error reporting.

Submitted by:	Matthew Fleming <matthew dot fleming at isilon dot com>
MFC After:	1 month
2010-01-12 18:20:20 +00:00
Brooks Davis
412f9500e2 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1.  kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1.  Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after:	1 month
2010-01-12 07:49:34 +00:00
Brooks Davis
5feedc2575 Correct the explination text for the kern.ngroups. It reflects the
number of supplemental groups, not the total number of groups.

MFC after:	3 days
2010-01-09 23:22:31 +00:00
Alan Cox
fe105d45a2 Add a new sysctl for reporting all of the supported page sizes.
Reviewed by:	jhb
MFC after:	3 weeks
2009-09-18 17:04:57 +00:00
Jamie Gritton
c1f192193d Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by:	bz (mentor)
2009-06-13 15:39:12 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Jamie Gritton
0304c73163 Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails.  Child jails may be restricted more than their parents,
but never less.  Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system.  Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings.  The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by:	bz (mentor)
2009-05-27 14:11:23 +00:00
Marko Zec
29b02909eb Introduce a new virtualization container, provisionally named vprocg, to hold
virtualized instances of hostname and domainname, as well as a new top-level
virtualization struct vimage, which holds pointers to struct vnet and struct
vprocg.  Struct vprocg is likely to become replaced in the near future with
a new jail management API import.

As a consequence of this change, change struct ucred to point to a struct
vimage, instead of directly pointing to a vnet.

Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage
branch.

Permit kldload / kldunload operations to be executed only from the default
vimage context.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by:	bz
Approved by:	julian (mentor)
2009-05-08 14:11:06 +00:00
Marko Zec
f6dfe47a14 Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance.  Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables.  As an example, V_ifnet becomes:

    options VIMAGE:          ((struct vnet_net *) vnet_net)->_ifnet
    default build:           vnet_net_0._ifnet
    options VIMAGE_GLOBALS:  ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

    INIT_VNET_NET(ifp->if_vnet); becomes

    struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals.  If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet.  options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod.  SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by:	bz, rwatson
Approved by:	julian (mentor)
2009-04-30 13:36:26 +00:00
Ed Schouten
f3b86a5fd7 Mark most often used sysctl's as MPSAFE.
After running a `make buildkernel', I noticed most of the Giant locks in
sysctl are only caused by a very small amount of sysctl's:

- sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like
  sysctl.oidfmt.

- kern.ident, kern.osrelease, kern.version, etc. These are just constant
  strings.

- kern.arandom, used by the stack protector. It is already protected by
  arc4_mtx.

I also saw the following sysctl's show up. Not as often as the ones
above, but still quite often:

- security.jail.jailed. Also mark security.jail.list as MPSAFE. They
  don't need locking or already use allprison_lock.

- kern.devname, used by devname(3), ttyname(3), etc.

This seems to reduce Giant locking inside sysctl by ~75% in my primitive
test setup.
2009-01-28 19:58:05 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Tom Rhodes
1e018d99f2 Fix a typo in r180291
"NAme of the current YP/NIS domain" -> "Name of the current YP/NIS domain"
2008-08-28 23:52:34 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
Antoine Brodin
370f990d30 Make sysctl_kern_arnd return a random buffer instead of a random long,
as it is expected by userland (stack protector guard setup for example).

PR:		119129
Approved by:	rwatson (mentor)
MFC after:	1 month
2008-02-17 16:44:48 +00:00
John Baldwin
2c17901060 Add 'compat_freebsd[4567]' features corresponding to the kernel options
COMPAT_FREEBSD[4567].

MFC after:	1 week
Requested by:	kris
2008-01-17 22:46:32 +00:00
John Baldwin
0deabe7e53 Actually declare the kern.features sysctl node.
Pointy hat to:	jhb
2007-12-31 22:03:57 +00:00
Konstantin Belousov
f231de478e Implement fetching of the __FreeBSD_version from the ELF ABI-tag note.
The value is read into the p_osrel member of the struct proc. p_osrel
is set to 0 for the binaries without the note.

MFC after:	3 days
2007-12-04 12:28:07 +00:00
Attilio Rao
2feb50bf7d Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00
Warner Losh
cfa7a8beea Simplify the kernel configuration file return code.
Reviewed by: wkoszek
2007-05-28 20:41:10 +00:00
Alexander Kabaev
ee9f46615e Add kern.arnd sysctl. SSP code uses it to initialize the stack guard
magic value.

Submitted by: Jeremie Le Hen <jeremie@le-hen.org>
2007-05-19 04:53:14 +00:00
Jeff Roberson
222d01951f - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating
vmcnts.  This can be used to abstract away pcpu details but also changes
   to use atomics for all counters now.  This means sched lock is no longer
   responsible for protecting counts in the switch routines.

Contributed by:		Attilio Rao <attilio@FreeBSD.org>
2007-05-18 07:10:50 +00:00
Warner Losh
3627f73782 Don't export a kern.conftxt sysctl, except when INCLUDE_CONF_FILE is
defined.  This restores the old behavior, and eliminates the
dependency on the kernconf.tmpl when INCLUDE_CONFIG_FILE isn't
included in the kernel config.  There were many people in the terminal
room that had almost, but not quite, up-to-date config files that this
helps.  I don't know if this is the result of skew among the cvsup
servers, or some other more subtle problem.  However, this fix should
work for any config of recent vintage (I tested with the latest, and
one before the recent changes, and eye-balled the intermediate
versions).

Reviewed by: the terminal room crew
2007-05-17 05:05:12 +00:00
Wojciech A. Koszek
5f9974ae57 Handle !INCLUDE_CONFIG_FILE entirely in the kernel. This should make some
developers happy, since it will let them to use old config(8) with newer
kernels.

Reviewed by:	imp
Approved by:	imp
2007-05-16 16:08:04 +00:00
Wojciech A. Koszek
744b947ef8 Improve INCLUDE_CONFIG_FILE support.
This change will let us to have full configuration of a running kernel
available in sysctl:

	sysctl -b kern.conftxt

The same configuration is also contained within the kernel image. It can be
obtained with:

	config -x <kernelfile>

Current functionality lets you to quickly recover kernel configuration, by
simply redirecting output from commands presented above and starting kernel
build procedure. "include" statements are also honored, which means options
and devices from included files are also included.

Please note that comments from configuration files are not preserved by
default. In order to preserve them, you can use -C flag for config(8). This
will bring configuration file and included files literally; however,
redirection to a file no longer works directly.

This commit was followed by discussion, that took place on freebsd-current@.
For more details, look here:

	http://lists.freebsd.org/pipermail/freebsd-current/2007-March/069994.html
	http://lists.freebsd.org/pipermail/freebsd-current/2007-May/071844.html

Development of this patch took place in Perforce, hierarchy:

	//depot/user/wkoszek/wkoszek_kconftxt/

Support from:	freebsd-current@ (links above)
Reviewed by:	imp@
Approved by:	imp@
2007-05-12 19:38:18 +00:00
Pawel Jakub Dawidek
82068fe7a9 Add kern.hostuuid sysctl, which will be used to keep host's UUID.
Reviewed by:	mlaier, rink, brooks, rwatson
2007-04-09 19:18:09 +00:00
Pawel Jakub Dawidek
4e4aa37e75 mp_ncpus is always (properly) initialized, even on UP kernels, so just use it. 2005-08-21 18:03:31 +00:00
Poul-Henning Kamp
31cc57cdbd Collect the devfs related sysctls in one place 2005-08-16 19:25:02 +00:00
Wes Peters
a09150446d Add a sysctl that records the amount of physical memory in the machine.
Submitted by:	Nicko Dehaine <nicko@stbernard.com>
MFC after:	1 day
2005-02-28 21:42:56 +00:00
Robert Watson
78bb1895ab Fix spelling of integer in a comment.
Beady eyes:	ceri
2005-01-30 00:31:19 +00:00
Robert Watson
4261ed50fd When retrieving the current per-jails securelevel for a sysctl read,
don't acquire the prison mutex, as it's an integer read and races
here don't make a difference.

MFC after:	1 week
2005-01-23 20:59:19 +00:00
Warner Losh
7f8a436ff2 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-05 21:03:37 +00:00
Mike Silbersack
184dcdc7c8 Change all SYSCTLS which are readonly and have a related TUNABLE
from CTLFLAG_RD to CTLFLAG_RDTUN so that sysctl(8) can provide
more useful error messages.
2003-10-21 18:28:36 +00:00
Eivind Eklund
effb9ebd01 Change description of kern.osreldate from "Operating system release date" to
"Kernel release date" - userland version is in /usr/include/osreldate.h
2003-08-21 14:47:08 +00:00
David E. O'Brien
677b542ea2 Use __FBSDID(). 2003-06-11 00:56:59 +00:00
Juli Mallett
c02d762181 Attempt to fix Alpha build by renaming ident[] to kern_ident[]. 2003-06-09 18:19:33 +00:00
Juli Mallett
da1186f2c7 Expose kern.ident by way of OID_AUTO.
Requested by:	phk
2003-06-09 10:54:23 +00:00
Mark Murray
51da11a27a Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.
2003-04-30 12:57:40 +00:00
Jake Burkholder
e548a1d4c8 - Provide backwards compatibility for kern.fallback_elf_brand.
- Use the generic elf type macros in imgact_elf.h instead of ifdefing the
  entire contents of the header.
2003-01-05 03:48:14 +00:00
Jake Burkholder
a360a43dd5 Improve the way that an elf image activator for an alternate word size is
included in the kernel.  Include imgact_elf.c in conf/files,  instead of
both imgact_elf32.c and imgact_elf64.c, which will use the default word
size for an architecture as defined in machine/elf.h.  Architectures that
wish to build an additional image activator for an alternate word size can
include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which
allows it to be dependent on MD options instead of solely on architecture.

Glanced at by:	peter
2003-01-04 22:07:48 +00:00
Thomas Moestl
0fca57b8b8 Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.
2002-11-07 23:57:17 +00:00
Mike Barcroft
eeea998c3c Update a sysctl to use _POSIX_VERSION from <sys/unistd.h>, instead of
the kernel option _KPOSIX_VERSION.
2002-10-13 14:26:29 +00:00
Mike Barcroft
9e020cdab9 Include <sys/_posix.h> directly instead of depending on <sys/proc.h>
to include <sys/signal.h> to include <sys/_posix.h>.
2002-10-13 11:54:16 +00:00
Poul-Henning Kamp
ca916247cd Rename struct specinfo to the more appropriate struct cdev.
Agreed on:	jake, rwatson, jhb
2002-09-27 18:27:10 +00:00
Andrew R. Reiter
72a492cacf - Add a mutex to lock the global securelevel value.
- Make use of MTX_SYSINIT() as the means to initialize our mutex lock.
2002-04-02 17:43:17 +00:00
John Baldwin
a854ed9893 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
Andrew R. Reiter
d0615c64a5 - Attempt to help declutter kern. sysctl by moving security out from
beneath it.

Reviewed by: rwatson
2002-01-16 06:55:30 +00:00
Luigi Rizzo
af1408e33f Add/correct description for some sysctl variables where it was missing.
The description field is unused in -stable, so the MFC there is equivalent
to a comment. It can be done at any time, i am just setting a reminder
in 45 days when hopefully we are past 4.5-release.

MFC after: 45 days
2001-12-16 16:07:20 +00:00
Robert Watson
9147519a91 o Remove unnecessary inclusion of opt_global.h.
Submitted by:	bde
2001-12-06 21:55:41 +00:00
Robert Watson
011376308f o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
  pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
  so as to enforce these protections, in particular, in kern_mib.c
  protection sysctl access to the hostname and securelevel, as well as
  kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
  mib entries (osname, osrelease, osversion) so that they don't return
  a pointer to the text in the struct linux_prison, rather, a copy
  to an array passed into the calls.  Likewise, update linprocfs to
  use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
  accessing struct prison.

Reviewed by:	jhb
2001-12-03 16:12:27 +00:00
Robert Watson
1e4b531bb6 o Cache req->td->td_proc->p_ucred->cr_prison in pr to improve
readability.
o Conditionalize only the SYSCTL definitions for the regression
  tree, not the variables itself, decreasing the number of #ifdef
  REGRESSIONs scattered in kern_mib.c, and making the code more
  readable.

Sponsored by:	DARPA, NAI Labs
2001-11-28 21:22:05 +00:00
Robert Watson
eacb362f8a o General style improvemnts.
Submitted by:	bde
2001-11-08 15:31:19 +00:00
Robert Watson
44a280a67e o Trim trailing whitespace from kern_mib.c, as suggested by bde. Good
grief.
2001-11-08 15:20:00 +00:00
Robert Watson
ce17880650 o Replace reference to 'struct proc' with 'struct thread' in 'struct
sysctl_req', which describes in-progress sysctl requests.  This permits
  sysctl handlers to have access to the current thread, permitting work
  on implementing td->td_ucred, migration of suser() to using struct
  thread to derive the appropriate ucred, and allowing struct thread to be
  passed down to other code, such as network code where td is not currently
  available (and curproc is used).

o Note: netncp and netsmb are not updated to reflect this change, as they
  are not currently KSE-adapted.

Reviewed by:		julian
Obtained from:	TrustedBSD Project
2001-11-08 02:13:18 +00:00
Robert Watson
d3c9fa0463 o Cache the process's struct prison so as to create a more visually
appealing code structure.  In particular, s/req->p->p_ucred->cr_prison/pr/

Requested by:	imp, jhb, jake, other hangers on
2001-11-06 20:09:33 +00:00
Robert Watson
5c0c46c684 o Remove a tab missed in the previous whitespace commit. 2001-11-06 19:58:43 +00:00
Robert Watson
9afc1eee4f o Remove double-indentation of sysctl_kern_securelvl. This change is
consistent with the one other function in the file, and prevents long
  lines in up-coming changes.  This nominally pulls kern_mib.c a little
  further down the long path to style(9) compliance.
2001-11-06 19:56:58 +00:00
Robert Watson
c175d2226f o Introduce an 'options REGRESSION'-dependant sysctl namespaces,
'regression.*'.
o Add 'regression.securelevel_nonmonotonic', conditional on 'options
  REGRESSION', which allows the securelevel to be lowered for the purposes
  of efficient regression testing of securelevel policy decisions.
  Regression tests for securelevels will be committed shortly.

NOTE: 'options REGRESSION' should never be used on production machines, as
it permits violation of system invariants so as to improve the ability to
effectively test edge cases, and improve testing efficiency.
2001-10-07 03:51:22 +00:00
Robert Watson
8a528812a0 o Modify kern.securelevel MIB entry to return a local securelevel, if
one is present in the current jail, otherwise, to return the global
  securelevel.
o If the securelevel is being updated, require that it be greater than
  the maximum of local and global, if a local securelevel exists,
  otherwise, just maximum of the global.  If there is a local
  securelevel, update the local one instead of the global one.
o Note: this does allow local securelevels to lag behind the global one
  as long as the local one is not updated following a global increase.

Obtained from:	TrustedBSD Project
2001-09-26 20:39:48 +00:00
Peter Wemm
24a590a074 Fix cut/paste blunder. Serves me right for doing a last minute tweak
to what I had for some time.

Submitted by:	bde
2001-07-27 15:52:49 +00:00
Peter Wemm
ee342e1bf1 Move param.c out of the conf directory and make it fully dynamic.
Tunables are now derived at boot time from maxusers.  ie: change maxusers
via a tunable and all the derivative settings change.  You can change
the other tunables individually as well.  Even hz etc is tunable.
2001-07-26 23:04:03 +00:00
Jim Pirzyk
f83ae79fbe changed hostid from long to unsigned long to be able to store values > 2GB
on i386 platforms.  Also changed SYSCTL type from INT to ULONG and removed
comment about it.

PR:		kern/21132
MFC after:	1 month
2001-06-22 16:03:14 +00:00
John Baldwin
6caa8a1501 Overhaul of the SMP code. Several portions of the SMP kernel support have
been made machine independent and various other adjustments have been made
to support Alpha SMP.

- It splits the per-process portions of hardclock() and statclock() off
  into hardclock_process() and statclock_process() respectively.  hardclock()
  and statclock() call the *_process() functions for the current process so
  that UP systems will run as before.  For SMP systems, it is simply necessary
  to ensure that all other processors execute the *_process() functions when the
  main clock functions are triggered on one CPU by an interrupt.  For the alpha
  4100, clock interrupts are delievered in a staggered broadcast fashion, so
  we simply call hardclock/statclock on the boot CPU and call the *_process()
  functions on the secondaries.  For x86, we call statclock and hardclock as
  usual and then call forward_hardclock/statclock in the MD code to send an IPI
  to cause the AP's to execute forwared_hardclock/statclock which then call the
  *_process() functions.
- forward_signal() and forward_roundrobin() have been reworked to be MI and to
  involve less hackery.  Now the cpu doing the forward sets any flags, etc. and
  sends a very simple IPI_AST to the other cpu(s).  AST IPIs now just basically
  return so that they can execute ast() and don't bother with setting the
  astpending or needresched flags themselves.  This also removes the loop in
  forward_signal() as sched_lock closes the race condition that the loop worked
  around.
- need_resched(), resched_wanted() and clear_resched() have been changed to take
  a process to act on rather than assuming curproc so that they can be used to
  implement forward_roundrobin() as described above.
- Various other SMP variables have been moved to a MI subr_smp.c and a new
  header sys/smp.h declares MI SMP variables and API's.   The IPI API's from
  machine/ipl.h have moved to machine/smp.h which is included by sys/smp.h.
- The globaldata_register() and globaldata_find() functions as well as the
  SLIST of globaldata structures has become MI and moved into subr_smp.c.
  Also, the globaldata list is only available if SMP support is compiled in.

Reviewed by:	jake, peter
Looked over by:	eivind
2001-04-27 19:28:25 +00:00
Robert Watson
91421ba234 o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
  pr_free(), invoked by the similarly named credential reference
  management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
  of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
  rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
  flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
  mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
  credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
  required to protect the reference count plus some fields in the
  structure.

Reviewed by:	freebsd-arch
Obtained from:	TrustedBSD Project
2001-02-21 06:39:57 +00:00
Jake Burkholder
d5a08a6065 Implement a unified run queue and adjust priority levels accordingly.
- All processes go into the same array of queues, with different
  scheduling classes using different portions of the array.  This
  allows user processes to have their priorities propogated up into
  interrupt thread range if need be.
- I chose 64 run queues as an arbitrary number that is greater than
  32.  We used to have 4 separate arrays of 32 queues each, so this
  may not be optimal.  The new run queue code was written with this
  in mind; changing the number of run queues only requires changing
  constants in runq.h and adjusting the priority levels.
- The new run queue code takes the run queue as a parameter.  This
  is intended to be used to create per-cpu run queues.  Implement
  wrappers for compatibility with the old interface which pass in
  the global run queue structure.
- Group the priority level, user priority, native priority (before
  propogation) and the scheduling class into a struct priority.
- Change any hard coded priority levels that I found to use
  symbolic constants (TTIPRI and TTOPRI).
- Remove the curpriority global variable and use that of curproc.
  This was used to detect when a process' priority had lowered and
  it should yield.  We now effectively yield on every interrupt.
- Activate propogate_priority().  It should now have the desired
  effect without needing to also propogate the scheduling class.
- Temporarily comment out the call to vm_page_zero_idle() in the
  idle loop.  It interfered with propogate_priority() because
  the idle process needed to do a non-blocking acquire of Giant
  and then other processes would try to propogate their priority
  onto it.  The idle process should not do anything except idle.
  vm_page_zero_idle() will return in the form of an idle priority
  kernel thread which is woken up at apprioriate times by the vm
  system.
- Update struct kinfo_proc to the new priority interface.  Deliberately
  change its size by adjusting the spare fields.  It remained the same
  size, but the layout has changed, so userland processes that use it
  would parse the data incorrectly.  The size constraint should really
  be changed to an arbitrary version number.  Also add a debug.sizeof
  sysctl node for struct kinfo_proc.
2001-02-12 00:20:08 +00:00
Poul-Henning Kamp
77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp
82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Robert Watson
e812e4917d Dammit.
Trimmed an extra sysctl when I moved kern.suser_permitted from kern_mib.c
to kern_prot.c.  This commit should restore it, as well as fix the
resulting build problems.

Submitted by:	asmodai
2000-06-07 18:54:41 +00:00
Robert Watson
579f4eb4cd o bde suggested moving the SYSCTL from kern_mib to the more appropriate
kern_prot, which cleans up some namespace issues
o Don't need a special handler to limit un-setting, as suser is used to
  protect suser_permitted, making it one-way by definition.

Suggested by:	bde
2000-06-05 18:30:55 +00:00
Robert Watson
0309554711 o Introduce kern.suser_permitted, a sysctl that disables the suser_xxx()
returning anything but EPERM.
o suser is enabled by default; once disabled, cannot be reenabled
o To be used in alternative security models where uid0 does not connote
  additional privileges
o Should be noted that uid0 still has some additional powers as it
  owns many important files and executables, so suffers from the same
  fundamental security flaws as securelevels.  This is fixed with
  MAC integrity protection code (in progress)
o Not safe for consumption unless you are *really* sure you don't want
  things like shutdown to work, et al :-)

Obtained from:	TrustedBSD Project
2000-06-05 14:53:55 +00:00
Poul-Henning Kamp
9626b608de Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
Poul-Henning Kamp
8c125869a9 Draw the outline of "struct bio".
Struct bio is the future carrier of I/O requests for "struct buf".
2000-04-02 09:26:51 +00:00
Matthew Dillon
db6a426158 The SMP cleanup commit broke UP compiles. Make UP compiles work again. 2000-03-28 18:06:49 +00:00
Robert Watson
83f1e257e0 Yet-another-update: rename ``kern.prison'' to a new sysctl root entry,
``jail'', and move the set_hostname_allowed sysctl there, as well as
fixing a bug in the sysctl that resulted in jails being over-limited
(preventing them from reading as well as writing the hostname).  Also,
correct some formatting issues, courtesy bde :-).

Reviewed by:	phk
Approved by:	jkh
2000-02-12 13:41:56 +00:00
Robert Watson
5bdee2c5d5 Fix sysctl namespace for jail: move the kern.jailcansethostname to
kern.prison.set_hostname_allowed, off of the kern.prison node.  Future
jail twiddles should be placed in this namespace.
2000-02-10 18:51:58 +00:00
Robert Watson
6c144e7521 Introduce a new sysctl, kern.jailcansethostname, which determines whether
or not a process in a jail, with privilege, may set the jail's hostname.
Defaults to 1, which permits this.  May be set to 0 by a process with
appropriate privilege outside of jail.  Preventing hostname renaming
from within a jail is currently required to make jails manageable, as they
a currently identifiable only by hostname using /proc, which may be
modified without this sysctl being set to 0.  This will be documented
in upcoming man commits.

Authorized by:	jkh, the ever-patient
2000-02-10 05:32:03 +00:00
Peter Wemm
d1f088dab5 Trim unused options (or #ifdef for undoc options).
Submitted by:	phk
1999-10-11 15:19:12 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Marcel Moolenaar
c6dfea0ebd Add sysctl variables for the Linuxulator. These reside under `compat.linux' as
discussed on current.

The following variables are defined (for now):

    osname (defaults to "Linux")
        Allow users to change the name of the OS as returned by uname(2),
        specially added for all those Linux Netscape users and statistics
        maniacs :-) We now have what we all wanted!

    osrelease (defaults to "2.2.5")
        Allow users to change the version of the OS as returned by uname(2).
        Since -current supports glibc2.1 now, change the default to 2.2.5
        (was 2.0.36).

    oss_version (defaults to 198144 [0x030600])
        This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
        can commit now that we have the MIB. The default version number is the
        lowest version possible with the current 'encoding'.

A note about imprisoned processes (see jail(2)):
  These variables are copy-on-write (as suggested by phk). This means that
  imprisoned processes will use the system wide value unless it is written/set
  by the process. From that moment on, a copy local to the prison will be
  used.

A note about the implementation:
  I choose to add a single pointer to struct prison, because I didn't like the
  idea of changing struct prison every time I come up with a new variable. As
  a side effect, the extra storage is only needed when a variable is set from
  within the prison. This also minimizes kernel bloat when the Linuxulator is
  not used; both compiled in or as a module.

Reviewed by: bde (first version only) and phk
1999-08-27 19:47:41 +00:00
Poul-Henning Kamp
0ef1c82630 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
Poul-Henning Kamp
d7bf417de7 add debug.sizeof.specinfo 1999-07-20 07:19:32 +00:00
Poul-Henning Kamp
6f13bfc261 Add sysctl tree debug.sizeof to tell us how big things are. First two
entries are struct proc and struct vnode.
1999-07-19 09:13:12 +00:00
Bill Fumerola
3d177f465a Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
Poul-Henning Kamp
75c1354190 This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing.  The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact:  "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

   I have no scripts for setting up a jail, don't ask me for them.

   The IP number should be an alias on one of the interfaces.

   mount a /proc in each jail, it will make ps more useable.

   /proc/<pid>/status tells the hostname of the prison for
   jailed processes.

   Quotas are only sensible if you have a mountpoint per prison.

   There are no privisions for stopping resource-hogging.

   Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by:   http://www.rndassociates.com/
Run for almost a year by:       http://www.servetheweb.com/
1999-04-28 11:38:52 +00:00
Matthew Dillon
56319e3a58 Ok, people didn't like kern.conf_dir. Poof, backed out. 1999-01-26 07:37:11 +00:00
Matthew Dillon
b1cba377b0 Add kern.conf_dir sysctl. This is a R+W string used to specify the
directory containing rc.conf.local and rc.local, and possibly other
    things in the future.

    This sysctl is used by the diskless startup code and new rc.conf.  If
    it cannot be found or is empty, the system should revert to using /etc.
1999-01-25 18:26:09 +00:00
KATO Takenori
582e52862a - hw.machine_arch returns cpu architecture type.
- moved definition of MACHINE_ARCH from cpu.h to parm.h as alpha.
- Added definitions of _MACHINE and _MACHINE_ARCH.
- Added hw.ispc98. The hw.ispc98 is 1 in PC98 kernel and is 0 in
  IBM-PC kernel.

Discussed with:	John Birrell <jb@FreeBSD.ORG>
1998-08-31 08:41:58 +00:00
Peter Dufault
8a6472b723 Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work.  Changes:

Change all "posix4" to "p1003_1b".  Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.
1998-03-28 11:51:01 +00:00
Peter Dufault
644d85f4ca Reviewed by: msmith, bde long ago
Fix for RTPRIO scheduler to eliminate invalid context switches.

POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:25:55 +00:00
Gary Palmer
b3b84d9b17 Make kern.ncpu reports the number of detected processors when running
with a SMP kernel.
1997-12-25 13:14:21 +00:00
David Greenman
916ca17535 kern.maxproc is not writable since there are tables that are statically
sized at startup.
PR: 4675
1997-10-19 18:45:59 +00:00
KATO Takenori
662f9a6987 Move MACHINE_ARCH definition from <machine/param.h> to <machine/cpu.h>.
Submitted by:	Bruce Evans <bde@zeta.org.au>
1997-08-30 02:52:04 +00:00
KATO Takenori
664f85174a Added a sysctl arg, hw.machine_arch. The hw.machine_arch is "ibm-pc"
on IBM-PC box and is "pc-98" on NEC PC-98 box.  Userland program can
distinguish architecture on which the program runs.
1997-08-29 09:03:40 +00:00
Joerg Wunsch
e16ed08126 Don't ever allow lowering the securelevel at all. Allowing it does
nothing good except of opening a can of (potential or real) security
holes.  People maintaining a machine with higher security requirements
need to be on the console anyway, so there's no point in not forcing
them to reboot before starting maintenance.

Agreed by:	hackers, guido
1997-06-25 07:31:47 +00:00
Bruce Evans
4a8b966013 Attach vfs_sysctl() one level lower so that only the levels below
VFS_GENERIC aren't done in the FreeBSD way.  The previous commit
broke the nfs sysctls.
1997-03-04 18:31:56 +00:00
Bruce Evans
3a76a5949b Merged Lite2's vfs_sysctl(). It doesn't fit very well into FreeBSD's
(phk's) sysctl framework, and I needed special code to disambiguate
the VFS_GENERIC node from the VFS_VFSCONF leaf, so I only converted
the leaves to the FreeBSD framework.  The error handling isn't quite
right.  CSRGS's sysctls seem to return ENOTDIR too much and FreeBSD's
sysctls don't agree with the man page.
1997-03-03 12:58:20 +00:00