4344 Commits

Author SHA1 Message Date
luigi
e39284a688 Add code to export and print the description associated to sysctl
variables. Use the -d flag in sysctl(8) to see this information.

Possible extensions to sysctl:
 + report variables that do not have a description
 + given a name, report the oid it maps to.

Note to developers: have a look at your code, there are a number of
	variables which do not have a description.

Note to developers: do we want this in 4.5 ? It is a very small change
	and very useful for documentation purposes.

Suggested by: Orion Hodson
2001-12-16 02:55:41 +00:00
jhb
0c1bfda974 Fix some nits in fork_exit() so it more properly duplicates the backend
of mi_switch:
- Set the oncpu value for the current thread.
- Always set switchticks, not just in the SMP case.
- Add a KTR entry for fork_exit that is the same as the "new proc"
  entry in mi_switch().
- Release sched_lock a bit later like we do with mi_switch().
2001-12-14 23:37:35 +00:00
jlemon
f7af5c6f92 When removing kqueue descriptors from the descriptor table during a fork,
update fd_freefile and fd_lastfile as well, to keep things in sync.

Pointed out by: Debbie Chu <dchu@juniper.net>
2001-12-14 19:02:57 +00:00
luigi
f8ad22919e Device Polling code for -current.
Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

        options DEVICE_POLLING

and at runtime enable polling with

        sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

        sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

	http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
        new file kern/kern_poll.c
sys/conf/options.i386
        new option
sys/i386/i386/trap.c
        poll in trap (disabled by default)
sys/kern/kern_clock.c
        initialization and hardclock hooks.
sys/kern/kern_intr.c
        minor swi_net changes
sys/kern/kern_poll.c
        the bulk of the code.
sys/net/if.h
        new flag
sys/net/if_var.h
        declaration for functions used in device drivers.
sys/net/netisr.h
        NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
        device driver modifications
2001-12-14 17:56:12 +00:00
peter
dd0f3c5ca2 Proper fix for old config setting maxusers to 8. 2001-12-14 09:39:29 +00:00
dillon
8e6d2fbcbd A slightly different version of the vlrureclaim fix.
Reported by: peter, ps
2001-12-14 07:18:31 +00:00
mckusick
d3b383005d Add disk I/O scheduling for positively niced processes.
When a positively niced process requests a disk I/O, make
it wait for its nice value of ticks before scheduling its
I/O request if there are any other processes with I/O
requests in the disk queue. For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
2001-12-14 05:50:44 +00:00
dillon
62f062ea62 Too many people are compiling kernels with maxusers set to 0 without the new
config.  Hack the kernel to force auto-sizing if the old config is used.
2001-12-14 04:01:08 +00:00
dillon
cd4d323ad3 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
rwatson
9e2b770a8f o Wording fix in comment.
Submitted by:	tanimura via p4
2001-12-14 00:38:01 +00:00
peter
a194c44001 If we were called to allocate a vnode that is not associated with a
mount point, do not dereference the NULL mp argument.
2001-12-13 23:46:01 +00:00
rwatson
36784fd2c4 o Back out portions of 1.50 and 1.47, eliminating sonewconn3() and
always deriving the credential for a newly accepted connection from
  the listen socket.  Previously, the selection of the credential
  depended on the protocol: UNIX domain sockets would use the
  connecting process's credential, and protocols supporting a creation
  of the socket before the receiving end called accept() would use
  the listening socket.  After this change, it is always the listening
  credential.

Reviewed by:	green
2001-12-13 22:09:37 +00:00
silby
dc4fed395a Limit maxprocperuid to 9/10 maxproc, and limit maxfilesperproc to 9/10
maxfiles.  This should make local resource exhaustion attacks easier
to handle with a non-tweaked setup.

MFC after:	3 days
2001-12-13 20:00:45 +00:00
jhb
66ac46bd15 Use a per-thread variable for keeping state when a thread is processing
a KTR log entry.  Any KTR requests made while working on an entry are
ignored/discarded to prevent recursion.  This is a better fix for the
hack to futz with the CPU mask and call getnanotime() if KTR_LOCK or
KTR_WITNESS was on.  It also covers the actual formatting of the log entry
including dumping it to the display which the earlier hacks did not.
2001-12-13 10:33:20 +00:00
arr
e55fee2143 - Move _jail sysctl node underneath _kern_security in order to standardize
where our security related sysctl tuneables are located.  Also, this
  will help if/when we move _security node out from under _kern as to help
  make _kern less cluttered.

Approved by:	rwatson
Review by:	rwatson
2001-12-12 05:23:20 +00:00
jhb
21b6b26912 Overhaul the per-CPU support a bit:
- The MI portions of struct globaldata have been consolidated into a MI
  struct pcpu.  The MD per-CPU data are specified via a macro defined in
  machine/pcpu.h.  A macro was chosen over a struct mdpcpu so that the
  interface would be cleaner (PCPU_GET(my_md_field) vs.
  PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead.  In a UP kernel,
  this data was stored as global variables which is where the original name
  came from.  In an SMP world this data is per-CPU and ideally private to each
  CPU outside of the context of debuggers.  This also included combining
  machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
  npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
  fields.
- The globaldata_register() function was renamed to pcpu_init() and now
  init's MI fields of a struct pcpu in addition to registering it with
  the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
  internal array and list.

Tested on:	alpha, i386
Reviewed by:	peter, jake
2001-12-11 23:33:44 +00:00
guido
2e77fc4d02 Fix boot -p for DDBless kernels
Pointed out by: John Hay <jhay@icomtek.csir.co.za>
2001-12-11 10:21:26 +00:00
peter
46c0ef263e Wrap Dangerously Dedicated printf under if (bootverbose) 2001-12-11 05:35:43 +00:00
obrien
41ac252611 Missed an assignment of arg6 in previous commit. 2001-12-10 20:58:39 +00:00
obrien
806dd95941 Adjust for the addition of CTR6. 2001-12-10 20:18:17 +00:00
guido
d779575f78 Add new boot flag to i386 boot: -p.
This flag adds a pausing utility. When ran with -p, during the kernel
probing phase, the kernel will pause after each line of output.
This pausing can be ended with the '.' key, and is automatically
suspended when entering ddb.

This flag comes in handy at systems without a serial port that either hang
during booting or reser.
Reviewed by:	(partly by jlemon)
MFC after:	1 week
2001-12-10 20:02:22 +00:00
obrien
330a1032c1 Update to C99, s/__FUNCTION__/__func__/. 2001-12-10 05:51:45 +00:00
obrien
cca4f7b2d9 Repeat after me -- "Use of ANSI string concatenation can be bad."
In this case, C99's __func__ is properly defined as:

	static const char __func__[] = "function-name";

and GCC 3.1 will not allow it to be used in bogus string concatenation.
2001-12-10 05:40:12 +00:00
alc
a49c1c9183 o Eliminate compilation warnings on 64-bit architectures. 2001-12-10 03:34:06 +00:00
alc
be4cbfd029 o Eliminate unnecessary synchronization from filt_aiodetach().
o The manual page for kevent says that EVFILT_AIO returns under the same
   conditions as aio_error().  With that in mind, set the data field
   of the returned struct kevent to the value that would be returned
   by aio_error().
 o Fix two compilation warnings.
2001-12-09 08:16:36 +00:00
dillon
6fe4980d43 Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after:	1 week
2001-12-09 01:57:09 +00:00
dillon
6e9238ff3f The nbuf calculation was assuming that PAGE_SIZE = 4096 bytes, which is
bogus.  The calculation has been adjusted to use units of kilobytes.

Noticed by: Chad David <davidc@acns.ab.ca>
MFC after:	1 week
2001-12-08 20:37:08 +00:00
davidc
1d1054c88d Update the comment about System initialization to reflect the use of
DOMAIN_SET(9) instead of SYSINIT for adding domains at startup.

Reviewed by: alfred
2001-12-08 04:20:54 +00:00
rwatson
7769631069 o A few more minor whitespace and other style fixes.
Submitted by:	bde
2001-12-06 21:58:47 +00:00
rwatson
751c41df3a o Remove unnecessary inclusion of opt_global.h.
Submitted by:	bde
2001-12-06 21:55:41 +00:00
rwatson
754ad10054 o Make kern.security.bsd.suser_enabled TUNABLE.
Requested by:	green
2001-12-05 18:49:20 +00:00
mckusick
f62c954d2f Update pathnames for creation of tags file. 2001-12-05 01:23:21 +00:00
rwatson
fb311b7cce o Update an instance of 'unprivileged_procdebug_permitted' missed
in the previous commit: the comment should also call it
  'unprivileged_proc_debug'.
2001-12-03 19:10:21 +00:00
rwatson
b5de442911 o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
  pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
  so as to enforce these protections, in particular, in kern_mib.c
  protection sysctl access to the hostname and securelevel, as well as
  kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
  mib entries (osname, osrelease, osversion) so that they don't return
  a pointer to the text in the struct linux_prison, rather, a copy
  to an array passed into the calls.  Likewise, update linprocfs to
  use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
  accessing struct prison.

Reviewed by:	jhb
2001-12-03 16:12:27 +00:00
rwatson
de0f8b15da o Uniformly copy uap arguments into local variables before grabbing
giant, and make whitespace more consistent around giant-frobbing.
2001-12-02 15:22:56 +00:00
rwatson
dbe003dc3e o Remove KSE race in setuid() in which oldcred was preserved before giant
was grabbed.  This was introduced in 1.101 when the giant pushdown
  for kern_prot.c was originally performed.
2001-12-02 15:15:29 +00:00
rwatson
8b2ab77900 o General style, formatting, etc, improvements:
- uid's -> uids
	- whitespace improvements, linewrap improvements
	- reorder copyright more appropriately
	- remove redundant MP SAFE comments, add one "NOT MPSAFE?"
	  for setgroups(), which seems to be the sole un-changed system
	  call in the file.
	- clean up securelevel_g?() functions, improve comments.

Largely submitted by:	bde
2001-12-02 15:07:10 +00:00
alfred
77b8f8139c make LOCKF_DEBUG kernel option work (sorta)
Submitted by: Maxim Konovalov <maxim@macomnet.ru>
PR: kern/32267
2001-12-02 12:47:25 +00:00
luigi
0d72b82e2e vm/vm_kern.c: rate limit (to once per second) diagnostic printf when
you run out of mbuf address space.

kern/subr_mbuf.c: print a warning message when mb_alloc fails, again
	rate-limited to at most once per second. This covers other
	cases of mbuf allocation failures. Probably it also overlaps the
	one handled in vm/vm_kern.c, so maybe the latter should go away.

This warning will let us gradually remove the printf that are scattered
across most network drivers to report mbuf allocation failures.
Those are potentially dangerous, in that they are not rate-limited and
can easily cause systems to panic.

Unless there is disagreement (which does not seem to be the case
judging from the discussion on -net so far), and because this is
sort of a safety bugfix, I plan to commit a similar change to STABLE
during the weekend (it affects kern/uipc_mbuf.c there).

Discussed-with: jlemon, silby and -net
2001-12-01 00:21:30 +00:00
rwatson
aa8360c1cd o Introduce kern.security.bsd.unprivileged_read_msgbuf, which allows
the administrator to restrict access to the kernel message buffer.
  It defaults to '1', which permits access, but if set to '0', requires
  that the process making the sysctl() have appropriate privilege.
o Note that for this to be effective, access to this data via system
  logs derived from /dev/klog must also be limited.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-30 21:40:52 +00:00
rwatson
68b9d3708b o Further sysctl name simplification, generally stripping 'permitted',
using '_'s more consistently.

Discussed with:	bde, jhb
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-30 21:33:16 +00:00
rwatson
e92874bd10 o Move current inhabitants of kern.security to kern.security.bsd, so
that new models can inhabit kern.security.<modelname>.
o While I'm there, shorten somewhat excessive variable names, and clean
  things up a little.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-30 20:58:31 +00:00
rwatson
5682f21557 o Cache req->td->td_proc->p_ucred->cr_prison in pr to improve
readability.
o Conditionalize only the SYSCTL definitions for the regression
  tree, not the variables itself, decreasing the number of #ifdef
  REGRESSIONs scattered in kern_mib.c, and making the code more
  readable.

Sponsored by:	DARPA, NAI Labs
2001-11-28 21:22:05 +00:00
jwd
2a6f1a68f9 Return a more meaningful errno when the length of the interpreter
exceeds MAXSHELLCMDLEN to avoid secondary /bin/sh execution.

Update execve man page to reflect change.

Increase MAXSHELLCMDLEN to a slightly more meaningful value.

PR:		kern/32106
Submitted by:	b@etek.chalmers.se
Reviewed by:	bsd
MFC after:	2 weeks
2001-11-28 03:26:58 +00:00
peter
ca5b2bc739 Dont print the sysctl node tree unless you're root.
Found by:	jkb (Yahoo OS troublemaker)
2001-11-28 03:11:16 +00:00
bmilekic
0dbfbc0131 Context:
For an object type, we maintain a variable mb_mapfull. It is 0 by default
and is only raised to 1 in one place: when an mb_pop_cont() fails for
the first time, on the assumption that the reason for the failure is
due to the underlying map for the object (e.g. clust_map, mbuf_map) being
exhausted.

Problem and Changes:
Change how we define "mb_mapfull." It now means: "set to 1 when the first
mb_pop_cont() fails only in the kmem_malloc()-ing of the object, and
only if the call was with the M_TRYWAIT flag." This is a more conservative
definition and should avoid odd [but theoretically possible] situations
from occuring. i.e. we had set mb_mapfull to 1 thinking the map for the
object was actually exhausted when we _actually_ failed in malloc()ing
the space for the bucket structure managing the objects in the page
we're allocating.
2001-11-25 04:42:54 +00:00
dfr
194b963c4d Since we used '#ifdef __i386__', don't close with '#endif /* !__alpha__ */' 2001-11-24 10:11:14 +00:00
obrien
8c5542cd12 Remove the use of _PATH_DEV in the example.
The kernel certainly doesn't use _PATH_DEV or even /dev/ to find the device.
It cannot, since "/" has not been mounted.  Maybe the only affect of using
/dev/ is that it gets put in the mounted-from name for "/", so that mount(8),
etc., display an absolute path before "/" has been remounted.  Many have
never bothered typing the full path, and code that constructs a path in
rootdevnames[] never bothered to construct a full path, so the example
shouldn't have it.

Submitted by:	bde
2001-11-24 01:34:12 +00:00
peter
43edb17438 Recognize the "fixed" geometry in boot1 so that DD disks are not
interpreted as real fdisk tables (and fail).
2001-11-21 08:31:45 +00:00
obrien
33425adab6 We only have slices on i386 and IA-64. 2001-11-20 23:48:00 +00:00