Commit Graph

4579 Commits

Author SHA1 Message Date
Peter Wemm
d5c6775903 Fix forward_roundrobin(). It was mistakenly using the cpu number as
though it was a mask.  As a result, we sent AST IPI's to the wrong
cpu and/or left out some.

Spotted by: jake
2002-01-05 09:38:47 +00:00
Peter Wemm
ab8061d84c Add a per-cpu variable, cpumask, the preshifted equivalent of 1 << cpuid.
We use this around the place a lot.
2002-01-05 09:35:50 +00:00
John Baldwin
c86b6ff551 Change the preemption code for software interrupt thread schedules and
mutex releases to not require flags for the cases when preemption is
not allowed:

The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent
switching to a higher priority thread on mutex releease and swi schedule,
respectively when that switch is not safe.  Now that the critical section
API maintains a per-thread nesting count, the kernel can easily check
whether or not it should switch without relying on flags from the
programmer.  This fixes a few bugs in that all current callers of
swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from
fast interrupt handlers and the swi_sched of softclock needed this flag.
Note that to ensure that swi_sched()'s in clock and fast interrupt
handlers do not switch, these handlers have to be explicitly wrapped
in critical_enter/exit pairs.  Presently, just wrapping the handlers is
sufficient, but in the future with the fully preemptive kernel, the
interrupt must be EOI'd before critical_exit() is called.  (critical_exit()
can switch due to a deferred preemption in a fully preemptive kernel.)

I've tested the changes to the interrupt code on i386 and alpha.  I have
not tested ia64, but the interrupt code is almost identical to the alpha
code, so I expect it will work fine.  PowerPC and ARM do not yet have
interrupt code in the tree so they shouldn't be broken.  Sparc64 is
broken, but that's been ok'd by jake and tmm who will be fixing the
interrupt code for sparc64 shortly.

Reviewed by:	peter
Tested on:	i386, alpha
2002-01-05 08:47:13 +00:00
John Baldwin
422f61655f Remove brain damaged code in witness_lock(). We could have easily
just used PCPU_GET(spinlocks) w/o needing the w_mtx held.  It is more
correct to just check td_critnest now though.
2002-01-05 08:29:54 +00:00
John Baldwin
9d234f99f7 Axe a stale comment. Holding sched_lock across both setrunqueue() and
mi_switch() is sufficient.
2002-01-04 10:55:51 +00:00
Mike Silbersack
a262ae8267 Throw the $FreeBSD$s back in, properly escaping them. 2002-01-04 05:27:47 +00:00
Mike Silbersack
91ea78c52a Remove $FreeBSD$s from previous commit; perl thinks that they're
something to be interpreted.  Urk.
2002-01-04 01:40:50 +00:00
Mike Silbersack
cd6fdcb9ac Solve vnode_if.pl's identity crisis; make sure that it refers to itself
as vnode_if.pl instead of vnode_if.sh.

PR:		33509
MFC after:	3 weeks
2002-01-03 21:53:09 +00:00
Stefan Eßer
10cc6dff87 Return EBADF in case some vnode field has been reset to a NULL pointer.
(There has been some discussion, whether ENOENT or EBADF is more
appropriate. I choose the latter, since the operation is not supported
on the file descriptor at that time, even if it was, immediately before.)

PR:		32681
Reviewed by:	dillon, iedowse, ...
Approved by:	nectar
MFC after:	3 days
		(pending RE approval)
2002-01-03 09:54:24 +00:00
Alan Cox
23f139432e o Properly check the file descriptor passed to aio_cancel(2). (Previously,
no out-of-bounds check was performed on the file descriptor.)
 o Eliminate some excessive white space from aio_cancel(2).
2002-01-02 07:04:38 +00:00
Jake Burkholder
5e8af3b31d Print parm6 too in the !KTR_EXTEND case. 2002-01-01 21:47:38 +00:00
Alan Cox
eae43d0e56 o Some style(9)-motivated changes to white space. 2002-01-01 00:40:29 +00:00
Robert Watson
9c4d63da6d o Make the credential used by socreate() an explicit argument to
socreate(), rather than getting it implicitly from the thread
  argument.

o Make NFS cache the credential provided at mount-time, and use
  the cached credential (nfsmount->nm_cred) when making calls to
  socreate() on initially connecting, or reconnecting the socket.

This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well
as bugs involving NFS and mandatory access control implementations.

Reviewed by:	freebsd-arch
2001-12-31 17:45:16 +00:00
Alan Cox
5ca50a4bc9 o Correct an off-by-one error in aio_suspend(2).
PR:		18350
2001-12-31 03:13:24 +00:00
Alan Cox
516d256401 o Use "td->td_proc" instead of "curproc" where possible.
o Eliminate the unnecessary initialization of several static variables
   to zero.
2001-12-31 02:03:39 +00:00
Alan Cox
477b78a0df Eliminate semexit_hook using at_exit(9) and rm_at_exit(9).
Reviewed by:	alfred
2001-12-30 18:55:09 +00:00
Jake Burkholder
c9f4877d7c Change traces in hardclock and statclock to use the KTR_CLK trace
facility, rather than KTR_INTR.
2001-12-29 08:39:57 +00:00
Alfred Perlstein
21d56e9c33 Make AIO a loadable module.
Remove the explicit call to aio_proc_rundown() from exit1(), instead AIO
will use at_exit(9).

Add functions at_exec(9), rm_at_exec(9) which function nearly the
same as at_exec(9) and rm_at_exec(9), these functions are called
on behalf of modules at the time of execve(2) after the image
activator has run.

Use a modified version of tegge's suggestion via at_exec(9) to close
an exploitable race in AIO.

Fix SYSCALL_MODULE_HELPER such that it's archetecuterally neutral,
the problem was that one had to pass it a paramater indicating the
number of arguments which were actually the number of "int".  Fix
it by using an inline version of the AS macro against the syscall
arguments.  (AS should be available globally but we'll get to that
later.)

Add a primative system for dynamically adding kqueue ops, it's really
not as sophisticated as it should be, but I'll discuss with jlemon when
he's around.
2001-12-29 07:13:47 +00:00
Bruce Evans
25bf7324c8 Fixed an apparent typo ("-" before ":") and an English error (comma
splice) in the "already exists" message.

Fixed some minor style bugs (KNFization to "return (foo)" had rotted
in 2 out of 177 cases).
2001-12-28 18:32:13 +00:00
Alfred Perlstein
58e5d6695d brace by itself after function declaration.
Mandated by: style(9)
Pointed out by: rwatson
2001-12-27 20:16:21 +00:00
Matthew Dillon
9dd4281db8 Fix type-o in previous commit (tsleep was using wrong rendezvous point) 2001-12-25 01:23:25 +00:00
Bosko Milekic
56b602dd6a On the first day of Christmas bde gave to me:
A [hopefully] conforming style(9) revamp of mb_alloc and related code.
(This was possible due to bde's remarkable patience.)

Submitted by: (in large part) bde
Reviewed by: (the other part) bde
2001-12-23 22:04:08 +00:00
Bosko Milekic
4878b75e6c Move prototype of _mext_free to mbuf.h, where it belongs, because it is
used in MEXTFREE and needs to be in scope for external MEXTFREE users.

Pointed out by: Chad David <davidc@acns.ab.ca>
Confirmed by: bde
2001-12-22 20:09:08 +00:00
Thomas Moestl
87b1520ae4 Add a generic __BUS_ACCESSOR macro to construct ivar accessor functions,
and a generic resource_list_print_type() function to print all resouces
of a certain type in a resource list.
Use ulmin()/ulmax() instead of min()/max() in two places to handle
u_longs correctly.
2001-12-21 21:45:09 +00:00
Thomas Moestl
13fb665772 Add a rman_reserve_resource_bound() function that takes an additional
argument specifying the boundary for the resource allocation.
Use ulmin()/ulmax() instead of min()/max() in some places to correctly
deal with the u_long resource range specifications.
2001-12-21 21:40:55 +00:00
Peter Wemm
205b2b6107 Avoid an interaction between syncache and accept filters. The syncache
code only passed up the connection to the tcp stack when it was complete,
so it went directly into the so_comp (complete) queue.  However, with
accept filters, there is an additional phase before calling it "complete".

Reviewed by: jlemon
2001-12-21 04:30:49 +00:00
John Baldwin
98f9879242 Introduce a standard name for the lock protecting an interrupt controller
and it's associated state variables: icu_lock with the name "icu".  This
renames the imen_mtx for x86 SMP, but also uses the lock to protect
access to the 8259 PIC on x86 UP.  This also adds an appropriate lock to
the various Alpha chipsets which fixes problems with Alpha SMP machines
dropping interrupts with an SMP kernel.
2001-12-20 23:48:31 +00:00
Matthew Dillon
23b590188f Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()
against VM_WAIT in the pageout code.  Both fixes involve adjusting
the lockmgr's timeout capability so locks obtained with timeouts do not
interfere with locks obtained without a timeout.

Hopefully MFC: before the 4.5 release
2001-12-20 22:42:27 +00:00
Matthew Dillon
a57094a011 Calculate whether the sbuf is dynamic *before* bzero()ing the
structure.  This fixes a serious memory leak in the sbuf code.

MFC after:	3 days
2001-12-19 19:04:57 +00:00
Peter Wemm
9f2f52d695 Do not initialize static/global variables to 0. Use bss instead of
taking up space in the data section.
2001-12-19 01:35:18 +00:00
Peter Wemm
8f0d41d324 Use a different mechanism to get the vnlru process to wake up and notice
the shutdown request at reboot/halt time.
Disable the printf 'vnlru process getting nowhere, pausing...' and instead
export the count to the debug.vnlru_nowhere sysctl.
2001-12-19 01:31:12 +00:00
Luigi Rizzo
d105c784d5 Complete the device polling support by adding a thread in charge
of polling interfaces at the lowest possible priority
(this might result in softnetisr being scheduled, but there is
no risk of livelock because they have a higher priority than
this thread).
2001-12-19 00:53:24 +00:00
John Baldwin
885ccc61f2 Return EINVAL if kernel only flags are passed to the rfork syscall rather
than silently masking them.
2001-12-19 00:53:23 +00:00
Matthew Dillon
fdb33f08ef This is a forward port of Peter's vlrureclaim() fix, with some minor mods
by me to make it more efficient.  The original code had serious balancing
problems and could also deadlock easily.  This code relegates the vnode
reclamation to its own kproc and relaxes the vnode reclamation requirements
to better maintain kern.maxvnodes.  This code still doesn't balance as well
as it could, but it does a much better job then the original code.

Approved by:	re@freebsd.org
Obtained from:	ps, peter, dillon
MFS Assuming:	Assuming no problems crop up in Yahoo testing
MFC after:	7 days
2001-12-18 20:48:54 +00:00
John Baldwin
48fd1f38ee - Change all callers of addupc_task() to check PS_PROFIL explicitly and
remove the check from addupc_task().  It would need sched_lock while
  testing the flag anyways.
- Always read sticks while holding sched_lock using a temporary variable
  where needed.
- Always init prticks to 0 in ast() to quiet a warning.
2001-12-18 09:06:10 +00:00
John Baldwin
7e1f6dfe9d Modify the critical section API as follows:
- The MD functions critical_enter/exit are renamed to start with a cpu_
  prefix.
- MI wrapper functions critical_enter/exit maintain a per-thread nesting
  count and a per-thread critical section saved state set when entering
  a critical section while at nesting level 0 and restored when exiting
  to nesting level 0.  This moves the saved state out of spin mutexes so
  that interlocking spin mutexes works properly.
- Most low-level MD code that used critical_enter/exit now use
  cpu_critical_enter/exit.  MI code such as device drivers and spin
  mutexes use the MI wrappers.  Note that since the MI wrappers store
  the state in the current thread, they do not have any return values or
  arguments.
- mtx_intr_enable() is replaced with a constant CRITICAL_FORK which is
  assigned to curthread->td_savecrit during fork_exit().

Tested on:	i386, alpha
2001-12-18 00:27:18 +00:00
Mark Peek
bf43c504c9 Remove whitespace at end of line. 2001-12-16 17:21:16 +00:00
Luigi Rizzo
af1408e33f Add/correct description for some sysctl variables where it was missing.
The description field is unused in -stable, so the MFC there is equivalent
to a comment. It can be done at any time, i am just setting a reminder
in 45 days when hopefully we are past 4.5-release.

MFC after: 45 days
2001-12-16 16:07:20 +00:00
Luigi Rizzo
6105f81565 Add code to export and print the description associated to sysctl
variables. Use the -d flag in sysctl(8) to see this information.

Possible extensions to sysctl:
 + report variables that do not have a description
 + given a name, report the oid it maps to.

Note to developers: have a look at your code, there are a number of
	variables which do not have a description.

Note to developers: do we want this in 4.5 ? It is a very small change
	and very useful for documentation purposes.

Suggested by: Orion Hodson
2001-12-16 02:55:41 +00:00
John Baldwin
201b0ea8fd Fix some nits in fork_exit() so it more properly duplicates the backend
of mi_switch:
- Set the oncpu value for the current thread.
- Always set switchticks, not just in the SMP case.
- Add a KTR entry for fork_exit that is the same as the "new proc"
  entry in mi_switch().
- Release sched_lock a bit later like we do with mi_switch().
2001-12-14 23:37:35 +00:00
Jonathan Lemon
2b846bd3a5 When removing kqueue descriptors from the descriptor table during a fork,
update fd_freefile and fd_lastfile as well, to keep things in sync.

Pointed out by: Debbie Chu <dchu@juniper.net>
2001-12-14 19:02:57 +00:00
Luigi Rizzo
e4fc250c15 Device Polling code for -current.
Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

        options DEVICE_POLLING

and at runtime enable polling with

        sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

        sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

	http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
        new file kern/kern_poll.c
sys/conf/options.i386
        new option
sys/i386/i386/trap.c
        poll in trap (disabled by default)
sys/kern/kern_clock.c
        initialization and hardclock hooks.
sys/kern/kern_intr.c
        minor swi_net changes
sys/kern/kern_poll.c
        the bulk of the code.
sys/net/if.h
        new flag
sys/net/if_var.h
        declaration for functions used in device drivers.
sys/net/netisr.h
        NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
        device driver modifications
2001-12-14 17:56:12 +00:00
Peter Wemm
f6916f666c Proper fix for old config setting maxusers to 8. 2001-12-14 09:39:29 +00:00
Matthew Dillon
873a490449 A slightly different version of the vlrureclaim fix.
Reported by: peter, ps
2001-12-14 07:18:31 +00:00
Kirk McKusick
d8bddaa85d Add disk I/O scheduling for positively niced processes.
When a positively niced process requests a disk I/O, make
it wait for its nice value of ticks before scheduling its
I/O request if there are any other processes with I/O
requests in the disk queue. For all the gory details, see
the ``Running fsck in the Background'' paper in the Usenix
BSDCon 2002 Conference Proceedings, pages 55-64.
2001-12-14 05:50:44 +00:00
Matthew Dillon
7ca592e093 Too many people are compiling kernels with maxusers set to 0 without the new
config.  Hack the kernel to force auto-sizing if the old config is used.
2001-12-14 04:01:08 +00:00
Matthew Dillon
3ebeaf5984 This fixes a large number of bugs in our NFS client side code. A recent
commit by Kirk also fixed a softupdates bug that could easily be triggered
by server side NFS.

	* An edge case with shared R+W mmap()'s and truncate whereby
	  the system would inappropriately clear the dirty bits on
	  still-dirty data.  (applicable to all filesystems)

	  THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING.
	  see vm/vm_page.c line 1641

	* The straddle case for VM pages and buffer cache buffers when
	  truncating.  (applicable to NFS client side)

	* Possible SMP database corruption due to vm_pager_unmap_page()
	  not clearing the TLB for the other cpu's.  (applicable to NFS
	  client side but could effect all filesystems).  Note: not
	  considered serious since the corruption occurs beyond the file
	  EOF.

	* When flusing a dirty buffer due to B_CACHE getting cleared,
	  we were accidently setting B_CACHE again (that is, bwrite() sets
	  B_CACHE), when we really want it to stay clear after the write
	  is complete.  This resulted in a corrupt buffer.  (applicable
	  to all filesystems but probably only triggered by NFS)

	* We have to call vtruncbuf() when ftruncate()ing to remove
	  any buffer cache buffers.  This is still tentitive, I may
	  be able to remove it due to the second bug fix.  (applicable
	  to NFS client side)

	* vnode_pager_setsize() race against nfs_vinvalbuf()... we have
	  to set n_size before calling nfs_vinvalbuf or the NFS code
	  may recursively vnode_pager_setsize() to the original value
	  before the truncate.  This is what was causing the user mmap
	  bus faults in the nfs tester program.  (applicable to NFS
	  client side)

	* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made
	  by Kirk).

Testing program written by: Avadis Tevanian, Jr.
Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step')
MFC after:	1 week
2001-12-14 01:16:57 +00:00
Robert Watson
48f1ba5b0d o Wording fix in comment.
Submitted by:	tanimura via p4
2001-12-14 00:38:01 +00:00
Peter Wemm
9446b36bab If we were called to allocate a vnode that is not associated with a
mount point, do not dereference the NULL mp argument.
2001-12-13 23:46:01 +00:00
Robert Watson
f8cf411e49 o Back out portions of 1.50 and 1.47, eliminating sonewconn3() and
always deriving the credential for a newly accepted connection from
  the listen socket.  Previously, the selection of the credential
  depended on the protocol: UNIX domain sockets would use the
  connecting process's credential, and protocols supporting a creation
  of the socket before the receiving end called accept() would use
  the listening socket.  After this change, it is always the listening
  credential.

Reviewed by:	green
2001-12-13 22:09:37 +00:00
Mike Silbersack
ebacce5e99 Limit maxprocperuid to 9/10 maxproc, and limit maxfilesperproc to 9/10
maxfiles.  This should make local resource exhaustion attacks easier
to handle with a non-tweaked setup.

MFC after:	3 days
2001-12-13 20:00:45 +00:00
John Baldwin
69e9495750 Use a per-thread variable for keeping state when a thread is processing
a KTR log entry.  Any KTR requests made while working on an entry are
ignored/discarded to prevent recursion.  This is a better fix for the
hack to futz with the CPU mask and call getnanotime() if KTR_LOCK or
KTR_WITNESS was on.  It also covers the actual formatting of the log entry
including dumping it to the display which the earlier hacks did not.
2001-12-13 10:33:20 +00:00
Andrew R. Reiter
83aee5a8d5 - Move _jail sysctl node underneath _kern_security in order to standardize
where our security related sysctl tuneables are located.  Also, this
  will help if/when we move _security node out from under _kern as to help
  make _kern less cluttered.

Approved by:	rwatson
Review by:	rwatson
2001-12-12 05:23:20 +00:00
John Baldwin
0bbc882680 Overhaul the per-CPU support a bit:
- The MI portions of struct globaldata have been consolidated into a MI
  struct pcpu.  The MD per-CPU data are specified via a macro defined in
  machine/pcpu.h.  A macro was chosen over a struct mdpcpu so that the
  interface would be cleaner (PCPU_GET(my_md_field) vs.
  PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead.  In a UP kernel,
  this data was stored as global variables which is where the original name
  came from.  In an SMP world this data is per-CPU and ideally private to each
  CPU outside of the context of debuggers.  This also included combining
  machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
  npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
  fields.
- The globaldata_register() function was renamed to pcpu_init() and now
  init's MI fields of a struct pcpu in addition to registering it with
  the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
  internal array and list.

Tested on:	alpha, i386
Reviewed by:	peter, jake
2001-12-11 23:33:44 +00:00
Guido van Rooij
f4029c1446 Fix boot -p for DDBless kernels
Pointed out by: John Hay <jhay@icomtek.csir.co.za>
2001-12-11 10:21:26 +00:00
Peter Wemm
b21d3f5c61 Wrap Dangerously Dedicated printf under if (bootverbose) 2001-12-11 05:35:43 +00:00
David E. O'Brien
071087f3d7 Missed an assignment of arg6 in previous commit. 2001-12-10 20:58:39 +00:00
David E. O'Brien
b45df7b4ae Adjust for the addition of CTR6. 2001-12-10 20:18:17 +00:00
Guido van Rooij
28703190c5 Add new boot flag to i386 boot: -p.
This flag adds a pausing utility. When ran with -p, during the kernel
probing phase, the kernel will pause after each line of output.
This pausing can be ended with the '.' key, and is automatically
suspended when entering ddb.

This flag comes in handy at systems without a serial port that either hang
during booting or reser.
Reviewed by:	(partly by jlemon)
MFC after:	1 week
2001-12-10 20:02:22 +00:00
David E. O'Brien
a48740b6c5 Update to C99, s/__FUNCTION__/__func__/. 2001-12-10 05:51:45 +00:00
David E. O'Brien
91f9161737 Repeat after me -- "Use of ANSI string concatenation can be bad."
In this case, C99's __func__ is properly defined as:

	static const char __func__[] = "function-name";

and GCC 3.1 will not allow it to be used in bogus string concatenation.
2001-12-10 05:40:12 +00:00
Alan Cox
604035c5f2 o Eliminate compilation warnings on 64-bit architectures. 2001-12-10 03:34:06 +00:00
Alan Cox
91369fc768 o Eliminate unnecessary synchronization from filt_aiodetach().
o The manual page for kevent says that EVFILT_AIO returns under the same
   conditions as aio_error().  With that in mind, set the data field
   of the returned struct kevent to the value that would be returned
   by aio_error().
 o Fix two compilation warnings.
2001-12-09 08:16:36 +00:00
Matthew Dillon
66a11b9fb1 Allow maxusers to be specified as 0 in the kernel config, which will
cause the system to auto-size to between 32 and 512 depending on the
amount of memory.

MFC after:	1 week
2001-12-09 01:57:09 +00:00
Matthew Dillon
a4233d5dc3 The nbuf calculation was assuming that PAGE_SIZE = 4096 bytes, which is
bogus.  The calculation has been adjusted to use units of kilobytes.

Noticed by: Chad David <davidc@acns.ab.ca>
MFC after:	1 week
2001-12-08 20:37:08 +00:00
Chad David
995a2227c5 Update the comment about System initialization to reflect the use of
DOMAIN_SET(9) instead of SYSINIT for adding domains at startup.

Reviewed by: alfred
2001-12-08 04:20:54 +00:00
Robert Watson
5a92ee3c00 o A few more minor whitespace and other style fixes.
Submitted by:	bde
2001-12-06 21:58:47 +00:00
Robert Watson
9147519a91 o Remove unnecessary inclusion of opt_global.h.
Submitted by:	bde
2001-12-06 21:55:41 +00:00
Robert Watson
65bbadfbbc o Make kern.security.bsd.suser_enabled TUNABLE.
Requested by:	green
2001-12-05 18:49:20 +00:00
Kirk McKusick
dd58224e31 Update pathnames for creation of tags file. 2001-12-05 01:23:21 +00:00
Robert Watson
5d476e73ce o Update an instance of 'unprivileged_procdebug_permitted' missed
in the previous commit: the comment should also call it
  'unprivileged_proc_debug'.
2001-12-03 19:10:21 +00:00
Robert Watson
011376308f o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
  pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
  so as to enforce these protections, in particular, in kern_mib.c
  protection sysctl access to the hostname and securelevel, as well as
  kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
  mib entries (osname, osrelease, osversion) so that they don't return
  a pointer to the text in the struct linux_prison, rather, a copy
  to an array passed into the calls.  Likewise, update linprocfs to
  use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
  accessing struct prison.

Reviewed by:	jhb
2001-12-03 16:12:27 +00:00
Robert Watson
4f5a4612d3 o Uniformly copy uap arguments into local variables before grabbing
giant, and make whitespace more consistent around giant-frobbing.
2001-12-02 15:22:56 +00:00
Robert Watson
f605567c24 o Remove KSE race in setuid() in which oldcred was preserved before giant
was grabbed.  This was introduced in 1.101 when the giant pushdown
  for kern_prot.c was originally performed.
2001-12-02 15:15:29 +00:00
Robert Watson
eb725b4e6a o General style, formatting, etc, improvements:
- uid's -> uids
	- whitespace improvements, linewrap improvements
	- reorder copyright more appropriately
	- remove redundant MP SAFE comments, add one "NOT MPSAFE?"
	  for setgroups(), which seems to be the sole un-changed system
	  call in the file.
	- clean up securelevel_g?() functions, improve comments.

Largely submitted by:	bde
2001-12-02 15:07:10 +00:00
Alfred Perlstein
59aff5fcf3 make LOCKF_DEBUG kernel option work (sorta)
Submitted by: Maxim Konovalov <maxim@macomnet.ru>
PR: kern/32267
2001-12-02 12:47:25 +00:00
Luigi Rizzo
60363fb9f7 vm/vm_kern.c: rate limit (to once per second) diagnostic printf when
you run out of mbuf address space.

kern/subr_mbuf.c: print a warning message when mb_alloc fails, again
	rate-limited to at most once per second. This covers other
	cases of mbuf allocation failures. Probably it also overlaps the
	one handled in vm/vm_kern.c, so maybe the latter should go away.

This warning will let us gradually remove the printf that are scattered
across most network drivers to report mbuf allocation failures.
Those are potentially dangerous, in that they are not rate-limited and
can easily cause systems to panic.

Unless there is disagreement (which does not seem to be the case
judging from the discussion on -net so far), and because this is
sort of a safety bugfix, I plan to commit a similar change to STABLE
during the weekend (it affects kern/uipc_mbuf.c there).

Discussed-with: jlemon, silby and -net
2001-12-01 00:21:30 +00:00
Robert Watson
6f3933fa6f o Introduce kern.security.bsd.unprivileged_read_msgbuf, which allows
the administrator to restrict access to the kernel message buffer.
  It defaults to '1', which permits access, but if set to '0', requires
  that the process making the sysctl() have appropriate privilege.
o Note that for this to be effective, access to this data via system
  logs derived from /dev/klog must also be limited.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-30 21:40:52 +00:00
Robert Watson
e409590d0e o Further sysctl name simplification, generally stripping 'permitted',
using '_'s more consistently.

Discussed with:	bde, jhb
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-30 21:33:16 +00:00
Robert Watson
48713bdc3c o Move current inhabitants of kern.security to kern.security.bsd, so
that new models can inhabit kern.security.<modelname>.
o While I'm there, shorten somewhat excessive variable names, and clean
  things up a little.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, NAI Labs
2001-11-30 20:58:31 +00:00
Robert Watson
1e4b531bb6 o Cache req->td->td_proc->p_ucred->cr_prison in pr to improve
readability.
o Conditionalize only the SYSCTL definitions for the regression
  tree, not the variables itself, decreasing the number of #ifdef
  REGRESSIONs scattered in kern_mib.c, and making the code more
  readable.

Sponsored by:	DARPA, NAI Labs
2001-11-28 21:22:05 +00:00
John W. De Boskey
a5f75648d8 Return a more meaningful errno when the length of the interpreter
exceeds MAXSHELLCMDLEN to avoid secondary /bin/sh execution.

Update execve man page to reflect change.

Increase MAXSHELLCMDLEN to a slightly more meaningful value.

PR:		kern/32106
Submitted by:	b@etek.chalmers.se
Reviewed by:	bsd
MFC after:	2 weeks
2001-11-28 03:26:58 +00:00
Peter Wemm
023a0e6100 Dont print the sysctl node tree unless you're root.
Found by:	jkb (Yahoo OS troublemaker)
2001-11-28 03:11:16 +00:00
Bosko Milekic
a705398be0 Context:
For an object type, we maintain a variable mb_mapfull. It is 0 by default
and is only raised to 1 in one place: when an mb_pop_cont() fails for
the first time, on the assumption that the reason for the failure is
due to the underlying map for the object (e.g. clust_map, mbuf_map) being
exhausted.

Problem and Changes:
Change how we define "mb_mapfull." It now means: "set to 1 when the first
mb_pop_cont() fails only in the kmem_malloc()-ing of the object, and
only if the call was with the M_TRYWAIT flag." This is a more conservative
definition and should avoid odd [but theoretically possible] situations
from occuring. i.e. we had set mb_mapfull to 1 thinking the map for the
object was actually exhausted when we _actually_ failed in malloc()ing
the space for the bucket structure managing the objects in the page
we're allocating.
2001-11-25 04:42:54 +00:00
Doug Rabson
c36e48514d Since we used '#ifdef __i386__', don't close with '#endif /* !__alpha__ */' 2001-11-24 10:11:14 +00:00
David E. O'Brien
d970bcc9db Remove the use of _PATH_DEV in the example.
The kernel certainly doesn't use _PATH_DEV or even /dev/ to find the device.
It cannot, since "/" has not been mounted.  Maybe the only affect of using
/dev/ is that it gets put in the mounted-from name for "/", so that mount(8),
etc., display an absolute path before "/" has been remounted.  Many have
never bothered typing the full path, and code that constructs a path in
rootdevnames[] never bothered to construct a full path, so the example
shouldn't have it.

Submitted by:	bde
2001-11-24 01:34:12 +00:00
Peter Wemm
fef8392d99 Recognize the "fixed" geometry in boot1 so that DD disks are not
interpreted as real fdisk tables (and fail).
2001-11-21 08:31:45 +00:00
David E. O'Brien
cabb03fc76 We only have slices on i386 and IA-64. 2001-11-20 23:48:00 +00:00
Maxim Sobolev
783c41d432 Make kevents on pipes work as described in the manpage - when the last
reader/writer disconnects, ensure that anybody who is waiting for the
kevent on the other end of the pipe gets EV_EOF.

MFC after:	2 weeks
2001-11-19 09:25:30 +00:00
Matthew Dillon
849948a7cd cast hashing index to (int)(intptr_t) for calculation.
mtx_init() with MTX_QUIET and MTX_NOWITNESS to avoid bogus warnings
2001-11-19 00:20:36 +00:00
Andrew R. Reiter
b489b4075c - Ensure that linker file id's are unique, rather than blindly
incrementing the value.

Reviewed by: dfr, peter
2001-11-18 18:19:35 +00:00
Matthew Dillon
b1e4abd246 Give struct socket structures a ref counting interface similar to
vnodes.  This will hopefully serve as a base from which we can
expand the MP code.  We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
2001-11-17 03:07:11 +00:00
Peter Wemm
1b27b1ad08 Fix some warnings on 64 bit platforms. 2001-11-17 00:42:02 +00:00
Peter Wemm
857ff6155b utime/stime.tv_sec are elapsed times, not relative to 1970. We can
safely print them as longs.  Even if ^T overflows after a process
has accumulated 68 years of user or system time, it is no big deal.
2001-11-17 00:26:57 +00:00
Peter Wemm
aa89942676 You cannot cast a time_t to quad_t and printf it with %lld. quad_t is
64 bits, not long long.
2001-11-16 23:53:48 +00:00
Ian Dowse
7b9716bad2 Fix a number of misspellings of "dependency" and "dependencies" in
comments and function names.

PR:		kern/8589
Submitted by:	Rajesh Vaidheeswarran <rv@fore.com>
2001-11-16 21:08:40 +00:00
Poul-Henning Kamp
24d5c95471 Back out the previous fix to the leading zero problem, I hadn't
noticed it in there already.  That should teach me to check exit
code from cvsup.
2001-11-16 17:07:47 +00:00
Poul-Henning Kamp
10786074c5 Reject leading zeros in dev_stdclone().
PR:		32019
Submitted by:	fenner
2001-11-16 17:05:07 +00:00
Josef Karthauser
9ea6d9ef6a Switch warnings and strict back on again in a way that's compatible
with -stable as well as -current.

Reviewed by:	imp
2001-11-16 02:02:42 +00:00
Bill Fenner
b519852a02 Do not allow leading zeros on device names in dev_stdclone().
PR:		kern/32019
Reviewed by:	phk
2001-11-15 23:27:46 +00:00
John Baldwin
21a7a9aeb6 Use MTX_QUIET for the lock operations during clock interrupts so their logs
don't drown out more useful log messages.
2001-11-15 19:54:48 +00:00
John Baldwin
f4076cc158 Add a couple of returns to making recovering from a failed witness_assert()
more sane in the RESTARTABLE_PANICS case.
2001-11-15 19:46:36 +00:00
John Baldwin
ba48b69a13 Remove definition of witness and comment stating that this file implements
witness.  Witness moved off to subr_witness.c a while ago.
2001-11-15 19:08:55 +00:00
Matthew Dillon
b064d43d8f remove holdfp()
Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate

introduce fget(), fget_read(), fget_write() - these functions will take
a thread and file descriptor and return a file pointer with its ref
count bumped.

introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will
take a thread and file descriptor and return a vref()'d vnode.

*_read() requires that the file pointer be FREAD, *_write that it be
FWRITE.

This continues the cleanup of struct filedesc and struct file access
routines which, when are all through with it, will allow us to then
make the API calls MP safe and be able to move Giant down into the fo_*
functions.
2001-11-14 06:30:36 +00:00
Matthew Dillon
f286003909 Create a mutex pool API for short term leaf mutexes.
Replace the manual mutex pool in kern_lock.c (lockmgr locks) with the new API.
Replace the mutexes embedded in sxlocks with the new API.
2001-11-13 21:55:13 +00:00
John Baldwin
00f13cb353 As a followup to the previous fixes to inferior, revert some of the
changes in 1.80 that were needed for locking that are no longer needed now
that a lock is simply asserted.

Submitted by:	bde
2001-11-13 16:55:54 +00:00
Paul Saab
817805d9c9 Fix a signed bug in the crashdump code for systems with > 2GB of ram.
Reviewed by:	peter
2001-11-13 01:08:54 +00:00
Giorgos Keramidas
7377f0d190 Remove EOL whitespace.
Reviewed by:	alfred
2001-11-12 20:51:40 +00:00
Giorgos Keramidas
074df01866 Make KASSERT's print the values that triggered a panic.
Reviewed by:	alfred
2001-11-12 20:50:06 +00:00
John Baldwin
5b29d6e906 Clean up breakage in inferior() I introduced in 1.92 of kern_proc.c:
- Restore inferior() to being iterative rather than recursive.
- Assert that the proctree_lock is held in inferior() and change the one
  caller to get a shared lock of it.  This also ensures that we hold the
  lock after performing the check so the check can't be made invalid out
  from under us after the check but before we act on it.

Requested by:	bde
2001-11-12 18:56:49 +00:00
Peter Wemm
658c434d90 Commit the better version that I had a while ago. This has only one
reference to curthread.  (#define curproc (curthread->td_proc)).
2001-11-12 08:53:34 +00:00
Matthew Dillon
5b1927bc01 When curproc is used repeatedly store curproc into a local
variable to reduce generated code.  This is a test case.
2001-11-12 08:42:20 +00:00
Alfred Perlstein
f03e89de68 turn vn_open() into a wrapper around vn_open_cred() which allows
one to perform a vn_open using temporary/other/fake credentials.

Modify the nfs client side locking code to use vn_open_cred() passing
proc0's ucred instead of the old way which was to temporary raise
privs while running vn_open().  This should close the race hopefully.
2001-11-11 22:39:07 +00:00
Andrew R. Reiter
b49c67f03f - No need for resetting values to 0 when M_ZERO flag is used.
Approved: jhb
2001-11-10 21:36:56 +00:00
Ian Dowse
cca8f9808b Properly sanity-check the old msgbuf structure before we accept it
as being valid. Previously only the magic number and the virtual
address were checked, but it makes little sense to require that
the virtual address is the same (the message buffer is located at
the end of physical memory), and checks on the msg_bufx and msg_bufr
indices were missing.

Submitted by:	Bodo Rueskamp <br@clabsms.de>
Tripped over during a kernel debugging tutorial given by: grog
Reviewed by:	grog, dwmalone
MFC after:	1 week
2001-11-09 23:58:07 +00:00
Matthew Dillon
8ba1f55b49 Placemark an interrupt race in -current which is currently protected by
Giant.  -stable will get spl*() fixes for the race.

Reported by: Rob Anderson <rob@isilon.com>
MFC after:	0 days
2001-11-08 18:09:18 +00:00
Robert Watson
eacb362f8a o General style improvemnts.
Submitted by:	bde
2001-11-08 15:31:19 +00:00
Robert Watson
44a280a67e o Trim trailing whitespace from kern_mib.c, as suggested by bde. Good
grief.
2001-11-08 15:20:00 +00:00
Robert Watson
ce17880650 o Replace reference to 'struct proc' with 'struct thread' in 'struct
sysctl_req', which describes in-progress sysctl requests.  This permits
  sysctl handlers to have access to the current thread, permitting work
  on implementing td->td_ucred, migration of suser() to using struct
  thread to derive the appropriate ucred, and allowing struct thread to be
  passed down to other code, such as network code where td is not currently
  available (and curproc is used).

o Note: netncp and netsmb are not updated to reflect this change, as they
  are not currently KSE-adapted.

Reviewed by:		julian
Obtained from:	TrustedBSD Project
2001-11-08 02:13:18 +00:00
Peter Wemm
c3699b5f63 For what its worth, sync up the type of ps_arg_cache_max (unsigned long)
with the sysctl type (signed long).
2001-11-08 00:24:48 +00:00
Robert Watson
d3c9fa0463 o Cache the process's struct prison so as to create a more visually
appealing code structure.  In particular, s/req->p->p_ucred->cr_prison/pr/

Requested by:	imp, jhb, jake, other hangers on
2001-11-06 20:09:33 +00:00
Robert Watson
5c0c46c684 o Remove a tab missed in the previous whitespace commit. 2001-11-06 19:58:43 +00:00
Robert Watson
9afc1eee4f o Remove double-indentation of sysctl_kern_securelvl. This change is
consistent with the one other function in the file, and prevents long
  lines in up-coming changes.  This nominally pulls kern_mib.c a little
  further down the long path to style(9) compliance.
2001-11-06 19:56:58 +00:00
Andrew R. Reiter
22524ad0e2 o No need to set values to 0 when we utilize M_ZERO
Approved by: peter
2001-11-05 22:27:46 +00:00
Matthew Dillon
7e76bb562e Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking
in wdrain during a write.  This flag needs to be used in devices whos
strategy routines turn-around and issue another high level I/O, such as
when MD turns around and issues a VOP_WRITE to vnode backing store, in order
to avoid deadlocking the dirty buffer draining code.

Remove a vprintf() warning from MD when the backing vnode is found to be
in-use.  The syncer of buf_daemon could be flushing the backing vnode at
the time of an MD operation so the warning is not correct.

MFC after:	1 week
2001-11-05 18:48:54 +00:00
Robert Watson
149e39ea9e Update copyrights to include Thomas Moestl.
Submitted by:	"Ilmar S. Habibulin" <ilmar@watson.org>
Obtained from:	TrustedBSD Project
2001-11-05 15:36:24 +00:00
Poul-Henning Kamp
751a2cd05b Define a new mount flag "MNT_JAILDEVFS"
Collect the magic combination of flags which can be updated into
a macro in sys/mount.h rather than inlining them (twice!) in
vfs_syscalls.c
2001-11-05 10:33:45 +00:00
Matthew Dillon
6b8bd2efc1 Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on.  mnt_nvnodelist
currently holds all the vnodes under the mount point.  This will eventually
be split into a 'dirty' and 'clean' list.  This way we only break kld's once
rather then twice.  nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.
2001-11-04 18:55:42 +00:00
Peter Wemm
9aefe36fa6 *** empty log message *** 2001-11-04 18:22:48 +00:00
Poul-Henning Kamp
3165f068f3 Don't call cdevsw_add(). 2001-11-04 11:56:22 +00:00
Poul-Henning Kamp
20a3b67cb2 Rename the top 7 bits if disk minors to spare bits, rather than type bits. 2001-11-04 09:01:07 +00:00
Poul-Henning Kamp
b456f7e6b3 Don't choke on old sd%d.ctl devices.
Tripped over by:	Jos Backus <josb@cncdsl.com>
2001-11-03 23:21:00 +00:00
Peter Wemm
6c1534a73e _SIG_MAXSIG (128) is the highest legal signal. The arrays are offset
by one - see _SIG_IDX().  Revert part of my mis-correction in kern_sig.c
(but signal 0 still has to be allowed) and fix _SIG_VALID() (it was
rejecting ignal 128).
2001-11-03 13:26:15 +00:00
Peter Wemm
049954de94 Partial reversion of rev 1.138. kill and killpg allow a signal
argument of 0.  You cannot return EINVAL for signal 0.  This broke
(in 5 minutes of testing) at least ssh-agent and screen.

However, there was a bug in the original code.  Signal 128 is not
valid.

Pointy-hat to: des, jhb
2001-11-03 12:36:16 +00:00
Peter Wemm
e0234e53c6 FreeBSD/tahoe is not likely for a while. 2001-11-03 08:19:21 +00:00
Dag-Erling Smørgrav
2899d60638 We have a _SIG_VALID() macro, so use it instead of duplicating the test all
over the place.  Also replace a printf() + panic() with a KASSERT().

Reviewed by:	jhb
2001-11-02 23:50:00 +00:00
Robert Watson
fad8096565 o Remove (struct proc *p = td->td_proc) indirection in ipcperm(),
as suser_td(td) works as well as suser_xxx(NULL, p->p_ucred, 0);
  This simplifies upcoming changes to suser(), and causes this code
  to use the right credential (well, largely) once the td->td_ucred
  changes are complete.  There remains some redundancy and oddness
  in this code, which should be rethought after the next batch of
  suser and credential changes.
2001-11-02 21:20:05 +00:00
Warner Losh
bc5fc9140e Back out the -w, option strict and our($...). They don't work for me and
have broken the kernel build.
2001-11-02 21:14:17 +00:00
Robert Watson
cd778f0244 o Remove the local temporary variable "struct proc *p" from vfs_mount()
in vfs_syscalls.c.  Although it did save some indirection, many of
  those savings will be obscured with the impending commit of suser()
  changes, and the result is increased code complexity.  Also, once
  p->p_ucred and td->td_ucred are distinguished, this will make
  vfs_mount() use the correct thread credential, rather than the
  process credential.
2001-11-02 21:11:41 +00:00
Poul-Henning Kamp
0bd1a2d087 Argh!
patch added the nmount at the bottom first time around.

Take 3!
2001-11-02 19:12:06 +00:00
Robert Watson
db42a33d81 o Introduce group subset test, which limits the ability of a process to
debug another process based on their respective {effective,additional,
  saved,real} gid's.  p1 is only permitted to debug p2 if its effective
  gids (egid + additional groups) are a strict superset of the gids of
  p2.  This implements properly the security test previously incorrectly
  implemented in kern_ktrace.c, and is consistent with the kernel
  security policy (although might be slightly confusing for those more
  familiar with the userland policy).
o Restructure p_candebug() logic so that various results are generated
  comparing uids, gids, credential changes, and then composed in a
  single check before testing for privilege.  These tests encapsulate
  the "BSD" inter-process debugging policy.  Other non-BSD checks remain
  seperate.  Additional comments are added.

Submitted by:   tmm, rwatson
Obtained from:  TrustedBSD Project
Reviewed by:    petef, tmm, rwatson
2001-11-02 18:44:50 +00:00
Poul-Henning Kamp
bad699770a Add empty shell for nmount syscall (take 2!) 2001-11-02 18:35:54 +00:00
Poul-Henning Kamp
06d133c475 Add nmount() stub function and regenerate the syscall-glue which should
not need to check in generated files.
2001-11-02 17:59:23 +00:00
Poul-Henning Kamp
c60693dbd3 Reserve 378 for the new mount syscall Maxime Henrion <mux@qualys.com>
is working on.  (This is to get us more than 32 mountoptions).
2001-11-02 17:58:26 +00:00
Warner Losh
89bbe0cd1e Don't hide the failure to allocate device behind boot verbose. It is
still telling us of real problems so should remain until it stops
doing that.

Submitted by: OGAWA Takaya <t-ogawa@triaez.kaisei.org>
2001-11-02 17:33:06 +00:00
Jonathan Lemon
198475ebeb + Fix another possible vn_close race, in the same fashion as r1.95.
+ Check that the cached vnode type != VBAD before calling devsw(),
   this can happen if the vnode has been revoked.
2001-11-02 17:04:32 +00:00
Robert Watson
5fab7614f4 o Add a comment to p_candebug() noting that the P_INEXEC check should
really be moved elsewhere: p_candebug() encapsulates the security
  policy decision, whereas the P_INEXEC check has to do with "correctness"
  regarding race conditions, rather than security policy.

  Example: even if no security protections were enforced (the "uids are
  advisory" model), removing P_INEXEC could result in incorrect operation
  due to races on credential evaluation and modification during execve().

Obtained from:	TrustedBSD Project
2001-11-02 16:41:06 +00:00
Robert Watson
bb51af2816 Merge from POSIX.1e Capabilities development tree:
o Reorder and synchronize #include's, including moving "opt_cap.h" to
  above system includes.
o Introduce #ifdef'd kern.security.capabilities sysctl tree, including
  kern.security.capabilities.enabled, which defaults to 0.

The rest of the file remains stubs for the time being.

Obtained from:	TrustedBSD Project
2001-11-02 15:22:32 +00:00
Robert Watson
bcc0dc3dc7 Merge from POSIX.1e Capabilities development tree:
o POSIX.1e capabilities authorize overriding of VEXEC for VDIR based
  on CAP_DAC_READ_SEARCH, but of !VDIR based on CAP_DAC_EXECUTE.  Add
  appropriate conditionals to vaccess() to take that into account.
o Synchronization cap_check_xxx() -> cap_check() change.

Obtained from:	TrustedBSD Project
2001-11-02 15:16:59 +00:00
Robert Watson
4df571b101 o Capabilities cap_check() interface revised to remove _xxx, so rename
in p_cansched().  Also, replace '0' with 'NULL' for the ucred * pointer.

Obtained from:	TrustedBSD Project
2001-11-02 15:08:08 +00:00
Robert Watson
a76789e7df o Since kern_acl.c uses #ifdef CAPABILITIES to control
capability-specific semantics, #include "opt_cap.h".

Obtained from:	TrustedBSD Project
2001-11-02 14:53:04 +00:00
Poul-Henning Kamp
8dd72bc887 #ifdef KTRACE a variable to silence a warning.
Submitted by:	Maxime "mux" Henrion <mux@qualys.com>
2001-11-02 09:55:01 +00:00
Poul-Henning Kamp
a2d7281c5a Turn the symlinks around, instead of ad0s1 -> ad0s1c, make it ad0s1c -> ad0s1.
Requested by:	peter
2001-11-02 09:16:25 +00:00
Robert Watson
6d8785434f o Update copyright dates.
o Add reference to TrustedBSD Project in license header.
o Update dated comments, including comment in extattr.h claiming that
  no file systems support extended attributes.
o Improve comment consistency.
2001-11-01 21:37:07 +00:00
Robert Watson
fc5d29ef7d o Move suser() calls in kern/ to using suser_xxx() with an explicit
credential selection, rather than reference via a thread or process
  pointer.  This is part of a gradual migration to suser() accepting
  a struct ucred instead of a struct proc, simplifying the reference
  and locking semantics of suser().

Obtained from:	TrustedBSD Project
2001-11-01 20:56:57 +00:00
Mitsuru IWASAKI
f9390180fe Some fix for the recent apm module changes.
- Now that apm loadable module can inform its existence to other kernel
   components  (e.g. i386/isa/clock.c:startrtclock()'s TCS hack).
 - Exchange priority of SI_SUB_CPU and SI_SUB_KLD for above purpose.
 - Add simple arbitration mechanism for APM vs. ACPI.  This prevents
   the kernel enables both of them.
 - Remove obsolete `#ifdef DEV_APM' related code.
 - Add abstracted interface for Powermanagement operations.  Public apm(4)
   functions, such as apm_suspend(), should be replaced new interfaces.
   Currently only power_pm_suspend (successor of apm_suspend) is implemented.

Reviewed by:	peter, arch@ and audit@
2001-11-01 16:34:07 +00:00
Josef Karthauser
0c5d0f0eff Tidy up the variable declarations and switch on warnings and strict.
Reviewed by:	diffing the generated files from before and after the change.
2001-11-01 12:46:08 +00:00
Andrey A. Chernov
82849b4dfe Add new interface function
int	devclass_find_free_unit(devclass_t dc, int unit);
which return first free unit in given class starting from 'unit'.
2001-11-01 05:07:28 +00:00
Marcel Moolenaar
1245202150 Don't remove the tentative declaration. It's the only one...
Pointy hat: marcel (self-sponsoring)
2001-10-31 20:43:38 +00:00
Marcel Moolenaar
8b3e7871bc Make smp_started volatile in sys/smp.h and remove the volatile
declaration in subr_smp.c. This solves a compile problem with
gcc 3.0.1 (ia64 cross-build).

Reviewed: jhb
2001-10-31 09:03:05 +00:00
Brian Feldman
bb9fe9dd9e Add the sysctl "kern.function_list", which currently exports all
function symbols in the kernel in a list of C strings, with an extra
nul-termination at the end.

This sysctl requires addition of a new linker operation.  Now,
linker_file_t's need to respond to "each_function_name" to export
their function symbols.

Note that the sysctl doesn't currently allow distinguishing multiple
symbols with the same name from different modules, but could quite
easily without a change to the linker operation.  This will be a nicety
to have when it can be used.

Obtained from:	NAI Labs CBOSS project
Funded by:	DARPA
2001-10-30 15:21:45 +00:00
Brian Feldman
08d68dda08 Also, machine/profile.h should be necessary for the function prototype
of kmupetext().
2001-10-30 15:10:16 +00:00
Brian Feldman
f99502a4d4 Use kmupetext() for ELF KLDs to allow for increased text segment size.
Obtained from:	NAI Labs CBOSS project
Funded by:	DARPA
2001-10-30 15:08:51 +00:00
Brian Feldman
4a44bd4b4a Add kmupetext(), a function that expands the range of memory covered
by the profiler on a running system.  This is not done sparsely, as
memory is cheaper than processor speed and each gprof mcount() and
mexitcount() operation is already very expensive.

Obtained from:	NAI Labs CBOSS project
Funded by:	DARPA
2001-10-30 15:04:57 +00:00
Julian Elischer
48810023a3 Use the thread we have instead of finding another
that may be the wrong one.
2001-10-30 07:15:46 +00:00
David Malone
12396bdca7 When scanning for control messages, don't process the data mbufs.
This could cause hangs if a unix domain socket was closed with data
still to be read from it.

Tested by:	Andrea Campi <andrea@webcom.it>
2001-10-29 20:04:03 +00:00
Matthew Dillon
434d21ccbf Make ttyprintf() of tv_sec value type agnostic. 2001-10-29 01:23:28 +00:00
Andrey A. Chernov
e9c044bd9e 1) In devclass_alloc_unit(), skip duplicated wired devices (i.e. with fixed
number) instead of allocating next free unit for them.  If someone needs
fixed place, he must specify it correctly. "Allocating next" is especially bad
because leads to double device detection and to "repeat make_dev panic" as
result.  This can happens if the same devices present somewhere on PCI bus,
hints and  ACPI.  Making them present in one place only not always
possible, "sc" f.e.  can't be removed from hints, it results to no console at
all.

2) In make_device(), detect when devclass_add_device() fails, free dev and
return. I.e. add missing error checking. This part needed to finish fix in 1),
but must be done this way in anycase, with old variant too.
2001-10-28 23:32:35 +00:00
Matthew Dillon
0e9fe2127c Adjust printfs to be time_t agnostic. 2001-10-28 22:53:45 +00:00
Poul-Henning Kamp
4e13006747 Fix a problem in the disk related hack where device nodes for a physically
non-existent disk in a legacy /dev on a DEVFS system would panic the system
if stat(2)'ed.

Do not whine about anonymous device nodes not having a si_devsw, they're
not supposed to.
2001-10-28 09:39:28 +00:00
Michael Reifenberger
491dec936c Introduce [IPC|SHM]_[INFO|STAT] to shmctl to make
`/compat/linux/usr/bin/ipcs -m` happy.
2001-10-28 09:29:10 +00:00
Matthew Dillon
4ffa210b94 syncdelay, filedelay, dirdelay, metadelay are ints, not time_t's,
and can also be made static.
2001-10-27 19:58:56 +00:00
Poul-Henning Kamp
4e4a76633b Nudge the axe a bit closer to cdevsw[]:
Make it a panic to repeat make_dev() or destroy_dev(), this check
   should maybe be neutered when -current goes -stable.

   Whine if devsw() is called on anon dev_t's in a devfs system.

   Make a hack to avoid our lazy-eval disk code triggering the above whine.

   Fix the multiple make_dev() in disk code by making ${disk}${unit}s${slice}
   an alias/symlink to ${disk}${unit}s${slice}c
2001-10-27 17:44:21 +00:00
Dag-Erling Smørgrav
9ca45e813c Add a P_INEXEC flag that indicates that the process has called execve() and
it has not yet returned.  Use this flag to deny debugging requests while
the process is execve()ing, and close once and for all any race conditions
that might occur between execve() and various debugging interfaces.

Reviewed by:	jhb, rwatson
2001-10-27 11:11:25 +00:00
Robert Watson
48be932ac0 o Update copyright dates.
Obtained from:	TrustedBSD Project
2001-10-27 05:46:43 +00:00
Robert Watson
fdba6d3a1e o Improve style(9) compliance following KSE modifications. In particular,
strip the space from '( struct thread *...', wrap long lines.
o Remove an unneeded comment on the topic of no lock being required as
  part of the NDINIT() in __acl_get_file(), as it's really not required
  there.

Obtained from:	TrustedBSD Project
2001-10-27 05:45:42 +00:00
Matthew Dillon
d23f5958bc Add mtx_lock_giant() and mtx_unlock_giant() wrappers for sysctl management
of Giant during the Giant unwinding phase, and start work on instrumenting
Giant for the file and proc mutexes.

These wrappers allow developers to turn on and off Giant around various
subsystems.  DEVELOPERS SHOULD NEVER TURN OFF GIANT AROUND A SUBSYSTEM JUST
BECAUSE THE SYSCTL EXISTS!  General developers should only considering
turning on Giant for a subsystem whos default is off (to help track down
bugs).  Only developers working on particular subsystems who know what
they are doing should consider turning off Giant.

These wrappers will greatly improve our ability to unwind Giant and test
the kernel on a (mostly) subsystem by subsystem basis.   They allow Giant
unwinding developers (GUDs) to emplace appropriate subsystem and structural
mutexes in the main tree and then request that the larger community test
the work by turning off Giant around the subsystem(s), without the larger
community having to mess around with patches.  These wrappers also allow
GUDs to boot into a (more likely to be) working system in the midst of
their unwinding work and to test that work under more controlled
circumstances.

There is a master sysctl, kern.giant.all, which defaults to 0 (off).  If
turned on it overrides *ALL* other kern.giant sysctls and forces Giant to
be turned on for all wrapped subsystems.  If turned off then Giant around
individual subsystems are controlled by various other kern.giant.XXX sysctls.

Code which overlaps multiple subsystems must have all related subsystem Giant
sysctls turned off in order to run without Giant.
2001-10-26 20:48:04 +00:00
John Baldwin
282873e2c0 - Change the taskqueue locking to protect the necessary parts of a task
while it is on a queue with the queue lock and remove the per-task locks.
- Remove TASK_DESTROY now that it is no longer needed.
- Go back to inlining TASK_INIT now that it is short again.

Inspired by:	dfr
2001-10-26 18:46:48 +00:00
Poul-Henning Kamp
5f7806ab69 Make cdevsw[] static. 2001-10-26 15:31:22 +00:00
John Baldwin
8e2e767b1f Add a per-thread ucred reference for syscalls and synchronous traps from
userland.  The per thread ucred reference is immutable and thus needs no
locks to be read.  However, until all the proc locking associated with
writes to p_ucred are completed, it is still not safe to use the per-thread
reference.

Tested on:	x86 (SMP), alpha, sparc64
2001-10-26 08:12:54 +00:00
John Baldwin
1de1c550b1 Add locking to taskqueues. There is one mutex per task, one mutex per
queue, and a mutex to protect the global list of taskqueues.  The only
visible change is that a TASK_DESTROY() macro has been added to mirror
the TASK_INIT() macro to destroy a task before it is free'd.

Submitted by:	Andrew Reiter <awr@watson.org>
2001-10-26 06:32:21 +00:00
John Baldwin
40c6d2be16 Use msleep() to avoid lost wakeup's instead of doing an ineffective
splhigh() before the mtx_unlock and tsleep().  The splhigh() was probably
correct in the original code using simplelocks but is not correct in
5.0-current.

Noticed by:	Andrew Reiter <awr@FreeBSD.org>
2001-10-26 06:09:01 +00:00
Matthew Dillon
245df27cee Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a
real effect.

Optimize vfs_msync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  Improves looping case by 500%.

Optimize ffs_sync().  Avoid having to continually drop and re-obtain
mutexes when scanning the vnode list.  This makes a couple of assumptions,
which I believe are ok, in regards to vnode stability when the mount list
mutex is held.  Improves looping case by 500%.

(more optimization work is needed on top of these fixes)

MFC after:	1 week
2001-10-26 00:08:05 +00:00
Matthew Dillon
f92dcd3e4a Add missing TAILQ_INSERT_TAIL's which somehow didn't get comitted with
the recent vnode cleanup.
2001-10-25 23:13:56 +00:00
Matthew Dillon
f02098e59c In cluster_rbuild(), 'size' had better match buf->b_bcount and buf->b_bufsize
or the cluster will not be properly merged.  Dup the code from
cluster_wbuild() and add some printf()s to see if bad cases are present.

MFC after:	2 weeks
2001-10-25 22:49:48 +00:00
John Baldwin
5a08b84f83 Fix an inverted test csae. Success of getenv() is determined by a return
value of !NUL rather than NUL.

Submitted by:	luigi
Pointy hat to:	jhb
2001-10-25 17:22:31 +00:00
Jonathan Lemon
18bfd58110 cnclose() can potentially race against itself. To avoid vn_close() races,
NULL-out cnd_vp before calling the latter, as it may block.

Submitted by: dillon
2001-10-25 04:51:37 +00:00
Jonathan Lemon
7ce26133ea Force FWRITE on when opening the console, so that the flags passed to
vn_close match those from vn_open.  This fixes the panic some people
were seeing about "vrele: missed vn_close".
2001-10-25 00:14:16 +00:00
John Baldwin
882bcf5879 Document the requirements and nature of the logical CPU IDs. It isn't
very strict and leaves much up to the platform so that it can define a
convenient mapping.

Requested by:	mjacob
2001-10-24 22:15:38 +00:00
Matthew Dillon
a06fe5111e unwind v_writecount in fhopen() if we are unable to allocate the
descriptor.

MFC after:	3 days
2001-10-24 18:32:17 +00:00
John Baldwin
781a35df6b Fix this to actually compile in the !INVARIANTS case.
Reported by:	Maxime Henrion <mux@qualys.com>
2001-10-24 14:18:33 +00:00
Robert Drehmel
9a024fc559 Use vm_offset_t instead of caddr_t to fix a warning and remove
two casts.
2001-10-24 14:15:28 +00:00
Matthew Dillon
79deba82cd Fix ktrace enablement/disablement races that can result in a vnode
ref count panic.

Bug noticed by:	ps
Reviewed by:	ps
MFC after:	1 day
2001-10-24 01:05:39 +00:00
John Baldwin
4e5e677bc0 Change the sx(9) assertion API to use a sx_assert() function similar to
mtx_assert(9) rather than several SX_ASSERT_* macros.
2001-10-23 22:39:11 +00:00
John Baldwin
21cbf0cc8b - Change getenv_quad() to return an int instead of a quad_t since it
returns an success/failure code rather than the actual value.
- Add getenv_string() which copies a string from the environment to another
  string and returns true on success.
2001-10-23 22:34:36 +00:00
Jonathan Lemon
991f976036 Implement multiple low-level console support. 2001-10-23 20:25:50 +00:00
Robert Watson
fc2749a40c o vn_open() fails to call VOP_CLOSE() if vfs_object_create fails. Ideally
all successful calls to VOP_OPEN() might be reflected in a call to
  VOP_CLOSE().  For now, simply add a comment reflecting this problem;
  this should be fixed at some point.
2001-10-23 19:09:01 +00:00
John Baldwin
ac9a258074 Assert that Giant is not held in mi_switch() unless the process state
is SMTX or SRUN.
2001-10-23 17:52:49 +00:00
Matthew Dillon
4f467cb8c1 Fix incorrect double-termination of vm_object. When a vm_object is
terminated and flushes pending dirty pages it is possible for the
object to be ref'd (0->1) and then deref'd (1->0) during termination.
We do not terminate the object a second time.

Document vop_stdgetvobject() to explicitly allow it to be called without
the vnode interlock held (for upcoming sync_msync() and ffs_sync()
performance optimizations)

MFC after:	3 days
2001-10-23 01:23:41 +00:00
Matthew Dillon
c72ccd014d Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after:	3 days
2001-10-23 01:21:29 +00:00
Poul-Henning Kamp
5015bb7f85 disk_clone() was a bit too eager to please: "md0s1ec" is not a valid
device.

Noticed by:	Chad David <davidc@acns.ab.ca>
2001-10-22 10:18:45 +00:00
Dag-Erling Smørgrav
7c62990641 Move procfs_* from procfs_machdep.c into sys_process.c, and rename them to
proc_* in the process; procfs_machdep.c is no longer needed.

Run-tested on i386, build-tested on Alpha, untested on other platforms.
2001-10-21 23:57:24 +00:00
Dag-Erling Smørgrav
45fb069ac9 Convert textvp_fullpath() into the more generic vn_fullpath() which takes a
struct thread * and a struct vnode * instead of a struct proc *.

Temporarily add a textvp_fullpath macro for compatibility.
2001-10-21 15:52:51 +00:00
Matthew Dillon
5eb13f768c Documentation
MFC after:	1 day
2001-10-21 06:26:55 +00:00
Matthew Dillon
57601bcb5d Syntax cleanup and documentation, no operational changes.
MFC after:	1 day
2001-10-21 06:12:06 +00:00
Ian Dowse
72ec63a53d Introduce some jitter to the timing of the samples that determine
the system load average. Previously, the load average measurement
was susceptible to synchronisation with processes that run at
regular intervals such as the system bufdaemon process.

Each interval is now chosen at random within the range of 4 to 6
seconds. This large variation is chosen so that over the shorter
5-minute load average timescale there is a good dispersion of
samples across the 5-second sample period (the time to perform 60
5-second samples now has a standard deviation of approx 4.5 seconds).
2001-10-20 16:07:17 +00:00
Ian Dowse
0eb6ce3169 Move the code that computes the system load average from vm_meter.c
to kern_synch.c in preparation for adding some jitter to the
inter-sample time.

Note that the "vm.loadavg" sysctl still lives in vm_meter.c which
isn't the right place, but it is appropriate for the current (bad)
name of that sysctl.

Suggested by:	jhb (some time ago)
Reviewed by:	bde
2001-10-20 13:10:43 +00:00
John Baldwin
7ada587697 The mtx_init() and sx_init() functions bzero'd locks before handing them
off to witness_init() making the check for double intializating a lock by
testing the LO_INITIALIZED flag moot.  Workaround this by checking the
LO_INITIALIZED flag ourself before we bzero the lock structure.
2001-10-20 01:22:42 +00:00
Peter Wemm
259ed91740 Add a sysctl for preventing the sync() in panic() recovery. This can
be so dangerous it isn't funny.  eg: if you panic inside NFS or softdep,
and then try and sync you run into held locks and cause either deadlocks,
recursive panics or other interesting chaos.  Default is unchanged.
2001-10-19 23:32:03 +00:00
Jonathan Lemon
7e7c3f3f33 Add dev_named(dev, name), which is similar in spirit to devtoname().
This function returns success if the device is known by either 'name'
or any of its aliases.
2001-10-17 18:47:12 +00:00
Matthew Dillon
2210e5d9fa fix minor bug in kern.minvnodes sysctl. Use OID_AUTO. 2001-10-16 23:08:09 +00:00
Robert Watson
ab323a7d45 o Update init_sysent.c and friends for allocation of afs_syscall. 2001-10-13 13:30:21 +00:00
Robert Watson
b55abfd929 o Reserve system call 377 for afs_syscall; by reserving a system call
number, portable OpenAFS applications don't have to attempt to determine
  what system call number was dynamically allocated.  No system call
  prototype or implementation is defined.

Requested by:	Tom Maher <tardis@watson.org>
2001-10-13 13:19:34 +00:00
Poul-Henning Kamp
ce9d2b59b2 Regenerate syscall stuff.
Remove syscall-hide.h
2001-10-13 09:18:28 +00:00
Poul-Henning Kamp
5ab1bfacb1 Don't generate <sys/syscalls-hide.h> it has never had any users anywhere in
the source tree.
2001-10-13 09:17:49 +00:00
Peter Pentchev
88fbb423d4 Remove the panic when trying to register a sysctl with an oid too high.
This stops panics on unloading modules which define their own sysctl sets.

However, this also removes the protection against somebody actually
defining a static sysctl with an oid in the range of the dynamic ones,
which would break badly if there is already a dynamic sysctl with
the requested oid.

Apparently, the algorithm for removing sysctl sets needs a bit more work.
For the present, the panic I introduced only leads to Bad Things (tm).

Submitted by:	many users of -current :(
Pointy hat to:	roam (myself) for not testing rev. 1.112 enough.
2001-10-12 09:16:36 +00:00
John Baldwin
a2f2b3afcd - Catch up to the new ucred API.
- Add proc locking to the jail() syscall.  This mostly involved shuffling
  a few things around so that blockable things like malloc and copyin
  were performed before acquiring the lock and checking the existing
  ucred and then updating the ucred as one "atomic" change under the proc
  lock.
2001-10-11 23:39:43 +00:00
John Baldwin
bd78cece5d Change the kernel's ucred API as follows:
- crhold() returns a reference to the ucred whose refcount it bumps.
- crcopy() now simply copies the credentials from one credential to
  another and has no return value.
- a new crshared() primitive is added which returns true if a ucred's
  refcount is > 1 and false (0) otherwise.
2001-10-11 23:38:17 +00:00
John Baldwin
698166ca55 Whitespace fixes. 2001-10-11 22:49:27 +00:00
John Baldwin
6a90c862d3 Rework some code to be a bit simpler by inverting a few tests and using
else clauses instead of goto's.
2001-10-11 22:48:37 +00:00
John Baldwin
61d80e90a9 Add missing includes of sys/ktr.h. 2001-10-11 17:53:43 +00:00
John Baldwin
7106ca0d1a Add missing includes of sys/lock.h. 2001-10-11 17:52:20 +00:00
Michael Reifenberger
91a701cd13 Fix SysV Semaphore Handling.
Updated by peter following KSE and Giant pushdown.
I've running with this patch for two week with no ill side effects.

PR:		kern/12014: Fix SysV Semaphore handling
Submitted by:	Peter Jeremy <peter.jeremy@alcatel.com.au>
2001-10-11 08:15:14 +00:00
Paul Saab
cbc89bfbfe Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loader
tunable.

Reviewed by:	peter
MFC after:	2 weeks
2001-10-10 23:06:54 +00:00
John Baldwin
f21fc12736 Add a temporary hack that will go away with the ucred API update to bzero
the duplicated mutex before initializing it to avoid triggering the check
for init'ing an already initialized mutex.
2001-10-10 20:45:40 +00:00
John Baldwin
6a40eccec3 Malloc mutexes pre-zero'd as random garbage (including 0xdeadcode) my
trigget the check to make sure we don't initalize a mutex twice.
2001-10-10 20:43:50 +00:00
Doug Rabson
e913ca22e2 Move setregs() out from under the PROC_LOCK so that it can use functions
list suword() which may trap.
2001-10-10 20:04:57 +00:00
Robert Watson
8a7d8cc675 - Combine kern.ps_showallprocs and kern.ipc.showallsockets into
a single kern.security.seeotheruids_permitted, describes as:
  "Unprivileged processes may see subjects/objects with different real uid"
  NOTE: kern.ps_showallprocs exists in -STABLE, and therefore there is
  an API change.  kern.ipc.showallsockets does not.
- Check kern.security.seeotheruids_permitted in cr_cansee().
- Replace visibility calls to socheckuid() with cr_cansee() (retain
  the change to socheckuid() in ipfw, where it is used for rule-matching).
- Remove prison_unpcb() and make use of cr_cansee() against the UNIX
  domain socket credential instead of comparing root vnodes for the
  UDS and the process.  This allows multiple jails to share the same
  chroot() and not see each others UNIX domain sockets.
- Remove unused socheckproc().

Now that cr_cansee() is used universally for socket visibility, a variety
of policies are more consistently enforced, including uid-based
restrictions and jail-based restrictions.  This also better-supports
the introduction of additional MAC models.

Reviewed by:	ps, billf
Obtained from:	TrustedBSD Project
2001-10-09 21:40:30 +00:00
John Baldwin
8688bb9383 proces -> process in a comment. 2001-10-09 17:25:30 +00:00
Robert Watson
32d186043b o Recent addition of (p1==p2) exception in p_candebug() permitted
processes to attach debugging to themselves even though the
  global kern_unprivileged_procdebug_permitted policy might disallow
  this.
o Move the kern_unprivileged_procdebug_permitted check above the
  (p1==p2) check.

Reviewed by:	des
2001-10-09 16:56:29 +00:00
John Baldwin
74e4502e62 Replace 'curproc' with 'td->td_proc'. 2001-10-08 21:05:46 +00:00
Matthew Dillon
917efbaaba WS Cleanup 2001-10-08 19:51:13 +00:00
Dag-Erling Smørgrav
3da3249106 Dissociate ptrace from procfs.
Until now, the ptrace syscall was implemented as a wrapper that called
various functions in procfs depending on which ptrace operation was
requested.  Most of these functions were themselves wrappers around
procfs_{read,write}_{,db,fp}regs(), with only some extra error checks,
which weren't necessary in the ptrace case anyway.

This commit moves procfs_rwmem() from procfs_mem.c into sys_process.c
(renaming it to proc_rwmem() in the process), and implements ptrace()
directly in terms of procfs_{read,write}_{,db,fp}regs() instead of
having it fake up a struct uio and then call procfs_do{,db,fp}regs().

It also moves the prototypes for procfs_{read,write}_{,db,fp}regs()
and proc_rwmem() from proc.h to ptrace.h, and marks all procfs files
except procfs_machdep.c as "optional procfs" instead of "standard".
2001-10-07 20:08:42 +00:00
Dag-Erling Smørgrav
23fad5b6c9 Always succeed if the target process is the same as the requesting process. 2001-10-07 20:06:03 +00:00
Ian Dowse
80f42b555d Fix a typo in do_sigaction() where sa_sigaction and sa_handler were
confused. Since sa_sigaction and sa_handler alias each other in a
union, the bug was completely harmless. This had been fixed as part
of the SIGCHLD changes in revision 1.125, but it was reverted when
they were backed out in revision 1.126.
2001-10-07 16:11:37 +00:00
Robert Watson
c175d2226f o Introduce an 'options REGRESSION'-dependant sysctl namespaces,
'regression.*'.
o Add 'regression.securelevel_nonmonotonic', conditional on 'options
  REGRESSION', which allows the securelevel to be lowered for the purposes
  of efficient regression testing of securelevel policy decisions.
  Regression tests for securelevels will be committed shortly.

NOTE: 'options REGRESSION' should never be used on production machines, as
it permits violation of system invariants so as to improve the ability to
effectively test edge cases, and improve testing efficiency.
2001-10-07 03:51:22 +00:00
Marcel Moolenaar
49ead724c6 Fix breakage caused by previous commit. The lkmnosys and lkmressys
syscalls are of type NODEF but not in a way that fits the given
definition of that type. The exact difference of lkmressys and
lkmnosys is unclear, which makes it all the more confusing. A
reevaluation of what we have and what we really need is in order.

Spotted by: Maxime Henrion <mux@qualys.com>
Pointy hat: marcel
2001-10-07 00:16:31 +00:00
Matthew Dillon
845bd795c9 vinvalbuf() was only waiting for write-I/O to complete. It really has to
wait for both read AND write I/O to complete.  Only NFS calls vinvalbuf()
on an active vnode (when the server indicates that the file is stale), so
this bug fix only effects NFS clients.

MFC after:	3 days
2001-10-05 20:10:32 +00:00
John Baldwin
43150722c9 The aio kthreads start off with a root credential just like all other
kthreads, so don't malloc a ucred just so we can create a duplicate of the
one we already have.
2001-10-05 17:55:11 +00:00
Paul Saab
4787fd37af Only allow users to see their own socket connections if
kern.ipc.showallsockets is set to 0.

Submitted by:	billf (with modifications by me)
Inspired by:	Dave McKay (aka pm aka Packet Magnet)
Reviewed by:	peter
MFC after:	2 weeks
2001-10-05 07:06:32 +00:00
Dag-Erling Smørgrav
50f74e92b8 Final style(9) commit: placement of opening brace; a continuation indent I
missed in the previous commit; a line that exceeded 80 characters.  No
functional changes, but the object file's md5 checksum changes because some
lines have been displaced.
2001-10-04 16:35:44 +00:00
Dag-Erling Smørgrav
8a8d4e459c More style(9) fixes: no spaces between function name and parameter list;
some indentation fixes (particularly continuation lines).

Reviewed by:	md5(1)
2001-10-04 16:29:45 +00:00
Dag-Erling Smørgrav
c5799337ea This file had a mixture of "return foo;" and "return (foo);"; standardize
on "return (foo);" as mandated by style(9).

Reviewed by:	md5(1)
2001-10-04 16:09:22 +00:00
David Malone
2bc21ed985 Hopefully improve control message passing over Unix domain sockets.
1) Allow the sending of more than one control message at a time
over a unix domain socket. This should cover the PR 29499.

2) This requires that unp_{ex,in}ternalize and unp_scan understand
mbufs with more than one control message at a time.

3) Internalize and externalize used to work on the mbuf in-place.
This made life quite complicated and the code for sizeof(int) <
sizeof(file *) could end up doing the wrong thing. The patch always
create a new mbuf/cluster now. This resulted in the change of the
prototype for the domain externalise function.

4) You can now send SCM_TIMESTAMP messages.

5) Always use CMSG_DATA(cm) to determine the start where the data
in unp_{ex,in}ternalize. It was using ((struct cmsghdr *)cm + 1)
in some places, which gives the wrong alignment on the alpha.
(NetBSD made this fix some time ago).

This results in an ABI change for discriptor passing and creds
passing on the alpha. (Probably on the IA64 and Spare ports too).

6) Fix userland programs to use CMSG_* macros too.

7) Be more careful about freeing mbufs containing (file *)s.
This is made possible by the prototype change of externalise.

PR:		29499
MFC after:	6 weeks
2001-10-04 13:11:48 +00:00
David Malone
59bdd40568 Allow sbcreatecontrol to make cluster sized control messages. 2001-10-04 12:59:53 +00:00
John Baldwin
0479e3d339 Move the ap boot spin lock earlier in the lock order before the sio(4)
lock since we occasionally call printf() while holding the ap boot lock
which can call down into the sio(4) driver if using a serial console.
2001-10-01 22:50:30 +00:00
Robert Watson
c6ab2f6b4e o Complete the migration from suser error checking in the following form
in vfs_syscalls.c:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
        (error = suser_td(td)) != 0) {
            unwrap_lots_of_stuff();
            return (error);
    }

  to:

    if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
            error = suser_td(td);
            if (error) {
                unwrap_lots_of_stuff();
                return (error);
            }
    }

  This makes the code more readable when complex clauses are in use,
  and minimizes conflicts for large outstanding patchsets modifying the
  kernel authorization code (of which I have several), especially where
  existing authorization and context code are combined in the same if()
  conditional.

Obtained from:	TrustedBSD Project
2001-10-01 20:01:07 +00:00
Matthew Dillon
b5810bab2d After extensive testing it has been determined that adding complexity
to avoid removing higher level directory vnodes from the namecache has
no perceivable effect and will be removed.  This is especially true
when vmiodirenable is turned on, which it is by default now.  ( vmiodirenable
makes a huge difference in directory caching ).  The vfs.vmiodirenable and
vfs.nameileafonly sysctls have been left in to allow further testing, but
I expect to rip out vfs.nameileafonly soon too.

I have also determined through testing that the real problem with numvnodes
getting too large is due to the VM Page cache preventing the vnode from
being reclaimed.  The directory stuff made only a tiny dent relative
to Poul's original code, enough so that some tests succeeded.  But tests
with several million small files show that the bigger problem is the VM Page
cache.  This will have to be addressed by a future commit.

MFC after:	3 days
2001-10-01 04:33:35 +00:00
Jonathan Lemon
1a6fc8ef63 When FREE()ing kqueue related structures, charge them to the correct bucket.
Submitted by: iedowse
Forgotten by: jlemon
2001-09-30 17:00:56 +00:00
Bosko Milekic
70a61707f6 Re-enable mbtypes statistics in the mbuf allocator. I disabled these
when I changed the allocator bits. This implements per-CPU mbtypes
stats by keeping net number of decrements/increments of a given mbtype
per-CPU and then summing all of the per-CPU mbtypes to produce the total
net number of allocated mbufs of the given mbtype.
Counters are carefully balanced to avoid/prevent underflows/overflows.

mbtypes stats are re-enabled with the idea that we may occasionally
(although very rarely) observe slight inconsistencies in the stat
reporting. Most of the time, we should be fine, though.

Also make appropriate modifications to netstat(1) and systat(1) to do
the necessary reporting.

Submitted by: Jiangyi Liu <jyliu@163.net>
2001-09-30 01:58:39 +00:00