Commit Graph

6109 Commits

Author SHA1 Message Date
Jeffrey Hsu
67c0ddef59 Remove vestiges of no longer needed unp_rvnode field.
Approved by:	phk (who originally added it in rev 1.8 of unpcb.h)
2003-02-06 01:34:43 +00:00
Julian Elischer
822ded67fe The lockmanager has to keep track of locks per thread, not per process.
Submitted by:	david Xu (davidxu@)
Reviewed by:	jhb@
2003-02-05 19:36:58 +00:00
Dag-Erling Smørgrav
c524b1a8cf Correct grammatical error in previous commit. 2003-02-04 18:47:17 +00:00
Dag-Erling Smørgrav
91dd013b1e Extra precautions before trying to start init(8). 2003-02-04 18:16:50 +00:00
Poul-Henning Kamp
6334a66358 Implement proper bounds-checking and truncation of device names, this has
become an issue now that end-user controlable attributes can become devices
names with the geom_vol_ffs class.
2003-02-04 11:04:26 +00:00
Poul-Henning Kamp
237d2765f9 Pave the road to removing the fixed size limit on device nodes:
Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.

Initialize si_name to point to the private buffer.

Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.
2003-02-04 10:32:40 +00:00
Poul-Henning Kamp
8751a8c73b Add vsnrprintf() which is just like vsnprintf() but takes a "radix"
argument for the kernel-special %r format.
2003-02-04 10:00:34 +00:00
Poul-Henning Kamp
91f1c2b3cc Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by:    tjr
2003-02-03 19:49:35 +00:00
Jake Burkholder
238dd3209a Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway.  This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by:	jhb, tmm
Tested on:	i386, sparc64
2003-02-03 17:53:15 +00:00
Hajimu UMEMOTO
12e4397ea3 Break out the bind and connect syscalls to intend to make calling
these syscalls internally easy.
This is preparation for force coming IPv6 support for Linuxlator.

Submitted by:	dwmalone
MFC after:	10 days
2003-02-03 17:36:52 +00:00
Tim J. Robbins
b338d59fef No need to lock Giant around call to nanosleep1() in nanosleep(). 2003-02-03 15:31:57 +00:00
Tim J. Robbins
411c25edae Avoid holding Giant across copyout() in gettimeofday() and getitimer(). 2003-02-03 14:47:22 +00:00
Hartmut Brandt
1b978d453b Make the variable types, the sysctl macros and the sysctl handler for
kern.ipc.{maxsockbuf,sockbuf_waste_factor} to agree that those variables
are of type unsigned long.

PR:		sparc64/47389
Approved by:	jake (mentor)
2003-02-03 06:50:59 +00:00
Jeff Roberson
5d7ef00cfe - Make some context switches conditional on SCHED_STRICT_RESCHED. This may
have some negative effect on interactivity but it yields great perf. gains.
   This also brings the conditions under which ULE context switches inline
   with SCHED_4BSD.
 - Define some new kseq_* functions for manipulating the run queue.
 - Add a new kseq member ksq_rslices and ksq_bload.  rslices is the sum of
   the slices of runnable kses.  This will be used for push load balance
   decisions.  bload is the number of threads blocked waiting on IO.
2003-02-03 05:30:07 +00:00
Jeff Roberson
cd6e33df1c - Stop abusing oncpu for our cpu binding. Define a scheduler local element
in the kse datastructure called ke_cpu.  This is the cpu which we are
   currently bound to.  Some flags may be added later to support hard binding.
2003-02-03 02:26:28 +00:00
Alfred Perlstein
04738e99b5 Catch more uses of MIN(). 2003-02-02 13:30:00 +00:00
Alfred Perlstein
8deebb0160 Consolidate MIN/MAX macros into one place (param.h).
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
2003-02-02 13:17:30 +00:00
Scott Long
7121cce58a Use hz if stathz is zero. Adopted from sched_4bsd. 2003-02-02 08:24:32 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
Poul-Henning Kamp
4db4f5c87f Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have
M_ZERO specified with 0x70.  (malloc_flags=J for the kernel :-)
2003-02-01 10:07:49 +00:00
Poul-Henning Kamp
33bef83cc6 Under DIAGNOSTIC, only report expensive timeouts if they are more expensive
than the last on we reported.
2003-02-01 10:06:40 +00:00
Julian Elischer
ff92b12dce Only add one tick per tick to the thread stats, instead of some random number. 2003-01-31 22:14:46 +00:00
Robert Watson
565211b27f Correct handling of locking for chroot() and chdir() cases: rather
than having change_dir() release the vnode lock on success, hold the
lock so that we can use it later when invoking MAC checks and
VOP_ACCESS() in the chroot() code.  Update the comment to reflect
this calling convention.  Update callers to unlock the vnode
lock.  Correct a typo regarding vnode naming in the MAC case that
crept in via the previous patch applied.
2003-01-31 21:13:25 +00:00
Robert Watson
7278944df1 Clean up vnode handling on return from chroot() in certain error
cases: we might multiply vrele() a vnode when certain classes of
failures occur.  This appears to stem from earlier Giant/file
descriptor lock pushdown and restructuring.

Submitted by:	maxim
2003-01-31 18:57:04 +00:00
Tim J. Robbins
48ed1432c5 Use a local variable to store the number of ticks that elapsed in
kernel mode instead of (unintentionally) using the global `ticks'.
This error completely broke profiling.
2003-01-31 11:22:31 +00:00
Poul-Henning Kamp
6e1203e558 NO_GEOM cleanup: unifdef; 2003-01-30 19:22:27 +00:00
Poul-Henning Kamp
c5cab5b2fa NO_GEOM cleanup: retire to attic. 2003-01-30 12:58:55 +00:00
Poul-Henning Kamp
c9834aa961 NODEVFS cleanup: Unifdef. 2003-01-30 12:51:32 +00:00
Poul-Henning Kamp
8e67075792 NO_GEOM cleanup: remove #ifdef 2003-01-30 12:36:30 +00:00
Poul-Henning Kamp
4af0d0c21f NODEVFS cleanup: remove #ifdefs 2003-01-30 12:35:40 +00:00
Poul-Henning Kamp
f6a1852dcc NODEVFS cleanup: remove #ifdefs. 2003-01-30 12:35:17 +00:00
Poul-Henning Kamp
34189c035b NODEVFS cleanup: Remove cdevsw[].
This implicitly removes the need for major numbers, but a number of
drivers still know things they shouldn't need to, and we need to
consider if there are applications which cache major(+minor) gleaned
from stat(2) and rely on it being constant over reboots before we
start assigning random majors.
2003-01-29 21:54:03 +00:00
Tim J. Robbins
af7cbce89c Fix two fatal signedness errors introduced when i and j in semop()
were changed from int to size_t in the previous revision.

PR:		47625
2003-01-29 12:30:59 +00:00
Poul-Henning Kamp
60ca399653 Move timecounters notion of frequency to 64 bits.
[WARNING: CPUs in the distant future may be closer than they appear!]
2003-01-29 11:29:22 +00:00
Jeff Roberson
0a016a05a4 - Use ksq_load as the authoritive count of kses on the pair of kseqs for
sched_runnable() et all.
 - Remove some dead code in sched_clock().
 - Define two macros KSEQ_SELF() and KSEQ_CPU() for getting the kseq of the
   current cpu or some alternate cpu.
 - Start introducing kseq_() functions, such as kseq_choose() and kseq_setup().
2003-01-29 07:00:51 +00:00
Jeff Roberson
bf857e69a2 - Remove debugging code that didn't work on UP. 2003-01-29 00:26:47 +00:00
Jeff Roberson
d465fb9589 - Allow idle's pctcpu time to be calculated. 2003-01-28 09:30:17 +00:00
Jeff Roberson
c9f25d8f92 - Fix the ksq_load calculation. It now reflects the number of entries on the
run queue for each cpu.
 - Introduce kse stealing into the sched_choose() code.  This helps balance
   cpus better in cases where process turnover is high.  This implementation
   is fairly trivial and will likely be only a temporary measure until
   something more sophisticated has been written.
2003-01-28 09:28:20 +00:00
Peter Wemm
bf2053cad6 No longer force COMPAT_FREEBSD4 to be on. 2003-01-27 23:01:03 +00:00
Poul-Henning Kamp
109751d28c Don't dereference null vnode pointer if controling terminal was revoked.
Submitted by:	"Peter Edwards" <pmedwards@eircom.net>
2003-01-27 16:54:17 +00:00
David Xu
ba07d97e62 Use kg_numupcalls to see if we are closing a thread group,
not kg_kses which is not changed when a group is still working.
2003-01-26 23:39:33 +00:00
Alfred Perlstein
ca315837c7 fix warnings 2003-01-26 23:25:00 +00:00
Alfred Perlstein
b17c9cfa5e Add const qualifier to data argument for msgsnd.
PR: standards/45274
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-26 20:09:34 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Jeff Roberson
35e6168fcd - Add the ule scheduler. This is intended to be a general purpose process
scheduler with many SMP benefits.  It is still very experimental and should
   be used only in test environments.
2003-01-26 05:23:15 +00:00
Jeff Roberson
4e997f4b87 - Call sched_sleep() instead of rolling our own in cv_waitq_add(). 2003-01-26 04:00:39 +00:00
Alfred Perlstein
e1d7d0bb60 Bring shm functions closer the the opengroup standards.
PR: 47469
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-25 21:33:05 +00:00
Alfred Perlstein
3beb32709d Bring semop() closer the the opengroup standards.
PR: 47471
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-25 21:27:37 +00:00
Poul-Henning Kamp
4394f4767d Add sysctl kern.timecounter.nsetclock which indicates the number of
potential discontinuities in our UTC timescale.

Applications can monitor this variable if they want to be informed
about steps in the timescale.  Slews (ntp and adjtime(2)) and
frequency adjustments (ntp) will not increment this counter, only
operations which set the clock.  No attempt is made to classify
size or direction of the step.
2003-01-25 07:51:09 +00:00
Jeffrey Hsu
afb0573a12 Remove extraneous FILEDESC_LOCKs around atomic reads.
Reviewed by:	jhb
2003-01-24 22:49:52 +00:00
Hajimu UMEMOTO
bdc5f6a345 Added comment why this workaround is required.
Suggested by:	sam
MFC after:	1 week
2003-01-22 18:03:06 +00:00
Hajimu UMEMOTO
56b3905f15 getpeername() returns with no error but didn't fill struct sockaddr
correctly against PF_LOCAL.  It seems that the test always fails then
sockaddr was not filled.  So, I added else clause for workaround.
I doubt if it is right fix.  However, it is better than nothing.  I
found that NetBSD has same potential problem.  But, fortunately,
NetBSD has equivalent else clause.

MFC after:	1 week
2003-01-22 13:13:13 +00:00
Dag-Erling Smørgrav
ecf031c9ad There's absolutely no need for a struct-within-a-struct, so move the
counters out of the inner struct and remove it.
2003-01-21 20:33:27 +00:00
Jeffrey Hsu
a448a15bc1 Add missing SMP file locks around read-modify-write operations on
the flag field.

Reviewed by:	rwatson
2003-01-21 20:20:48 +00:00
Thomas Moestl
72aeb19aba Correct an off-by-one in the boundary check. Otherwise, resource
allocations would fail if the desired allocation size was equal to
the boundary.
2003-01-21 17:02:21 +00:00
Poul-Henning Kamp
a63935c3f6 #ifdef NO_GEOM all of this file. 2003-01-21 10:40:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Sam Leffler
07ff231fcb preserve the order of tags copied by m_tag_copy_chain
Obtained from:	OpenBSD
2003-01-21 06:14:38 +00:00
Jeffrey Hsu
34c54d9f74 Rewrite the SMP filedesc locking in knote_attach() in order to
1.  eliminate unnecessary loop which frees and re-allocates
	the just allocated array
  2.  eliminate the newsize recomputation
  3.  eliminate unnecessary unlock and relock around free
  4.  correctly match the free with the malloc into M_KQUEUE instead of M_TEMP
  5.  eliminate conditional assignment of oldlist, which is equivalent to a
	simple assignment
  6.  eliminate the oldlist temporary variable completely

Reviewed by:    jhb
2003-01-21 04:05:49 +00:00
Robert Watson
ec35c2af68 Perform VOP_GETATTR() before mac_check_vnode_exec() so that
the cached attributes are available to MAC modules.

Submitted by:   mike halderman <mrh@nosc.mil>
Obtained from:	TrustedBSD Project
2003-01-21 03:26:28 +00:00
Jake Burkholder
7251b4bf93 Resolve relative relocations in klds before trying to parse the module's
metadata.  This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by:	Hartmut Brandt <brandt@fokus.gmd.de>
PR:		46732
Tested on:	alpha (obrien), i386, sparc64
2003-01-21 02:42:44 +00:00
Matthew Dillon
2d5c7e4506 Close the remaining user address mapping races for physical
I/O, CAM, and AIO.  Still TODO: streamline useracc() checks.

Reviewed by:	alc, tegge
MFC after:	7 days
2003-01-20 17:46:48 +00:00
Poul-Henning Kamp
c0805171aa disk_dev_synth() is a NO_GEOM hack. 2003-01-20 11:29:07 +00:00
Poul-Henning Kamp
0b4583e873 Only include <sys/diskslice.h> ifdef NO_GEOM 2003-01-20 11:28:37 +00:00
Alan Cox
28ec30cd9f - Hold the page queues lock around vm_page_hold().
- Assert that the page queues lock rather than Giant is held in
   vm_page_hold().
2003-01-20 09:24:03 +00:00
Julian Elischer
67f7c1bbe1 Remove a KASSERT that can now happen and add a missing setrunnable. 2003-01-20 03:41:04 +00:00
Poul-Henning Kamp
8a5c54f72d #ifdef NO_GEOM these files entirely. When NO_GEOM is removed as an
option the files can be removed.
2003-01-19 11:51:35 +00:00
Tim J. Robbins
5cb6b2cada Remove unnecessary locking of Giant around nanotime() in clock_gettime(). 2003-01-19 11:28:22 +00:00
Poul-Henning Kamp
5ecd6fd411 Mark more code #ifdef NODEVFS 2003-01-19 11:26:13 +00:00
Poul-Henning Kamp
7e760e148a Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.
2003-01-19 11:03:07 +00:00
Poul-Henning Kamp
ec2c4225ce When we use DEVFS, we don't need the /dev/tty pseudo-driver to do
more than return ENXIO from its open routine, so most of this file
is unneeded.

A straight #ifdef'ing would look quite messy, and make the file
quite unreadable, so instead I have simply added the DEVFS version
of the file at the top, protected by #ifndef NODEVFS.

Once we have removed NODEVFS option, we can retain 86 the 86 lines at
the top and drop the other 287 lines.
2003-01-19 10:23:47 +00:00
Alfred Perlstein
31f3e2ad8e useracc() is mpsafe so we only need to hold Giant
over the call to nanosleep1()

Pointed out by: tjr
2003-01-19 06:51:10 +00:00
Warner Losh
b47d073500 Fix comment about what we do when there are no listeners. 2003-01-19 00:34:17 +00:00
Poul-Henning Kamp
ffffe9203f Move alpha_fix_srm_checksum() from subr_diskmbr.c to subr_disklabel.c 2003-01-17 19:37:55 +00:00
Poul-Henning Kamp
40f683a443 Remove the unused DSO_* options. 2003-01-17 19:36:14 +00:00
Thomas Moestl
6f7cab9301 Disallow listen() on sockets which are in the SS_ISCONNECTED or
SS_ISCONNECTING state, returning EINVAL (which is what POSIX mandates
in this case).
listen() on connected or connecting sockets would cause them to enter
a bad state; in the TCP case, this could cause sockets to go
catatonic or panics, depending on how the socket was connected.

Reviewed by:	-net
MFC after:	2 weeks
2003-01-17 19:20:00 +00:00
Poul-Henning Kamp
e948321c7a Move dkmodpart() from subr_diskslice.c to subr_disklabel.c. 2003-01-17 19:05:58 +00:00
Poul-Henning Kamp
ce9fac0072 Move a local variable to avoid the compiler warning about it being unused. 2003-01-16 20:06:45 +00:00
John Hay
b1e7e2019e hardpps() wants the raw hardware counter value converted to nanoseconds. 2003-01-16 19:22:13 +00:00
Alan Cox
6eb07b4ac2 Fix two long-standing, but likely harmless, errors in the use of
vm_pageout_deficit:
1. Update vm_pageout_deficit before VM_WAIT.  There is no sense in
   delaying the update; the sooner the pageout daemon receives this
   information the better.  Reviewed by: tegge
2. Update vm_pageout_deficit according to the number of pages still
   needed to complete the allocation, not the original size of the
   allocation.  Submitted by: tegge

(These errors have existed since the introduction of vm_pageout_deficit
in revision 1.144.)
2003-01-16 08:14:56 +00:00
Matthew Dillon
f597900329 Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy.  Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).
2003-01-15 23:54:35 +00:00
David Xu
4e77d3c6a2 Don't forget to disconnect object from class. 2003-01-15 14:58:07 +00:00
Matthew Dillon
fe41ca530c Introduce the ability to flag a sysctl for operation at secure level 2 or 3
in addition to secure level 1.  The mask supports up to a secure level of 8
but only add defines through CTLFLAG_SECURE3 for now.

As per the missif in the log entry for 1.11 of ip_fw2.c which added the
secure flag to the IPFW sysctl's in the first place, change the secure
level requirement from 1 to 3 now that we have support for it.

Reviewed by:	imp
With Design Suggestions by:	imp
2003-01-14 19:35:33 +00:00
Alan Cox
b0ef8c5fe4 - Update vm_pageout_deficit using atomic operations. It's a simple
counter outside the scope of existing locks.
 - Eliminate a redundant clearing of vm_pageout_deficit.
2003-01-14 06:57:03 +00:00
Matthew Dillon
3db161e079 It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped.  Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree().  This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after:	7 days
2003-01-13 23:04:32 +00:00
Alfred Perlstein
ac41f2ef0b style(9) fixes, mostly add parens around return arguments. 2003-01-13 15:06:05 +00:00
Jeff Roberson
8fb913face - Unbreak world. I did not notice that libkvm was still used in some places
to access the pctcpu.  This will have to be sorted out more later as the
   new scheduler requires a procedural interface for this data.  A more
   complete solution will follow.
2003-01-13 03:42:41 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Jeff Roberson
bcb06d5980 - Move ke_pctcpu and ke_cpticks into the scheduler specific datastructure.
This will prevent access through mechanisms other than the published
   interfaces.
2003-01-12 19:04:49 +00:00
Tim J. Robbins
ae3b195fcf Allowing nent < 0 in aio_suspend() and lio_listio() is just asking for
trouble. Return EINVAL instead.
2003-01-12 09:40:23 +00:00
Tim J. Robbins
44a2c818de Remove "XXX undocumented" comment from lio_listio(). 2003-01-12 09:33:16 +00:00
Alan Cox
8febaa4df0 vm_hold_load_pages() needn't clear PG_ZERO because it didn't pass
VM_ALLOC_ZERO to vm_page_alloc(). (PG_ZERO is clear by default.)
2003-01-12 06:30:15 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Maxime Henrion
5d9155dfee Fix kernel build.
Pointy hats to:	dillon, Hiten Pandya <hiten@unixdaemons.com>
2003-01-11 12:39:45 +00:00
Tim J. Robbins
b3f1af6b8e Don't count mbufs with m_type == MT_HEADER or MT_OOBDATA as control data
in sballoc(), sbcompress(), sbdrop() and sbfree(). Fixes fstat() st_size
reporting and kevent() EVFILT_READ on TCP sockets.
2003-01-11 07:51:52 +00:00
Matthew Dillon
57e6d29b1e Remove all use of the LOG2() macro/inline, undoing some non-optimal cruft
that crept in recently.  GCC will optimize the divides and multiplies for us.

Submitted by:	David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after:	1 day
2003-01-11 01:09:51 +00:00
Alfred Perlstein
b3890a1c42 make sem_leave return a usable errno instead of -1.
make ksem_close return that usable errno instead of -1 (ERESTART).

PR: 46957
2003-01-10 23:13:16 +00:00
David Xu
7be6584678 Don't record thread pointer, it's not permanent in process life cycle,
use process pointer instead.
2003-01-10 09:54:51 +00:00
Jake Burkholder
be0800e85e Oops, add zstty to the witness order list.
Noticed by:	benno
2003-01-09 15:45:28 +00:00
David Xu
b47679ccff Some KSE syscalls are MPSAFE. 2003-01-08 04:57:53 +00:00
Peter Wemm
2488f30980 Move the MOD_SHUTDOWN event from shutdown_post_sync to shutdown_final,
so that entities that want to use the post_sync hook to write stuff
to devices and other tidy-up can do so before the device tree is
shot down.  eg: da doing a SYNC_CACHE etc.  This should get crashdumps
working on mpt devices again, and stops the ia64 boxes locking up
on regular shutdown when da tries to issue the scsi commands to mpt.

Obtained from:  njl, gibbs
2003-01-07 22:24:13 +00:00
Brian Feldman
c579babe54 In vn_open(), unset ndp->ni_vp when returning failure so that code
which expects it to be NULL unless the return value was 0 will work.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-01-07 20:59:55 +00:00
Alfred Perlstein
a11acc6f8a Use copyout to access user memory.
Submittted by: pho
MFC After: 2 days
2003-01-07 20:10:04 +00:00
Alan Cox
1f17965656 Make bogus_offset local to bufinit(). 2003-01-07 19:55:08 +00:00
Poul-Henning Kamp
c49bddce94 Fix warnings & errors caused by my last commit. 2003-01-07 19:09:10 +00:00
John Baldwin
377a66bc40 Cast the integer read as the first argument for %b to an unsigned integer
so it's value is not sign extended when assigned to the uintmax_t variable
used internally by printf.  For example, if bit 31 is set in the cpuid
feature word, then %b would print out the initial value as a 16 character
hexadecimal value.  Now it only prints out an 8 character value.

Reviewed by:	bde
2003-01-07 18:17:18 +00:00
David Xu
45f603e21c Clear some KSE fields after kse mode was turned off. 2003-01-07 06:56:43 +00:00
David Xu
b81c4d1e8c Forgot to call setrunnable() for un-idled thread. 2003-01-07 06:04:33 +00:00
David Xu
ea5ab16eba Check signals for idled threads. 2003-01-07 05:56:38 +00:00
Jacques Vidrine
f0c093284d Correct file descriptor leaks in lseek and do_dup.
The leak in lseek was introduced in vfs_syscalls.c revision 1.218.
The leak in do_dup was introduced in kern_descrip.c revision 1.158.

Submitted by:	iedowse
2003-01-06 13:19:05 +00:00
Poul-Henning Kamp
578c478621 This is all "#if defined(__i386__) && __GNUC__ >= 2":
Add support for GCC's --test-coverage --profile-arcs options.

Add code to call the functions listed in the .ctors section, these are
used to string the per .o file counter blocks into a linked list.

Add empty __bb_fork_func() to cope with GCC magic gandling of exec*()
named functions.

To add support for other platforms should be trivial, but involves
determining the exact data-types gcc uses on that platform.
2003-01-06 07:40:49 +00:00
Peter Wemm
ff29255673 Explicitly have the timecounter init happen after the cpu_initclocks is
called.  Otherwise (depending on a non-deterministic sort), the timecounter
code can be initialized before the clock rate has been set (on ia64) and it
assumes hz = 100, rather than the real value of 1024.  I'm not sure how much
gets upset by this.

Glanced at by:	phk
2003-01-06 01:01:08 +00:00
Poul-Henning Kamp
ea4804130a Fix cut&paste bug which would result in a panic because buffer was
being biodone'ed multiple times.
2003-01-05 22:01:08 +00:00
Alan Cox
9ce904432a Allocate bogus_page with VM_ALLOC_WIRED. (Previously, bogus_page's
allocation incremented the global count of wired pages, but not the
page's own wire count.  This inconsistency was introduced in
revision 1.230.)
2003-01-05 18:46:13 +00:00
Alfred Perlstein
a09de2f7cd In sodealloc(), if there is an accept filter present on the socket
then call do_setopt_accept_filter(so, NULL) which will free the filter
instead of duplicating the code in do_setopt_accept_filter().

Pointed out by: Hiten Pandya <hiten@angelica.unixdaemons.com>
2003-01-05 11:14:04 +00:00
Jake Burkholder
e548a1d4c8 - Provide backwards compatibility for kern.fallback_elf_brand.
- Use the generic elf type macros in imgact_elf.h instead of ifdefing the
  entire contents of the header.
2003-01-05 03:48:14 +00:00
Poul-Henning Kamp
f5b11b6e2d Temporarily introduce a new VOP_SPECSTRATEGY operation while I try
to sort out disk-io from file-io in the vm/buffer/filesystem space.

The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes.  For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.

Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY.  First time it is called it will print debugging
information.  This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.

Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.

Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened.  Handle
the request just like we always did, but first time called print
debugging information.

Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.

If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org
2003-01-04 22:10:36 +00:00
Jake Burkholder
a360a43dd5 Improve the way that an elf image activator for an alternate word size is
included in the kernel.  Include imgact_elf.c in conf/files,  instead of
both imgact_elf32.c and imgact_elf64.c, which will use the default word
size for an architecture as defined in machine/elf.h.  Architectures that
wish to build an additional image activator for an alternate word size can
include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which
allows it to be dependent on MD options instead of solely on architecture.

Glanced at by:	peter
2003-01-04 22:07:48 +00:00
Poul-Henning Kamp
3c3871e5e6 Introduce the
void backtrace(void);
function which will print a backtrace if DDB is in the kernel and an
explanation if not.

This is useful for recording backtraces in non-fatal circumstances and
does not require pollution with DDB #includes in the files where it
is used.

It would of course be nice to have a non-DDB dependent version too,
but since the meat of a backtrace is MD it is probably not worth it.
2003-01-04 20:54:58 +00:00
Poul-Henning Kamp
c7fb6fd1b8 resort the vnode ops list. 2003-01-04 20:31:27 +00:00
Poul-Henning Kamp
3ae5950529 Move #include of ddb/ddb.h up with the rest. 2003-01-04 20:15:32 +00:00
Poul-Henning Kamp
b3ed130c42 Export tc_tick with sysctl, not tick.
Spotted by:	bde
2003-01-04 17:33:55 +00:00
Jeffrey Hsu
98ab1489e4 Remove unnecessary lock assertion. 2003-01-04 11:45:50 +00:00
David Xu
cac3fba0ce Some KSE syscalls are MPSAFE. 2003-01-04 11:41:12 +00:00
Poul-Henning Kamp
7b330b22b6 Don't call VOP_BMAP on VCHR vnodes when the logical and physical block
numbers are identical: it cannot even hope to accomplish anything.
2003-01-04 09:37:42 +00:00
Jake Burkholder
5dadd17b08 Add a sysctl to get the vm protections for the stack of the current process.
On architectures with a non-executable stack, eg sparc64, this is used by
libgcc to determine at runtime if its necessary to enable execute permissions
on a region of the stack which will be used to execute code, allowing the
call to mprotect to be avoided if the kernel is configured to map the stack
executable.
2003-01-04 07:54:23 +00:00
David Xu
450c38d016 Set kse mailbox pointer to NULL when P_KSES is turned off. 2003-01-04 05:59:25 +00:00
Julian Elischer
a98c9b8604 White space fixes 2003-01-03 20:55:52 +00:00
Julian Elischer
03ea472080 Make an explicit flag to indicate that a KSE has a reason to upcall,
and use that flag when there is a kse_wakeup() call. It will probably
be used with signal delivery as well eventually.

Submitted by:	davidxu@
2003-01-03 20:41:49 +00:00
Julian Elischer
3f5f24287f Don't need to set retvals to 0 in the non error case. They
are set to a good default anyhow.

Submitted by: davidxu@
2003-01-03 19:38:54 +00:00
Poul-Henning Kamp
862702306b Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since
all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
2003-01-03 06:32:15 +00:00
Poul-Henning Kamp
e2a3ea1c45 Remove unused second argument from DEV_STRATEGY(). 2003-01-03 05:57:35 +00:00
Andrew Gallatin
1f88bad30a o Introduce a new external mbuf type, EXT_EXTREF.
o Allow callers of m_extadd() to allocate their own reference
m_ext.ref_cnt pointer, rather than having the mbuf system allocate it
with a malloc() in the critical path.  This speeds m_extadd() up, and
also simplifies locking (malloc() may need Giant).

A driver or subsystem wishing to take use its own ref counter must
initialize m_ext.ref_cnt to point to its ref counter prior to
calling m_extadd(), and it must use EXT_EXTREF as its external type.

Eg:
	 m->m_ext.ref_cnt =  my_ref_cnt_ptr;
	 m_extadd(.....,EXT_EXTREF);

Reviewed by: bosko
2003-01-02 21:16:50 +00:00
Alan Cox
49bf855d20 Lock the vm object when performing back-to-back vm_object_clear_flag() and
vm_object_set_flag().
2003-01-02 18:32:13 +00:00
David Xu
42f67bd752 Adjust code for Julian's last commit. use td_mailbox to detect if
a syscall is from UTS kernel.
2003-01-02 02:48:03 +00:00
Jens Schweikhardt
9d5abbddbf Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
2003-01-01 18:49:04 +00:00
Warner Losh
62c8b32c71 Use 0600 for permissions for /dev/devctl until it is cloneable.
Use UID_ROOT and GID_WHEEL rather than 0.

Prompted by: rwatson
2003-01-01 03:43:58 +00:00
Alfred Perlstein
13438f6823 When compiling the kernel do not implicitly include filedesc.h from proc.h,
this was causing filedesc work to be very painful.
In order to make this work split out sigio definitions to thier own header
(sigio.h) which is included from proc.h for the time being.
2003-01-01 01:56:19 +00:00
Alfred Perlstein
c522c1bf4b fdcopy() only needs a filedesc pointer. 2003-01-01 01:19:31 +00:00
Alfred Perlstein
03282e6e3d purge 'register'. 2003-01-01 01:05:54 +00:00
Alfred Perlstein
c7f1c11b20 Since fdshare() and fdinit() only operate on filedescs, make them
take pointers to filedesc structures instead of threads.  This makes
it more clear that they do not do any voodoo with the thread/proc
or anything other than the filedesc passed in or returned.

Remove some XXX KSE's as this resolves the issue.
2003-01-01 01:01:14 +00:00
Alfred Perlstein
59c97598d3 fdinit() does not need to lock the filedesc it is creating as no one
besideds itself has access until the function returns.
2003-01-01 00:35:46 +00:00
Sam Leffler
addea9d4d7 o reduce the overhead of calling ppsratecheck by using ticks instead of
calling getmicrouptime (but maintain the struct timeval-based calling
  convention for compatibility)
o eliminate the use of timersub in ratecheck

Note that flood ping tests indicate ppsratecheck is inaccurate (but on the
conservative side) with this revised implementation.  If more accuracy is
needed we'll have to introduce an alternate interface or increase the
overhead.

Reviewed by:	silby, dillon, bde
2002-12-31 18:22:12 +00:00
Jens Schweikhardt
d64ada501a Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
2002-12-30 21:18:15 +00:00
Sam Leffler
9967cafc49 Correct mbuf packet header propagation. Previously, packet headers
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a  "copy" operation.  This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain.  This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.

These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block.  This
introduces an incompatibility with openbsd which we may want to revisit.

Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them.  We may want to support this;
for now we watch for it with an assert.

Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.

Supported by:	Vernier Networks
Reviewed by:	Robert Watson <rwatson@FreeBSD.org>
2002-12-30 20:22:40 +00:00
Robert Watson
3c67c23bcf Implement new ACL system calls which do not follow symbolic links:
__acl_get_link(), __acl_set_link(), acl_delete_link(), and
__acl_aclcheck_link(), with almost identical implementations to
the existing __acl_*_file() variants on these calls.  Update
copyright.

Obtained from:	TrustedBSD Project
2002-12-29 20:28:44 +00:00
Robert Watson
6f123c35a0 Regen from syscalls.master:1.139 2002-12-29 20:26:41 +00:00
Robert Watson
b1f4acd8ac Add definitions for four new system calls:
__acl_get_link()	Retrieve an ACL by name without following
			symbolic links.
__acl_set_link()	Set an ACL by name without following
			symbolic links.
__acl_delete_link()	Delete an ACL by name without following
			symbolic links.
__acl_aclcheck_link()	Check an ACL against a file by name without
			following symbolic links.

These calls are similar in spirit to lstat(), lchown(), lchmod(), etc,
and will be used under similar circumstances.

Obtained from:	TrustedBSD Project
2002-12-29 20:25:54 +00:00
Ian Dowse
6a1b2a22ef Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVE
call is in progress on the vnode. When vput() or vrele() sees a
1->0 reference count transition, it now return without any further
action if this flag is set. This flag is necessary to avoid recursion
into VOP_INACTIVE if the filesystem inactive routine causes the
reference count to increase and then drop back to zero. It is also
used to guarantee that an unlocked vnode will not be recycled while
blocked in VOP_INACTIVE().

There are at least two cases where the recursion can occur: one is
that the softupdates code called by ufs_inactive() via ffs_truncate()
can call vput() on the vnode. This has been reported by many people
as "lockmgr: draining against myself" panics. The other case is
that nfs_inactive() can call vget() and then vrele() on the vnode
to clean up a sillyrename file.

Reviewed by:	mckusick (an older version of the patch)
2002-12-29 18:30:49 +00:00
Poul-Henning Kamp
371400cf2e Use a timeout of one second while we wait for the vnode washer,
this prevents a potential race and makes the system a little bit
less jerky under extreme loads.
2002-12-29 11:18:25 +00:00
Poul-Henning Kamp
851a87ea1a Vnodes pull in 800-900 bytes these days, all things counted, so we need
to treat desiredvnodes much more like a limit than as a vague concept.

On a 2GB RAM machine where desired vnodes is 130k, we run out of
kmem_map space when we hit about 190k vnodes.

If we wake up the vnode washer in getnewvnode(), sleep until it is done,
so that it has a chance to offer us a washed vnode.  If we don't sleep
here we'll just race ahead and allocate yet a vnode which will never
get freed.

In the vnodewasher, instead of doing 10 vnodes per mountpoint per
rotation, do 10% of the vnodes distributed evenly across the
mountpoints.
2002-12-29 10:39:05 +00:00
Alan Cox
a28cc55e5b Reduce the number of times that we acquire and release the page queues
lock by making vm_page_rename()'s caller, rather than vm_page_rename(),
responsible for acquiring it.
2002-12-29 07:17:06 +00:00
Jake Burkholder
24fbeaf9c3 Don't put a newline in KTR traces. 2002-12-28 23:22:22 +00:00
Jake Burkholder
dcc4093c7a Add a tunable kern.smp.disabled for disabling explicitly smp on an smp
kernel.
2002-12-28 23:21:13 +00:00
Poul-Henning Kamp
9f16282798 KASSERT that vop_revoke() gets a VCHR. 2002-12-28 22:27:14 +00:00
Poul-Henning Kamp
f53c6e5c9a Remove unused cdevsw_ALLOCSTART macro. 2002-12-28 21:47:43 +00:00
Poul-Henning Kamp
7068a01c6f Remove cdevsw_add calls, they are deprecated. 2002-12-28 21:39:46 +00:00
Matthew Dillon
45587e2514 Abstract-out the constants for the sequential heuristic.
No operational changes.

MFC after:	1 day
2002-12-28 20:28:10 +00:00
Julian Elischer
93a7aa79d6 Add code to ddb to allow backtracing an arbitrary thread.
(show thread {address})

Remove the IDLE kse state and replace it with a change in
the way threads sahre KSEs. Every KSE now has a thread, which is
considered its "owner" however a KSE may also be lent to other
threads in the same group to allow completion of in-kernel work.
n this case the owner remains the same and the KSE will revert to the
owner when the other work has been completed.

All creations of upcalls etc. is now done from
kse_reassign() which in turn is called from mi_switch or
thread_exit(). This means that special code can be removed from
msleep() and cv_wait().

kse_release() does not leave a KSE with no thread any more but
converts the existing thread into teh KSE's owner, and sets it up
for doing an upcall. It is just inhibitted from being scheduled until
there is some reason to do an upcall.

Remove all trace of the kse_idle queue since it is no-longer needed.
"Idle" KSEs are now on the loanable queue.
2002-12-28 01:23:07 +00:00
Robert Watson
f0bc12ee8d Improve consistency between devfs and MAKEDEV: use UID_ROOT and
GID_WHEEL instead of UID_BIN and GID_BIN for /dev/fd/* entries.

Submitted by:	kris
2002-12-27 16:54:44 +00:00
Alfred Perlstein
5590e7fdf0 Lock filedesc while performing a range check on the file descriptor.
Reviewed by: alc
2002-12-27 08:39:42 +00:00
Alan Cox
d746789347 Hold the page queues lock when calling vm_page_flag_clear(). 2002-12-27 06:52:32 +00:00
Jeffrey Hsu
6f782c4636 Ensure that the made-up inode number for a Unix domain socket is persistent. 2002-12-25 07:59:39 +00:00
Robert Watson
79191eca57 Flush vop_refreshlabel() definition, since it is no longer used.
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-12-24 19:47:13 +00:00
Poul-Henning Kamp
a7010ee2f4 White-space changes. 2002-12-24 09:44:51 +00:00
Jeffrey Hsu
956b0b653c SMP locking for radix nodes. 2002-12-24 03:03:39 +00:00
Poul-Henning Kamp
08c7670a8b Move the declaration of the socket fileops from socketvar.h to file.h.
This allows us to use the new typedefs and removes the needs for a number
of forward struct declarations in socketvar.h
2002-12-23 22:46:47 +00:00
Poul-Henning Kamp
f3a682116c Detediousficate declaration of fileops array members by introducing
typedefs for them.
2002-12-23 21:53:20 +00:00
Poul-Henning Kamp
6ce9c72c30 s/sokqfilter/soo_kqfilter/ for consistency with the naming of all
other socket/file operations.
2002-12-23 21:37:28 +00:00
Alan Cox
0cb6c00463 - Hold the kernel_object's lock around vm_page_alloc(kernel_object,...).
- Hold the page queues lock around vm_page_wakeup().
2002-12-23 20:10:47 +00:00
Jake Burkholder
c3c2862df4 - Add a spin lock to single thread cache invalidation and tlb flush ipis,
which allows ipis to be sent outside of Giant.
- Remove the ap boot mutex, which is unused.
2002-12-22 20:50:23 +00:00
Kris Kennaway
4ef3d7a27b Enforce correct ordering of the filedesc structure and pipe mutex, because
WITNESS can get the order wrong if it guesses based on first use.

Reviewed by:	jhb, alfred
2002-12-22 16:32:34 +00:00
Jeffrey Hsu
b30a244c34 SMP locking for ifnet list. 2002-12-22 05:35:03 +00:00
Marcel Moolenaar
551d79e177 Fix multiple registration of the elf_legacy_coredump sysctl variable.
The duplication is caused by the fact that imgact_elf.c is included
by both imgact_elf32.c and imgact_elf64.c and both are compiled by
default on ia64. Consequently, we have two seperate copies of the
elf_legacy_coredump variable due to them being declared static, and
two entries for the same sysctl in the linker set, both referencing
the unique copy of the elf_legacy_coredump variable. Since the second
sysctl cannot be registered, one of the elf_legacy_coredump variables
can not be tuned (if ordering still holds, it's the ELF64 related one).

The only solution is to create two different sysctl variables, just
like the elf<32|64>_trace sysctl variables. This unfortunately is an
(user) interface change, but unavoidable. Thus, on ELF32 platforms
the sysctl variable is called elf32_legacy_coredump and on ELF64
platforms it is called elf64_legacy_coredump. Platforms that have
both ELF formats have both sysctl variables.

These variables should probably be retired sooner rather than later.
2002-12-21 01:15:39 +00:00
Sam Leffler
91974ce10b add generic rate limiting support from netbsd; ratelimit is purely time based,
ppsratecheck is for controlling packets/second

Obtained from:	netbsd
2002-12-20 23:54:47 +00:00
Alan Cox
2952e1fb58 Extend the scope of the page queues lock in vm_pgmoveco(). 2002-12-20 21:18:29 +00:00
Maxime Henrion
894db7b01f Don't forget to destroy the mutex if an error occurs
in the jail() system call.

Submitted by:	Pawel Jakub Dawidek <nick@garage.freebsd.pl>
2002-12-20 14:32:20 +00:00
Alan Cox
ee113343eb Hold the page queues lock when performing vm_page_busy(). 2002-12-18 20:16:22 +00:00
Poul-Henning Kamp
4d99ef8d55 Indent properly. 2002-12-17 19:31:26 +00:00
Poul-Henning Kamp
126c7e29fe Remove unused variable cn_devfsdev. 2002-12-17 19:30:50 +00:00
Poul-Henning Kamp
d321df47c3 Don't cast a pointer to (intptr_t) and then on to (int) when we cannot
be sure that (int) is large enough.  Instead cast only to (intptr_t) and
cast the switch/case values to (intptr_t) as well.
2002-12-17 19:13:03 +00:00
Matthew Dillon
fa7dd9c5bc Change the way ELF coredumps are handled. Instead of unconditionally
skipping read-only pages, which can result in valuable non-text-related
data not getting dumped, the ELF loader and the dynamic loader now mark
read-only text pages NOCORE and the coredump code only checks (primarily) for
complete inaccessibility of the page or NOCORE being set.

Certain applications which map large amounts of read-only data will
produce much larger cores.  A new sysctl has been added,
debug.elf_legacy_coredump, which will revert to the old behavior.

This commit represents collaborative work by all parties involved.
The PR contains a program demonstrating the problem.

PR:		kern/45994
Submitted by:	"Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org>
Reviewed by:	jdp, dillon
MFC after:	7 days
2002-12-16 19:24:43 +00:00
Robert Drehmel
0adb6d7a49 Remove the hto(be|le)[slq] and (be|le)toh[slq] macros defined in
_KERNEL scope from "src/sys/sys/mchain.h".

Replace each occurrence of the above in _KERNEL scope with the
appropriate macro from the set of hto(be|le)(16|32|64) and
(be|le)toh(16|32|64) from "src/sys/sys/endian.h".

Tested by:		tjr
Requested by:		comment marked with XXX
2002-12-16 16:20:06 +00:00
Matthew Dillon
72e7f3ddc2 Regenerate system calls (swapoff added) 2002-12-15 19:19:15 +00:00
Matthew Dillon
92da00bb24 This is David Schultz's swapoff code which I am finally able to commit.
This should be considered highly experimental for the moment.

Submitted by:	David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after:	3 weeks
2002-12-15 19:17:57 +00:00
Matthew Dillon
389d2b6e21 Fix a refcount race with the vmspace structure. In order to prevent
resource starvation we clean-up as much of the vmspace structure as we
can when the last process using it exits.  The rest of the structure
is cleaned up when it is reaped.  But since exit1() decrements the ref
count it is possible for a double-free to occur if someone else, such as
the process swapout code, references and then dereferences the structure.
Additionally, the final cleanup of the structure should not occur until
the last process referencing it is reaped.

This commit solves the problem by introducing a secondary reference count,
calling 'vm_exitingcnt'.  The normal reference count is decremented on exit
and vm_exitingcnt is incremented.  vm_exitingcnt is decremented when the
process is reaped.  When both vm_exitingcnt and vm_refcnt are 0, the
structure is freed for real.

MFC after:	3 weeks
2002-12-15 18:50:04 +00:00
Maxim Konovalov
9f59c468f3 o Clear a high bit of ipc_perm.seq so msgget(3) never returns a
negative message queue id.

PR:		kern/46122
Submitted by:	Vladimir B.Grebenschikov <vova@sw.ru>
MFC after:	2 weeks
2002-12-15 09:41:46 +00:00
Alan Cox
475e8011ab Perform vm_object_lock() and vm_object_unlock() around
vm_object_page_remove().
2002-12-15 05:41:56 +00:00
Alfred Perlstein
f97182acf8 unwrap lines made short enough by SCARGS removal 2002-12-14 08:18:06 +00:00
Alfred Perlstein
b80521fee5 remove syscallarg().
Suggested by: peter
2002-12-14 02:07:32 +00:00
Alfred Perlstein
d1e405c5ce SCARGS removal take II. 2002-12-14 01:56:26 +00:00
Kirk McKusick
0f5f789c0d The buffer daemon cannot skip over buffers owned by locked inodes as
they may be the only viable ones to flush. Thus it will now wait for
an inode lock if the other alternatives will result in rollbacks (and
immediate redirtying of the buffer). If only buffers with rollbacks
are available, one will be flushed, but then the buffer daemon will
wait briefly before proceeding. Failing to wait briefly effectively
deadlocks a uniprocessor since every other process writing to that
filesystem will wait for the buffer daemon to clean up which takes
close enough to forever to feel like a deadlock.

Reported by:	Archie Cobbs <archie@dellroad.org>
Sponsored by:   DARPA & NAI Labs.
Approved by:	re
2002-12-14 01:35:30 +00:00
Alfred Perlstein
bc9e75d7ca Backout removal SCARGS, the code freeze is only "selectively" over. 2002-12-13 22:41:47 +00:00
Alfred Perlstein
0bbe7292e1 Remove SCARGS.
Reviewed by: md5
2002-12-13 22:27:25 +00:00
Tim J. Robbins
9d0fffd3ca Drop filedesc lock and acquire Giant around calls to malloc() and free().
These call uma_large_malloc() and uma_large_free() which require Giant.
Fixes panic when descriptor table is larger than KMEM_ZMAX bytes
noticed by kkenn.

Reviewed by:	jhb
2002-12-13 09:59:40 +00:00
Julian Elischer
696058c3c5 Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by:	re (jhb)
2002-12-10 02:33:45 +00:00
Robert Watson
990b4b2dc5 Remove dm_root entry from struct devfs_mount. It's never set, and is
unused.  Replace it with a dm_mount back-pointer to the struct mount
that the devfs_mount is associated with.  Export that pointer to MAC
Framework entry points, where all current policies don't use the
pointer.  This permits the SEBSD port of SELinux's FLASK/TE to compile
out-of-the-box on 5.0-CURRENT with full file system labeling support.

Approved by:	re (murray)
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-12-09 03:44:28 +00:00
Alan Cox
2e29a1f21f To avoid lock order reversals in getnewvnode(), the call to uma_zfree()
must be delayed until the vnode interlock is released.

Reported by:	kris@
Approved by:	re (jhb)
2002-12-08 05:06:50 +00:00
Giorgos Keramidas
0c920c0de8 Fix typo in comment. It's SYSINIT, not SYSINT.
Approved by:	re (murray)
2002-11-30 22:15:30 +00:00
Kirk McKusick
c6964d3bc9 Remove a race condition / deadlock from snapshots. When
converting from individual vnode locks to the snapshot
lock, be sure to pass any waiting processes along to the
new lock as well. This transfer is done by a new function
in the lock manager, transferlockers(from_lock, to_lock);
Thanks to Lamont Granquist <lamont@scriptkiddie.org> for
his help in pounding on snapshots beyond all reason and
finding this deadlock.

Sponsored by:   DARPA & NAI Labs.
2002-11-30 19:00:51 +00:00
Warner Losh
304f10ce4a devd kernel improvements:
1) Record all device events when devctl is enabled, rather than just when
   devd has devctl open.  This is necessary to prevent races between when
   a device arrives, and when devd starts.
2) Add hw.bus.devctl_disable to disable devctl, this can also be set as a
   tunable.
3) Fix async support. Reset nonblocking and async_td in open.  remove
   async flags.
4) Free all memory when devctl is disabled.

Approved by: re (blanket)
2002-11-30 00:49:43 +00:00
Alan Cox
fdff30d256 Use pmap_remove_all() instead of pmap_remove() before freeing the page
in vm_pgmoveco(); the page may have more than one mapping.  Hold the page
queues lock when calling pmap_remove_all().

Approved by:	re (blanket)
2002-11-28 08:44:26 +00:00
Robert Drehmel
f85a961930 Do not set a variable (vp->p_pollinfo) to NULL if we know
it already has that value.

Approved by:	re
2002-11-27 16:45:54 +00:00
Maxim Konovalov
8819f45b51 Small SO_RCVTIMEO and SO_SNDTIMEO values are mistakenly taken to be zero.
PR:		kern/32827
Submitted by:	Hartmut Brandt <brandt@fokus.gmd.de>
Approved by:	re (jhb)
MFC after:	2 weeks
2002-11-27 13:34:04 +00:00
Tim J. Robbins
fef82663b8 o Initialise each mbuf's m_len to 0 in m_getm(); mb_put_mem() depends
on this.
o Update the `cur' pointer in the cluster loop in m_getm() to avoid
  incorrect truncation and leaked mbufs.

Reviewed by:	bmilekic
Approved by:	re
2002-11-27 04:26:00 +00:00
Warner Losh
647501a046 Make the rman_{get,set}_* macros into real functions. The macros
create an ABI that encodes offsets and sizes of structures into client
drivers.  The functions isolate the ABI from changes to the resource
structure.  Since these are used very rarely (once at startup), the
speed penalty will be down in the noise.

Also, add r_rid to the structure so that clients can save the 'rid' of
the resource in the struct resource, plus accessor functions.  Future
additions to newbus will make use of this to present a simplified
interface for resource specification.

Approved by: re (jhb)
Reviewed by: jhb, jake
2002-11-27 03:55:22 +00:00
Bill Fenner
8b5f8b061a Don't hold acct_mtx over limcopy(), since it's unnecessary and
limcopy() can sleep.

Approved by:	re
2002-11-26 18:04:12 +00:00
Sam Leffler
c8f43965d6 correct function names in KASSERT's for 2 m_tag routines
Submitted by:	rwatson
Approved by:	re
2002-11-26 17:59:16 +00:00
Robert Drehmel
d1989db545 To avoid sleeping with all sorts of resources acquired (the reported
problem was a locked directory vnode), do not give the process a chance
to sleep in state "stopevent" (depends on the S_EXEC bit being set in
p_stops) until most resources have been released again.

Approved by:	re
2002-11-26 17:30:55 +00:00
John Baldwin
04f4a16448 If the file descriptors passed into do_dup() are negative, return EBADF
instead of panicing.  Also, perform some of the simpler sanity checks on
the fds before acquiring the filedesc lock.

Approved by:	re
Reported by:	Dan Nelson <dan@emsphone.com> and others
2002-11-26 17:22:15 +00:00
Robert Watson
4d10c0ce5f Un-staticize mac_cred_mmapped_drop_perms() so that it may be used
by policy modules making use of downgrades in the MAC AST event.  This
is required by the mac_lomac port of LOMAC to the MAC Framework.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-26 17:11:57 +00:00
Alan Cox
2d21129db2 Acquire and release the page queues lock around pmap_remove_pages() because
it updates several of vm_page's fields.
2002-11-25 04:37:44 +00:00
Alan Cox
178949e021 Hold the page queues/flags lock when calling vm_page_set_validclean().
Approved by:	re
2002-11-23 19:10:31 +00:00
Maxime Henrion
b19d9defef Under certain circumstances, we were calling kmem_free() from
i386 cpu_thread_exit().  This resulted in a panic with WITNESS
since we need to hold Giant to call kmem_free(), and we weren't
helding it anymore in cpu_thread_exit().  We now do this from a
new MD function, cpu_thread_dtor(), called by thread_dtor().

Approved by:	re@
Suggested by:	jhb
2002-11-22 23:57:02 +00:00
Jeff Roberson
79acfc497b - Add the new sched_pctcpu() function to the sched_* api.
- Provide a routine in sched_4bsd to add this functionality.
 - Use sched_pctcpu() in kern_proc, which is the one place outside of
   sched_4bsd where the old pctcpu value was accessed directly.

Approved by:	re
2002-11-21 09:30:55 +00:00
Jeff Roberson
06439a04a1 - Move scheduler specific macros and defines out of proc.h
Approved by:	re
2002-11-21 09:14:13 +00:00
Jeff Roberson
148302c9c9 - Move FSCALE back to kern_sync. This is not scheduler specific.
- Create a new callout for lbolt and move it out of schedcpu().  This is not
   scheduler specific either.

Approved by:	re
2002-11-21 08:57:08 +00:00
Jeff Roberson
de028f5a4a - Implement a mechanism for allowing schedulers to place scheduler dependant
data in the scheduler independant structures (proc, ksegrp, kse, thread).
 - Implement unused stubs for this mechanism in sched_4bsd.

Approved by:	re
Reviewed by:	luigi, trb
Tested on:	x86, alpha
2002-11-21 01:22:38 +00:00
Robert Watson
2555374c4f Introduce p_label, extensible security label storage for the MAC framework
in struct proc.  While the process label is actually stored in the
struct ucred pointed to by p_ucred, there is a need for transient
storage that may be used when asynchronous (deferred) updates need to
be performed on the "real" label for locking reasons.  Unlike other
label storage, this label has no locking semantics, relying on policies
to provide their own protection for the label contents, meaning that
a policy leaf mutex may be used, avoiding lock order issues.  This
permits policies that act based on historical process behavior (such
as audit policies, the MAC Framework port of LOMAC, etc) can update
process properties even when many existing locks are held without
violating the lock order.  No currently committed policies implement use
of this label storage.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-20 15:41:25 +00:00
Robert Watson
a3df768b04 Merge kld access control checks from the MAC tree: these access control
checks permit policy modules to augment the system policy for permitting
kld operations.  This permits policies to limit access to kld operations
based on credential (and other) properties, as well as to perform checks
on the kld being loaded (integrity, etc).

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-19 22:12:42 +00:00
Robert Watson
293d2d2261 We leaked a process lock reference in the event an RFTHREAD process
leader wasn't exiting during a fork; instead, do remember to release
the lock avoiding lock order reversals and recursion panic.

Reported by:	"Joel M. Baldwin" <qumqats@outel.org>
2002-11-18 14:23:21 +00:00
David Xu
bfd8325073 Make sure only update wall clock at upcall time, slightly reformat
code in kse_relase().
2002-11-18 12:28:15 +00:00
Alfred Perlstein
ec63e12a03 During shutdown explain what the numbers following the 'syncing
disks' message mean, specifically, 'buffers remaining...'.
2002-11-18 02:41:03 +00:00
David Xu
8798d4f9c8 1. Support versioning and wall clock in kse mailbox,
also add rusage time in thread mailbox.
2. Minor change for thread limit code in thread_user_enter(),
   fix typo in kse_release() last I committed.

Reviewed by: deischen, mini
2002-11-18 01:59:31 +00:00
Julian Elischer
904f1b77cc include smp.h.
it is required by some code that was commented out until david's
last commit.
2002-11-17 23:26:42 +00:00
David Xu
fdc5ecd24f 1.Add sysctls to control KSE resource allocation.
kern.threads.max_threads_per_proc
  kern.threads.max_groups_per_proc
2.Temporary disable borrower thread stash itself as
  owner thread's spare thread in thread_exit(). there
  is a race between owner thread and borrow thread:
  an owner thread may allocate a spare thread as this:
	if (td->td_standin == NULL)
		td->standin = thread_alloc();
  but thread_alloc() can block the thread, then a borrower
  thread would possible stash it self as owner's spare
  thread in thread_exit(), after owner is resumed, result
  is a thread leak in kernel, double check in owner can
  avoid the race, but it may be ugly and not worth to do.
2002-11-17 11:47:03 +00:00
David Xu
db9b0729fc Rework last exiting thread in kse_release(), wait a signal and then
schedule an upcall and call thread_exit().
2002-11-17 10:12:00 +00:00
Jeff Roberson
a9a088823e - Release the imgp vnode prior to freeing exec_map resources to avoid
deadlock.
2002-11-17 09:33:00 +00:00
Alfred Perlstein
f51c1e897d Rework the sysconf(3) interaction with aio:
sysconf.c:
  Use 'break' rather than 'goto yesno' in sysconf.c so that we report a '0'
  return value from the kernel sysctl.

vfs_aio.c:
  Make aio reset its configuration parameters to -1 after unloading
  instead of 0.

posix4_mib.c:
  Initialize the aio configuration parameters to -1
  to indicate that it is not loaded.
  Add a facility (p31b_iscfg()) to determine if a posix4 facility has been
  initialized to avoid having to re-order the SYSINITs.
  Use p31b_iscfg() to determine if aio has had a chance to run yet which
  is likely if it is compiled into the kernel and avoid spamming its
  values.
  Introduce a macro P31B_VALID() instead of doing the same comparison over
  and over.

posix4.h:
  Prototype p31b_iscfg().
2002-11-17 04:15:34 +00:00
Alan Cox
4fec79bef8 Now that pmap_remove_all() is exported by our pmap implementations
use it directly.
2002-11-16 07:44:25 +00:00
Alfred Perlstein
86d52125a2 Export the values for _SC_AIO_MAX and _SC_AIO_PRIO_DELTA_MAX via the p1003b
sysctl interface.
2002-11-16 06:38:07 +00:00
Daniel Eischen
f3ec9000e9 Regenerate after adding system calls. 2002-11-16 06:36:56 +00:00
Daniel Eischen
2be05b70c9 Add getcontext, setcontext, and swapcontext as system calls.
Previously these were libc functions but were requested to
be made into system calls for atomicity and to coalesce what
might be two entrances into the kernel (signal mask setting
and floating point trap) into one.

A few style nits and comments from bde are also included.

Tested on alpha by: gallatin
2002-11-16 06:35:53 +00:00
Alfred Perlstein
c844abc920 Call 'p31b_setcfg(CTL_P1003_1B_AIO_LISTIO_MAX, AIO_LISTIO_MAX)'
when AIO is initialized so that sysconf() gives correct results.

Reported by: Craig Rodrigues <rodrigc@attbi.com>
2002-11-16 04:22:55 +00:00
Alfred Perlstein
b565fb9e6f headers should not really include "opt_foo.h" (in this case opt_posix.h).
remove it from the header and add it to the files that require it.
2002-11-15 22:55:06 +00:00
David Xu
1d2c5bd519 Return EWOULDBLOCK for last thread in kse_release().
Requested by: archie
2002-11-15 00:53:59 +00:00
Thomas Moestl
01ee43955c Make the msg_size, msg_bufx and msg_bufr memebers of struct msgbuf
signed, since they describe a ring buffer and signed arithmetic is
performed on them. This avoids some evilish casts.

Since this changes all but two members of this structure, style(9)
those remaining ones, too.

Requested by:	bde
Reviewed by:	bde (earlier version)
2002-11-14 16:11:12 +00:00
David Xu
ca161eb6e9 In kse_release(), check if current thread is bound
and current kse mailbox was already initialized, also
prevent last thread from exiting unless we figure out
how to safely support null thread proc.
2002-11-14 06:06:45 +00:00
Robert Watson
a96acd1ace Introduce a condition variable to avoid returning EBUSY when
the MAC policy list is busy during a load or unload attempt.
We assert no locks held during the cv wait, meaning we should
be fairly deadlock-safe.  Because of the cv model and busy
count, it's possible for a cv waiter waiting for exclusive
access to the policy list to be starved by active and
long-lived access control/labeling events.  For now, we
accept that as a necessary tradeoff.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-13 15:47:09 +00:00
Maxime Henrion
2bb95458bd Add support for the C99 %t format modifier. 2002-11-13 15:15:59 +00:00
Robert Watson
63b6f478ec Garbage collect mac_create_devfs_vnode() -- it hasn't been used since
we brought in the new cache and locking model for vnode labels.  We
now rely on mac_associate_devfs_vnode().

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-12 04:20:36 +00:00
John Baldwin
d2b28e078a Correct an assertion in the code to traverse the list of locks to find an
earlier acquired lock with the same witness as the lock currently being
acquired.  If we had released several earlier acquired locks after
acquiring enough locks to require another lock_list_entry bucket in the
lock list, then subsequent lock_list_entry buckets could contain only one
lock instance in which case i would be zero.

Reported by:	Joel M. Baldwin <qumqats@outel.org>
2002-11-11 16:36:20 +00:00
Robert Watson
2d43d24ed4 Garbage collect definition of M_MACOPVEC -- we no longer perform a
dynamic mapping of an operation vector into an operation structure,
rather, we rely on C99 sparse structure initialization.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-11 14:15:58 +00:00
Alan Cox
d154fb4fe6 When prot is VM_PROT_NONE, call pmap_page_protect() directly rather than
indirectly through vm_page_protect().  The one remaining page flag that
is updated by vm_page_protect() is already being updated by our various
pmap implementations.

Note: A later commit will similarly change the VM_PROT_READ case and
eliminate vm_page_protect().
2002-11-10 07:12:04 +00:00
Alfred Perlstein
29f194457c Fix instances of macros with improperly parenthasized arguments.
Verified by: md5
2002-11-09 12:55:07 +00:00
Robert Watson
6d7bdc8def Assign value of NULL to imgp->execlabel when imgp is initialized
in the ELF code.  Missed in earlier merge from the MAC tree.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 20:49:50 +00:00
Robert Watson
52378b8acd To reduce per-return overhead of userret(), call into
mac_thread_userret() only if PS_MACPEND is set in the process AST mask.
This avoids the cost of the entry point in the common case, but
requires policies interested in the userret event to set the flag
(protected by the scheduler lock) if they do want the event.  Since
all the policies that we're working with which use mac_thread_userret()
use the entry point only selectively to perform operations deferred
for locking reasons, this maintains the desired semantics.

Approved by:	re
Requested by:	bde
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 19:00:17 +00:00
Robert Watson
9fa3506ecd Add an explicit execlabel argument to exec-related MAC policy entry
points, rather than relying on policies to grub around in the
image activator instance structure.

Approved by:	re
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2002-11-08 18:04:00 +00:00
Thomas Moestl
0fca57b8b8 Move the definitions of the hw.physmem, hw.usermem and hw.availpages
sysctls to MI code; this reduces code duplication and makes all of them
available on sparc64, and the latter two on powerpc.
The semantics by the i386 and pc98 hw.availpages is slightly changed:
previously, holes between ranges of available pages would be included,
while they are excluded now. The new behaviour should be more correct
and brings i386 in line with the other architectures.

Move physmem to vm/vm_init.c, where this variable is used in MI code.
2002-11-07 23:57:17 +00:00
John Baldwin
6274bdda4c - Use %j to print intmax_t values.
- Cast more daddr_t values to intmax_t when printing to quiet warnings.
2002-11-07 22:41:08 +00:00