Commit Graph

5866 Commits

Author SHA1 Message Date
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Jeff Roberson
35e6168fcd - Add the ule scheduler. This is intended to be a general purpose process
scheduler with many SMP benefits.  It is still very experimental and should
   be used only in test environments.
2003-01-26 05:23:15 +00:00
Jeff Roberson
4e997f4b87 - Call sched_sleep() instead of rolling our own in cv_waitq_add(). 2003-01-26 04:00:39 +00:00
Alfred Perlstein
e1d7d0bb60 Bring shm functions closer the the opengroup standards.
PR: 47469
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-25 21:33:05 +00:00
Alfred Perlstein
3beb32709d Bring semop() closer the the opengroup standards.
PR: 47471
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-25 21:27:37 +00:00
Poul-Henning Kamp
4394f4767d Add sysctl kern.timecounter.nsetclock which indicates the number of
potential discontinuities in our UTC timescale.

Applications can monitor this variable if they want to be informed
about steps in the timescale.  Slews (ntp and adjtime(2)) and
frequency adjustments (ntp) will not increment this counter, only
operations which set the clock.  No attempt is made to classify
size or direction of the step.
2003-01-25 07:51:09 +00:00
Jeffrey Hsu
afb0573a12 Remove extraneous FILEDESC_LOCKs around atomic reads.
Reviewed by:	jhb
2003-01-24 22:49:52 +00:00
Hajimu UMEMOTO
bdc5f6a345 Added comment why this workaround is required.
Suggested by:	sam
MFC after:	1 week
2003-01-22 18:03:06 +00:00
Hajimu UMEMOTO
56b3905f15 getpeername() returns with no error but didn't fill struct sockaddr
correctly against PF_LOCAL.  It seems that the test always fails then
sockaddr was not filled.  So, I added else clause for workaround.
I doubt if it is right fix.  However, it is better than nothing.  I
found that NetBSD has same potential problem.  But, fortunately,
NetBSD has equivalent else clause.

MFC after:	1 week
2003-01-22 13:13:13 +00:00
Dag-Erling Smørgrav
ecf031c9ad There's absolutely no need for a struct-within-a-struct, so move the
counters out of the inner struct and remove it.
2003-01-21 20:33:27 +00:00
Jeffrey Hsu
a448a15bc1 Add missing SMP file locks around read-modify-write operations on
the flag field.

Reviewed by:	rwatson
2003-01-21 20:20:48 +00:00
Thomas Moestl
72aeb19aba Correct an off-by-one in the boundary check. Otherwise, resource
allocations would fail if the desired allocation size was equal to
the boundary.
2003-01-21 17:02:21 +00:00
Poul-Henning Kamp
a63935c3f6 #ifdef NO_GEOM all of this file. 2003-01-21 10:40:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Sam Leffler
07ff231fcb preserve the order of tags copied by m_tag_copy_chain
Obtained from:	OpenBSD
2003-01-21 06:14:38 +00:00
Jeffrey Hsu
34c54d9f74 Rewrite the SMP filedesc locking in knote_attach() in order to
1.  eliminate unnecessary loop which frees and re-allocates
	the just allocated array
  2.  eliminate the newsize recomputation
  3.  eliminate unnecessary unlock and relock around free
  4.  correctly match the free with the malloc into M_KQUEUE instead of M_TEMP
  5.  eliminate conditional assignment of oldlist, which is equivalent to a
	simple assignment
  6.  eliminate the oldlist temporary variable completely

Reviewed by:    jhb
2003-01-21 04:05:49 +00:00
Robert Watson
ec35c2af68 Perform VOP_GETATTR() before mac_check_vnode_exec() so that
the cached attributes are available to MAC modules.

Submitted by:   mike halderman <mrh@nosc.mil>
Obtained from:	TrustedBSD Project
2003-01-21 03:26:28 +00:00
Jake Burkholder
7251b4bf93 Resolve relative relocations in klds before trying to parse the module's
metadata.  This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by:	Hartmut Brandt <brandt@fokus.gmd.de>
PR:		46732
Tested on:	alpha (obrien), i386, sparc64
2003-01-21 02:42:44 +00:00
Matthew Dillon
2d5c7e4506 Close the remaining user address mapping races for physical
I/O, CAM, and AIO.  Still TODO: streamline useracc() checks.

Reviewed by:	alc, tegge
MFC after:	7 days
2003-01-20 17:46:48 +00:00
Poul-Henning Kamp
c0805171aa disk_dev_synth() is a NO_GEOM hack. 2003-01-20 11:29:07 +00:00
Poul-Henning Kamp
0b4583e873 Only include <sys/diskslice.h> ifdef NO_GEOM 2003-01-20 11:28:37 +00:00
Alan Cox
28ec30cd9f - Hold the page queues lock around vm_page_hold().
- Assert that the page queues lock rather than Giant is held in
   vm_page_hold().
2003-01-20 09:24:03 +00:00
Julian Elischer
67f7c1bbe1 Remove a KASSERT that can now happen and add a missing setrunnable. 2003-01-20 03:41:04 +00:00
Poul-Henning Kamp
8a5c54f72d #ifdef NO_GEOM these files entirely. When NO_GEOM is removed as an
option the files can be removed.
2003-01-19 11:51:35 +00:00
Tim J. Robbins
5cb6b2cada Remove unnecessary locking of Giant around nanotime() in clock_gettime(). 2003-01-19 11:28:22 +00:00
Poul-Henning Kamp
5ecd6fd411 Mark more code #ifdef NODEVFS 2003-01-19 11:26:13 +00:00
Poul-Henning Kamp
7e760e148a Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.
2003-01-19 11:03:07 +00:00
Poul-Henning Kamp
ec2c4225ce When we use DEVFS, we don't need the /dev/tty pseudo-driver to do
more than return ENXIO from its open routine, so most of this file
is unneeded.

A straight #ifdef'ing would look quite messy, and make the file
quite unreadable, so instead I have simply added the DEVFS version
of the file at the top, protected by #ifndef NODEVFS.

Once we have removed NODEVFS option, we can retain 86 the 86 lines at
the top and drop the other 287 lines.
2003-01-19 10:23:47 +00:00
Alfred Perlstein
31f3e2ad8e useracc() is mpsafe so we only need to hold Giant
over the call to nanosleep1()

Pointed out by: tjr
2003-01-19 06:51:10 +00:00
Warner Losh
b47d073500 Fix comment about what we do when there are no listeners. 2003-01-19 00:34:17 +00:00
Poul-Henning Kamp
ffffe9203f Move alpha_fix_srm_checksum() from subr_diskmbr.c to subr_disklabel.c 2003-01-17 19:37:55 +00:00
Poul-Henning Kamp
40f683a443 Remove the unused DSO_* options. 2003-01-17 19:36:14 +00:00
Thomas Moestl
6f7cab9301 Disallow listen() on sockets which are in the SS_ISCONNECTED or
SS_ISCONNECTING state, returning EINVAL (which is what POSIX mandates
in this case).
listen() on connected or connecting sockets would cause them to enter
a bad state; in the TCP case, this could cause sockets to go
catatonic or panics, depending on how the socket was connected.

Reviewed by:	-net
MFC after:	2 weeks
2003-01-17 19:20:00 +00:00
Poul-Henning Kamp
e948321c7a Move dkmodpart() from subr_diskslice.c to subr_disklabel.c. 2003-01-17 19:05:58 +00:00
Poul-Henning Kamp
ce9fac0072 Move a local variable to avoid the compiler warning about it being unused. 2003-01-16 20:06:45 +00:00
John Hay
b1e7e2019e hardpps() wants the raw hardware counter value converted to nanoseconds. 2003-01-16 19:22:13 +00:00
Alan Cox
6eb07b4ac2 Fix two long-standing, but likely harmless, errors in the use of
vm_pageout_deficit:
1. Update vm_pageout_deficit before VM_WAIT.  There is no sense in
   delaying the update; the sooner the pageout daemon receives this
   information the better.  Reviewed by: tegge
2. Update vm_pageout_deficit according to the number of pages still
   needed to complete the allocation, not the original size of the
   allocation.  Submitted by: tegge

(These errors have existed since the introduction of vm_pageout_deficit
in revision 1.144.)
2003-01-16 08:14:56 +00:00
Matthew Dillon
f597900329 Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy.  Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).
2003-01-15 23:54:35 +00:00
David Xu
4e77d3c6a2 Don't forget to disconnect object from class. 2003-01-15 14:58:07 +00:00
Matthew Dillon
fe41ca530c Introduce the ability to flag a sysctl for operation at secure level 2 or 3
in addition to secure level 1.  The mask supports up to a secure level of 8
but only add defines through CTLFLAG_SECURE3 for now.

As per the missif in the log entry for 1.11 of ip_fw2.c which added the
secure flag to the IPFW sysctl's in the first place, change the secure
level requirement from 1 to 3 now that we have support for it.

Reviewed by:	imp
With Design Suggestions by:	imp
2003-01-14 19:35:33 +00:00
Alan Cox
b0ef8c5fe4 - Update vm_pageout_deficit using atomic operations. It's a simple
counter outside the scope of existing locks.
 - Eliminate a redundant clearing of vm_pageout_deficit.
2003-01-14 06:57:03 +00:00
Matthew Dillon
3db161e079 It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped.  Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree().  This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after:	7 days
2003-01-13 23:04:32 +00:00
Alfred Perlstein
ac41f2ef0b style(9) fixes, mostly add parens around return arguments. 2003-01-13 15:06:05 +00:00
Jeff Roberson
8fb913face - Unbreak world. I did not notice that libkvm was still used in some places
to access the pctcpu.  This will have to be sorted out more later as the
   new scheduler requires a procedural interface for this data.  A more
   complete solution will follow.
2003-01-13 03:42:41 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Jeff Roberson
bcb06d5980 - Move ke_pctcpu and ke_cpticks into the scheduler specific datastructure.
This will prevent access through mechanisms other than the published
   interfaces.
2003-01-12 19:04:49 +00:00
Tim J. Robbins
ae3b195fcf Allowing nent < 0 in aio_suspend() and lio_listio() is just asking for
trouble. Return EINVAL instead.
2003-01-12 09:40:23 +00:00
Tim J. Robbins
44a2c818de Remove "XXX undocumented" comment from lio_listio(). 2003-01-12 09:33:16 +00:00
Alan Cox
8febaa4df0 vm_hold_load_pages() needn't clear PG_ZERO because it didn't pass
VM_ALLOC_ZERO to vm_page_alloc(). (PG_ZERO is clear by default.)
2003-01-12 06:30:15 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00