Commit Graph

6253 Commits

Author SHA1 Message Date
John Baldwin
9722121a3c Respect any passed in external lockmgr flags such as LK_NOWAIT in the
default implementations of VOP_LOCK() and VOP_UNLOCK().

Tested by:	jlemon, phk
Glanced at by:	jeffr
2003-03-07 20:45:07 +00:00
John Baldwin
9da590b49b Oops, fix the double faults people were seeing with the recent changes to
witness.  Sleepable locks such as sx locks always come before all mutexes
including Giant.  However, the static lock order list placed Giant before
the proctree and allproc sx locks.  This resulted in witness creating a
cycle in its lock order "tree" (real trees don't have cycles) leading to
infinite recursion and eventually a double fault.  To fix, put Giant after
sx locks in the lock order list.
2003-03-06 17:25:06 +00:00
Alan Cox
7c4351aabd Remove GIANT_REQUIRED from sf_buf_free(). 2003-03-06 04:48:19 +00:00
Robert Watson
9283578946 Instrument sysarch() MD privileged I/O access interfaces with a MAC
check, mac_check_sysarch_ioperm(), permitting MAC security policy
modules to control access to these interfaces.  Currently, they
protect access to IOPL on i386, and setting HAE on Alpha.
Additional checks might be required on other platforms to prevent
bypass of kernel security protections by unauthorized processes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-06 04:47:47 +00:00
Alan Cox
09c80124a3 Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on:	arch@
2003-03-06 03:41:02 +00:00
Robert Watson
1b2c2ab29a Provide a mac_check_system_swapoff() entry point, which permits MAC
modules to authorize disabling of swap against a particular vnode.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-05 23:50:15 +00:00
Robert Watson
a184d471e2 Move the initialization of the vattr flags field in setfflags() to
before the MAC check so that we pass the flags field into the MAC
check properly initialized.  This didn't affect any current MAC
modules since they didn't care what the flags argument was (as
they were primarily interested in the fact that it was a meta-data
write, not the contents of the write), but would be relevant to
future modules relying on that field.

Submitted by:	Mike Halderman <mrh@spawar.navy.mil>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-05 23:15:23 +00:00
Peter Wemm
3c6b084e96 Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.
2003-03-05 19:24:24 +00:00
David Schultz
9c62b3ee7c Make TTYHOG tunable.
Reviewed by:	mike (mentor)
2003-03-05 08:16:29 +00:00
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
John Baldwin
c141c242ac Bah, fix a bogon in the last commit: get the sense of a compare test right
so that we allow a sleepable lock to be acquired with Giant held rather
than allowing a sleepable lock to be acquired with anything but Giant held.
2003-03-04 22:34:07 +00:00
Jeff Roberson
24deed1aaa - Hold the buf lock while manipulating and inspecting its fields.
- Use gbincore() and not incore() so that we can drop the vnode interlock
   as we acquire the buflock.
 - Use GB_LOCK_NOWAIT when getting bufs for read ahead clusters so that we
   don't block on locked bufs.
 - Convert a while loop to a howmany() that will most likely be faster on
   modern processors.  There is another while loop divide that was left
   near by because it is operating on a 64bit int and is most likely faster.
 - Cleanup the cluster_read() code a little to get rid of a goto and make
   the logic clearer.

Tested on:	x86, alpha
Tested by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
Reviewd by:	arch
2003-03-04 21:35:28 +00:00
John Baldwin
1106937d99 Remove safety belt: it is now ok to do a mtx_trylock() on a mutex you
already own.  The mtx_trylock() will fail however.  Enhance the comment
at the top of the try lock function to explain this.

Requested by:	jlemon and his evil netisr locking
2003-03-04 21:32:25 +00:00
John Baldwin
263067951a Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().
2003-03-04 21:03:05 +00:00
John Baldwin
9b4982bfed Add a WITNESS_WARN() call to verify that we hold no locks after running
a handler from an interrupt thread.
2003-03-04 21:01:42 +00:00
John Baldwin
35580ede37 A small overhaul of witness:
- Add a comment about special lock order rules and Giant near the top of
  subr_witness.c.  Specifically, this documents and explains the real lock
  order relationship between Giant and sleepable locks (i.e. lockmgr locks
  and sx locks).  Basically, Giant can be safely acquired either before or
  after sleepable locks and the case of Giant before a sleepable lock is
  exempted as a special case.
- Add a new static function 'witness_list_lock()' that displays a single
  line of information about a struct lock_instance.  This is used to
  make the output of witness messages more consistent and reduce some code
  duplication.
- Fixup a few comments in witness_lock().
- Properly handle the Giant-before-sleepable-lock lock order exception in
  a more general fashion and remove the no longer needed LI_SLEPT flag.
- Break up the last condition before assuming a reversal a bit to try
  and make the logic less confusing in witness_lock().
- Axe WITNESS_SLEEP() now that LI_SLEPT is no longer needed and replace it
  with a more general WITNESS_WARN() macro/function combination.
  WITNESS_WARN() allows you to output a customized message out to the
  console along with a list of held locks.  It will optionally drop into
  the debugger as well.  You can exempt a single lock from the check by
  passing it in as the second argument.  You can also use flags to specify
  if Giant should be exempt from the check, if all sleepable locks should
  be exempt from the check, and if witness should panic if any non-exempt
  locks are found.
- Make the witness_list() function static.  Other areas of the kernel
  should use the new WITNESS_WARN() instead.
2003-03-04 20:56:39 +00:00
John Baldwin
5fa8dd90f9 Miscellaneous cleanups to _mtx_lock_sleep():
- Declare some local variables at the top of the function instead of in a
  nested block.
- Use mtx_owned() instead of masking off bits from mtx_lock manually.
- Read the value of mtx_lock into 'v' as a separate line rather than inside
  an if statement for clarity.  This code is hairy enough as it is.
2003-03-04 20:32:41 +00:00
John Baldwin
6b869595c5 Properly assert that mtx_trylock() is not called on a mutex we already
owned.  Previously the KASSERT would only trigger if we successfully
acquired a lock that we already held.  However, _obtain_lock() fails to
acquire locks that we already hold, so the KASSERT was never checked in
the case it was supposed to fail.
2003-03-04 20:30:30 +00:00
Jeff Roberson
e1f89c222b - Create a function sched_interact_score() which decides on the
interactivity of a kseg and assigns it a value of 0 through 100.
 - Use sched_interact_score() to determine the dynamic priority.
 - Define SCHED_CURR() in terms of sched_interact_score().
 - Adjust the maximum slice back down to 100ms.
 - Remove redundant clearing of ke_runq in sched_wakeup()
 - Clean up #defines and comment them.
2003-03-04 02:45:59 +00:00
Jeff Roberson
7261f5f68e - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
Jeff Roberson
f727171140 - Correct the wchan in vop_stdfsync()
This is almost what bde asked for.  There is some desire to have per fs wchans
still but that is difficult giving the current arrangement of the code.
2003-03-03 23:37:50 +00:00
Ruslan Ermilov
6de61153e8 FreeBSD 5.0 has stopped shipping /modules 2.5 years ago. Catch
up with this further by excluding /modules from the (default)
kern.module_path.
2003-03-03 22:53:35 +00:00
Nate Lawson
7dc9111650 Pick up one file missed in the previous vprint() cleanup 2003-03-03 19:50:36 +00:00
Nate Lawson
99648386d3 Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
2003-03-03 19:15:40 +00:00
Poul-Henning Kamp
182a9f7455 Make nokqfilter() return the correct return value.
Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.
2003-03-03 16:24:47 +00:00
Poul-Henning Kamp
7ac40f5f59 Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
Poul-Henning Kamp
a9463ba804 Don't pick up a name from the dev_t if it is not there. 2003-03-03 11:14:36 +00:00
Jeff Roberson
65c8760dbf - Shift the tick count by 10 and back around sched_pctcpu_update()
calculations.  Keep this changes local to the function so the tick count
   is in its natural form otherwise.  Previously 1000 was added each time
   a tick fired and we divided by 1000 when it was reported.  This is done
   to reduce rounding errors.
2003-03-03 05:29:09 +00:00
Jeff Roberson
a6ed41865b - In sched_add() special case PRI_TIMESHARE and PRI_ITHD|PRI_REALTIME. We
always place ITHD & REALTIME threads on the current queue of the current
   cpu.  Prior to this change an interrupt thread would only ever run on one
   cpu.
2003-03-03 04:28:07 +00:00
Jeff Roberson
f1e8dc4a3b - Refrain from setting the td_priority in sched_wakeup(). It will be reset
before we return to user space.
2003-03-03 04:11:40 +00:00
Poul-Henning Kamp
f16304aaf0 Explicitly initialize all cdevsw methods with the relevant nofoo() function
if they are NULL.
2003-03-02 19:46:45 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Dag-Erling Smørgrav
8994a245e0 Clean up whitespace, s/register //, refrain from strong urge to ANSIfy. 2003-03-02 15:56:49 +00:00
Dag-Erling Smørgrav
c952458814 uiomove-related caddr_t -> void * (just the low-hanging fruit) 2003-03-02 15:50:23 +00:00
Dag-Erling Smørgrav
d5279f20c5 Convert one of our main caddr_t consumers, uiomove(9), to void *. 2003-03-02 15:29:13 +00:00
Dag-Erling Smørgrav
34ca14c687 Clean up whitespace, unregisterize, ANSIfy, remove prototypes made
superfluous by ANSIfication.
2003-03-02 15:08:33 +00:00
Poul-Henning Kamp
9c486c30e2 NO_GEOM cleanup:
Remove cdevsw->d_size() implementation.  No longer needed.
2003-03-02 14:43:46 +00:00
Poul-Henning Kamp
9285a87efd NODEVFS cleanup:
Replace devfs_{create,destroy} hooks with direct function calls.
2003-03-02 13:35:30 +00:00
Jeff Roberson
491081fabf - Hold the vnode interlock across calls to bgetvp instead of acquiring it
internally.  This is required to stop multiple bufs from being associated
   with a single lblkno.
2003-03-02 06:05:23 +00:00
Tor Egge
c6faf3bf1d Remove unneeded code added in revision 1.188. 2003-03-01 17:18:28 +00:00
Jeff Roberson
bff5362bf2 - gc USE_BUFHASH. The smp locking of the buf cache renders this useless. 2003-03-01 05:55:03 +00:00
David Xu
9948c47f0e Check kse group limit before linking new ksegrp. 2003-02-28 15:57:33 +00:00
Poul-Henning Kamp
85f19dccdb Add the flip-side check: If a driver wants a particular major#, make
sure it is marked as allocated in reserved_majors[].  Whine if it wasn't.
2003-02-27 15:17:37 +00:00
Maxime Henrion
bca0668a92 We can now properly return ENODEV in nommap(), so do it.
Remove the now wrong comment which says we can't.
2003-02-27 14:48:53 +00:00
Poul-Henning Kamp
beea48b254 Add support for allocating a device driver major number on demand.
To do this, initialize the d_maj member of the cdevsw to MAJOR_AUTO.
When the cdevsw is first passed to make_dev() a free major number
will be assigned.

Until we have a bit more experience with this a printf will announce
this fact.

Major numbers are not reclaimed, so loading/unloading the same
device driver which uses MAJOR_AUTO will eventually deplete the
pool of free major numbers and the system will panic when it can
not allocate one.  Still undecided who to invonvenience with the
solution to this.
2003-02-27 14:46:51 +00:00
Hartmut Brandt
b89bc9e62b When a process has been waiting on a condition variable or mutex the
td_wmesg field in the thread structure points to the description string of
the condition variable or mutex. If the condvar or the mutex had been
initialized from a loadable module that was unloaded in the meantime,
td_wmesg may now point to invalid memory. Retrieving the process table now
may panic the kernel (or access junk). Setting the td_wmesg field to NULL
after unblocking on the condvar/mutex prevents this panic.

PR:		kern/47408
Approved by:	jake (mentor)
2003-02-27 08:43:27 +00:00
Poul-Henning Kamp
f477b4fd53 NODEVFS cleanup:
Remove cdevsw_add() and cdevsw_remove(), they served us well for a long time.
Bump __FreeBSD_version to 500104 to mark this.
2003-02-27 07:40:44 +00:00
David Xu
3b3df40fc4 Release sched_lock before calling upcall_free. 2003-02-27 05:42:01 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
Sam Leffler
893bec8059 o fix ppsratecheck to interpret a maxpps of zero as "ignore everything"
o add a comment explaining the significance of using 0 or -1 (actually
  any negative value) for maxpps
2003-02-26 17:16:38 +00:00
David Xu
426269b2c2 Fix a bug when handling SIGCONT.
Reported By: Mike Makonnen <mtm@identd.net>
2003-02-26 12:47:46 +00:00
Scott Long
7874f606d5 Introduce a new taskqueue that runs completely free of Giant, and in
turns runs its tasks free of Giant too.  It is intended that as drivers
become locked down, they will move out of the old, Giant-bound taskqueue
and into this new one.  The old taskqueue has been renamed to
taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.
2003-02-26 03:15:42 +00:00
David Xu
5614648e5e Add a missing '!'. 2003-02-26 01:56:14 +00:00
David Xu
4b4866ed42 Add a simple facility to allow round roubin in userland.
Reviewed by:	julain
2003-02-26 00:58:23 +00:00
Kirk McKusick
7e734c4149 When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c
delta 1.371) we must ensure that we do not get ourselves into a
recursive trap endlessly trying to clean up after ourselves.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:59:09 +00:00
Mike Makonnen
0bd5f7979d Unbreak mutex profiling (at least for me).
o Always check for null when dereferencing the filename component.
	o Implement a try-and-backoff method for allocating memory to
	  dump stats to avoid a spin-lock -> sleep-lock mutex lock order
	  panic with WITNESS.

Approved by:	des, markm (mentor)
Not objected:	jhb
2003-02-25 22:28:46 +00:00
Jeff Roberson
2e3981a70c - Add the missing NULL interlock argument to a recently added BUF_LOCK. 2003-02-25 08:23:11 +00:00
Kirk McKusick
3a7053cb60 Prevent large files from monopolizing the system buffers. Keep
track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.

Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 06:44:42 +00:00
David Xu
d4b570f053 Remove a bogus comment. 2003-02-25 05:17:18 +00:00
David Xu
768298d8c4 Remove a never true condition. 2003-02-25 05:14:18 +00:00
Jeff Roberson
17661e5ac4 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
Maxime Henrion
07159f9c56 Cleanup of the d_mmap_t interface.
- Get rid of the useless atop() / pmap_phys_address() detour.  The
  device mmap handlers must now give back the physical address
  without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
  int.  Now we properly pass a vm_offset_t * and expect it to be
  filled by the mmap handler when the mapping was successful.  The
  mmap handler must now return 0 when successful, any other value
  is considered as an error.  Previously, returning -1 was the only
  way to fail.  This change thus accidentally fixes some devices
  which were bogusly returning errno constants which would have been
  considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
  no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with:		alc, phk, jake
Reviewed by:		peter
Compile-tested on:	LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on:	i386
2003-02-25 03:21:22 +00:00
Scott Long
3303c14b57 Don't NULL out p_fd until after closefd() has been called. This isn't
totally correct, but it has caused breakage for too long.  I welcome
someone with more fd fu to fix it correctly.
2003-02-24 05:46:55 +00:00
David Xu
0fccb684d1 Remove a XXXKSE. kg_completed now needs proc lock. 2003-02-24 01:28:10 +00:00
David Xu
f5878f69df Backout last surplus commit. That day just wasn't my day. 2003-02-24 00:49:55 +00:00
Tor Egge
6a07a13944 Sync new socket nonblocking/async state with file flags in accept().
PR:		1775
Reviewed by:	mbr
2003-02-23 23:00:28 +00:00
Poul-Henning Kamp
acb18acfec Bracket the kern.vnode sysctl in #ifdef notyet because it results
in massive locking issues on diskless systems.

It is also not clear that this sysctl is non-dangerous in its
requirements for locked down memory on large RAM systems.
2003-02-23 18:09:05 +00:00
Poul-Henning Kamp
5cb3dc8fa3 OK, I was too sleepy there...
Pointy hat over here!
2003-02-23 13:45:55 +00:00
Poul-Henning Kamp
8f5ef1a9fa Implement CLOCK_MONOTONIC. 2003-02-23 10:18:31 +00:00
Jake Burkholder
fc718df7d0 Add a /a modifier to the show ktr ddb command, which prints the whole trace
buffer without stopping.  Useful if you just want to capture the output but
can't run ktrdump.
2003-02-22 23:30:37 +00:00
Robert Watson
90623e1a9e Don't panic when enumerating SYSCTL_NODE() nodes without any children
nodes.

Submitted by:	green, Hiten Pandya <hiten@unixdaemons.com>
2003-02-22 17:58:06 +00:00
Mike Makonnen
750a91d8b1 Remove a comment which hasn't been true since rev. 1.158
Approved by:	jhb, markm (mentor)(implicit)
2003-02-22 05:59:48 +00:00
Robert Watson
838a6d03e8 Export the name of the device used to mount the root file system as
kern.rootdev.  If rootdev is undefined (NFS mount, etc), export an
empty string.

Desired by:	peter
2003-02-22 05:01:12 +00:00
Peter Wemm
86bb731626 Missing M_TRYWAIT from so_upcall third argument. 2003-02-21 22:23:40 +00:00
Poul-Henning Kamp
2c6b49f6af NO_GEOM cleanup:
Retire the "d_dump_t" and use the "dumper_t" type instead.

Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t.  (Remember: we could have net-dumpers if
somebody wrote us one!)

Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.

Change device drivers accordingly.
2003-02-21 19:00:48 +00:00
David Xu
34ada4b3bb If UTS kernel is calling kse_wakeup for itself, do nothing. 2003-02-21 07:11:38 +00:00
Poul-Henning Kamp
263444cfbf Change the console interface to pass a "struct consdev *" instead of a
dev_t to the method functions.

The dev_t can still be found at struct consdev *->cn_dev.

Add a void *cn_arg element to struct consdev which the drivers can use
for retrieving their softc.
2003-02-20 20:54:45 +00:00
Poul-Henning Kamp
02574b19e1 Add a dead_cdevsw which does its best to return ENXIO if at all possible.
In devsw() return dead_cdevsw instead of NULL in case the dev_t does not
have a si_devsw.

This may improve our survival chances with devices which go away unexpectedly.
2003-02-20 15:35:54 +00:00
David Xu
ab7d94f7eb Forgot to set KU_DOUPCALL in kse_wakeup. 2003-02-20 08:22:04 +00:00
David Xu
eb117d5cb0 Add a timeout parameter to kse_release. 2003-02-20 08:18:15 +00:00
Bosko Milekic
025b4be197 o Allow "buckets" in mb_alloc to be differently sized (according to
compile-time constants).  That is, a "bucket" now is not necessarily
  a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ
  worth of mbufs, clusters.
o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce
  {mbuf,clust}_lowm, which currently has no effect but will be used
  to set the low watermarks.
o Fix netstat so that it can deal with the differently-sized buckets
  and teach it about the low watermarks too.
o Make sure the per-cpu stats for an absent CPU has mb_active set to 0,
  explicitly.
o Get rid of the allocate refcounts from mbuf map mess.  Instead,
  just malloc() the refcounts in one shot from mbuf_init()
o Clean up / update comments in subr_mbuf.c
2003-02-20 04:26:58 +00:00
Tim J. Robbins
27e39ae4d8 Remove the PL_SHAREMOD flag from struct plimit, which could have been
used to share resource limits between rfork threads, but never was.
Removing it makes resource limit locking much simpler -- only the current
process can change the contents of the structure that p_limit points to.
2003-02-20 04:18:42 +00:00
Olivier Houchard
d6bf23783f Remove duplicate includes.
Submitted by:	Cyril Nguyen-Huu <cyril@ci0.org>
2003-02-20 03:26:11 +00:00
Bosko Milekic
ec73437395 Fix a serious bug when computing the index for the
reference counter array for mbuf clusters.  I don't know
how this got past early testing nor how it survived so long
without getting caught.  If anyone was seeing really really
bizarre memory corruption in a few mbufs this would be why.
2003-02-20 03:01:04 +00:00
David Xu
a87891ee9e Move thread limits testing code up a bit. This let UPCALLING thread
takes possible accumulated contexts away.
2003-02-20 01:11:17 +00:00
Poul-Henning Kamp
0c977c9c53 Add M_WAITOK 2003-02-19 22:51:33 +00:00
David Xu
fc8cdd87d2 Count non-threaded group. 2003-02-19 13:40:24 +00:00
David Xu
4f6cfa4520 Update comments to reflect new KSE code. 2003-02-19 13:36:51 +00:00
Tim J. Robbins
a44a414e11 The "m = m->m_next" that was removed in the revision 1.12 was necessary
for the m->m_next != NULL case to avoid looping infinitely when the first
mbuf in the chain becomes full.
2003-02-19 10:12:42 +00:00
David Xu
30621e142d M_WAITOK and remove an useless comment. 2003-02-19 09:59:12 +00:00
Warner Losh
a163d034fa Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
David Xu
0252d20369 Optimize the case when max threads number was hit. 2003-02-19 04:01:55 +00:00
Peter Wemm
af3d516f55 Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been
#if'ed out for a while.  Complete the deed and tidy up some other bits.

We need to be able to call this stuff from outer edges of interrupt
handlers for devices that have the ISR bits in pci config space.  Making
the bios code mpsafe was just too hairy.  We had also stubbed it out some
time ago due to there simply being too much brokenness in too many systems.
This adds a leaf lock so that it is safe to use pci_read_config() and
pci_write_config() from interrupt handlers.  We still will use pcibios
to do interrupt routing if there is no acpi.. [yes, I tested this]

Briefly glanced at by:  imp
2003-02-18 03:36:49 +00:00
David Xu
88aba94cdc Further fix PS_NEEDSIGCHK 2003-02-17 14:54:57 +00:00
David Xu
02bbffaf3c Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall,
I think it is a better place to handle it.
2003-02-17 14:41:22 +00:00
Tim J. Robbins
96d7f8ef46 Use the proc lock to protect p_realtimer instead of Giant, and obtain
sched_lock around accesses to p_stats->p_timer[] to avoid a potential
race with hardclock. getitimer(), setitimer() and the realitexpire()
callout are now Giant-free.
2003-02-17 10:03:02 +00:00
Jeff Roberson
58a3c27384 - Add a new function, thread_signal_add(), that is called from postsig to
add a signal to a mailbox's pending set.
 - Add a new function, thread_signal_upcall(), this causes the current thread
   to upcall so that we can deliver pending signals.

Reviewed by:	mini
2003-02-17 09:58:11 +00:00
Julian Elischer
4a338afd7a Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by:	 parts by davidxu@
Reviewed by:	jeff@ mini@
2003-02-17 09:55:10 +00:00
Jeff Roberson
5215b1872f - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
Jeff Roberson
e4625663c9 - Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc.  These counters are only examined through calcru.

Submitted by:	davidxu
Tested on:	x86, alpha, UP/SMP
2003-02-17 02:19:58 +00:00
Alfred Perlstein
9d4156aed3 Fix logic in loop so it actually executes.
Pointed out by: fjoe
2003-02-16 16:12:10 +00:00
Poul-Henning Kamp
f341ca9891 Remove #include <sys/dkstat.h> 2003-02-16 14:13:23 +00:00
Poul-Henning Kamp
3abd4ccf87 Move the tty related statistics counters to live with the tty code. 2003-02-16 13:22:15 +00:00
Jeff Roberson
71146186a1 - Introduce a new function bremfreel() that does a bremfree with the buf
queue lock already held.
 - In getblk() and flushbufqueues() use bremfreel() while we still have the
   buf queue lock held to keep the lists consistent.
 - Add LK_NOWAIT to two cases where we're essentially asserting that the bufs
   are not locked while acquiring the locks.  This will make sure that we get
   the appropriate panic() and not another one for sleeping with a lock held.
2003-02-16 10:43:06 +00:00
Jeff Roberson
5e8feb5bed - Add a WITNESS_SLEEP() for the appropriate cases in lockmgr(). 2003-02-16 10:39:49 +00:00
Alfred Perlstein
5015c68a3c prevent overflow in shminfo.shmmax 2003-02-16 06:08:55 +00:00
Jeffrey Hsu
a44009e07d Remove extraneous FILEDESC_LOCK around atomic read. 2003-02-16 02:15:15 +00:00
Andrew R. Reiter
1f5a94d5f6 - Update a couple of comments to make sense with what today's code is
doing (stale comments make arr something something ;)).
2003-02-15 23:25:12 +00:00
Tor Egge
218a01e062 Avoid file lock leakage when linuxthreads port or rfork is used:
- Mark the process leader as having an advisory lock
  - Check if process leader is marked as having advisory lock when
    closing file
  - Check that file is still open after lock has been obtained
  - Don't allow file descriptor table sharing between processes
    with different leaders

PR:		10265
Reviewed by:	alfred
2003-02-15 22:43:05 +00:00
Andrew R. Reiter
da8f0c8429 - Remove old comment for PURGE() as it no longer exists and implied it
was a comment to cache_zap().
- Add a comment to quickly state what cache_zap() does.

Reviewed by:	phk, mux
2003-02-15 18:58:06 +00:00
Tim J. Robbins
4444375710 Acquire Giant around calls to kern_sigaction() in sigaction(),
freebsd4_sigaction() and osigaction() instead of around the whole
body of those functions. They now no longer hold Giant around calls
to copyin() and copyout(), and it is slightly more obvious what
Giant is protecting.
2003-02-15 09:56:09 +00:00
Tim J. Robbins
c41c566c4a osigpending() no longer needs Giant, for the same reason sigpending()
does not.
2003-02-15 09:15:30 +00:00
Tim J. Robbins
48e8f774cb All uses of p_siglist are protected by the proc lock now, so there's
no need to acquire Giant in sigpending() anymore.
2003-02-15 08:42:02 +00:00
Alfred Perlstein
e7d6662f1b Do not allow kqueues to be passed via unix domain sockets. 2003-02-15 06:04:55 +00:00
Alfred Perlstein
edf6699ae6 Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a
barrier between free'ing filedesc structures.  Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.
2003-02-15 05:52:56 +00:00
Bosko Milekic
9e7225808e Make m_getm() always return the top of the newly allocated chain, as
opposed to returning the top of the old chain when there was one and
the top of the newly allocated chain if there was no old chain.

Actually, it should be noted that prior to this fix, although the
comment above m_getm() advertised that m_getm() would return the
top of the old chain (if an old chain was being passed in) it
actually [wrongly] was returning the tail mbuf in the old chain
instead.  This is a bug but since the one use of m_getm() in
the tree luckily did not depend on the behavior, it happened
to work out without notice.

Harti Brandt pointed out that the advertised behavior was actually
not the real behavior and so this change makes m_getm() ALWAYS
return the newly allocated chain (and fixes the comment).  This
is less confusing and is the best course of action as then the
caller is always able to have both a reference to the top of
the original chain (because it's passing it in in the call) and
a reference to the newly attached chain.  Although the API is
slightly modified, I don't think that any third-party code uses
m_getm() and if it does, it surely can't be working properly
because the old behavior was bogus.

API bug pointed out by: Harti Brandt <brandt@fokus.fraunhofer.de>
2003-02-14 16:50:13 +00:00
Dag-Erling Smørgrav
af2eed6648 Style nit. 2003-02-14 13:30:25 +00:00
Alfred Perlstein
3dc593c895 KASSERT format string does not need newline termination 2003-02-14 13:28:44 +00:00
Alfred Perlstein
0c5f7aaab5 Add kasserts to catch bad API usage.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
2003-02-14 13:18:51 +00:00
Alfred Perlstein
c11110eabe Fix crash dumps on ata and scsi.
To fix scsi, don't wait for ithreads if we're dumping, it makes the
debugger sad.

To fix ata, use what appears to be a polling method if we're dumping,
I stole this from tmm but added code to ensure that this change is
only in effect while dumping.

Tested by: des
2003-02-14 13:10:40 +00:00
Alfred Perlstein
e95499bd4c style. 2003-02-14 12:44:48 +00:00
Alfred Perlstein
aae87a3681 Print a backtrace in case we tsleep from inside of DDB. 2003-02-14 12:44:07 +00:00
Alan Cox
2bd63062b5 Use atomic ops to update amountpipekva. Amountpipekva represents the
total kernel virtual address space used by all pipes.  It is, thus, outside
the scope of any individual pipe lock.
2003-02-13 19:39:54 +00:00
Dag-Erling Smørgrav
f6cebd7310 It seems the extra precautions are no longer needed. 2003-02-13 10:05:20 +00:00
Tim J. Robbins
5ce623b8e0 Add an XXX comment noting that getrusage() accesses p_stats->p_ru
and p_stats->p_cru without holding the appropriate locks.
2003-02-13 09:53:59 +00:00
Peter Wemm
1c425b874c Add a 'debug.witness_trace' sysctl (and tunable) when DDB is present.
This causes LOR and could-sleep messages to come with a stack trace.
2003-02-13 01:35:56 +00:00
Peter Wemm
891e066864 Print "Stack backtrace:" right before dumping the backtrace. We cannot
expect end users to automatically recognize a stack trace for what it is.
2003-02-13 01:33:59 +00:00
Warner Losh
b235704d7c Implement rman_get_device
# I though this was alredy implemented

Pointy hat on my head shown by: peter
2003-02-12 07:00:59 +00:00
Alfred Perlstein
42e1b74af2 Don't lock FILEDESC under PROC.
The locking here needs to be revisited, but this ought to get rid of the
LOR messages that people are complaining about for now.  I imagine either
I or someone else interested with smp will eventually clear this up.
2003-02-11 07:20:52 +00:00
Jeff Roberson
25c4325446 - Add a comment about a race that will happen without Giant. 2003-02-10 22:47:34 +00:00
Jeff Roberson
c7b716cc2a - Unlock the nblock after the loop in bwillwrite(). 2003-02-10 22:33:59 +00:00
Jeff Roberson
783caefbbf - Enable STRICT_RESCHED until code that dynamically decides on resched
strictness based on the current workload is finished.
2003-02-10 14:11:23 +00:00
Jeff Roberson
407b015791 - Add a new variable 'kg_runtime' that tracks the amount of time we've run.
- Use the ratio of kg_runtime / kg_slptime to determine our dynamic priority.
 - Scale kg_runtime and kg_slptime back when the sum of the two exceeds
   SCHED_SLP_RUN_MAX.  This allows us to slowly forget old behavior.
 - Scale back the runtime and slptime in fork so that the new process has the
   same ratio but much less accumulated time.  This causes new behavior to be
   noticed more quickly.
2003-02-10 14:03:45 +00:00
Tim J. Robbins
fbf70de6b0 Lock the proc around accessing p_siglist in ttycheckoutq() in the
unused wait != 0 case.
2003-02-10 06:06:46 +00:00
Jeff Roberson
7137d635ac - In getnewbuf() unlock the bq lock prior to sleeping when we're out of
buffers.

Submitted by:	tegge
2003-02-10 06:02:51 +00:00
Jake Burkholder
3749dff3f9 Remove mtx_lock_giant from functions which are mp-safe. 2003-02-10 04:42:20 +00:00
Jeff Roberson
3306adcfcf - Correct another atomic op.
Spotted by:	alc
2003-02-09 22:39:51 +00:00
Jeff Roberson
08883c8a85 - Claim we're 'fsync' and not 'spec_fsync' in vop_stdfsync. 2003-02-09 12:29:38 +00:00
Jeff Roberson
69953c8435 - Move some code out from #ifdef INVARIANTS. 2003-02-09 12:11:37 +00:00
Jeff Roberson
05e393f0cd - Update a printf format for b_flags. 2003-02-09 11:56:13 +00:00
Jeff Roberson
767b9a529d - Cleanup unlocked accesses to buf flags by introducing a new b_vflag member
that is protected by the vnode lock.
 - Move B_SCANNED into b_vflags and call it BV_SCANNED.
 - Create a vop_stdfsync() modeled after spec's sync.
 - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
   fs specific processing.  This gives all of these filesystems proper
   behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
 - Annotate the locking in buf.h
2003-02-09 11:28:35 +00:00
Jeff Roberson
15553af710 - spell add 'add' and not 'subtract' in an atomic op.
Spotted by:	alc
Pointy hat to:	jeff
2003-02-09 11:21:40 +00:00
Jeff Roberson
d85be48243 - Lock down the buffer cache's infrastructure code. This includes locks on
buf lists, synchronization variables, and atomic ops for the counters.
   This change does not remove giant from any code although some pushdown
   may be possible.
 - In vfs_bio_awrite() don't access buf fields without the buf lock.
2003-02-09 09:47:31 +00:00
Julian Elischer
a282253a29 A little infrastructure, preceding some upcoming changes
to the profiling and statistics code.

Submitted by:	DavidXu@
Reviewed by:	peter@
2003-02-08 02:58:16 +00:00
Jeffrey Hsu
67c0ddef59 Remove vestiges of no longer needed unp_rvnode field.
Approved by:	phk (who originally added it in rev 1.8 of unpcb.h)
2003-02-06 01:34:43 +00:00
Julian Elischer
822ded67fe The lockmanager has to keep track of locks per thread, not per process.
Submitted by:	david Xu (davidxu@)
Reviewed by:	jhb@
2003-02-05 19:36:58 +00:00
Dag-Erling Smørgrav
c524b1a8cf Correct grammatical error in previous commit. 2003-02-04 18:47:17 +00:00
Dag-Erling Smørgrav
91dd013b1e Extra precautions before trying to start init(8). 2003-02-04 18:16:50 +00:00
Poul-Henning Kamp
6334a66358 Implement proper bounds-checking and truncation of device names, this has
become an issue now that end-user controlable attributes can become devices
names with the geom_vol_ffs class.
2003-02-04 11:04:26 +00:00
Poul-Henning Kamp
237d2765f9 Pave the road to removing the fixed size limit on device nodes:
Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.

Initialize si_name to point to the private buffer.

Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.
2003-02-04 10:32:40 +00:00
Poul-Henning Kamp
8751a8c73b Add vsnrprintf() which is just like vsnprintf() but takes a "radix"
argument for the kernel-special %r format.
2003-02-04 10:00:34 +00:00
Poul-Henning Kamp
91f1c2b3cc Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by:    tjr
2003-02-03 19:49:35 +00:00
Jake Burkholder
238dd3209a Split statclock into statclock and profclock, and made the method for driving
statclock based on profhz when profiling is enabled MD, since most platforms
don't use this anyway.  This removes the need for statclock_process, whose
only purpose was to subdivide profhz, and gets the profiling clock running
outside of sched_lock on platforms that implement suswintr.
Also changed the interface for starting and stopping the profiling clock to
do just that, instead of changing the rate of statclock, since they can now
be separate.

Reviewed by:	jhb, tmm
Tested on:	i386, sparc64
2003-02-03 17:53:15 +00:00
Hajimu UMEMOTO
12e4397ea3 Break out the bind and connect syscalls to intend to make calling
these syscalls internally easy.
This is preparation for force coming IPv6 support for Linuxlator.

Submitted by:	dwmalone
MFC after:	10 days
2003-02-03 17:36:52 +00:00
Tim J. Robbins
b338d59fef No need to lock Giant around call to nanosleep1() in nanosleep(). 2003-02-03 15:31:57 +00:00
Tim J. Robbins
411c25edae Avoid holding Giant across copyout() in gettimeofday() and getitimer(). 2003-02-03 14:47:22 +00:00
Hartmut Brandt
1b978d453b Make the variable types, the sysctl macros and the sysctl handler for
kern.ipc.{maxsockbuf,sockbuf_waste_factor} to agree that those variables
are of type unsigned long.

PR:		sparc64/47389
Approved by:	jake (mentor)
2003-02-03 06:50:59 +00:00
Jeff Roberson
5d7ef00cfe - Make some context switches conditional on SCHED_STRICT_RESCHED. This may
have some negative effect on interactivity but it yields great perf. gains.
   This also brings the conditions under which ULE context switches inline
   with SCHED_4BSD.
 - Define some new kseq_* functions for manipulating the run queue.
 - Add a new kseq member ksq_rslices and ksq_bload.  rslices is the sum of
   the slices of runnable kses.  This will be used for push load balance
   decisions.  bload is the number of threads blocked waiting on IO.
2003-02-03 05:30:07 +00:00
Jeff Roberson
cd6e33df1c - Stop abusing oncpu for our cpu binding. Define a scheduler local element
in the kse datastructure called ke_cpu.  This is the cpu which we are
   currently bound to.  Some flags may be added later to support hard binding.
2003-02-03 02:26:28 +00:00
Alfred Perlstein
04738e99b5 Catch more uses of MIN(). 2003-02-02 13:30:00 +00:00
Alfred Perlstein
8deebb0160 Consolidate MIN/MAX macros into one place (param.h).
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
2003-02-02 13:17:30 +00:00
Scott Long
7121cce58a Use hz if stathz is zero. Adopted from sched_4bsd. 2003-02-02 08:24:32 +00:00
Julian Elischer
6f8132a867 Reversion of commit by Davidxu plus fixes since applied.
I'm not convinced there is anything major wrong with the patch but
them's the rules..

I am using my "David's mentor" hat to revert this as he's
offline for a while.
2003-02-01 12:17:09 +00:00
Poul-Henning Kamp
4db4f5c87f Under #ifdef DIAGNOSTIC, fill malloc(9) allocations which do not have
M_ZERO specified with 0x70.  (malloc_flags=J for the kernel :-)
2003-02-01 10:07:49 +00:00
Poul-Henning Kamp
33bef83cc6 Under DIAGNOSTIC, only report expensive timeouts if they are more expensive
than the last on we reported.
2003-02-01 10:06:40 +00:00
Julian Elischer
ff92b12dce Only add one tick per tick to the thread stats, instead of some random number. 2003-01-31 22:14:46 +00:00
Robert Watson
565211b27f Correct handling of locking for chroot() and chdir() cases: rather
than having change_dir() release the vnode lock on success, hold the
lock so that we can use it later when invoking MAC checks and
VOP_ACCESS() in the chroot() code.  Update the comment to reflect
this calling convention.  Update callers to unlock the vnode
lock.  Correct a typo regarding vnode naming in the MAC case that
crept in via the previous patch applied.
2003-01-31 21:13:25 +00:00
Robert Watson
7278944df1 Clean up vnode handling on return from chroot() in certain error
cases: we might multiply vrele() a vnode when certain classes of
failures occur.  This appears to stem from earlier Giant/file
descriptor lock pushdown and restructuring.

Submitted by:	maxim
2003-01-31 18:57:04 +00:00
Tim J. Robbins
48ed1432c5 Use a local variable to store the number of ticks that elapsed in
kernel mode instead of (unintentionally) using the global `ticks'.
This error completely broke profiling.
2003-01-31 11:22:31 +00:00
Poul-Henning Kamp
6e1203e558 NO_GEOM cleanup: unifdef; 2003-01-30 19:22:27 +00:00
Poul-Henning Kamp
c5cab5b2fa NO_GEOM cleanup: retire to attic. 2003-01-30 12:58:55 +00:00
Poul-Henning Kamp
c9834aa961 NODEVFS cleanup: Unifdef. 2003-01-30 12:51:32 +00:00
Poul-Henning Kamp
8e67075792 NO_GEOM cleanup: remove #ifdef 2003-01-30 12:36:30 +00:00
Poul-Henning Kamp
4af0d0c21f NODEVFS cleanup: remove #ifdefs 2003-01-30 12:35:40 +00:00
Poul-Henning Kamp
f6a1852dcc NODEVFS cleanup: remove #ifdefs. 2003-01-30 12:35:17 +00:00
Poul-Henning Kamp
34189c035b NODEVFS cleanup: Remove cdevsw[].
This implicitly removes the need for major numbers, but a number of
drivers still know things they shouldn't need to, and we need to
consider if there are applications which cache major(+minor) gleaned
from stat(2) and rely on it being constant over reboots before we
start assigning random majors.
2003-01-29 21:54:03 +00:00
Tim J. Robbins
af7cbce89c Fix two fatal signedness errors introduced when i and j in semop()
were changed from int to size_t in the previous revision.

PR:		47625
2003-01-29 12:30:59 +00:00
Poul-Henning Kamp
60ca399653 Move timecounters notion of frequency to 64 bits.
[WARNING: CPUs in the distant future may be closer than they appear!]
2003-01-29 11:29:22 +00:00
Jeff Roberson
0a016a05a4 - Use ksq_load as the authoritive count of kses on the pair of kseqs for
sched_runnable() et all.
 - Remove some dead code in sched_clock().
 - Define two macros KSEQ_SELF() and KSEQ_CPU() for getting the kseq of the
   current cpu or some alternate cpu.
 - Start introducing kseq_() functions, such as kseq_choose() and kseq_setup().
2003-01-29 07:00:51 +00:00
Jeff Roberson
bf857e69a2 - Remove debugging code that didn't work on UP. 2003-01-29 00:26:47 +00:00
Jeff Roberson
d465fb9589 - Allow idle's pctcpu time to be calculated. 2003-01-28 09:30:17 +00:00
Jeff Roberson
c9f25d8f92 - Fix the ksq_load calculation. It now reflects the number of entries on the
run queue for each cpu.
 - Introduce kse stealing into the sched_choose() code.  This helps balance
   cpus better in cases where process turnover is high.  This implementation
   is fairly trivial and will likely be only a temporary measure until
   something more sophisticated has been written.
2003-01-28 09:28:20 +00:00
Peter Wemm
bf2053cad6 No longer force COMPAT_FREEBSD4 to be on. 2003-01-27 23:01:03 +00:00
Poul-Henning Kamp
109751d28c Don't dereference null vnode pointer if controling terminal was revoked.
Submitted by:	"Peter Edwards" <pmedwards@eircom.net>
2003-01-27 16:54:17 +00:00
David Xu
ba07d97e62 Use kg_numupcalls to see if we are closing a thread group,
not kg_kses which is not changed when a group is still working.
2003-01-26 23:39:33 +00:00
Alfred Perlstein
ca315837c7 fix warnings 2003-01-26 23:25:00 +00:00
Alfred Perlstein
b17c9cfa5e Add const qualifier to data argument for msgsnd.
PR: standards/45274
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-26 20:09:34 +00:00
David Xu
0dbb100b9b Move UPCALL related data structure out of kse, introduce a new
data structure called kse_upcall to manage UPCALL. All KSE binding
and loaning code are gone.

A thread owns an upcall can collect all completed syscall contexts in
its ksegrp, turn itself into UPCALL mode, and takes those contexts back
to userland. Any thread without upcall structure has to export their
contexts and exit at user boundary.

Any thread running in user mode owns an upcall structure, when it enters
kernel, if the kse mailbox's current thread pointer is not NULL, then
when the thread is blocked in kernel, a new UPCALL thread is created and
the upcall structure is transfered to the new UPCALL thread. if the kse
mailbox's current thread pointer is NULL, then when a thread is blocked
in kernel, no UPCALL thread will be created.

Each upcall always has an owner thread. Userland can remove an upcall by
calling kse_exit, when all upcalls in ksegrp are removed, the group is
atomatically shutdown. An upcall owner thread also exits when process is
in exiting state. when an owner thread exits, the upcall it owns is also
removed.

KSE is a pure scheduler entity. it represents a virtual cpu. when a thread
is running, it always has a KSE associated with it. scheduler is free to
assign a KSE to thread according thread priority, if thread priority is changed,
KSE can be moved from one thread to another.

When a ksegrp is created, there is always N KSEs created in the group. the
N is the number of physical cpu in the current system. This makes it is
possible that even an userland UTS is single CPU safe, threads in kernel still
can execute on different cpu in parallel. Userland calls kse_create to add more
upcall structures into ksegrp to increase concurrent in userland itself, kernel
is not restricted by number of upcalls userland provides.

The code hasn't been tested under SMP by author due to lack of hardware.

Reviewed by: julian
2003-01-26 11:41:35 +00:00
Jeff Roberson
35e6168fcd - Add the ule scheduler. This is intended to be a general purpose process
scheduler with many SMP benefits.  It is still very experimental and should
   be used only in test environments.
2003-01-26 05:23:15 +00:00
Jeff Roberson
4e997f4b87 - Call sched_sleep() instead of rolling our own in cv_waitq_add(). 2003-01-26 04:00:39 +00:00
Alfred Perlstein
e1d7d0bb60 Bring shm functions closer the the opengroup standards.
PR: 47469
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-25 21:33:05 +00:00
Alfred Perlstein
3beb32709d Bring semop() closer the the opengroup standards.
PR: 47471
Submitted by: Craig Rodrigues <rodrigc@attbi.com>
2003-01-25 21:27:37 +00:00
Poul-Henning Kamp
4394f4767d Add sysctl kern.timecounter.nsetclock which indicates the number of
potential discontinuities in our UTC timescale.

Applications can monitor this variable if they want to be informed
about steps in the timescale.  Slews (ntp and adjtime(2)) and
frequency adjustments (ntp) will not increment this counter, only
operations which set the clock.  No attempt is made to classify
size or direction of the step.
2003-01-25 07:51:09 +00:00
Jeffrey Hsu
afb0573a12 Remove extraneous FILEDESC_LOCKs around atomic reads.
Reviewed by:	jhb
2003-01-24 22:49:52 +00:00
Hajimu UMEMOTO
bdc5f6a345 Added comment why this workaround is required.
Suggested by:	sam
MFC after:	1 week
2003-01-22 18:03:06 +00:00
Hajimu UMEMOTO
56b3905f15 getpeername() returns with no error but didn't fill struct sockaddr
correctly against PF_LOCAL.  It seems that the test always fails then
sockaddr was not filled.  So, I added else clause for workaround.
I doubt if it is right fix.  However, it is better than nothing.  I
found that NetBSD has same potential problem.  But, fortunately,
NetBSD has equivalent else clause.

MFC after:	1 week
2003-01-22 13:13:13 +00:00
Dag-Erling Smørgrav
ecf031c9ad There's absolutely no need for a struct-within-a-struct, so move the
counters out of the inner struct and remove it.
2003-01-21 20:33:27 +00:00
Jeffrey Hsu
a448a15bc1 Add missing SMP file locks around read-modify-write operations on
the flag field.

Reviewed by:	rwatson
2003-01-21 20:20:48 +00:00
Thomas Moestl
72aeb19aba Correct an off-by-one in the boundary check. Otherwise, resource
allocations would fail if the desired allocation size was equal to
the boundary.
2003-01-21 17:02:21 +00:00
Poul-Henning Kamp
a63935c3f6 #ifdef NO_GEOM all of this file. 2003-01-21 10:40:46 +00:00
Alfred Perlstein
44956c9863 Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
Sam Leffler
07ff231fcb preserve the order of tags copied by m_tag_copy_chain
Obtained from:	OpenBSD
2003-01-21 06:14:38 +00:00
Jeffrey Hsu
34c54d9f74 Rewrite the SMP filedesc locking in knote_attach() in order to
1.  eliminate unnecessary loop which frees and re-allocates
	the just allocated array
  2.  eliminate the newsize recomputation
  3.  eliminate unnecessary unlock and relock around free
  4.  correctly match the free with the malloc into M_KQUEUE instead of M_TEMP
  5.  eliminate conditional assignment of oldlist, which is equivalent to a
	simple assignment
  6.  eliminate the oldlist temporary variable completely

Reviewed by:    jhb
2003-01-21 04:05:49 +00:00
Robert Watson
ec35c2af68 Perform VOP_GETATTR() before mac_check_vnode_exec() so that
the cached attributes are available to MAC modules.

Submitted by:   mike halderman <mrh@nosc.mil>
Obtained from:	TrustedBSD Project
2003-01-21 03:26:28 +00:00
Jake Burkholder
7251b4bf93 Resolve relative relocations in klds before trying to parse the module's
metadata.  This fixes module dependency resolution by the kernel linker on
sparc64, where the relocations for the metadata are different than on other
architectures; the relative offset is in the addend of an Elf_Rela record
instead of the original value of the location being patched.
Also fix printf formats in debug code.

Submitted by:	Hartmut Brandt <brandt@fokus.gmd.de>
PR:		46732
Tested on:	alpha (obrien), i386, sparc64
2003-01-21 02:42:44 +00:00
Matthew Dillon
2d5c7e4506 Close the remaining user address mapping races for physical
I/O, CAM, and AIO.  Still TODO: streamline useracc() checks.

Reviewed by:	alc, tegge
MFC after:	7 days
2003-01-20 17:46:48 +00:00
Poul-Henning Kamp
c0805171aa disk_dev_synth() is a NO_GEOM hack. 2003-01-20 11:29:07 +00:00
Poul-Henning Kamp
0b4583e873 Only include <sys/diskslice.h> ifdef NO_GEOM 2003-01-20 11:28:37 +00:00
Alan Cox
28ec30cd9f - Hold the page queues lock around vm_page_hold().
- Assert that the page queues lock rather than Giant is held in
   vm_page_hold().
2003-01-20 09:24:03 +00:00
Julian Elischer
67f7c1bbe1 Remove a KASSERT that can now happen and add a missing setrunnable. 2003-01-20 03:41:04 +00:00
Poul-Henning Kamp
8a5c54f72d #ifdef NO_GEOM these files entirely. When NO_GEOM is removed as an
option the files can be removed.
2003-01-19 11:51:35 +00:00
Tim J. Robbins
5cb6b2cada Remove unnecessary locking of Giant around nanotime() in clock_gettime(). 2003-01-19 11:28:22 +00:00
Poul-Henning Kamp
5ecd6fd411 Mark more code #ifdef NODEVFS 2003-01-19 11:26:13 +00:00
Poul-Henning Kamp
7e760e148a Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.
2003-01-19 11:03:07 +00:00
Poul-Henning Kamp
ec2c4225ce When we use DEVFS, we don't need the /dev/tty pseudo-driver to do
more than return ENXIO from its open routine, so most of this file
is unneeded.

A straight #ifdef'ing would look quite messy, and make the file
quite unreadable, so instead I have simply added the DEVFS version
of the file at the top, protected by #ifndef NODEVFS.

Once we have removed NODEVFS option, we can retain 86 the 86 lines at
the top and drop the other 287 lines.
2003-01-19 10:23:47 +00:00
Alfred Perlstein
31f3e2ad8e useracc() is mpsafe so we only need to hold Giant
over the call to nanosleep1()

Pointed out by: tjr
2003-01-19 06:51:10 +00:00
Warner Losh
b47d073500 Fix comment about what we do when there are no listeners. 2003-01-19 00:34:17 +00:00
Poul-Henning Kamp
ffffe9203f Move alpha_fix_srm_checksum() from subr_diskmbr.c to subr_disklabel.c 2003-01-17 19:37:55 +00:00
Poul-Henning Kamp
40f683a443 Remove the unused DSO_* options. 2003-01-17 19:36:14 +00:00
Thomas Moestl
6f7cab9301 Disallow listen() on sockets which are in the SS_ISCONNECTED or
SS_ISCONNECTING state, returning EINVAL (which is what POSIX mandates
in this case).
listen() on connected or connecting sockets would cause them to enter
a bad state; in the TCP case, this could cause sockets to go
catatonic or panics, depending on how the socket was connected.

Reviewed by:	-net
MFC after:	2 weeks
2003-01-17 19:20:00 +00:00
Poul-Henning Kamp
e948321c7a Move dkmodpart() from subr_diskslice.c to subr_disklabel.c. 2003-01-17 19:05:58 +00:00
Poul-Henning Kamp
ce9fac0072 Move a local variable to avoid the compiler warning about it being unused. 2003-01-16 20:06:45 +00:00
John Hay
b1e7e2019e hardpps() wants the raw hardware counter value converted to nanoseconds. 2003-01-16 19:22:13 +00:00
Alan Cox
6eb07b4ac2 Fix two long-standing, but likely harmless, errors in the use of
vm_pageout_deficit:
1. Update vm_pageout_deficit before VM_WAIT.  There is no sense in
   delaying the update; the sooner the pageout daemon receives this
   information the better.  Reviewed by: tegge
2. Update vm_pageout_deficit according to the number of pages still
   needed to complete the allocation, not the original size of the
   allocation.  Submitted by: tegge

(These errors have existed since the introduction of vm_pageout_deficit
in revision 1.144.)
2003-01-16 08:14:56 +00:00
Matthew Dillon
f597900329 Merge all the various copies of vmapbuf() and vunmapbuf() into a single
portable copy.  Note that pmap_extract() must be used instead of
pmap_kextract().

This is precursor work to a reorganization of vmapbuf() to close remaining
user/kernel races (which can lead to a panic).
2003-01-15 23:54:35 +00:00
David Xu
4e77d3c6a2 Don't forget to disconnect object from class. 2003-01-15 14:58:07 +00:00
Matthew Dillon
fe41ca530c Introduce the ability to flag a sysctl for operation at secure level 2 or 3
in addition to secure level 1.  The mask supports up to a secure level of 8
but only add defines through CTLFLAG_SECURE3 for now.

As per the missif in the log entry for 1.11 of ip_fw2.c which added the
secure flag to the IPFW sysctl's in the first place, change the secure
level requirement from 1 to 3 now that we have support for it.

Reviewed by:	imp
With Design Suggestions by:	imp
2003-01-14 19:35:33 +00:00
Alan Cox
b0ef8c5fe4 - Update vm_pageout_deficit using atomic operations. It's a simple
counter outside the scope of existing locks.
 - Eliminate a redundant clearing of vm_pageout_deficit.
2003-01-14 06:57:03 +00:00
Matthew Dillon
3db161e079 It is possible for an active aio to prevent shared memory from being
dereferenced when a process exits due to the vmspace ref-count being
bumped.  Change shmexit() and shmexit_myhook() to take a vmspace instead
of a process and call it in vmspace_dofree().  This way if it is missed
in exit1()'s early-resource-free it will still be caught when the zombie is
reaped.

Also fix a potential race in shmexit_myhook() by NULLing out
vmspace->vm_shm prior to calling shm_delete_mapping() and free().

MFC after:	7 days
2003-01-13 23:04:32 +00:00
Alfred Perlstein
ac41f2ef0b style(9) fixes, mostly add parens around return arguments. 2003-01-13 15:06:05 +00:00
Jeff Roberson
8fb913face - Unbreak world. I did not notice that libkvm was still used in some places
to access the pctcpu.  This will have to be sorted out more later as the
   new scheduler requires a procedural interface for this data.  A more
   complete solution will follow.
2003-01-13 03:42:41 +00:00
Matthew Dillon
48e3128b34 Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
2003-01-13 00:33:17 +00:00
Jeff Roberson
bcb06d5980 - Move ke_pctcpu and ke_cpticks into the scheduler specific datastructure.
This will prevent access through mechanisms other than the published
   interfaces.
2003-01-12 19:04:49 +00:00
Tim J. Robbins
ae3b195fcf Allowing nent < 0 in aio_suspend() and lio_listio() is just asking for
trouble. Return EINVAL instead.
2003-01-12 09:40:23 +00:00
Tim J. Robbins
44a2c818de Remove "XXX undocumented" comment from lio_listio(). 2003-01-12 09:33:16 +00:00
Alan Cox
8febaa4df0 vm_hold_load_pages() needn't clear PG_ZERO because it didn't pass
VM_ALLOC_ZERO to vm_page_alloc(). (PG_ZERO is clear by default.)
2003-01-12 06:30:15 +00:00
Matthew Dillon
cd72f2180b Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary.  There are no operational changes in this
commit.
2003-01-12 01:37:13 +00:00
Maxime Henrion
5d9155dfee Fix kernel build.
Pointy hats to:	dillon, Hiten Pandya <hiten@unixdaemons.com>
2003-01-11 12:39:45 +00:00
Tim J. Robbins
b3f1af6b8e Don't count mbufs with m_type == MT_HEADER or MT_OOBDATA as control data
in sballoc(), sbcompress(), sbdrop() and sbfree(). Fixes fstat() st_size
reporting and kevent() EVFILT_READ on TCP sockets.
2003-01-11 07:51:52 +00:00
Matthew Dillon
57e6d29b1e Remove all use of the LOG2() macro/inline, undoing some non-optimal cruft
that crept in recently.  GCC will optimize the divides and multiplies for us.

Submitted by:	David Schultz <dschultz@uclink.Berkeley.EDU>
MFC after:	1 day
2003-01-11 01:09:51 +00:00
Alfred Perlstein
b3890a1c42 make sem_leave return a usable errno instead of -1.
make ksem_close return that usable errno instead of -1 (ERESTART).

PR: 46957
2003-01-10 23:13:16 +00:00
David Xu
7be6584678 Don't record thread pointer, it's not permanent in process life cycle,
use process pointer instead.
2003-01-10 09:54:51 +00:00
Jake Burkholder
be0800e85e Oops, add zstty to the witness order list.
Noticed by:	benno
2003-01-09 15:45:28 +00:00
David Xu
b47679ccff Some KSE syscalls are MPSAFE. 2003-01-08 04:57:53 +00:00
Peter Wemm
2488f30980 Move the MOD_SHUTDOWN event from shutdown_post_sync to shutdown_final,
so that entities that want to use the post_sync hook to write stuff
to devices and other tidy-up can do so before the device tree is
shot down.  eg: da doing a SYNC_CACHE etc.  This should get crashdumps
working on mpt devices again, and stops the ia64 boxes locking up
on regular shutdown when da tries to issue the scsi commands to mpt.

Obtained from:  njl, gibbs
2003-01-07 22:24:13 +00:00
Brian Feldman
c579babe54 In vn_open(), unset ndp->ni_vp when returning failure so that code
which expects it to be NULL unless the return value was 0 will work.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-01-07 20:59:55 +00:00
Alfred Perlstein
a11acc6f8a Use copyout to access user memory.
Submittted by: pho
MFC After: 2 days
2003-01-07 20:10:04 +00:00
Alan Cox
1f17965656 Make bogus_offset local to bufinit(). 2003-01-07 19:55:08 +00:00
Poul-Henning Kamp
c49bddce94 Fix warnings & errors caused by my last commit. 2003-01-07 19:09:10 +00:00
John Baldwin
377a66bc40 Cast the integer read as the first argument for %b to an unsigned integer
so it's value is not sign extended when assigned to the uintmax_t variable
used internally by printf.  For example, if bit 31 is set in the cpuid
feature word, then %b would print out the initial value as a 16 character
hexadecimal value.  Now it only prints out an 8 character value.

Reviewed by:	bde
2003-01-07 18:17:18 +00:00