Commit Graph

6054 Commits

Author SHA1 Message Date
davidxu
193655459b Add a missing '!'. 2003-02-26 01:56:14 +00:00
davidxu
3766220237 Add a simple facility to allow round roubin in userland.
Reviewed by:	julain
2003-02-26 00:58:23 +00:00
mckusick
0309ffd2e7 When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c
delta 1.371) we must ensure that we do not get ourselves into a
recursive trap endlessly trying to clean up after ourselves.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:59:09 +00:00
mtm
3a9a3e5e4d Unbreak mutex profiling (at least for me).
o Always check for null when dereferencing the filename component.
	o Implement a try-and-backoff method for allocating memory to
	  dump stats to avoid a spin-lock -> sleep-lock mutex lock order
	  panic with WITNESS.

Approved by:	des, markm (mentor)
Not objected:	jhb
2003-02-25 22:28:46 +00:00
jeff
1228dbd648 - Add the missing NULL interlock argument to a recently added BUF_LOCK. 2003-02-25 08:23:11 +00:00
mckusick
6e9f6f2d6d Prevent large files from monopolizing the system buffers. Keep
track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.

Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 06:44:42 +00:00
davidxu
5bb30740ab Remove a bogus comment. 2003-02-25 05:17:18 +00:00
davidxu
ad34180f0e Remove a never true condition. 2003-02-25 05:14:18 +00:00
jeff
9e4c9a6ce9 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
mux
541937cf73 Cleanup of the d_mmap_t interface.
- Get rid of the useless atop() / pmap_phys_address() detour.  The
  device mmap handlers must now give back the physical address
  without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
  int.  Now we properly pass a vm_offset_t * and expect it to be
  filled by the mmap handler when the mapping was successful.  The
  mmap handler must now return 0 when successful, any other value
  is considered as an error.  Previously, returning -1 was the only
  way to fail.  This change thus accidentally fixes some devices
  which were bogusly returning errno constants which would have been
  considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
  no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with:		alc, phk, jake
Reviewed by:		peter
Compile-tested on:	LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on:	i386
2003-02-25 03:21:22 +00:00
scottl
d3806508a8 Don't NULL out p_fd until after closefd() has been called. This isn't
totally correct, but it has caused breakage for too long.  I welcome
someone with more fd fu to fix it correctly.
2003-02-24 05:46:55 +00:00
davidxu
075f8dfffd Remove a XXXKSE. kg_completed now needs proc lock. 2003-02-24 01:28:10 +00:00
davidxu
6e6d69e258 Backout last surplus commit. That day just wasn't my day. 2003-02-24 00:49:55 +00:00
tegge
2072a48a93 Sync new socket nonblocking/async state with file flags in accept().
PR:		1775
Reviewed by:	mbr
2003-02-23 23:00:28 +00:00
phk
af9c7adfc3 Bracket the kern.vnode sysctl in #ifdef notyet because it results
in massive locking issues on diskless systems.

It is also not clear that this sysctl is non-dangerous in its
requirements for locked down memory on large RAM systems.
2003-02-23 18:09:05 +00:00
phk
699d720e80 OK, I was too sleepy there...
Pointy hat over here!
2003-02-23 13:45:55 +00:00
phk
4ddaec005d Implement CLOCK_MONOTONIC. 2003-02-23 10:18:31 +00:00
jake
696c85b9bd Add a /a modifier to the show ktr ddb command, which prints the whole trace
buffer without stopping.  Useful if you just want to capture the output but
can't run ktrdump.
2003-02-22 23:30:37 +00:00
rwatson
2383991687 Don't panic when enumerating SYSCTL_NODE() nodes without any children
nodes.

Submitted by:	green, Hiten Pandya <hiten@unixdaemons.com>
2003-02-22 17:58:06 +00:00
mtm
8ccfd800bd Remove a comment which hasn't been true since rev. 1.158
Approved by:	jhb, markm (mentor)(implicit)
2003-02-22 05:59:48 +00:00
rwatson
9f4f9a305c Export the name of the device used to mount the root file system as
kern.rootdev.  If rootdev is undefined (NFS mount, etc), export an
empty string.

Desired by:	peter
2003-02-22 05:01:12 +00:00
peter
4d4115d2f3 Missing M_TRYWAIT from so_upcall third argument. 2003-02-21 22:23:40 +00:00
phk
02e550fabb NO_GEOM cleanup:
Retire the "d_dump_t" and use the "dumper_t" type instead.

Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t.  (Remember: we could have net-dumpers if
somebody wrote us one!)

Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.

Change device drivers accordingly.
2003-02-21 19:00:48 +00:00
davidxu
1838912108 If UTS kernel is calling kse_wakeup for itself, do nothing. 2003-02-21 07:11:38 +00:00
phk
72688ad7fe Change the console interface to pass a "struct consdev *" instead of a
dev_t to the method functions.

The dev_t can still be found at struct consdev *->cn_dev.

Add a void *cn_arg element to struct consdev which the drivers can use
for retrieving their softc.
2003-02-20 20:54:45 +00:00
phk
fb70d313d7 Add a dead_cdevsw which does its best to return ENXIO if at all possible.
In devsw() return dead_cdevsw instead of NULL in case the dev_t does not
have a si_devsw.

This may improve our survival chances with devices which go away unexpectedly.
2003-02-20 15:35:54 +00:00
davidxu
b4106bbfef Forgot to set KU_DOUPCALL in kse_wakeup. 2003-02-20 08:22:04 +00:00
davidxu
d08eff5aaa Add a timeout parameter to kse_release. 2003-02-20 08:18:15 +00:00
bmilekic
26ba0eb55c o Allow "buckets" in mb_alloc to be differently sized (according to
compile-time constants).  That is, a "bucket" now is not necessarily
  a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ
  worth of mbufs, clusters.
o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce
  {mbuf,clust}_lowm, which currently has no effect but will be used
  to set the low watermarks.
o Fix netstat so that it can deal with the differently-sized buckets
  and teach it about the low watermarks too.
o Make sure the per-cpu stats for an absent CPU has mb_active set to 0,
  explicitly.
o Get rid of the allocate refcounts from mbuf map mess.  Instead,
  just malloc() the refcounts in one shot from mbuf_init()
o Clean up / update comments in subr_mbuf.c
2003-02-20 04:26:58 +00:00
tjr
569e9d1a86 Remove the PL_SHAREMOD flag from struct plimit, which could have been
used to share resource limits between rfork threads, but never was.
Removing it makes resource limit locking much simpler -- only the current
process can change the contents of the structure that p_limit points to.
2003-02-20 04:18:42 +00:00
cognet
8d83e0054a Remove duplicate includes.
Submitted by:	Cyril Nguyen-Huu <cyril@ci0.org>
2003-02-20 03:26:11 +00:00
bmilekic
78e7b5f0c6 Fix a serious bug when computing the index for the
reference counter array for mbuf clusters.  I don't know
how this got past early testing nor how it survived so long
without getting caught.  If anyone was seeing really really
bizarre memory corruption in a few mbufs this would be why.
2003-02-20 03:01:04 +00:00
davidxu
365e5f6c2a Move thread limits testing code up a bit. This let UPCALLING thread
takes possible accumulated contexts away.
2003-02-20 01:11:17 +00:00
phk
05d8d6e23f Add M_WAITOK 2003-02-19 22:51:33 +00:00
davidxu
e0c2153011 Count non-threaded group. 2003-02-19 13:40:24 +00:00
davidxu
0d18aacb0c Update comments to reflect new KSE code. 2003-02-19 13:36:51 +00:00
tjr
95727dd513 The "m = m->m_next" that was removed in the revision 1.12 was necessary
for the m->m_next != NULL case to avoid looping infinitely when the first
mbuf in the chain becomes full.
2003-02-19 10:12:42 +00:00
davidxu
dfa9741d3b M_WAITOK and remove an useless comment. 2003-02-19 09:59:12 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
davidxu
4fa19fe46a Optimize the case when max threads number was hit. 2003-02-19 04:01:55 +00:00
peter
c8ccde8063 Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been
#if'ed out for a while.  Complete the deed and tidy up some other bits.

We need to be able to call this stuff from outer edges of interrupt
handlers for devices that have the ISR bits in pci config space.  Making
the bios code mpsafe was just too hairy.  We had also stubbed it out some
time ago due to there simply being too much brokenness in too many systems.
This adds a leaf lock so that it is safe to use pci_read_config() and
pci_write_config() from interrupt handlers.  We still will use pcibios
to do interrupt routing if there is no acpi.. [yes, I tested this]

Briefly glanced at by:  imp
2003-02-18 03:36:49 +00:00
davidxu
6a2061cb8b Further fix PS_NEEDSIGCHK 2003-02-17 14:54:57 +00:00
davidxu
0330c19021 Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall,
I think it is a better place to handle it.
2003-02-17 14:41:22 +00:00
tjr
6ebeaa8ec8 Use the proc lock to protect p_realtimer instead of Giant, and obtain
sched_lock around accesses to p_stats->p_timer[] to avoid a potential
race with hardclock. getitimer(), setitimer() and the realitexpire()
callout are now Giant-free.
2003-02-17 10:03:02 +00:00
jeff
5c29a640b8 - Add a new function, thread_signal_add(), that is called from postsig to
add a signal to a mailbox's pending set.
 - Add a new function, thread_signal_upcall(), this causes the current thread
   to upcall so that we can deliver pending signals.

Reviewed by:	mini
2003-02-17 09:58:11 +00:00
julian
af55753a06 Move a bunch of flags from the KSE to the thread.
I was in two minds as to where to put them in the first case..
I should have listenned to the other mind.

Submitted by:	 parts by davidxu@
Reviewed by:	jeff@ mini@
2003-02-17 09:55:10 +00:00
jeff
590a39e29b - Split the struct kse into struct upcall and struct kse. struct kse will
soon be visible only to schedulers.  This greatly simplifies much the
   KSE code.

Submitted by:	davidxu
2003-02-17 05:14:26 +00:00
jeff
aa384c931f - Move ke_sticks, ke_iticks, ke_uticks, ke_uu, ke_su, and ke_iu back into
the proc.  These counters are only examined through calcru.

Submitted by:	davidxu
Tested on:	x86, alpha, UP/SMP
2003-02-17 02:19:58 +00:00
alfred
81da3fbe2a Fix logic in loop so it actually executes.
Pointed out by: fjoe
2003-02-16 16:12:10 +00:00
phk
4bfb37f22e Remove #include <sys/dkstat.h> 2003-02-16 14:13:23 +00:00
phk
811b1cae1c Move the tty related statistics counters to live with the tty code. 2003-02-16 13:22:15 +00:00
jeff
de87d496d3 - Introduce a new function bremfreel() that does a bremfree with the buf
queue lock already held.
 - In getblk() and flushbufqueues() use bremfreel() while we still have the
   buf queue lock held to keep the lists consistent.
 - Add LK_NOWAIT to two cases where we're essentially asserting that the bufs
   are not locked while acquiring the locks.  This will make sure that we get
   the appropriate panic() and not another one for sleeping with a lock held.
2003-02-16 10:43:06 +00:00
jeff
bdd1053074 - Add a WITNESS_SLEEP() for the appropriate cases in lockmgr(). 2003-02-16 10:39:49 +00:00
alfred
4e5b966b93 prevent overflow in shminfo.shmmax 2003-02-16 06:08:55 +00:00
hsu
762f64befa Remove extraneous FILEDESC_LOCK around atomic read. 2003-02-16 02:15:15 +00:00
arr
8e7322af29 - Update a couple of comments to make sense with what today's code is
doing (stale comments make arr something something ;)).
2003-02-15 23:25:12 +00:00
tegge
f360480a68 Avoid file lock leakage when linuxthreads port or rfork is used:
- Mark the process leader as having an advisory lock
  - Check if process leader is marked as having advisory lock when
    closing file
  - Check that file is still open after lock has been obtained
  - Don't allow file descriptor table sharing between processes
    with different leaders

PR:		10265
Reviewed by:	alfred
2003-02-15 22:43:05 +00:00
arr
e1e48e3d34 - Remove old comment for PURGE() as it no longer exists and implied it
was a comment to cache_zap().
- Add a comment to quickly state what cache_zap() does.

Reviewed by:	phk, mux
2003-02-15 18:58:06 +00:00
tjr
c831929bbb Acquire Giant around calls to kern_sigaction() in sigaction(),
freebsd4_sigaction() and osigaction() instead of around the whole
body of those functions. They now no longer hold Giant around calls
to copyin() and copyout(), and it is slightly more obvious what
Giant is protecting.
2003-02-15 09:56:09 +00:00
tjr
a000ef163a osigpending() no longer needs Giant, for the same reason sigpending()
does not.
2003-02-15 09:15:30 +00:00
tjr
f12b647e3e All uses of p_siglist are protected by the proc lock now, so there's
no need to acquire Giant in sigpending() anymore.
2003-02-15 08:42:02 +00:00
alfred
29fb7c2bce Do not allow kqueues to be passed via unix domain sockets. 2003-02-15 06:04:55 +00:00
alfred
d9a7e5d627 Fix LOR with PROC/filedesc. Introduce fdesc_mtx that will be used as a
barrier between free'ing filedesc structures.  Basically if you want to
access another process's filedesc, you want to hold this mutex over the
entire operation.
2003-02-15 05:52:56 +00:00
bmilekic
49e9cba72e Make m_getm() always return the top of the newly allocated chain, as
opposed to returning the top of the old chain when there was one and
the top of the newly allocated chain if there was no old chain.

Actually, it should be noted that prior to this fix, although the
comment above m_getm() advertised that m_getm() would return the
top of the old chain (if an old chain was being passed in) it
actually [wrongly] was returning the tail mbuf in the old chain
instead.  This is a bug but since the one use of m_getm() in
the tree luckily did not depend on the behavior, it happened
to work out without notice.

Harti Brandt pointed out that the advertised behavior was actually
not the real behavior and so this change makes m_getm() ALWAYS
return the newly allocated chain (and fixes the comment).  This
is less confusing and is the best course of action as then the
caller is always able to have both a reference to the top of
the original chain (because it's passing it in in the call) and
a reference to the newly attached chain.  Although the API is
slightly modified, I don't think that any third-party code uses
m_getm() and if it does, it surely can't be working properly
because the old behavior was bogus.

API bug pointed out by: Harti Brandt <brandt@fokus.fraunhofer.de>
2003-02-14 16:50:13 +00:00
des
c872f41146 Style nit. 2003-02-14 13:30:25 +00:00
alfred
f69f06f4e6 KASSERT format string does not need newline termination 2003-02-14 13:28:44 +00:00
alfred
766548f91e Add kasserts to catch bad API usage.
Submitted by: Hiten Pandya <hiten@unixdaemons.com>
2003-02-14 13:18:51 +00:00
alfred
268bc18ef2 Fix crash dumps on ata and scsi.
To fix scsi, don't wait for ithreads if we're dumping, it makes the
debugger sad.

To fix ata, use what appears to be a polling method if we're dumping,
I stole this from tmm but added code to ensure that this change is
only in effect while dumping.

Tested by: des
2003-02-14 13:10:40 +00:00
alfred
6c48c14c49 style. 2003-02-14 12:44:48 +00:00
alfred
952ec160dd Print a backtrace in case we tsleep from inside of DDB. 2003-02-14 12:44:07 +00:00
alc
d3cfca777c Use atomic ops to update amountpipekva. Amountpipekva represents the
total kernel virtual address space used by all pipes.  It is, thus, outside
the scope of any individual pipe lock.
2003-02-13 19:39:54 +00:00
des
0fb179fc08 It seems the extra precautions are no longer needed. 2003-02-13 10:05:20 +00:00
tjr
729d326fb2 Add an XXX comment noting that getrusage() accesses p_stats->p_ru
and p_stats->p_cru without holding the appropriate locks.
2003-02-13 09:53:59 +00:00
peter
30c571736e Add a 'debug.witness_trace' sysctl (and tunable) when DDB is present.
This causes LOR and could-sleep messages to come with a stack trace.
2003-02-13 01:35:56 +00:00
peter
e6756cd99a Print "Stack backtrace:" right before dumping the backtrace. We cannot
expect end users to automatically recognize a stack trace for what it is.
2003-02-13 01:33:59 +00:00
imp
003f2e27d6 Implement rman_get_device
# I though this was alredy implemented

Pointy hat on my head shown by: peter
2003-02-12 07:00:59 +00:00
alfred
2cadfd181f Don't lock FILEDESC under PROC.
The locking here needs to be revisited, but this ought to get rid of the
LOR messages that people are complaining about for now.  I imagine either
I or someone else interested with smp will eventually clear this up.
2003-02-11 07:20:52 +00:00
jeff
4d663017dc - Add a comment about a race that will happen without Giant. 2003-02-10 22:47:34 +00:00
jeff
2492221864 - Unlock the nblock after the loop in bwillwrite(). 2003-02-10 22:33:59 +00:00
jeff
1564002a4b - Enable STRICT_RESCHED until code that dynamically decides on resched
strictness based on the current workload is finished.
2003-02-10 14:11:23 +00:00
jeff
fbd09df3eb - Add a new variable 'kg_runtime' that tracks the amount of time we've run.
- Use the ratio of kg_runtime / kg_slptime to determine our dynamic priority.
 - Scale kg_runtime and kg_slptime back when the sum of the two exceeds
   SCHED_SLP_RUN_MAX.  This allows us to slowly forget old behavior.
 - Scale back the runtime and slptime in fork so that the new process has the
   same ratio but much less accumulated time.  This causes new behavior to be
   noticed more quickly.
2003-02-10 14:03:45 +00:00
tjr
ed08307335 Lock the proc around accessing p_siglist in ttycheckoutq() in the
unused wait != 0 case.
2003-02-10 06:06:46 +00:00
jeff
2de830f8f6 - In getnewbuf() unlock the bq lock prior to sleeping when we're out of
buffers.

Submitted by:	tegge
2003-02-10 06:02:51 +00:00
jake
d3a0473d61 Remove mtx_lock_giant from functions which are mp-safe. 2003-02-10 04:42:20 +00:00
jeff
6bab19f3ac - Correct another atomic op.
Spotted by:	alc
2003-02-09 22:39:51 +00:00
jeff
404c49d980 - Claim we're 'fsync' and not 'spec_fsync' in vop_stdfsync. 2003-02-09 12:29:38 +00:00
jeff
528cceebc4 - Move some code out from #ifdef INVARIANTS. 2003-02-09 12:11:37 +00:00
jeff
33fe54e557 - Update a printf format for b_flags. 2003-02-09 11:56:13 +00:00
jeff
87e306ad71 - Cleanup unlocked accesses to buf flags by introducing a new b_vflag member
that is protected by the vnode lock.
 - Move B_SCANNED into b_vflags and call it BV_SCANNED.
 - Create a vop_stdfsync() modeled after spec's sync.
 - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some
   fs specific processing.  This gives all of these filesystems proper
   behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag.
 - Annotate the locking in buf.h
2003-02-09 11:28:35 +00:00
jeff
75e9ed76e4 - spell add 'add' and not 'subtract' in an atomic op.
Spotted by:	alc
Pointy hat to:	jeff
2003-02-09 11:21:40 +00:00
jeff
734283166f - Lock down the buffer cache's infrastructure code. This includes locks on
buf lists, synchronization variables, and atomic ops for the counters.
   This change does not remove giant from any code although some pushdown
   may be possible.
 - In vfs_bio_awrite() don't access buf fields without the buf lock.
2003-02-09 09:47:31 +00:00
julian
cf07da2f1a A little infrastructure, preceding some upcoming changes
to the profiling and statistics code.

Submitted by:	DavidXu@
Reviewed by:	peter@
2003-02-08 02:58:16 +00:00
hsu
62a61b18df Remove vestiges of no longer needed unp_rvnode field.
Approved by:	phk (who originally added it in rev 1.8 of unpcb.h)
2003-02-06 01:34:43 +00:00
julian
cae0aa62ce The lockmanager has to keep track of locks per thread, not per process.
Submitted by:	david Xu (davidxu@)
Reviewed by:	jhb@
2003-02-05 19:36:58 +00:00
des
79e32cc4b8 Correct grammatical error in previous commit. 2003-02-04 18:47:17 +00:00
des
3038c34616 Extra precautions before trying to start init(8). 2003-02-04 18:16:50 +00:00
phk
f313c57d47 Implement proper bounds-checking and truncation of device names, this has
become an issue now that end-user controlable attributes can become devices
names with the geom_vol_ffs class.
2003-02-04 11:04:26 +00:00
phk
9d6d3f1673 Pave the road to removing the fixed size limit on device nodes:
Change the si_name of dev_t's to be a char * and put a private buffer for
holding the name at then end of the struct.

Initialize si_name to point to the private buffer.

Put a KASSERT in geom_disk to prevent overrun on the fake dev_t we still
have to generate for the disk_drivers.
2003-02-04 10:32:40 +00:00
phk
1abb95308b Add vsnrprintf() which is just like vsnprintf() but takes a "radix"
argument for the kernel-special %r format.
2003-02-04 10:00:34 +00:00
phk
3692879cc8 Split the global timezone structure into two integer fields to
prevent the compiler from optimizing assignments into byte-copy
operations which might make access to the individual fields non-atomic.

Use the individual fields throughout, and don't bother locking them with
Giant: it is no longer needed.

Inspired by:    tjr
2003-02-03 19:49:35 +00:00