Commit Graph

6011 Commits

Author SHA1 Message Date
harti
e30134bc39 When a process has been waiting on a condition variable or mutex the
td_wmesg field in the thread structure points to the description string of
the condition variable or mutex. If the condvar or the mutex had been
initialized from a loadable module that was unloaded in the meantime,
td_wmesg may now point to invalid memory. Retrieving the process table now
may panic the kernel (or access junk). Setting the td_wmesg field to NULL
after unblocking on the condvar/mutex prevents this panic.

PR:		kern/47408
Approved by:	jake (mentor)
2003-02-27 08:43:27 +00:00
phk
f0a6146e92 NODEVFS cleanup:
Remove cdevsw_add() and cdevsw_remove(), they served us well for a long time.
Bump __FreeBSD_version to 500104 to mark this.
2003-02-27 07:40:44 +00:00
davidxu
b821d0ec30 Release sched_lock before calling upcall_free. 2003-02-27 05:42:01 +00:00
julian
3fc9836d46 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
sam
ebbf7468f1 o fix ppsratecheck to interpret a maxpps of zero as "ignore everything"
o add a comment explaining the significance of using 0 or -1 (actually
  any negative value) for maxpps
2003-02-26 17:16:38 +00:00
davidxu
53f79e941f Fix a bug when handling SIGCONT.
Reported By: Mike Makonnen <mtm@identd.net>
2003-02-26 12:47:46 +00:00
scottl
9317dd9841 Introduce a new taskqueue that runs completely free of Giant, and in
turns runs its tasks free of Giant too.  It is intended that as drivers
become locked down, they will move out of the old, Giant-bound taskqueue
and into this new one.  The old taskqueue has been renamed to
taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.
2003-02-26 03:15:42 +00:00
davidxu
193655459b Add a missing '!'. 2003-02-26 01:56:14 +00:00
davidxu
3766220237 Add a simple facility to allow round roubin in userland.
Reviewed by:	julain
2003-02-26 00:58:23 +00:00
mckusick
0309ffd2e7 When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c
delta 1.371) we must ensure that we do not get ourselves into a
recursive trap endlessly trying to clean up after ourselves.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:59:09 +00:00
mtm
3a9a3e5e4d Unbreak mutex profiling (at least for me).
o Always check for null when dereferencing the filename component.
	o Implement a try-and-backoff method for allocating memory to
	  dump stats to avoid a spin-lock -> sleep-lock mutex lock order
	  panic with WITNESS.

Approved by:	des, markm (mentor)
Not objected:	jhb
2003-02-25 22:28:46 +00:00
jeff
1228dbd648 - Add the missing NULL interlock argument to a recently added BUF_LOCK. 2003-02-25 08:23:11 +00:00
mckusick
6e9f6f2d6d Prevent large files from monopolizing the system buffers. Keep
track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.

Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 06:44:42 +00:00
davidxu
5bb30740ab Remove a bogus comment. 2003-02-25 05:17:18 +00:00
davidxu
ad34180f0e Remove a never true condition. 2003-02-25 05:14:18 +00:00
jeff
9e4c9a6ce9 - Add an interlock argument to BUF_LOCK and BUF_TIMELOCK.
- Remove the buftimelock mutex and acquire the buf's interlock to protect
   these fields instead.
 - Hold the vnode interlock while locking bufs on the clean/dirty queues.
   This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
   BUF_LOCK with a LK_TIMEFAIL to a single lock.

Reviewed by:	arch, mckusick
2003-02-25 03:37:48 +00:00
mux
541937cf73 Cleanup of the d_mmap_t interface.
- Get rid of the useless atop() / pmap_phys_address() detour.  The
  device mmap handlers must now give back the physical address
  without atop()'ing it.
- Don't borrow the physical address of the mapping in the returned
  int.  Now we properly pass a vm_offset_t * and expect it to be
  filled by the mmap handler when the mapping was successful.  The
  mmap handler must now return 0 when successful, any other value
  is considered as an error.  Previously, returning -1 was the only
  way to fail.  This change thus accidentally fixes some devices
  which were bogusly returning errno constants which would have been
  considered as addresses by the device pager.
- Garbage collect the poorly named pmap_phys_address() now that it's
  no longer used.
- Convert all the d_mmap_t consumers to the new API.

I'm still not sure wheter we need a __FreeBSD_version bump for this,
since and we didn't guarantee API/ABI stability until 5.1-RELEASE.

Discussed with:		alc, phk, jake
Reviewed by:		peter
Compile-tested on:	LINT (i386), GENERIC (alpha and sparc64)
Runtime-tested on:	i386
2003-02-25 03:21:22 +00:00
scottl
d3806508a8 Don't NULL out p_fd until after closefd() has been called. This isn't
totally correct, but it has caused breakage for too long.  I welcome
someone with more fd fu to fix it correctly.
2003-02-24 05:46:55 +00:00
davidxu
075f8dfffd Remove a XXXKSE. kg_completed now needs proc lock. 2003-02-24 01:28:10 +00:00
davidxu
6e6d69e258 Backout last surplus commit. That day just wasn't my day. 2003-02-24 00:49:55 +00:00
tegge
2072a48a93 Sync new socket nonblocking/async state with file flags in accept().
PR:		1775
Reviewed by:	mbr
2003-02-23 23:00:28 +00:00
phk
af9c7adfc3 Bracket the kern.vnode sysctl in #ifdef notyet because it results
in massive locking issues on diskless systems.

It is also not clear that this sysctl is non-dangerous in its
requirements for locked down memory on large RAM systems.
2003-02-23 18:09:05 +00:00
phk
699d720e80 OK, I was too sleepy there...
Pointy hat over here!
2003-02-23 13:45:55 +00:00
phk
4ddaec005d Implement CLOCK_MONOTONIC. 2003-02-23 10:18:31 +00:00
jake
696c85b9bd Add a /a modifier to the show ktr ddb command, which prints the whole trace
buffer without stopping.  Useful if you just want to capture the output but
can't run ktrdump.
2003-02-22 23:30:37 +00:00
rwatson
2383991687 Don't panic when enumerating SYSCTL_NODE() nodes without any children
nodes.

Submitted by:	green, Hiten Pandya <hiten@unixdaemons.com>
2003-02-22 17:58:06 +00:00
mtm
8ccfd800bd Remove a comment which hasn't been true since rev. 1.158
Approved by:	jhb, markm (mentor)(implicit)
2003-02-22 05:59:48 +00:00
rwatson
9f4f9a305c Export the name of the device used to mount the root file system as
kern.rootdev.  If rootdev is undefined (NFS mount, etc), export an
empty string.

Desired by:	peter
2003-02-22 05:01:12 +00:00
peter
4d4115d2f3 Missing M_TRYWAIT from so_upcall third argument. 2003-02-21 22:23:40 +00:00
phk
02e550fabb NO_GEOM cleanup:
Retire the "d_dump_t" and use the "dumper_t" type instead.

Dumper_t takes a void * as first arg which is more general than the
dev_t taken by d_dump_t.  (Remember: we could have net-dumpers if
somebody wrote us one!)

Define the convention for GEOM controlled disk devices to be that the
first argument to the dumper function is the struct disk pointer.

Change device drivers accordingly.
2003-02-21 19:00:48 +00:00
davidxu
1838912108 If UTS kernel is calling kse_wakeup for itself, do nothing. 2003-02-21 07:11:38 +00:00
phk
72688ad7fe Change the console interface to pass a "struct consdev *" instead of a
dev_t to the method functions.

The dev_t can still be found at struct consdev *->cn_dev.

Add a void *cn_arg element to struct consdev which the drivers can use
for retrieving their softc.
2003-02-20 20:54:45 +00:00
phk
fb70d313d7 Add a dead_cdevsw which does its best to return ENXIO if at all possible.
In devsw() return dead_cdevsw instead of NULL in case the dev_t does not
have a si_devsw.

This may improve our survival chances with devices which go away unexpectedly.
2003-02-20 15:35:54 +00:00
davidxu
b4106bbfef Forgot to set KU_DOUPCALL in kse_wakeup. 2003-02-20 08:22:04 +00:00
davidxu
d08eff5aaa Add a timeout parameter to kse_release. 2003-02-20 08:18:15 +00:00
bmilekic
26ba0eb55c o Allow "buckets" in mb_alloc to be differently sized (according to
compile-time constants).  That is, a "bucket" now is not necessarily
  a page-worth of mbufs or clusters, but it is MBUF_BUCK_SZ, CLUS_BUCK_SZ
  worth of mbufs, clusters.
o Rename {mbuf,clust}_limit to {mbuf,clust}_hiwm and introduce
  {mbuf,clust}_lowm, which currently has no effect but will be used
  to set the low watermarks.
o Fix netstat so that it can deal with the differently-sized buckets
  and teach it about the low watermarks too.
o Make sure the per-cpu stats for an absent CPU has mb_active set to 0,
  explicitly.
o Get rid of the allocate refcounts from mbuf map mess.  Instead,
  just malloc() the refcounts in one shot from mbuf_init()
o Clean up / update comments in subr_mbuf.c
2003-02-20 04:26:58 +00:00
tjr
569e9d1a86 Remove the PL_SHAREMOD flag from struct plimit, which could have been
used to share resource limits between rfork threads, but never was.
Removing it makes resource limit locking much simpler -- only the current
process can change the contents of the structure that p_limit points to.
2003-02-20 04:18:42 +00:00
cognet
8d83e0054a Remove duplicate includes.
Submitted by:	Cyril Nguyen-Huu <cyril@ci0.org>
2003-02-20 03:26:11 +00:00
bmilekic
78e7b5f0c6 Fix a serious bug when computing the index for the
reference counter array for mbuf clusters.  I don't know
how this got past early testing nor how it survived so long
without getting caught.  If anyone was seeing really really
bizarre memory corruption in a few mbufs this would be why.
2003-02-20 03:01:04 +00:00
davidxu
365e5f6c2a Move thread limits testing code up a bit. This let UPCALLING thread
takes possible accumulated contexts away.
2003-02-20 01:11:17 +00:00
phk
05d8d6e23f Add M_WAITOK 2003-02-19 22:51:33 +00:00
davidxu
e0c2153011 Count non-threaded group. 2003-02-19 13:40:24 +00:00
davidxu
0d18aacb0c Update comments to reflect new KSE code. 2003-02-19 13:36:51 +00:00
tjr
95727dd513 The "m = m->m_next" that was removed in the revision 1.12 was necessary
for the m->m_next != NULL case to avoid looping infinitely when the first
mbuf in the chain becomes full.
2003-02-19 10:12:42 +00:00
davidxu
dfa9741d3b M_WAITOK and remove an useless comment. 2003-02-19 09:59:12 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
davidxu
4fa19fe46a Optimize the case when max threads number was hit. 2003-02-19 04:01:55 +00:00
peter
c8ccde8063 Initiate de-orbit burn for USE_PCI_BIOS_FOR_READ_WRITE. This has been
#if'ed out for a while.  Complete the deed and tidy up some other bits.

We need to be able to call this stuff from outer edges of interrupt
handlers for devices that have the ISR bits in pci config space.  Making
the bios code mpsafe was just too hairy.  We had also stubbed it out some
time ago due to there simply being too much brokenness in too many systems.
This adds a leaf lock so that it is safe to use pci_read_config() and
pci_write_config() from interrupt handlers.  We still will use pcibios
to do interrupt routing if there is no acpi.. [yes, I tested this]

Briefly glanced at by:  imp
2003-02-18 03:36:49 +00:00
davidxu
6a2061cb8b Further fix PS_NEEDSIGCHK 2003-02-17 14:54:57 +00:00
davidxu
0330c19021 Move code for detecting PS_NEEDSIGCHK into thread_schedule_upcall,
I think it is a better place to handle it.
2003-02-17 14:41:22 +00:00