Commit Graph

6044 Commits

Author SHA1 Message Date
Jonathan Lemon
1cafed3941 Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point.  Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs
2003-03-04 23:19:55 +00:00
John Baldwin
c141c242ac Bah, fix a bogon in the last commit: get the sense of a compare test right
so that we allow a sleepable lock to be acquired with Giant held rather
than allowing a sleepable lock to be acquired with anything but Giant held.
2003-03-04 22:34:07 +00:00
Jeff Roberson
24deed1aaa - Hold the buf lock while manipulating and inspecting its fields.
- Use gbincore() and not incore() so that we can drop the vnode interlock
   as we acquire the buflock.
 - Use GB_LOCK_NOWAIT when getting bufs for read ahead clusters so that we
   don't block on locked bufs.
 - Convert a while loop to a howmany() that will most likely be faster on
   modern processors.  There is another while loop divide that was left
   near by because it is operating on a 64bit int and is most likely faster.
 - Cleanup the cluster_read() code a little to get rid of a goto and make
   the logic clearer.

Tested on:	x86, alpha
Tested by:	Steve Kargl <sgk@troutmask.apl.washington.edu>
Reviewd by:	arch
2003-03-04 21:35:28 +00:00
John Baldwin
1106937d99 Remove safety belt: it is now ok to do a mtx_trylock() on a mutex you
already own.  The mtx_trylock() will fail however.  Enhance the comment
at the top of the try lock function to explain this.

Requested by:	jlemon and his evil netisr locking
2003-03-04 21:32:25 +00:00
John Baldwin
263067951a Replace calls to WITNESS_SLEEP() and witness_list() with equivalent calls
to WITNESS_WARN().
2003-03-04 21:03:05 +00:00
John Baldwin
9b4982bfed Add a WITNESS_WARN() call to verify that we hold no locks after running
a handler from an interrupt thread.
2003-03-04 21:01:42 +00:00
John Baldwin
35580ede37 A small overhaul of witness:
- Add a comment about special lock order rules and Giant near the top of
  subr_witness.c.  Specifically, this documents and explains the real lock
  order relationship between Giant and sleepable locks (i.e. lockmgr locks
  and sx locks).  Basically, Giant can be safely acquired either before or
  after sleepable locks and the case of Giant before a sleepable lock is
  exempted as a special case.
- Add a new static function 'witness_list_lock()' that displays a single
  line of information about a struct lock_instance.  This is used to
  make the output of witness messages more consistent and reduce some code
  duplication.
- Fixup a few comments in witness_lock().
- Properly handle the Giant-before-sleepable-lock lock order exception in
  a more general fashion and remove the no longer needed LI_SLEPT flag.
- Break up the last condition before assuming a reversal a bit to try
  and make the logic less confusing in witness_lock().
- Axe WITNESS_SLEEP() now that LI_SLEPT is no longer needed and replace it
  with a more general WITNESS_WARN() macro/function combination.
  WITNESS_WARN() allows you to output a customized message out to the
  console along with a list of held locks.  It will optionally drop into
  the debugger as well.  You can exempt a single lock from the check by
  passing it in as the second argument.  You can also use flags to specify
  if Giant should be exempt from the check, if all sleepable locks should
  be exempt from the check, and if witness should panic if any non-exempt
  locks are found.
- Make the witness_list() function static.  Other areas of the kernel
  should use the new WITNESS_WARN() instead.
2003-03-04 20:56:39 +00:00
John Baldwin
5fa8dd90f9 Miscellaneous cleanups to _mtx_lock_sleep():
- Declare some local variables at the top of the function instead of in a
  nested block.
- Use mtx_owned() instead of masking off bits from mtx_lock manually.
- Read the value of mtx_lock into 'v' as a separate line rather than inside
  an if statement for clarity.  This code is hairy enough as it is.
2003-03-04 20:32:41 +00:00
John Baldwin
6b869595c5 Properly assert that mtx_trylock() is not called on a mutex we already
owned.  Previously the KASSERT would only trigger if we successfully
acquired a lock that we already held.  However, _obtain_lock() fails to
acquire locks that we already hold, so the KASSERT was never checked in
the case it was supposed to fail.
2003-03-04 20:30:30 +00:00
Jeff Roberson
e1f89c222b - Create a function sched_interact_score() which decides on the
interactivity of a kseg and assigns it a value of 0 through 100.
 - Use sched_interact_score() to determine the dynamic priority.
 - Define SCHED_CURR() in terms of sched_interact_score().
 - Adjust the maximum slice back down to 100ms.
 - Remove redundant clearing of ke_runq in sched_wakeup()
 - Clean up #defines and comment them.
2003-03-04 02:45:59 +00:00
Jeff Roberson
7261f5f68e - Add a new 'flags' parameter to getblk().
- Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT
   flag to the initial BUF_LOCK().  This will eventually be used in cases
   were we want to use a buffer only if it is not currently in use.
 - Convert all consumers of the getblk() api to use this extra parameter.

Reviwed by:	arch
Not objected to by:	mckusick
2003-03-04 00:04:44 +00:00
Jeff Roberson
f727171140 - Correct the wchan in vop_stdfsync()
This is almost what bde asked for.  There is some desire to have per fs wchans
still but that is difficult giving the current arrangement of the code.
2003-03-03 23:37:50 +00:00
Ruslan Ermilov
6de61153e8 FreeBSD 5.0 has stopped shipping /modules 2.5 years ago. Catch
up with this further by excluding /modules from the (default)
kern.module_path.
2003-03-03 22:53:35 +00:00
Nate Lawson
7dc9111650 Pick up one file missed in the previous vprint() cleanup 2003-03-03 19:50:36 +00:00
Nate Lawson
99648386d3 Finish cleanup of vprint() which was begun with changing v_tag to a string.
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
2003-03-03 19:15:40 +00:00
Poul-Henning Kamp
182a9f7455 Make nokqfilter() return the correct return value.
Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.
2003-03-03 16:24:47 +00:00
Poul-Henning Kamp
7ac40f5f59 Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
Poul-Henning Kamp
a9463ba804 Don't pick up a name from the dev_t if it is not there. 2003-03-03 11:14:36 +00:00
Jeff Roberson
65c8760dbf - Shift the tick count by 10 and back around sched_pctcpu_update()
calculations.  Keep this changes local to the function so the tick count
   is in its natural form otherwise.  Previously 1000 was added each time
   a tick fired and we divided by 1000 when it was reported.  This is done
   to reduce rounding errors.
2003-03-03 05:29:09 +00:00
Jeff Roberson
a6ed41865b - In sched_add() special case PRI_TIMESHARE and PRI_ITHD|PRI_REALTIME. We
always place ITHD & REALTIME threads on the current queue of the current
   cpu.  Prior to this change an interrupt thread would only ever run on one
   cpu.
2003-03-03 04:28:07 +00:00
Jeff Roberson
f1e8dc4a3b - Refrain from setting the td_priority in sched_wakeup(). It will be reset
before we return to user space.
2003-03-03 04:11:40 +00:00
Poul-Henning Kamp
f16304aaf0 Explicitly initialize all cdevsw methods with the relevant nofoo() function
if they are NULL.
2003-03-02 19:46:45 +00:00
Dag-Erling Smørgrav
521f364b80 More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9). 2003-03-02 16:54:40 +00:00
Dag-Erling Smørgrav
8994a245e0 Clean up whitespace, s/register //, refrain from strong urge to ANSIfy. 2003-03-02 15:56:49 +00:00
Dag-Erling Smørgrav
c952458814 uiomove-related caddr_t -> void * (just the low-hanging fruit) 2003-03-02 15:50:23 +00:00
Dag-Erling Smørgrav
d5279f20c5 Convert one of our main caddr_t consumers, uiomove(9), to void *. 2003-03-02 15:29:13 +00:00
Dag-Erling Smørgrav
34ca14c687 Clean up whitespace, unregisterize, ANSIfy, remove prototypes made
superfluous by ANSIfication.
2003-03-02 15:08:33 +00:00
Poul-Henning Kamp
9c486c30e2 NO_GEOM cleanup:
Remove cdevsw->d_size() implementation.  No longer needed.
2003-03-02 14:43:46 +00:00
Poul-Henning Kamp
9285a87efd NODEVFS cleanup:
Replace devfs_{create,destroy} hooks with direct function calls.
2003-03-02 13:35:30 +00:00
Jeff Roberson
491081fabf - Hold the vnode interlock across calls to bgetvp instead of acquiring it
internally.  This is required to stop multiple bufs from being associated
   with a single lblkno.
2003-03-02 06:05:23 +00:00
Tor Egge
c6faf3bf1d Remove unneeded code added in revision 1.188. 2003-03-01 17:18:28 +00:00
Jeff Roberson
bff5362bf2 - gc USE_BUFHASH. The smp locking of the buf cache renders this useless. 2003-03-01 05:55:03 +00:00
David Xu
9948c47f0e Check kse group limit before linking new ksegrp. 2003-02-28 15:57:33 +00:00
Poul-Henning Kamp
85f19dccdb Add the flip-side check: If a driver wants a particular major#, make
sure it is marked as allocated in reserved_majors[].  Whine if it wasn't.
2003-02-27 15:17:37 +00:00
Maxime Henrion
bca0668a92 We can now properly return ENODEV in nommap(), so do it.
Remove the now wrong comment which says we can't.
2003-02-27 14:48:53 +00:00
Poul-Henning Kamp
beea48b254 Add support for allocating a device driver major number on demand.
To do this, initialize the d_maj member of the cdevsw to MAJOR_AUTO.
When the cdevsw is first passed to make_dev() a free major number
will be assigned.

Until we have a bit more experience with this a printf will announce
this fact.

Major numbers are not reclaimed, so loading/unloading the same
device driver which uses MAJOR_AUTO will eventually deplete the
pool of free major numbers and the system will panic when it can
not allocate one.  Still undecided who to invonvenience with the
solution to this.
2003-02-27 14:46:51 +00:00
Hartmut Brandt
b89bc9e62b When a process has been waiting on a condition variable or mutex the
td_wmesg field in the thread structure points to the description string of
the condition variable or mutex. If the condvar or the mutex had been
initialized from a loadable module that was unloaded in the meantime,
td_wmesg may now point to invalid memory. Retrieving the process table now
may panic the kernel (or access junk). Setting the td_wmesg field to NULL
after unblocking on the condvar/mutex prevents this panic.

PR:		kern/47408
Approved by:	jake (mentor)
2003-02-27 08:43:27 +00:00
Poul-Henning Kamp
f477b4fd53 NODEVFS cleanup:
Remove cdevsw_add() and cdevsw_remove(), they served us well for a long time.
Bump __FreeBSD_version to 500104 to mark this.
2003-02-27 07:40:44 +00:00
David Xu
3b3df40fc4 Release sched_lock before calling upcall_free. 2003-02-27 05:42:01 +00:00
Julian Elischer
ac2e415327 Change the process flags P_KSES to be P_THREADED.
This is just a cosmetic change but I've been meaning to do it for about a year.
2003-02-27 02:05:19 +00:00
Sam Leffler
893bec8059 o fix ppsratecheck to interpret a maxpps of zero as "ignore everything"
o add a comment explaining the significance of using 0 or -1 (actually
  any negative value) for maxpps
2003-02-26 17:16:38 +00:00
David Xu
426269b2c2 Fix a bug when handling SIGCONT.
Reported By: Mike Makonnen <mtm@identd.net>
2003-02-26 12:47:46 +00:00
Scott Long
7874f606d5 Introduce a new taskqueue that runs completely free of Giant, and in
turns runs its tasks free of Giant too.  It is intended that as drivers
become locked down, they will move out of the old, Giant-bound taskqueue
and into this new one.  The old taskqueue has been renamed to
taskqueue_swi_giant, and the new one keeps the name taskqueue_swi.
2003-02-26 03:15:42 +00:00
David Xu
5614648e5e Add a missing '!'. 2003-02-26 01:56:14 +00:00
David Xu
4b4866ed42 Add a simple facility to allow round roubin in userland.
Reviewed by:	julain
2003-02-26 00:58:23 +00:00
Kirk McKusick
7e734c4149 When doing cleanup of excessive buffers in bdwrite (see kern/vfs_bio.c
delta 1.371) we must ensure that we do not get ourselves into a
recursive trap endlessly trying to clean up after ourselves.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 23:59:09 +00:00
Mike Makonnen
0bd5f7979d Unbreak mutex profiling (at least for me).
o Always check for null when dereferencing the filename component.
	o Implement a try-and-backoff method for allocating memory to
	  dump stats to avoid a spin-lock -> sleep-lock mutex lock order
	  panic with WITNESS.

Approved by:	des, markm (mentor)
Not objected:	jhb
2003-02-25 22:28:46 +00:00
Jeff Roberson
2e3981a70c - Add the missing NULL interlock argument to a recently added BUF_LOCK. 2003-02-25 08:23:11 +00:00
Kirk McKusick
3a7053cb60 Prevent large files from monopolizing the system buffers. Keep
track of the number of dirty buffers held by a vnode. When a
bdwrite is done on a buffer, check the existing number of dirty
buffers associated with its vnode. If the number rises above
vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one
of the other (hopefully older) dirty buffers associated with
the vnode is written (using bawrite). In the event that this
approach fails to curb the growth in it the vnode's number of
dirty buffers (due to soft updates rollback dependencies),
the more drastic approach of doing a VOP_FSYNC on the vnode
is used. This code primarily affects very large and actively
written files such as snapshots. This change should eliminate
hanging when taking snapshots or doing background fsck on
very large filesystems.

Hopefully, one day it will be possible to cache filesystem
metadata in the VM cache as is done with file data. As it
stands, only the buffer cache can be used which limits total
metadata storage to about 20Mb no matter how much memory is
available on the system. This rather small memory gets badly
thrashed causing a lot of extra I/O. For example, taking a
snapshot of a 1Tb filesystem minimally requires about 35,000
write operations, but because of the cache thrashing (we only
have about 350 buffers at our disposal) ends up doing about
237,540 I/O's thus taking twenty-five minutes instead of four
if it could run entirely in the cache.

Reported by:	Attila Nagy <bra@fsn.hu>
Sponsored by:   DARPA & NAI Labs.
2003-02-25 06:44:42 +00:00
David Xu
d4b570f053 Remove a bogus comment. 2003-02-25 05:17:18 +00:00