Commit Graph

6125 Commits

Author SHA1 Message Date
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
John Baldwin
75b8b3b25c Replace the at_fork, at_exec, and at_exit functions with the slightly more
flexible process_fork, process_exec, and process_exit eventhandlers.  This
reduces code duplication and also means that I don't have to go duplicate
the eventhandler locking three more times for each of at_fork, at_exec, and
at_exit.

Reviewed by:	phk, jake, almost complete silence on arch@
2003-03-24 21:15:35 +00:00
John Baldwin
959d22329a - Remove witness_dead and just use witness_watch instead. If witness_watch
is set to 0, it now has the same affect as setting witness_dead used to
  have.
- Added a sysctl handler that allows root to change witness_watch from a
  non-zero value to zero to disable witness at runtime.  Note that you
  can't turn witness back on once it is off.  You can only turn it off as
  a one-way switch.
- Added a comment describing the possible values of witness_watch.
2003-03-24 21:03:53 +00:00
Maxime Henrion
4974b53e31 Remove a trailing semicolon in SCHED_QUANTUM definition.
Luckily this didn't cause any bugs.

Spotted by:	Samy Al Bahra <samy@kerneled.com>
2003-03-24 15:16:21 +00:00
Olivier Houchard
e2f9a08bb0 s/discriptors/descriptors/ 2003-03-23 19:41:34 +00:00
Tim J. Robbins
f949f795aa Remove unused mtx_lock_giant(), mtx_unlock_giant(), related globals
and sysctls.
2003-03-23 11:26:11 +00:00
Yaroslav Tykhiy
17ce5b94d6 We shouldn't assert that a vode is locked in vop_lock_post()
if VOP_LOCK() has failed.

Reviewed by:	jeff
2003-03-22 13:21:54 +00:00
John Baldwin
b254666064 Use td_ucred of curthread instead of p_ucred of curproc. This required
changing sem_perm() and sem_hasopen() to take a thread instead of a proc
for the first argument.
2003-03-20 21:12:31 +00:00
Poul-Henning Kamp
cc34e37e5b Backout the getcwd changes, a more comprehensive effort will be needed. 2003-03-20 10:40:45 +00:00
David Xu
6ce75196ce Adjust code for userland preemptive. Userland can set a quantum in
kse_mailbox to schedule an upcall, this is useful for userland timeout
routine, for example pthread_cond_timedwait().

Also extract upcall scheduling code from kse_reassign and create
a new function called thread_switchout to include these code.

Reviewed by: julain
2003-03-19 05:49:38 +00:00
Dag-Erling Smørgrav
830c3153c6 Unregisterize, ansify. 2003-03-19 00:49:40 +00:00
Dag-Erling Smørgrav
4e8074eba2 Whitespace cleanup. 2003-03-19 00:33:38 +00:00
Jake Burkholder
84513ca201 long != int. Use SYSCTL_UINT for kern.devstat.generation. Fixes booting
on sparc64.
2003-03-18 23:32:27 +00:00
Andrew Gallatin
daa949b66e Fix a race condition in socow_setup(): The page must be wired before
sf_buf_alloc() is called, as sf_buf_alloc() may sleep.  If it does sleep,
the page might be reclaimed before wiring occurs.

Reported by: alc
2003-03-18 18:27:33 +00:00
Poul-Henning Kamp
c967bee755 If devstat_new_entry() is passed a unit number of -1 assume that
the devstat is for an "interior" GEOM node and register using the
name argument as a geom identity pointer.  Do not put these devstat
structures on the list returned by the sysctl.

This gives us the ability to tell the two kinds of nodes apart and
leave the current "strictly physical" view of devstat intact without
modifications, yet be able to use devstat for both kinds of devices.

It also saves us bloating struct devstat with another 48 bytes of
space for the name.  At least for now.

Reviewed by:    ken
2003-03-18 09:30:31 +00:00
Poul-Henning Kamp
224d5539a9 Make devstat fully Giant agnostic:
Add a mutex and protect the allocation and traversal of the list with it.

When we allocate a page for devstat use we drop the mutex and use
M_WAITOK this is not nice, but under the given circumstances the
best we can do.

In the sysctl handler for returning the devstat entries we do not want to
hold the mutex across copyout(9) calls, so we keep a very careful eye on
the devstat_generation count, and abandon with EBUSY if it changes under
our feet.

Specifically test for BIO_WRITE, rather than default non-read,non-deletes
as write.  Make the default be DEVSTAT_NO_DATA.

Add atomic increments of the sequence[01] fields so applications using the
mmap'ed view stand a chance of detecting updates in progress.

Reviewed by:    ken
2003-03-18 09:20:20 +00:00
Poul-Henning Kamp
b4b138c27f Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
Poul-Henning Kamp
538aabaad9 Make devstat_new_entry() take a const void * rather than const char *
argument, GEOM nodes are not identified by ascii string.
2003-03-18 07:52:59 +00:00
Jeff Roberson
5d952c1b59 - Unlock the target bp and not the pager buf bp in a failure case in
cluster_wbuild().  This was causing strange panics that were widely
   reported on current@.

Big Pointy Hat to:	jeff
2003-03-17 18:38:49 +00:00
Poul-Henning Kamp
9eaf5abceb (This commit certainly increases the need for a wash&clean of vfs_cache.c,
but I decided that it was important for this patch to not bit-rot, and
since it is mainly moving code around, the total amount of entropy is
epsilon /phk)

This is a patch to move the common parts of linux_getcwd() back into
kern/vfs_cache.c so that the standard FreeBSD libc getcwd() can use it's
extended functionality.  The linux syscall linux_getcwd() in
compat/linux/linux_getcwd.c has been rewritten to use it too.  It should
be possible to simplify libc's getcwd() after this.  No doubt this code
needs some cleaning up, since I've left in the sysctl variables I used
for debugging.

PR:	48169
Submitted by:	James Whitwell <abacau@yahoo.com.au>
2003-03-17 12:21:08 +00:00
Poul-Henning Kamp
5fa5746d3d Add a #define for the device name of the mmap device for devstat.
Constify the geom identification pointer.
2003-03-16 23:20:05 +00:00
Alan Cox
42de97a50a Pass the sf buf to MEXTADD() as the optional argument. This permits
the simplification of socow_iodone() and sf_buf_free(); they don't
have to reverse engineer the sf buf from the data's address.
2003-03-16 07:19:12 +00:00
Poul-Henning Kamp
d15cd51001 One devstat_start_transaction_bio() is enough. 2003-03-15 22:20:38 +00:00
Poul-Henning Kamp
7194d335cf Run a revision of the devstat interface:
Kernel:

Change statistics to use the *uptime() timescale (ie: relative to
boottime) rather than the UTC aligned timescale.  This makes the
device statistics code oblivious to clock steps.

Change timestamps to bintime format, they are cheaper.

Remove the "busy_count", and replace it with two counter fields:
"start_count" and "end_count", which are updated in the down and
up paths respectively.  This removes the locking constraint on
devstat.

Add a timestamp argument to devstat_start_transaction(), this will
normally be a timestamp set by the *_bio() function in bp->bio_t0.
Use this field to calculate duration of I/O operations.

Add two timestamp arguments to devstat_end_transaction(), one is
the current time, a NULL pointer means "take timestamp yourself",
the other is the timestamp of when this transaction started (see
above).

Change calculation of busy_time to operate on "the salami principle":
Only when we are idle, which we can determine by the start+end
counts being identical, do we update the "busy_from" field in the
down path.  In the up path we accumulate the timeslice in busy_time
and update busy_from.

Change the byte_* and num_* fields into two arrays: bytes[] and
operations[].

Userland:

Change the misleading "busy_time" name to be called "snap_time" and
make the time long double since that is what most users need anyway,
fill it using clock_gettime(CLOCK_MONOTONIC) to put it on the same
timescale as the kernel fields.

Change devstat_compute_etime() to operate on struct bintime.

Remove the version 2 legacy interface: the change to bintime makes
compatibility far too expensive.

Fix a bug in systat's "vm" page where boot relative busy times would
be bogus.

Bump __FreeBSD_version to 500107

Review & Collaboration by:	ken
2003-03-15 21:59:06 +00:00
Poul-Henning Kamp
9fa85de269 Add a devstat_start_transaction_bio() to match the
devstat_end_transaction_bio() we already have.

For now it just calls devstat_start_transaction(), but that will change
shortly.
2003-03-15 10:33:32 +00:00
David Xu
9a4b78c9da Export current time when returning from never blocked syscall. 2003-03-14 03:52:16 +00:00
John Baldwin
2d055ab20f Trim some trailing whitespace. 2003-03-13 23:07:09 +00:00
John Baldwin
75768576cc Add a new userland-visible ktrace flag KTR_DROP and an internal ktrace flag
KTRFAC_DROP to track instances when ktrace events are dropped due to the
request pool being exhausted.  When a thread tries to post a ktrace event
and is unable to due to no available ktrace request objects, it sets
KTRFAC_DROP in its process' p_traceflag field.  The next trace event to
successfully post from that process will set the KTR_DROP flag in the
header of the request going out and clear KTRFAC_DROP.

The KTR_DROP flag is the high bit in the type field of the ktr_header
structure.  Older kdump binaries will simply complain about an unknown type
when seeing an entry with KTR_DROP set.  Note that KTR_DROP being set on a
record in a ktrace file does not tell you anything except that at least one
event from this process was dropped prior to this event.  The user has no
way of knowing what types of events were dropped nor how many were dropped.

Requested by:	phk
2003-03-13 18:31:15 +00:00
John Baldwin
a5881ea55a - Cache a reference to the credential of the thread that starts a ktrace in
struct proc as p_tracecred alongside the current cache of the vnode in
  p_tracep.  This credential is then used for all later ktrace operations on
  this file rather than using the credential of the current thread at the
  time of each ktrace event.
- Now that we have multiple ktrace-related items in struct proc that are
  pointers, rename p_tracep to p_tracevp to make it less ambiguous.

Requested by:	rwatson (1)
2003-03-13 18:24:22 +00:00
Ian Dowse
a80cc4e104 In m_dup_pkthdr(), convert the supplied `how' argument into malloc
flags when passing it into m_tag_copy_chain(), as m_tag* functions
use malloc, not mbuf flags.
2003-03-13 09:02:19 +00:00
Jeff Roberson
749ffa4ecd - Add a lock for protecting against msleep(bp, ...) wakeup(bp) races.
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
   is suitable for use as b_iodone for buf consumers who are not going
   through the buf cache.
 - Create a new function bwait() which waits for the buf to be done at a set
   priority and with a specific wmesg.
 - Replace several cases where the above functionality was implemented
   without locking with the new functions.
2003-03-13 07:31:45 +00:00
Jeff Roberson
e99215a614 - Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not been
possible for some time.
 - Lock the buf before accessing fields.  This should very rarely be locked.
 - Assert that B_DELWRI is set after we acquire the buf.  This should always
   be the case now.
2003-03-13 07:22:53 +00:00
Jeff Roberson
09f11da5a3 - Remove a race between fsync like functions and flushbufqueues() by
requiring locked bufs in vfs_bio_awrite().  Previously the buf could
   have been written out by fsync before we acquired the buf lock if it
   weren't for giant.  The cluster_wbuild() handles this race properly but
   the single write at the end of vfs_bio_awrite() would not.
 - Modify flushbufqueues() so there is only one copy of the loop.  Pass a
   parameter in that says whether or not we should sync bufs with deps.
 - Call flushbufqueues() a second time and then break if we couldn't find
   any bufs without deps.
2003-03-13 07:19:23 +00:00
Alfred Perlstein
569d3c4bf0 Make sure we actually have a dev before dereferencing in case someone
botches and sends us a NULL pointer.  The other code in this file seems
to expect it to be able to handle it behaving this way.
2003-03-13 06:29:44 +00:00
Jeff Roberson
de950c003c - Tune down read_max. For single disks we get no gain out of reading more
than a MAXPHYS size block ahead.  Having this set too high just leaves
   other processes starved for IO and screws up interactive response.  Let the
   users with RAID set it higher when they need it.
2003-03-13 06:17:59 +00:00
Tim J. Robbins
6ec62361c8 Tidy up previous change: move comment about obtaining an exclusive
reference where it belongs, and remove a blank line to make it more
obvious what the comment applies to.
2003-03-13 00:57:47 +00:00
Tim J. Robbins
262c27b846 Back out previous. The locking here needs a rethink. 2003-03-13 00:54:53 +00:00
John Baldwin
8510f2a833 - Various little style fixes.
- If SYSCTL_OUT() fails in sysctl_kern_proc_args(), return the error
  instead of ignoring it if we have new arguments for the process.
- If the new arguments for a process are too long, return ENOMEM instead of
  returning success but not doing the actual copy.

Submitted by:	bde
2003-03-12 20:17:40 +00:00
John Baldwin
4bc6471b53 - Avoid dropping the proc lock around a simple permissions check and just
hold hold it across the check to avoid extra lock operations in the
  common case.
- Copy in the new args to a temporary pargs structure before we drop the
  reference to the old one.  Thus, if the copyin() fails, the process
  arguments are unchanged rather than being deleted.  Also, p_args is no
  longer NULL during the sysctl operation.
2003-03-12 16:14:55 +00:00
Tim J. Robbins
a7cbe87a5e Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses
to kg_nice and calls to sched_nice() in getpriority() and setpriority()
(really donice()).
2003-03-12 11:24:41 +00:00
Tim J. Robbins
3890793e9c In wait1(), remove the zombie process from zombproc before removing
it from its pgrp to avoid leaving zombies around with p_pgrp == NULL.
This bug was apparent as a NULL-dereference in the pid selection code
in fork1().
2003-03-12 11:10:04 +00:00
John Baldwin
2ca9461a05 Trim an extra blank line that snuck into the last commit. 2003-03-11 22:33:42 +00:00
Alexander Kabaev
c162e9c2eb Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by:	jeff
2003-03-11 22:15:10 +00:00
John Baldwin
427b3a6549 - Change witness_displaydescendants() to accept the indentation level as
a parameter instead of using the level of a given witness.  When
  recursing, pass an indent level of indent + 1.
- Make use of the information witness_levelall() provides in
  witness_display_list() to use an O(n) algorithm instead of an O(n^2)
  algo to decide which witnesses to display hierarchies from.  Basically,
  we only display a hierarchy for witnesses with a level of 0.
- Add a new per-witness flag that is reset at the start of
  witness_display() for all witness's and is set the first time a witness
  is displayed in witness_displaydescendants().  If a witness is
  encountered more than once in the lock order tree (which happens often),
  witness_displaydescendants() marks the later occurrences with the string
  "(already displayed)" and doesn't display the subtree under that
  witness.  This avoids duplicating large amounts of the lock order tree
  in the 'show witness' output in DDB.

All these changes serve to make 'show witness' a lot more readable and
useful than it was previously.
2003-03-11 22:14:21 +00:00
John Baldwin
f82c6950be - Split the itismychild() function into two functions: insertchild()
adds a witness to the child list of a parent witness.  rebalancetree()
  runs through the entire tree removing direct descendants of witnesses
  who already have said child witness as an indirect descendant through
  another direct descendant.  itismychild() now calls insertchild()
  followed by rebalancetree() and no longer needs the evil hack of
  having static recursed variable.
- Add a function reparentchildren() that adds all the direct descendants
  of one witness as direct descendants of another witness.
- Change the return value of itismychild() and similar functions so that
  they return 0 in the case of failure due to lack of resources instead
  of 1.  This makes the return value more intuitive.
- Check the return value of itismychild() when defining the static lock
  order in witness_initialize().
- Don't try to setup a lock instance in witness_lock() if itismychild()
  fails.  Witness is hosed anyways so no need to do any more witness
  related activity at that point.  It also makes the code flow easier to
  understand.
- Add a new depart() function as the opposite of enroll().  When the
  reference count of a witness drops to 0 in witness_destroy(), this
  function is called on that witness.  First, it runs through the
  lock order tree using reparentchildren() to reparent direct descendants
  of the departing witness to each of the witness' parents in the tree.
  Next, it releases it's own child list and other associated resources.
  Finally it calls rebalanacetree() to rebalance the lock order tree.
- Sort function prototypes into something closer to alphabetical order.

As a result of these changes, there should no longer be 'dead' witnesses
in the order tree, and repeatedly loading and unloading a module should no
longer exhaust witness of its internal resources.

Inspired by:	gallatin
2003-03-11 22:07:35 +00:00
John Baldwin
d5b13ee082 Trim useless "../" leading strings from filenames passed into witness. 2003-03-11 21:53:12 +00:00
John Baldwin
28e4d137a2 Adjust style of #ifdef's and #endif's to be more consistent and in line
with recent additions to style(9).
2003-03-11 21:38:49 +00:00
John Baldwin
d278a7f9ba Do the lock order check skip for the LOP_TRYLOCK case after the check for
recursing on a lock instead of before.  This fixes a bug where WITNESS
could get a little confused if you did an sx_tryslock() on a sx lock that
you already had an slock on.  WITNESS would still function correctly but
it could result in weirdness in the output of 'show locks'.  This also
makes it possible for mtx_trylock() to recurse on a lock.
2003-03-11 20:54:37 +00:00
John Baldwin
ecdf4409f9 Rework the eventhandler locking for hopefully the last time. The scheme
used popped into my head during my morning commute a few weeks ago, but
it is also very similar (though a bit simpler) to a patch that mini@
developed a while ago.  Basically, each eventhandler list has a mutex and
a run count.  During an eventhandler invocation, the mutex is held while
we traverse the list but is dropped while we execute actual handlers.  Also,
a runcount counter is incremented at the start of an invocation and
decremented at the end of an invocation.  Adding to the list is not a big
deal since the reference of a thread currently executing the handlers
remains valid across an add operation.  Whether or not new handlers are
executed by threads currently executing the handlers for a given list is
indeterminate however.  The harder case is when a handler is removed from
the list.  If the runcount is zero, the handler is simply removed from the
list directly.  If the runcount is not zero, then another thread is
currently executing the handlers of this list, so the priority of this
handler is set to a magic value (currently -1) to mark it as dead.  Dead
handlers are not executed during an invocation.  If the runcount is zero
after it is decremented at the end of an invocation, then a new
eventhandler_prune_list() function is called to remove dead handlers from
the list.

Additional minor notes:
- All the common parts of EVENTHANDLER_INVOKE() and
  EVENTHANDLER_FAST_INVOKE() have been merged into a common
  _EVENTHANDLER_INVOKE() macro to reduce duplication and ease maintenance.
- KTR logging for eventhandlers is now available via the KTR_EVH mask.
- The global eventhander_mutex is no longer recursive.

Tested by:	scottl (SMP i386)
2003-03-11 20:17:00 +00:00
John Baldwin
75d468ee12 Axe the useless MTX_SLEEPABLE flag. mutexes are not sleepable locks.
Nothing used this flag and WITNESS would have panic'd during mtx_init()
if anything had.
2003-03-11 20:02:57 +00:00