6099 Commits

Author SHA1 Message Date
iedowse
db8dd4828e In m_dup_pkthdr(), convert the supplied `how' argument into malloc
flags when passing it into m_tag_copy_chain(), as m_tag* functions
use malloc, not mbuf flags.
2003-03-13 09:02:19 +00:00
jeff
459181e3ed - Add a lock for protecting against msleep(bp, ...) wakeup(bp) races.
- Create a new function bdone() which sets B_DONE and calls wakup(bp). This
   is suitable for use as b_iodone for buf consumers who are not going
   through the buf cache.
 - Create a new function bwait() which waits for the buf to be done at a set
   priority and with a specific wmesg.
 - Replace several cases where the above functionality was implemented
   without locking with the new functions.
2003-03-13 07:31:45 +00:00
jeff
ec5374265b - Remove a dead check for bp->b_vp == vp in vtruncbuf(). This has not been
possible for some time.
 - Lock the buf before accessing fields.  This should very rarely be locked.
 - Assert that B_DELWRI is set after we acquire the buf.  This should always
   be the case now.
2003-03-13 07:22:53 +00:00
jeff
ae3c8799da - Remove a race between fsync like functions and flushbufqueues() by
requiring locked bufs in vfs_bio_awrite().  Previously the buf could
   have been written out by fsync before we acquired the buf lock if it
   weren't for giant.  The cluster_wbuild() handles this race properly but
   the single write at the end of vfs_bio_awrite() would not.
 - Modify flushbufqueues() so there is only one copy of the loop.  Pass a
   parameter in that says whether or not we should sync bufs with deps.
 - Call flushbufqueues() a second time and then break if we couldn't find
   any bufs without deps.
2003-03-13 07:19:23 +00:00
alfred
91e561ec03 Make sure we actually have a dev before dereferencing in case someone
botches and sends us a NULL pointer.  The other code in this file seems
to expect it to be able to handle it behaving this way.
2003-03-13 06:29:44 +00:00
jeff
814703b2a4 - Tune down read_max. For single disks we get no gain out of reading more
than a MAXPHYS size block ahead.  Having this set too high just leaves
   other processes starved for IO and screws up interactive response.  Let the
   users with RAID set it higher when they need it.
2003-03-13 06:17:59 +00:00
tjr
2230824221 Tidy up previous change: move comment about obtaining an exclusive
reference where it belongs, and remove a blank line to make it more
obvious what the comment applies to.
2003-03-13 00:57:47 +00:00
tjr
a9d877b4c2 Back out previous. The locking here needs a rethink. 2003-03-13 00:54:53 +00:00
jhb
954b82f293 - Various little style fixes.
- If SYSCTL_OUT() fails in sysctl_kern_proc_args(), return the error
  instead of ignoring it if we have new arguments for the process.
- If the new arguments for a process are too long, return ENOMEM instead of
  returning success but not doing the actual copy.

Submitted by:	bde
2003-03-12 20:17:40 +00:00
jhb
7510e91aa2 - Avoid dropping the proc lock around a simple permissions check and just
hold hold it across the check to avoid extra lock operations in the
  common case.
- Copy in the new args to a temporary pargs structure before we drop the
  reference to the old one.  Thus, if the copyin() fails, the process
  arguments are unchanged rather than being deleted.  Also, p_args is no
  longer NULL during the sysctl operation.
2003-03-12 16:14:55 +00:00
tjr
679efe569a Acquire sched_lock around use of FOREACH_KSEGRP_IN_PROC, accesses
to kg_nice and calls to sched_nice() in getpriority() and setpriority()
(really donice()).
2003-03-12 11:24:41 +00:00
tjr
59d5730195 In wait1(), remove the zombie process from zombproc before removing
it from its pgrp to avoid leaving zombies around with p_pgrp == NULL.
This bug was apparent as a NULL-dereference in the pid selection code
in fork1().
2003-03-12 11:10:04 +00:00
jhb
0c3ac305c8 Trim an extra blank line that snuck into the last commit. 2003-03-11 22:33:42 +00:00
kan
378cd3b05d Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by:	jeff
2003-03-11 22:15:10 +00:00
jhb
b2bb08b487 - Change witness_displaydescendants() to accept the indentation level as
a parameter instead of using the level of a given witness.  When
  recursing, pass an indent level of indent + 1.
- Make use of the information witness_levelall() provides in
  witness_display_list() to use an O(n) algorithm instead of an O(n^2)
  algo to decide which witnesses to display hierarchies from.  Basically,
  we only display a hierarchy for witnesses with a level of 0.
- Add a new per-witness flag that is reset at the start of
  witness_display() for all witness's and is set the first time a witness
  is displayed in witness_displaydescendants().  If a witness is
  encountered more than once in the lock order tree (which happens often),
  witness_displaydescendants() marks the later occurrences with the string
  "(already displayed)" and doesn't display the subtree under that
  witness.  This avoids duplicating large amounts of the lock order tree
  in the 'show witness' output in DDB.

All these changes serve to make 'show witness' a lot more readable and
useful than it was previously.
2003-03-11 22:14:21 +00:00
jhb
736396fc05 - Split the itismychild() function into two functions: insertchild()
adds a witness to the child list of a parent witness.  rebalancetree()
  runs through the entire tree removing direct descendants of witnesses
  who already have said child witness as an indirect descendant through
  another direct descendant.  itismychild() now calls insertchild()
  followed by rebalancetree() and no longer needs the evil hack of
  having static recursed variable.
- Add a function reparentchildren() that adds all the direct descendants
  of one witness as direct descendants of another witness.
- Change the return value of itismychild() and similar functions so that
  they return 0 in the case of failure due to lack of resources instead
  of 1.  This makes the return value more intuitive.
- Check the return value of itismychild() when defining the static lock
  order in witness_initialize().
- Don't try to setup a lock instance in witness_lock() if itismychild()
  fails.  Witness is hosed anyways so no need to do any more witness
  related activity at that point.  It also makes the code flow easier to
  understand.
- Add a new depart() function as the opposite of enroll().  When the
  reference count of a witness drops to 0 in witness_destroy(), this
  function is called on that witness.  First, it runs through the
  lock order tree using reparentchildren() to reparent direct descendants
  of the departing witness to each of the witness' parents in the tree.
  Next, it releases it's own child list and other associated resources.
  Finally it calls rebalanacetree() to rebalance the lock order tree.
- Sort function prototypes into something closer to alphabetical order.

As a result of these changes, there should no longer be 'dead' witnesses
in the order tree, and repeatedly loading and unloading a module should no
longer exhaust witness of its internal resources.

Inspired by:	gallatin
2003-03-11 22:07:35 +00:00
jhb
5aeddebb51 Trim useless "../" leading strings from filenames passed into witness. 2003-03-11 21:53:12 +00:00
jhb
42e39caaab Adjust style of #ifdef's and #endif's to be more consistent and in line
with recent additions to style(9).
2003-03-11 21:38:49 +00:00
jhb
f7f4727b44 Do the lock order check skip for the LOP_TRYLOCK case after the check for
recursing on a lock instead of before.  This fixes a bug where WITNESS
could get a little confused if you did an sx_tryslock() on a sx lock that
you already had an slock on.  WITNESS would still function correctly but
it could result in weirdness in the output of 'show locks'.  This also
makes it possible for mtx_trylock() to recurse on a lock.
2003-03-11 20:54:37 +00:00
jhb
ed498389de Rework the eventhandler locking for hopefully the last time. The scheme
used popped into my head during my morning commute a few weeks ago, but
it is also very similar (though a bit simpler) to a patch that mini@
developed a while ago.  Basically, each eventhandler list has a mutex and
a run count.  During an eventhandler invocation, the mutex is held while
we traverse the list but is dropped while we execute actual handlers.  Also,
a runcount counter is incremented at the start of an invocation and
decremented at the end of an invocation.  Adding to the list is not a big
deal since the reference of a thread currently executing the handlers
remains valid across an add operation.  Whether or not new handlers are
executed by threads currently executing the handlers for a given list is
indeterminate however.  The harder case is when a handler is removed from
the list.  If the runcount is zero, the handler is simply removed from the
list directly.  If the runcount is not zero, then another thread is
currently executing the handlers of this list, so the priority of this
handler is set to a magic value (currently -1) to mark it as dead.  Dead
handlers are not executed during an invocation.  If the runcount is zero
after it is decremented at the end of an invocation, then a new
eventhandler_prune_list() function is called to remove dead handlers from
the list.

Additional minor notes:
- All the common parts of EVENTHANDLER_INVOKE() and
  EVENTHANDLER_FAST_INVOKE() have been merged into a common
  _EVENTHANDLER_INVOKE() macro to reduce duplication and ease maintenance.
- KTR logging for eventhandlers is now available via the KTR_EVH mask.
- The global eventhander_mutex is no longer recursive.

Tested by:	scottl (SMP i386)
2003-03-11 20:17:00 +00:00
jhb
97c1e71ca2 Axe the useless MTX_SLEEPABLE flag. mutexes are not sleepable locks.
Nothing used this flag and WITNESS would have panic'd during mtx_init()
if anything had.
2003-03-11 20:02:57 +00:00
jhb
8c7bd00836 Use a shorter and less redundant name for the sysctl tree lock. 2003-03-11 20:01:51 +00:00
jhb
a8c7b4178e Use the KTR_LOCK mask for logging events via KTR in lockmgr() rather
than KTR_LOCKMGR.  lockmgr locks are locks just like other locks.
2003-03-11 20:00:37 +00:00
jhb
559c644740 Trim leading "../" sequences from filenames. 2003-03-11 19:56:16 +00:00
jeff
1e928d5480 - Regularize variable usage in cluster_read().
- Issue the io that we will later block on prior to doing cluster read ahead
   so that it is more likely to be ready when we block.
 - Loop issuing clustered reads until we've exhausted the seq count supplied
   by the file system.
 - Use a sysctl tunable "vfs.read_max" to determine the maximum number of
   blocks that we'll read ahead.
2003-03-11 06:14:03 +00:00
davidxu
f453b04b6d Lock proc lock before changing p_flag. 2003-03-11 03:16:02 +00:00
davidxu
b47a4be33e Fix signal delivering bug for threaded process. 2003-03-11 02:59:50 +00:00
davidxu
bb4f70ad77 Fix threaded process job control bug. SMP tested.
Reviewed by: julian
2003-03-11 00:07:53 +00:00
kan
5fc13b9f67 Remove trainling whitespace. 2003-03-10 21:55:00 +00:00
phk
780911f210 PHCC[1]:
I had commented the #ifdef INVARIANTS checks out to make sure I ran this
code in all kernels and forgot to comment the #ifdefs back in before I
committed.

Spotted by:	bmilekic

[1] PHCC = Pointy Hat Correction Commit
2003-03-10 20:24:54 +00:00
phk
b3deffb7e1 Make malloc and mbuf allocation mode flags nonoverlapping.
Under INVARIANTS whine if we get incompatible flags.

Submitted by:   imp
2003-03-10 19:39:53 +00:00
jhb
cb710fa095 Now that we have WITNESS_WARN(), we only call witness_list() from the
ddb 'show locks' command.  Thus, move witness_list() to the #ifdef DDB
section and remove extra checks for calling this function outside of
DDB.  Also, witness_list() now returns void instead of returning an int.

Reported by:	Steve Ames <steve@energistic.com>
Prodded by:	davidxu
2003-03-10 17:03:57 +00:00
phk
ef078ef6a1 Don't call make_dev() before we are ready for it. 2003-03-09 20:42:49 +00:00
alc
3a50d97c5c Remove some unnecessary actions by the zero-copy setup and teardown code.
Remove an incorrect comment.  (Incrementing an object's reference count
does not prevent a process from exiting.  The real concern here is that the
physical page must not be deleted until transmission is complete.  That is
already handled by the VM system and sf_buf_free().)

Tested by:	ken
2003-03-09 20:38:56 +00:00
phk
2fe1f6204d Note that MAJOR_AUTO is now the default if d_maj is not initialized. This
is more robust and prevents the hijacking of /dev/console for the typical
mistake.

Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.
2003-03-09 11:03:45 +00:00
phk
0d99ca7a9d Add one little hack to allow us to make MAJOR_AUTO be zero:
Let the console driver ask for major 256 and magically change this to
mean zero.
2003-03-09 10:28:05 +00:00
davidxu
26081cfe06 Cosmetic change, make it QUEUE_MACRO_DEBUG friendly 2003-03-09 04:27:46 +00:00
tjr
0b60094f80 Hold the proc lock while accessing p_procsig in trapsignal(). 2003-03-09 01:40:55 +00:00
phk
cb28d5d218 Retire devstat_add_entry() as a public function and bump __FreeBSD_version
to mark this act.
2003-03-08 21:46:43 +00:00
phk
e39377fdd0 Introduce a device driver for /dev/devstat, this will allow us to mmap
the device statistics structures into userland instead of using sysctl.

Introduce new devstat_new_entry() function which allocates the devstat
structure an calls devstat_add_entry() on it.
2003-03-08 19:58:57 +00:00
ken
471eab1868 Zero copy send and receive fixes:
- On receive, vm_map_lookup() needs to trigger the creation of a shadow
  object.  To make that happen, call vm_map_lookup() with PROT_WRITE
  instead of PROT_READ in vm_pgmoveco().

- On send, a shadow object will be created by the vm_map_lookup() in
  vm_fault(), but vm_page_cowfault() will delete the original page from
  the backing object rather than simply letting the legacy COW mechanism
  take over.  In other words, the new page should be added to the shadow
  object rather than replacing the old page in the backing object.  (i.e.
  vm_page_cowfault() should not be called in this case.)  We accomplish
  this by making sure fs.object == fs.first_object before calling
  vm_page_cowfault() in vm_fault().

Submitted by:	gallatin, alc
Tested by:	ken
2003-03-08 06:58:18 +00:00
davidxu
dd4ead08fe Lock sched_lock before modifying td_flags. 2003-03-08 04:09:04 +00:00
bbraun
801f005cb2 Fix a spelling error.
Submitted by:	jkh
Reviewed by:	zarzycki
2003-03-07 22:47:32 +00:00
jhb
a189180b83 Respect any passed in external lockmgr flags such as LK_NOWAIT in the
default implementations of VOP_LOCK() and VOP_UNLOCK().

Tested by:	jlemon, phk
Glanced at by:	jeffr
2003-03-07 20:45:07 +00:00
jhb
7e6cfbee4e Oops, fix the double faults people were seeing with the recent changes to
witness.  Sleepable locks such as sx locks always come before all mutexes
including Giant.  However, the static lock order list placed Giant before
the proctree and allproc sx locks.  This resulted in witness creating a
cycle in its lock order "tree" (real trees don't have cycles) leading to
infinite recursion and eventually a double fault.  To fix, put Giant after
sx locks in the lock order list.
2003-03-06 17:25:06 +00:00
alc
e5b46f193e Remove GIANT_REQUIRED from sf_buf_free(). 2003-03-06 04:48:19 +00:00
rwatson
7974609efe Instrument sysarch() MD privileged I/O access interfaces with a MAC
check, mac_check_sysarch_ioperm(), permitting MAC security policy
modules to control access to these interfaces.  Currently, they
protect access to IOPL on i386, and setting HAE on Alpha.
Additional checks might be required on other platforms to prevent
bypass of kernel security protections by unauthorized processes.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-06 04:47:47 +00:00
alc
c50367da67 Remove ENABLE_VFS_IOOPT. It is a long unfinished work-in-progress.
Discussed on:	arch@
2003-03-06 03:41:02 +00:00
rwatson
9ecf925a7d Provide a mac_check_system_swapoff() entry point, which permits MAC
modules to authorize disabling of swap against a particular vnode.

Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-05 23:50:15 +00:00
rwatson
3158a8710a Move the initialization of the vattr flags field in setfflags() to
before the MAC check so that we pass the flags field into the MAC
check properly initialized.  This didn't affect any current MAC
modules since they didn't care what the flags argument was (as
they were primarily interested in the fact that it was a meta-data
write, not the contents of the write), but would be relevant to
future modules relying on that field.

Submitted by:	Mike Halderman <mrh@spawar.navy.mil>
Obtained from:	TrustedBSD Project
Sponsored by:	DARPA, Network Associates Laboratories
2003-03-05 23:15:23 +00:00