2225 Commits

Author SHA1 Message Date
jhb
d535a5cb81 Change msleep() and tsleep() to not alter the calling thread's priority
if the specified priority is zero.  This avoids a race where the calling
thread could read a snapshot of it's current priority, then a different
thread could change the first thread's priority, then the original thread
would call sched_prio() inside msleep() undoing the change made by the
second thread.  I used a priority of zero as no thread that calls msleep()
or tsleep() should be specifying a priority of zero anyway.

The various places that passed 'curthread->td_priority' or some variant
as the priority now pass 0.
2006-04-17 18:20:38 +00:00
pjd
ca3f23ca34 On shutdown try to turn off all swap devices. This way GEOM providers are
properly closed on shutdown.

Requested by:	ru
Reviewed by:	alc
MFC after:	2 weeks
2006-04-10 10:03:41 +00:00
peter
0f363b7d24 Remove the unused sva and eva arguments from pmap_remove_pages(). 2006-04-03 21:16:10 +00:00
jkoshy
48e5e4792d MFP4: Support for profiling dynamically loaded objects.
Kernel changes:

  Inform hwpmc of executable objects brought into the system by
  kldload() and mmap(), and of their removal by kldunload() and
  munmap().  A helper function linker_hwpmc_list_objects() has been
  added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve
  the list of currently loaded kernel modules.

  The unused `MAPPINGCHANGE' event has been deprecated in favour
  of separate `MAP_IN' and `MAP_OUT' events; this change reduces
  space wastage in the log.

  Bump the hwpmc's ABI version to "2.0.00".  Teach hwpmc(4) to
  handle the map change callbacks.

  Change the default per-cpu sample buffer size to hold
  32 samples (up from 16).

  Increment __FreeBSD_version.

libpmc(3) changes:

  Update libpmc(3) to deal with the new events in the log file; bring
  the pmclog(3) manual page in sync with the code.

pmcstat(8) changes:

  Introduce new options to pmcstat(8): "-r" (root fs path), "-M"
  (mapfile name), "-q"/"-v" (verbosity control).  Option "-k" now
  takes a kernel directory as its argument but will also work with
  the older invocation syntax.

  Rework string handling in pmcstat(8) to use an opaque type for
  interned strings.  Clean up ELF parsing code and add support for
  tracking dynamic object mappings reported by a v2.0.00 hwpmc(4).

  Report statistics at the end of a log conversion run depending
  on the requested verbosity level.

Reviewed by:	jhb, dds (kernel parts of an earlier patch)
Tested by:	gallatin (earlier patch)
2006-03-26 12:20:54 +00:00
imp
f1947eff71 Remove leading __ from __(inline|const|signed|volatile). They are
obsolete.  This should reduce diffs to NetBSD as well.
2006-03-08 06:31:46 +00:00
tegge
7d4fc1e0d4 Ignore dirty pages owned by "dead" objects. 2006-03-08 00:51:00 +00:00
tegge
774f51ad2c Eliminate a deadlock when creating snapshots. Blocking vn_start_write() must
be called without any vnode locks held.  Remove calls to vn_start_write() and
vn_finished_write() in vnode_pager_putpages() and add these calls before the
vnode lock is obtained to most of the callers that don't already have them.
2006-03-02 22:13:28 +00:00
tegge
29fc266ddd Hold extra reference to vm object while cleaning pages. 2006-03-02 21:38:38 +00:00
jhb
55b3bd61d2 Lock the vm_object while checking its type to see if it is a vnode-backed
object that requires Giant in vm_object_deallocate().  This is somewhat
hairy in that if we can't obtain Giant directly, we have to drop the
object lock, then lock Giant, then relock the object lock and verify that
we still need Giant.  If we don't (because the object changed to OBJT_DEAD
for example), then we drop Giant before continuing.

Reviewed by:	alc
Tested by:	kris
2006-02-21 22:09:54 +00:00
tegge
d0ed011fe8 Expand scope of marker to reduce the number of page queue scan restarts. 2006-02-17 21:02:39 +00:00
tegge
3cf2c022c2 Check return value from nonblocking call to vn_start_write(). 2006-02-17 18:22:19 +00:00
ups
ad011f6a36 When the VM needs to allocated physical memory pages (for non interrupt use)
and it has not plenty of free pages it tries to free pages in the cache queue.
Unfortunately freeing a cached page requires the locking of the object that
owns the page. However in the context of allocating pages we may not be able
to lock the object and thus can only TRY to lock the object. If the locking try
fails the cache page can not be freed and is activated to move it out of the way
so that we may try to free other cache pages.

If all pages in the cache belong to objects that are currently locked the
cache queue can be emptied without freeing a single page. This scenario caused
two problems:

    1)  vm_page_alloc always failed allocation when it tried freeing pages from
        the cache queue and failed to do so. However if there are more than
        cnt.v_interrupt_free_min pages on the free list it should return pages
        when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause
        resource exhaustion deadlocks.

    2)  Threads than need to allocate pages spend a lot of time cleaning up the
        page queue without really getting anything done while the pagedaemon
         needs to work overtime to refill the cache.

This change fixes the first problem. (1)

Reviewed by:	tegge@
2006-02-15 22:29:53 +00:00
rwatson
2c7a66af35 Skip per-cpu caches associated with absent CPUs when generating a
memory statistics record stream via sysctl.

MFC after:	3 days
2006-02-11 19:20:56 +00:00
jeff
547663461e - Fix silly VI locking that is used to check a single flag. The vnode
lock also protects this flag so it is not necessary.
 - Don't rely on v_mount to detect whether or not we've been recycled, use
   the more appropriate VI_DOOMED instead.

Sponsored by:	Isilon Systems, Inc.
MFC After:	1 week
2006-02-06 10:14:12 +00:00
alc
83970835a0 Remove an unnecessary call to pmap_remove_all(). The given page is not
mapped because its contents are invalid.
2006-02-04 22:37:10 +00:00
tegge
724ef57f1f Adjust old comment (present in rev 1.1) to match changes in rev 1.82.
PR:	kern/92509
Submitted by:   "Bryan Venteicher" <bryanv@daemoninthecloset.org>
2006-02-02 21:55:38 +00:00
yar
5a09437b55 Use off_t for file size passed to vnode_create_vobject().
The former type, size_t, was causing truncation to 32 bits on i386,
which immediately led to undersizing of VM objects backed by
files >4GB.  In particular, sendfile(2) was broken for such files.

PR:		kern/92243
MFC after:	5 days
2006-02-01 12:43:13 +00:00
jeff
4126297e17 - Install a temporary bandaid in vm_object_reference() that will stop
mtx_assert()s from triggering until I find a real long-term solution.
2006-02-01 09:47:02 +00:00
alc
2283c4343f Change #if defined(DIAGNOSTIC) to KASSERT. 2006-01-31 19:06:51 +00:00
pjd
645dd7b662 Add buffer corruption protection (RedZone) for kernel's malloc(9).
It detects both: buffer underflows and buffer overflows bugs at runtime
(on free(9) and realloc(9)) and prints backtraces from where memory was
allocated and from where it was freed.

Tested by:	kris
2006-01-31 11:09:21 +00:00
scottl
a3330c2d56 The change a few years ago of having contigmalloc start its scan at the top
of physical RAM instead of the bottom was a sound idea, but the implementation
left a lot to be desired.  Scans would spend considerable time looking at
pages that are above of the address range given by the caller, and multiple
calls (like what happens in busdma) would spend more time on top of that
rescanning the same pages over and over.

Solve this, at least for now, with two simple optimizations.  The first is
to not bother scanning high ordered pages that are outside of the provided
address range.  Second is to cache the page index from the last successful
operation so that subsequent scans don't have to restart from the top.  This
is conditional on the numpages argument being the same or greater between
calls.

MFC After: 2 weeks
2006-01-29 08:24:54 +00:00
jhb
9b143bcf62 Add a new macro wrapper WITNESS_CHECK() around the witness_warn() function.
The difference between WITNESS_CHECK() and WITNESS_WARN() is that
WITNESS_CHECK() should be used in the places that the return value of
witness_warn() is checked, whereas WITNESS_WARN() should be used in places
where the return value is ignored.  Specifically, in a kernel without
WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as
WITNESS_CHECK evaluates to 0.  I also updated the one place that was
checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
2006-01-27 22:20:15 +00:00
cognet
c2e2425399 Make sure b_vp and b_bufobj are NULL before calling relpbuf(), as it asserts
they are. They should be NULL at this point, except if we're coming from
swapdev_strategy().
It should only affect the case where we're swapping directly on a file over
NFS.
2006-01-27 21:11:50 +00:00
alc
c5764c76fc Style: Add blank line after local variable declarations. 2006-01-27 21:06:37 +00:00
alc
8db5cab2c2 Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 08:35:32 +00:00
alc
d2566b1431 Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 07:28:51 +00:00
alc
e32ac719d6 Plug a leak in the newer contigmalloc() implementation. Specifically, if
a multipage allocation was aborted midway, the pages that were already
allocated were not always returned to the free list.

Submitted by: tegge
2006-01-26 05:51:26 +00:00
jeff
4981e49c6b - Avoid calling vm_object_backing_scan() when collapsing an object when
the resident page count matches the object size.  We know it fully backs
   its parent in this case.

Reviewed by:	acl, tegge
Sponsored by:	Isilon Systems, Inc.
2006-01-25 08:42:58 +00:00
alc
d1c7af5900 The previous revision incorrectly changed a switch statement into an if
statement.  Specifically, a break statement that previously broke out of
the enclosing switch was not changed.  Consequently, the enclosing loop
terminated prematurely.

This could result in "vm_page_insert: page already inserted" panics.

Submitted by: tegge
2006-01-25 06:45:57 +00:00
alc
e8d6ec993a With the recent changes to the implementation of page coloring, the
the option PQ_NOOPT is used exclusively by vm_pageq.c.  Thus, the
include of opt_vmpage.h can be removed from vm_page.h.
2006-01-24 19:24:54 +00:00
alc
0ba6836a1e In vm_page_set_invalid() invalidate all of the page's mappings as soon as
any part of the page's contents is invalidated.

Submitted by: tegge
2006-01-24 07:21:38 +00:00
alc
a880fbca84 Make vm_object_vndeallocate() static. The external calls to it were
eliminated in ufs/ffs/ffs_vnops.c's revision 1.125.
2006-01-22 23:56:20 +00:00
jhb
b183f7d5ee Reduce the scope of one #ifdef to avoid duplicating a SYSCTL_INT() macro
and trim another unneeded #ifdef (it was just around a macro that is
already conditionally defined).
2006-01-06 18:03:45 +00:00
netchild
8852f0af37 Convert the PAGE_SIZE check into a CTASSERT.
Suggested by:	jhb
2006-01-04 19:19:42 +00:00
netchild
fbbb1e238e Prevent divide by zero, use default values in case one of the divisor's
is zero.

Tested by:	Randy Bush <randy@psg.com>
2006-01-04 18:26:54 +00:00
netchild
507a9b3e93 MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
pjd
3cc29e6ebf Improve memguard a bit:
- Provide tunable vm.memguard.desc, so one can specify memory type without
  changing the code and recompiling the kernel.
- Allow to use memguard for kernel modules by providing sysctl
  vm.memguard.desc, which can be changed to short description of memory
  type before module is loaded.
- Move as much memguard code as possible to memguard.c.
- Add sysctl node vm.memguard. and move memguard-specific sysctl there.
- Add malloc_desc2type() function for finding memory type based on its
  short description (ks_shortdesc field).
- Memory type can be changed (via vm.memguard.desc sysctl) only if it
  doesn't exist (will be loaded later) or when no memory is allocated yet.
  If there is allocated memory for the given memory type, return EBUSY.
- Implement two ways of memory types comparsion and make safer/slower the
  default.
2005-12-30 11:45:07 +00:00
tegge
7245d518e8 Don't access fs->first_object after dropping reference to it.
The result could be a missed or extra giant unlock.

Reviewed by:	alc
2005-12-20 12:27:59 +00:00
alc
f69d4d5fa8 Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the
ephemeral mappings that are used as the source for three copy
operations from kernel space to user space.  There are two reasons for
making this change: (1) Under heavy load exec_map can fill up causing
vm_map_find() to fail.  When it fails, the nascent process is aborted
(SIGABRT).  Whereas, this reimplementation using sf_buf_alloc()
sleeps.  (2) Although it is possible to sleep on vm_map_find()'s
failure until address space becomes available (see kmem_alloc_wait()),
using sf_buf_alloc() is faster.  Furthermore, the reimplementation
uses a CPU private mapping, avoiding a TLB shootdown on
multiprocessors.

Problem uncovered by: kris@
Reviewed by: tegge@
MFC after: 3 weeks
2005-12-16 18:34:14 +00:00
alc
9ecb89a9e0 Assert that the page that is given to vm_page_free_toq() does not have any
managed mappings.
2005-12-13 19:59:09 +00:00
alc
a5d0ac5faf Remove unneeded calls to pmap_remove_all(). The given page is not mapped.
Reviewed by: tegge
2005-12-11 22:06:57 +00:00
alc
034f1727af Simplify vmspace_dofree(). 2005-12-04 22:55:41 +00:00
alc
55eca1baec Eliminate unneeded preallocation at initialization.
Reviewed by: tegge
2005-12-03 22:41:15 +00:00
alc
8f06802ece Eliminate unneeded preallocation at initialization.
Reviewed by: tegge
2005-12-03 19:37:29 +00:00
alc
b77df1e33a Eliminate pmap_init2(). It's no longer used. 2005-11-20 06:09:49 +00:00
alc
8852c8f9e2 Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon.  Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails.  Previously, reclamation
destroyed mappings to both inactive and active pages.  get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs.  The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time.  This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge
2005-11-09 08:19:21 +00:00
alc
bf6d4253ee If a physical page is mapped by two or more virtual addresses, transmitted
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on.  Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.

Observed by: tegge
2005-11-08 06:33:21 +00:00
ps
c171de528f Rate limit vnode_pager_putpages printfs to once a second. 2005-11-01 23:00:24 +00:00
alc
e3ab6622c5 Consider the zero-copy transmission of a page that was wired by mlock(2).
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.

Submitted by: tegge
MFC after: 1 week
2005-11-01 04:30:21 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00