2206 Commits

Author SHA1 Message Date
pjd
645dd7b662 Add buffer corruption protection (RedZone) for kernel's malloc(9).
It detects both: buffer underflows and buffer overflows bugs at runtime
(on free(9) and realloc(9)) and prints backtraces from where memory was
allocated and from where it was freed.

Tested by:	kris
2006-01-31 11:09:21 +00:00
scottl
a3330c2d56 The change a few years ago of having contigmalloc start its scan at the top
of physical RAM instead of the bottom was a sound idea, but the implementation
left a lot to be desired.  Scans would spend considerable time looking at
pages that are above of the address range given by the caller, and multiple
calls (like what happens in busdma) would spend more time on top of that
rescanning the same pages over and over.

Solve this, at least for now, with two simple optimizations.  The first is
to not bother scanning high ordered pages that are outside of the provided
address range.  Second is to cache the page index from the last successful
operation so that subsequent scans don't have to restart from the top.  This
is conditional on the numpages argument being the same or greater between
calls.

MFC After: 2 weeks
2006-01-29 08:24:54 +00:00
jhb
9b143bcf62 Add a new macro wrapper WITNESS_CHECK() around the witness_warn() function.
The difference between WITNESS_CHECK() and WITNESS_WARN() is that
WITNESS_CHECK() should be used in the places that the return value of
witness_warn() is checked, whereas WITNESS_WARN() should be used in places
where the return value is ignored.  Specifically, in a kernel without
WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as
WITNESS_CHECK evaluates to 0.  I also updated the one place that was
checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
2006-01-27 22:20:15 +00:00
cognet
c2e2425399 Make sure b_vp and b_bufobj are NULL before calling relpbuf(), as it asserts
they are. They should be NULL at this point, except if we're coming from
swapdev_strategy().
It should only affect the case where we're swapping directly on a file over
NFS.
2006-01-27 21:11:50 +00:00
alc
c5764c76fc Style: Add blank line after local variable declarations. 2006-01-27 21:06:37 +00:00
alc
8db5cab2c2 Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 08:35:32 +00:00
alc
d2566b1431 Use the new macros abstracting the page coloring/queues implementation.
(There are no functional changes.)
2006-01-27 07:28:51 +00:00
alc
e32ac719d6 Plug a leak in the newer contigmalloc() implementation. Specifically, if
a multipage allocation was aborted midway, the pages that were already
allocated were not always returned to the free list.

Submitted by: tegge
2006-01-26 05:51:26 +00:00
jeff
4981e49c6b - Avoid calling vm_object_backing_scan() when collapsing an object when
the resident page count matches the object size.  We know it fully backs
   its parent in this case.

Reviewed by:	acl, tegge
Sponsored by:	Isilon Systems, Inc.
2006-01-25 08:42:58 +00:00
alc
d1c7af5900 The previous revision incorrectly changed a switch statement into an if
statement.  Specifically, a break statement that previously broke out of
the enclosing switch was not changed.  Consequently, the enclosing loop
terminated prematurely.

This could result in "vm_page_insert: page already inserted" panics.

Submitted by: tegge
2006-01-25 06:45:57 +00:00
alc
e8d6ec993a With the recent changes to the implementation of page coloring, the
the option PQ_NOOPT is used exclusively by vm_pageq.c.  Thus, the
include of opt_vmpage.h can be removed from vm_page.h.
2006-01-24 19:24:54 +00:00
alc
0ba6836a1e In vm_page_set_invalid() invalidate all of the page's mappings as soon as
any part of the page's contents is invalidated.

Submitted by: tegge
2006-01-24 07:21:38 +00:00
alc
a880fbca84 Make vm_object_vndeallocate() static. The external calls to it were
eliminated in ufs/ffs/ffs_vnops.c's revision 1.125.
2006-01-22 23:56:20 +00:00
jhb
b183f7d5ee Reduce the scope of one #ifdef to avoid duplicating a SYSCTL_INT() macro
and trim another unneeded #ifdef (it was just around a macro that is
already conditionally defined).
2006-01-06 18:03:45 +00:00
netchild
8852f0af37 Convert the PAGE_SIZE check into a CTASSERT.
Suggested by:	jhb
2006-01-04 19:19:42 +00:00
netchild
fbbb1e238e Prevent divide by zero, use default values in case one of the divisor's
is zero.

Tested by:	Randy Bush <randy@psg.com>
2006-01-04 18:26:54 +00:00
netchild
507a9b3e93 MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
pjd
3cc29e6ebf Improve memguard a bit:
- Provide tunable vm.memguard.desc, so one can specify memory type without
  changing the code and recompiling the kernel.
- Allow to use memguard for kernel modules by providing sysctl
  vm.memguard.desc, which can be changed to short description of memory
  type before module is loaded.
- Move as much memguard code as possible to memguard.c.
- Add sysctl node vm.memguard. and move memguard-specific sysctl there.
- Add malloc_desc2type() function for finding memory type based on its
  short description (ks_shortdesc field).
- Memory type can be changed (via vm.memguard.desc sysctl) only if it
  doesn't exist (will be loaded later) or when no memory is allocated yet.
  If there is allocated memory for the given memory type, return EBUSY.
- Implement two ways of memory types comparsion and make safer/slower the
  default.
2005-12-30 11:45:07 +00:00
tegge
7245d518e8 Don't access fs->first_object after dropping reference to it.
The result could be a missed or extra giant unlock.

Reviewed by:	alc
2005-12-20 12:27:59 +00:00
alc
f69d4d5fa8 Use sf_buf_alloc() instead of vm_map_find() on exec_map to create the
ephemeral mappings that are used as the source for three copy
operations from kernel space to user space.  There are two reasons for
making this change: (1) Under heavy load exec_map can fill up causing
vm_map_find() to fail.  When it fails, the nascent process is aborted
(SIGABRT).  Whereas, this reimplementation using sf_buf_alloc()
sleeps.  (2) Although it is possible to sleep on vm_map_find()'s
failure until address space becomes available (see kmem_alloc_wait()),
using sf_buf_alloc() is faster.  Furthermore, the reimplementation
uses a CPU private mapping, avoiding a TLB shootdown on
multiprocessors.

Problem uncovered by: kris@
Reviewed by: tegge@
MFC after: 3 weeks
2005-12-16 18:34:14 +00:00
alc
9ecb89a9e0 Assert that the page that is given to vm_page_free_toq() does not have any
managed mappings.
2005-12-13 19:59:09 +00:00
alc
a5d0ac5faf Remove unneeded calls to pmap_remove_all(). The given page is not mapped.
Reviewed by: tegge
2005-12-11 22:06:57 +00:00
alc
034f1727af Simplify vmspace_dofree(). 2005-12-04 22:55:41 +00:00
alc
55eca1baec Eliminate unneeded preallocation at initialization.
Reviewed by: tegge
2005-12-03 22:41:15 +00:00
alc
8f06802ece Eliminate unneeded preallocation at initialization.
Reviewed by: tegge
2005-12-03 19:37:29 +00:00
alc
b77df1e33a Eliminate pmap_init2(). It's no longer used. 2005-11-20 06:09:49 +00:00
alc
8852c8f9e2 Reimplement the reclamation of PV entries. Specifically, perform
reclamation synchronously from get_pv_entry() instead of
asynchronously as part of the page daemon.  Additionally, limit the
reclamation to inactive pages unless allocation from the PV entry zone
or reclamation from the inactive queue fails.  Previously, reclamation
destroyed mappings to both inactive and active pages.  get_pv_entry()
still, however, wakes up the page daemon when reclamation occurs.  The
reason being that the page daemon may move some pages from the active
queue to the inactive queue, making some new pages available to future
reclamations.

Print the "reclaiming PV entries" message at most once per minute, but
don't stop printing it after the fifth time.  This way, we do not give
the impression that the problem has gone away.

Reviewed by: tegge
2005-11-09 08:19:21 +00:00
alc
bf6d4253ee If a physical page is mapped by two or more virtual addresses, transmitted
by the zero-copy sockets method, and written to before the transmission
completes, we need to destroy all of the existing mappings to the page,
not just the one that we fault on.  Otherwise, the mappings will no longer
be to the same page and changes made through one of the mappings will not
be visible through the others.

Observed by: tegge
2005-11-08 06:33:21 +00:00
ps
c171de528f Rate limit vnode_pager_putpages printfs to once a second. 2005-11-01 23:00:24 +00:00
alc
e3ab6622c5 Consider the zero-copy transmission of a page that was wired by mlock(2).
If a copy-on-write fault occurs on the page, the new copy should inherit
a part of the original page's wire count.

Submitted by: tegge
MFC after: 1 week
2005-11-01 04:30:21 +00:00
rwatson
be4f357149 Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
alc
80f50568e4 Use of the ZERO_COPY_SOCKETS options can result in an unusual state that
vm_object_backing_scan() was not written to handle.  Specifically, a wired
page within a backing object that is shadowed by a page within the shadow
object.  Handle this state by removing the wired page from the backing
object.  The wired page will be freed by socow_iodone().

Stop masking errors: If a page is being freed by vm_object_backing_scan(),
assert that it is no longer mapped rather than quietly destroying any
mappings.

Tested by: Harald Schmalzbauer
2005-10-22 18:46:38 +00:00
rwatson
f604e28c4e Change format string for u_int64_t to %ju from %llu, in order to use the
correct format string on 64-bit systems.

Pointed out by:	pjd
2005-10-20 21:28:31 +00:00
rwatson
f568b9c967 Add a "show uma" command to DDB, which prints out the current stats for
available UMA zones.  Quite useful for post-mortem debugging of memory
leaks without a dump device configured on a panicked box.

MFC after:	2 weeks
2005-10-20 16:39:33 +00:00
dds
0fb2e655fd Move execve's access time update functionality into a new
vfs_mark_atime() function, and use the new function for
performing efficient atime updates in mmap().

Reviewed by:	bde
MFC after:	2 weeks
2005-10-12 06:56:00 +00:00
des
c66d4a275d As alc pointed out to me, vm_page.c 1.305 was incomplete: uma_startup()
still uses the constant UMA_BOOT_PAGES.  Change it to accept boot_pages
as an additional argument.

MFC after:	2 weeks
2005-10-08 21:03:54 +00:00
dds
c066055957 Update the vnode's access time after an mmap operation on it.
Before this change a copy operation with cp(1) would not update the
file access times.

According to the POSIX mmap(2) documentation: the st_atime field
of the mapped file may be marked for update at any time between the
mmap() call and the corresponding munmap() call. The initial read
or write reference to a mapped region shall cause the file's st_atime
field to be marked for update if it has not already been marked for
update.
2005-10-04 14:58:58 +00:00
jhb
7758b042a5 Trim a couple of unneeded includes. 2005-09-29 19:13:52 +00:00
cognet
2ccdc16083 Make sure we have a bufobj before calling bstrategy().
I'm not sure this is the right thing to do, but at least I don't panic
anymore when swapping on a NFS file without using md(4).

X-MFC after:      proper review
2005-09-21 15:01:09 +00:00
peter
eb46ffb067 Remove unused (but initialized) variable 'objsize' from vm_mmap() 2005-09-20 22:08:27 +00:00
alc
d03626c7d6 Introduce a new lock for the purpose of synchronizing access to the
UMA boot pages.

Disable recursion on the general UMA lock now that startup_alloc() no
longer uses it.

Eliminate the variable uma_boot_free.  It serves no purpose.

Note: This change eliminates a lock-order reversal between a system
map mutex and the UMA lock.  See
http://sources.zabbadoz.net/freebsd/lor.html#109 for details.

MFC after: 3 days
2005-09-09 06:03:08 +00:00
alc
dd6197a72f Eliminate an incorrect cast. 2005-09-07 01:42:30 +00:00
alc
39788de49e Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.
2005-09-03 18:20:20 +00:00
kan
51355225d4 Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variable in the same way.

Reported by:	ade
Obtained from:	alc
MFC after:	2 days
2005-08-13 20:21:33 +00:00
tegge
f13480db93 Check for marker pages when scanning active and inactive page queues.
Reviewed by:	alc
2005-08-12 18:17:40 +00:00
des
1722544f93 Introduce the vm.boot_pages tunable and sysctl, which controls the number
of pages reserved to bootstrap the kernel memory allocator.

MFC after:	2 weeks
2005-08-12 12:24:19 +00:00
tegge
a73af1b81d Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE
due to the vm object being locked.

When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine.  If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.

Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation.  Detect and handle relevant page queue changes.

This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.

Reviewed by:	alc
2005-08-10 00:17:36 +00:00
ssouhlal
871cf7b33c Use atomic operations on runningbufspace.
PR:		kern/84318
Submitted by:	ade
MFC after:	3 days
2005-08-08 22:44:10 +00:00
rwatson
149e4a2dca Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,
as opt_vmpage.h will not be available to user space library builds.  A
similar existing check is present for KLD_MODULE for similar reasons.

MFC after:	3 days
2005-08-04 10:05:11 +00:00
rwatson
65053d65ad Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be
used from memstat_uma.c for the purposes of kvm access without lots
of additional unsafe includes.

MFC after:	3 days
2005-08-04 10:03:53 +00:00