Commit Graph

2265 Commits

Author SHA1 Message Date
Alan Cox
57b5187b16 Eliminate an incorrect cast. 2005-09-07 01:42:30 +00:00
Alan Cox
ba8bca610c Pass a value of type vm_prot_t to pmap_enter_quick() so that it determine
whether the mapping should permit execute access.
2005-09-03 18:20:20 +00:00
Alexander Kabaev
857b66d505 Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.
vm_pager_init() is run before required nswbuf variable has been set
to correct value. This caused system to run with single pbuf available
for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt
variable in the same way.

Reported by:	ade
Obtained from:	alc
MFC after:	2 days
2005-08-13 20:21:33 +00:00
Tor Egge
1113a8b44a Check for marker pages when scanning active and inactive page queues.
Reviewed by:	alc
2005-08-12 18:17:40 +00:00
Dag-Erling Smørgrav
cfa22bcc4c Introduce the vm.boot_pages tunable and sysctl, which controls the number
of pages reserved to bootstrap the kernel memory allocator.

MFC after:	2 weeks
2005-08-12 12:24:19 +00:00
Tor Egge
8dbca793a9 Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVE
due to the vm object being locked.

When a process writes large amounts of data to a file, the vm object associated
with that file can contain most of the physical pages on the machine.  If the
process is preempted while holding the lock on the vm object, pagedaemon would
be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE
to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to
other vm objects.

Temporarily unlock the page queues lock while locking vm objects to avoid lock
order violation.  Detect and handle relevant page queue changes.

This change depends on both the lock portion of struct vm_object and normal
struct vm_page being type stable.

Reviewed by:	alc
2005-08-10 00:17:36 +00:00
Suleiman Souhlal
4f12e0acb0 Use atomic operations on runningbufspace.
PR:		kern/84318
Submitted by:	ade
MFC after:	3 days
2005-08-08 22:44:10 +00:00
Robert Watson
8be9063ea1 Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,
as opt_vmpage.h will not be available to user space library builds.  A
similar existing check is present for KLD_MODULE for similar reasons.

MFC after:	3 days
2005-08-04 10:05:11 +00:00
Robert Watson
af17e9a922 Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be
used from memstat_uma.c for the purposes of kvm access without lots
of additional unsafe includes.

MFC after:	3 days
2005-08-04 10:03:53 +00:00
Robert Watson
cbbb4a0089 Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in the
monitoring API, which might or might not be the same as the internal
maximum (currently none).

Export flag information on UMA zones -- in particular, whether or
not this is a secondary zone, and so the keg free count should be
considered in that light.

MFC after:	1 day
2005-07-25 00:47:32 +00:00
Alan Cox
ec9c9e7363 Eliminate inconsistency in the setting of the B_DONE flag. Specifically,
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function.  Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().

Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.

Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
2005-07-20 19:06:06 +00:00
Robert Watson
f4ff923bdd Further UMA statistics related changes:
- Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to
  to update the zone's uz_frees statistic.  Previously, the statistic was
  updated unconditionally.

- Use the flag in situations where a "real" free occurs: i.e., one where
  the caller is freeing an allocated item, to be differentiated from
  situations where uma_zfree_internal() is used to tear down the item
  during slab teardown in order to invoke its fini() method.  Also use
  the flag when UMA is freeing its internal objects.

- When exchanging a bucket with the zone from the per-CPU cache when
  freeing an item, flush cache statistics back to the zone (since the
  zone lock and critical section are both held) to match the allocation
  case.

MFC after: 3 days
2005-07-20 18:47:42 +00:00
Alan Cox
15d2d31372 Eliminate an incorrect (and unnecessary) cast. 2005-07-20 18:41:08 +00:00
Robert Watson
ab3a57c04d Use mp_maxid in preference to MAXCPU when creating exports of UMA
per-CPU cache statistics.  UMA sizes the cache array based on the
number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
could read off the end of the array (into the next zone).

Reported by:	yongari
MFC after:	1 week
2005-07-16 11:03:06 +00:00
Robert Watson
08ecce74bc Improve canonicalization of copyrights. Order copyrights by order of
assertion (jeff, bmilekic, rwatson).

Suggested ages ago by:	bde
MFC after:		1 week
2005-07-16 09:51:52 +00:00
Robert Watson
2450bbb872 Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so that
it covers the following of the uc_alloc/freebucket cache pointers.
Originally, I felt that the race wasn't helped by holding the mutex,
hence a comment in the code and not holding it across the cache access.
However, it does improve consistency, as while it doesn't prevent
bucket exchange, it does prevent bucket pointer invalidation.  So a
race in gathering cache free space statistics still can occur, but not
one that follows an invalid bucket pointer, if the mutex is held.

Submitted by:	yongari
MFC after:	1 week
2005-07-16 09:40:34 +00:00
Mike Silbersack
2018f30c01 Increase the flags field for kegs from a 16 to a 32 bit value;
we have exhausted all 16 flags.
2005-07-16 02:23:41 +00:00
Robert Watson
2019094a35 Track UMA(9) allocation failures by zone, and export via sysctl.
Requested by:	victor cruceru <victor dot cruceru at gmail dot com>
MFC after:	1 week
2005-07-15 23:34:39 +00:00
John Baldwin
b9d3b80521 Convert a remaining !fs.map->system_map to
fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier
revision.  This fixes mutex assertion failures in the debug.mpsafevm=0
case.

Reported by:	ps
MFC after:	3 days
2005-07-14 21:18:07 +00:00
Robert Watson
7a52a97eb3 Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator
statistics via a binary structure stream:

- Add structure 'uma_stream_header', which defines a stream version,
  definition of MAXCPUs used in the stream, and the number of zone
  records in the stream.

- Add structure 'uma_type_header', which defines the name, alignment,
  size, resource allocation limits, current pages allocated, preferred
  bucket size, and central zone + keg statistics.

- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
  includes the number of allocations and frees, as well as the number
  of free items in the cache.

- When the sysctl is queried, return a stream header, followed by a
  series of type descriptions, each consisting of a type header
  followed by a series of MAXCPUs uma_percpu_stat structures holding
  per-CPU allocation information.  Typical values of MAXCPU will be
  1 (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days.  This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.

MFC after:	1 week
2005-07-14 16:35:13 +00:00
Robert Watson
773df9ab16 In addition to tracking allocs in the zone, also track frees. Add
a zone free counter, as well as a cache free counter.

MFC after:	1 week
2005-07-14 16:17:21 +00:00
Robert Watson
2c743d361a In an earlier world order, UMA would flush per-CPU statistics to the
zone whenever it was moving buckets between the zone and the cache,
or when coalescing statistics across the CPU.  Remove flushing of
statistics to the zone when coalescing statistics as part of sysctl,
as we won't be running on the right CPU to write to the cache
statistics.

Add a missed gathering of statistics: when uma_zalloc_internal()
does a special case allocation of a single item, make sure to update
the zone statistics to represent this.  Previously this case wasn't
accounted for in user-visible statistics.

MFC after:	1 week
2005-07-14 16:13:46 +00:00
Mike Silbersack
cb6e5c1aba Change the panic in trash_ctor into just a printf for now. Once the reports
of panics in trash_ctor relating to mbufs have been examined and a fix
found, this will be turned back into a panic.

Approved by: re (rwatson)
2005-06-26 23:44:07 +00:00
Alan Cox
eafc7b549a Increase UMA_BOOT_PAGES to prevent a crash during initialization. See
http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed
description of the crash.

Reported by: Eric Anderson
Approved by: re (scottl)
MFC after: 3 days
2005-06-16 17:06:34 +00:00
Brian Feldman
a534973af4 The new contigmalloc(9) has a bad degenerate case where there were
many regions checked again and again despite knowing the pages
contained were not usable and only satisfied the alignment constraints
This case was compounded, especially for large allocations, by the
practice of looping from the top of memory so as to keep out of the
important low-memory regions.  While the old contigmalloc(9) has the
same problem, it is not as noticeable due to looping from the low
memory to high.

This degenerate case is fixed, as well as reversing the sense of the
rest of the loops within it, to provide a tremendous speed increase.
This makes the best case O(n * VM overhead) much more likely than the
worst case O(4 * VM overhead).  For comparison, the worst case for old
contigmalloc would be O(5 * VM overhead) in addition to its strategy
of turning used memory into free being highly pessimal.

Also, fix a bug that in practice most likely couldn't have been triggered,
int the new contigmalloc(9): it walked backwards from the end of memory
without accounting for how many pages it needed.  Potentially, nonexistant
pages could have been mapped.  This hasn't occurred because the kernel
generally requests as its first contigmalloc(9) a single page.

Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes
MFC After: 1 month
More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
2005-06-11 00:05:16 +00:00
Alan Cox
25f2e1c8cc Add a comment to the effect that fictitious pages do not require the
initialization of their machine-dependent fields.
2005-06-10 17:27:54 +00:00
Alan Cox
1c245ae7d1 Introduce a procedure, pmap_page_init(), that initializes the
vm_page's machine-dependent fields.  Use this function in
vm_pageq_add_new_page() so that the vm_page's machine-dependent and
machine-independent fields are initialized at the same time.

Remove code from pmap_init() for initializing the vm_page's
machine-dependent fields.

Remove stale comments from pmap_init().

Eliminate the Boolean variable pmap_initialized from the alpha, amd64,
i386, and ia64 pmap implementations.  Its use is no longer required
because of the above changes and earlier changes that result in physical
memory that is being mapped at initialization time being mapped without
pv entries.

Tested by: cognet, kensmith, marcel
2005-06-10 03:33:36 +00:00
Alan Cox
5f7679afd0 Update some comments to reflect the change from spl-based to lock-based
synchronization.
2005-05-28 17:56:18 +00:00
Stephan Uphoff
d13ec71369 Use low level constructs borrowed from interrupt threads to wait for
work in proc0.
Remove the TDP_WAKEPROC0 workaround.
2005-05-23 23:01:53 +00:00
Alan Cox
10c447fac2 Swap in can occur safely without Giant. Release Giant on entry to
scheduler().
2005-05-22 21:06:07 +00:00
Alan Cox
35cf2323f8 Remove GIANT_REQUIRED from swapout_procs(). 2005-05-22 00:30:50 +00:00
Alan Cox
071a1710d1 Reduce the number of times that we acquire and release locks in
swap_pager_getpages().

MFC after: 1 week
2005-05-20 21:26:05 +00:00
Alan Cox
1aececdb5a Remove calls to spl*(). 2005-05-19 06:11:13 +00:00
Alan Cox
d2d9c9aca7 Remove a stale comment concerning spl* usage. 2005-05-19 03:53:07 +00:00
Alan Cox
cf51adc0a1 Update some comments to reflect the change from spl-based to lock-based
synchronization.
2005-05-18 22:08:52 +00:00
Alan Cox
ab973359d5 Remove calls to spl*(). 2005-05-18 20:45:33 +00:00
Alan Cox
2e2a6fa28a Revert revision 1.270: swp_pager_async_iodone() need not perform
VM_LOCK_GIANT().

Discussed with: jeff
2005-05-18 17:48:04 +00:00
Bjoern A. Zeeb
f3aad9a6bb Correct 32 vs 64 bit signedness issues.
Approved by:	pjd (mentor)
MFC after:	2 weeks
2005-05-18 08:57:31 +00:00
Peter Grehan
10b00dd4f3 The final test in unlock_and_deallocate() to determine if GIANT needs to be
unlocked wasn't updated to check for OBJ_NEEDGIANT. This caused a WITNESS
panic when debug_mpsafevm was set to 0.

Approved by:	jeffr
2005-05-12 04:09:41 +00:00
Marcel Moolenaar
23110bedcd Enable debug_mpsafevm on ia64 due to the severe functional regression
caused by recent locking changes when it's off. Revert the logic to
trim down the conditional.

Clued-in by: alc@
2005-05-08 23:56:16 +00:00
Jeff Roberson
b8a0b997fd - We need to inhert the OBJ_NEEDGIANT flag from the original object in
vm_object_split().

Spotted by:	alc
2005-05-04 20:54:16 +00:00
Jeff Roberson
ed4fe4f4f5 - Add a new object flag "OBJ_NEEDSGIANT". We set this flag if the
underlying vnode requires Giant.
 - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT
   set.
 - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
2005-05-03 11:11:26 +00:00
Alan Cox
b7903e65fb Remove GIANT_REQUIRED from vmspace_exec().
Prodded by: jeff
2005-05-02 07:05:20 +00:00
Jeff Roberson
382a601cd7 - VM_LOCK_GIANT in the swap pager's iodone routine as VFS will soon call it
without Giant.

Sponsored by:	Isilon Systems, Inc.
2005-04-30 11:25:49 +00:00
Robert Watson
5d1ae027f0 Modify UMA to use critical sections to protect per-CPU caches, rather than
mutexes, which offers lower overhead on both UP and SMP.  When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks.  We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access.  In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.

Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.

Reviewed by:	bmilekic, jeff (somewhat)
Tested by:	rwatson, kris, gnn, scottl, mike at sentex dot net, others
2005-04-29 18:56:36 +00:00
Jeff Roberson
7625cbf3cc - Pass the ISOPEN flag to namei so filesystems will know we're about to
open them or otherwise access the data.
2005-04-27 09:05:19 +00:00
Kris Kennaway
f5fca0d8be Add the vm.exec_map_entries tunable and read-only sysctl, which controls
the number of entries in exec_map (maximum number of simultaneous execs
that can be handled by the kernel).  The default value of 16 is
insufficient on heavily loaded machines (particularly SMP machines), and
if it is exceeded then executing further processes will generate a SIGABRT.

This is a workaround until a better solution can be implemented.

Reviewed by:	alc
MFC after:	3 days
2005-04-25 19:22:05 +00:00
Dag-Erling Smørgrav
02dcaf2fd1 Unbreak the build on 64-bit architectures. 2005-04-16 12:37:16 +00:00
John Baldwin
3c3edcb445 Add a vm.blacklist tunable which can hold a space or comma seperated list
of physical addresses.  The pages containing these physical addresses will
not be added to the free list and thus will effectively be ignored by the
VM system.  This is mostly useful for the case when one knows of specific
physical addresses that have bit errors (such as from a memtest run) so
that one can blacklist the bad pages while waiting for the new sticks of
RAM to arrive.  The physical addresses of any ignored pages are listed in
the message buffer as well.
2005-04-15 21:45:02 +00:00
Christian S.J. Peron
c92163dcad Move MAC check_vnode_mmap entry point out from being exclusive to
MAP_SHARED so that the entry point gets executed un-conditionally.
This may be useful for security policies which want to perform access
control checks around run-time linking.

-add the mmap(2) flags argument to the check_vnode_mmap entry point
 so that we can make access control decisions based on the type of
 mapped object.
-update any dependent API around this parameter addition such as
 function prototype modifications, entry point parameter additions
 and the inclusion of sys/mman.h header file.
-Change the MLS, BIBA and LOMAC security policies so that subject
 domination routines are not executed unless the type of mapping is
 shared. This is done to maintain compatibility between the old
 vm_mmap_vnode(9) and these policies.

Reviewed by:	rwatson
MFC after:	1 month
2005-04-14 16:03:30 +00:00
John Baldwin
9fd0669542 Tidy vcnt() by moving a duplicated line above #ifdef and removing a useless
variable.
2005-04-12 23:15:28 +00:00
John Baldwin
a5c7ea5b9e Flip the switch and turn mpsafevm on by default for sparc64.
Approved by:	alc
2005-04-04 20:59:02 +00:00
Jeff Roberson
6e4b282039 - Don't NULL the vnode's v_object pointer until after the object is torn
down.  If we have dirty pages, the putpages routine will need to know
   what the vnode's object is so that it may write out dirty pages.

Pointy hat:	phk
Found by:	obrien
2005-04-03 22:56:58 +00:00
John Baldwin
98df9218da - Change the vm_mmap() function to accept an objtype_t parameter specifying
the type of object represented by the handle argument.
- Allow vm_mmap() to map device memory via cdev objects in addition to
  vnodes and anonymous memory.  Note that mmaping a cdev directly does not
  currently perform any MAC checks like mapping a vnode does.
- Unbreak the DRM getbufs ioctl by having it call vm_mmap() directly on the
  cdev the ioctl is acting on rather than trying to find a suitable vnode
  to map from.

Reviewed by:	alc, arch@
2005-04-01 20:00:11 +00:00
Jeff Roberson
f247a5240d - LK_NOPAUSE is a nop now.
Sponsored by:   Isilon Systems, Inc.
2005-03-31 04:37:09 +00:00
Alan Cox
c6ec6a7cae Eliminate (now) unnecessary acquisition and release of the global page
queues lock in vm_object_backing_scan().  Updates to the page's PG_BUSY
flag and busy field are synchronized by the containing object's lock.

Testing the page's hold_count and wire_count in vm_object_backing_scan()'s
OBSC_COLLAPSE_NOWAIT case is unnecessary.  There is no reason why the held
or wired pages cannot be migrated to the shadow object.

Reviewed by: tegge
2005-03-30 05:40:02 +00:00
David Schultz
010b1ca16e Move the swap_zone == NULL check earlier (i.e. before we dereference
the pointer.)

Found by:	Coverity Prevent analysis tool
2005-03-18 21:22:48 +00:00
Jeff Roberson
ee39666a76 - Don't lock the vnode interlock in vm_object_set_writeable_dirty() if
we've already set the object flags.

Reviewed by:	alc
2005-03-17 12:03:42 +00:00
Jeff Roberson
761dbeb66f - In vm_page_insert() hold the backing vnode when the first page
is inserted.
 - In vm_page_remove() drop the backing vnode when the last page
   is removed.
 - Don't check the vnode to see if it must be reclaimed on every
   call to vm_page_free_toq() as we only check it now when it is
   actually required.  This saves us two lock operations per call.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:14:09 +00:00
Jeff Roberson
7747c03884 - Don't directly adjust v_usecount, use vref() instead.
Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:03:19 +00:00
Jeff Roberson
1d39df3fe9 - Retire OLOCK and OWANT. All callers hold the vnode lock when creating
a vnode object.  There has been an assert to prove this for some time.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:29:40 +00:00
Jeff Roberson
493d78b3bd - Don't acquire the vnode lock in destroy_vobject, assert that it has
already been acquired by the caller.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 12:05:05 +00:00
Alan Cox
b70458aec3 Revert the first part of revision 1.114 and modify the second part. On
architectures implementing uma_small_alloc() pages do not necessarily
belong to the kmem object.
2005-02-24 06:13:01 +00:00
Poul-Henning Kamp
dfd4be14bd Try to unbreak the vnode locking around vop_reclaim() (based mostly on
patch from kan@).

Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on
close.  This is not yet a generally safe function, but for this very
specific use it is safe.  This solves the problem with buffers not
being flushed by unmount or after failed mount attempts.
2005-02-19 11:44:57 +00:00
Bosko Milekic
8076cb5289 Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out.  This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
2005-02-16 21:45:59 +00:00
Bosko Milekic
500f29d06e Make UMA set the overloaded page->object back to kmem_object for
UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly
came from kmem_map for those two.  Previously it would set it back
to NULL for UMA_ZONE_REFCNT zones and although this was probably not
fatal, it added MORE code for no reason.
2005-02-16 20:06:11 +00:00
Bosko Milekic
7fae6a1116 Rather than overloading the page->object field like UMA does, use instead
an unused pageq queue reference in the page structure to stash a pointer
to the MemGuard FIFO.  Using the page->object field caused problems
because when vm_map_protect() was called the second time to set
VM_PROT_DEFAULT back onto a set of pages in memguard_map, the protection
in the VM would be changed but the PMAP code would lazily not restore
the PG_RW bit on the underlying pages right away (see pmap_protect()).
So when a page fault finally occured and the VM noticed the faulting
address corresponds to a page that _does_ have write access now, it
would then call into PMAP to set back PG_RW (i386 case being discussed
here).  However, before it got to do that, an assertion on the object
lock not being owned would get triggered, as the object of the faulting
page would need to be locked but was overloaded by MemGuard.  This is
precisely why MemGuard cannot overload page->object.

Submitted by: Alan Cox (alc@)
2005-02-15 22:17:07 +00:00
Poul-Henning Kamp
7fbdc92113 sysctl node vm.stats can not be static (for ia64 reasons). 2005-02-11 16:34:14 +00:00
Bosko Milekic
0341256576 Implement support for buffers larger than PAGE_SIZE in MemGuard. Adds
a little bit of complexity but performance requirements lacking (this is
a debugging allocator after all), it's really not too bad (still
only 317 lines).

Also add an additional check to help catch really weird 3-threads-involved
races: make memguard_free() write to the first page handed back, always,
before it does anything else.

Note that there is still a problem in VM+PMAP (specifically with
vm_map_protect) w.r.t. MemGuard uses it, but this will be fixed shortly
and this change stands on its own.
2005-02-10 22:36:05 +00:00
Poul-Henning Kamp
39a79f0c01 Make three SYSCTL_NODEs static 2005-02-10 12:18:36 +00:00
Poul-Henning Kamp
253de0a143 Make npages static and const. 2005-02-10 12:18:17 +00:00
Suleiman Souhlal
81ae703462 Set the scheduling class of the zeroidle thread to PRI_IDLE.
Reviewed by:	jhb
Approved by:	grehan (mentor)
MFC after:	1 week
2005-02-04 06:18:31 +00:00
Alan Cox
8e99783b25 Update the text of an assertion to reflect changes made in revision 1.148.
Submitted by: tegge

Eliminate an unnecessary, temporary increment of the backing object's
reference count in vm_object_qcollapse().  Reviewed by: tegge
2005-01-30 21:29:47 +00:00
Poul-Henning Kamp
7146d6cb3e Move the contents of vop_stddestroyvobject() to the new vnode_pager
function vnode_destroy_vobject().

Make the new function zero the vp->v_object pointer so we can tell
if a call is missing.
2005-01-28 08:56:48 +00:00
Poul-Henning Kamp
8516dd18e1 Don't use VOP_GETVOBJECT, use vp->v_object directly. 2005-01-25 00:40:01 +00:00
Poul-Henning Kamp
d07a6d3f61 Move the body of vop_stdcreatevobject() over to the vnode_pager under
the name Sande^H^H^H^H^Hvnode_create_vobject().

Make the new function take a size argument which removes the need for
a VOP_STAT() or a very pessimistic guess for disks.

Call that new function from vop_stdcreatevobject().

Make vnode_pager_alloc() private now that its only user came home.
2005-01-24 21:21:59 +00:00
Poul-Henning Kamp
35764be39e Kill the VV_OBJBUF and test the v_object for NULL instead. 2005-01-24 13:13:57 +00:00
Jeff Roberson
ae51ff1127 - Remove GIANT_REQUIRED where giant is no longer required.
- Use VFS_LOCK_GIANT() rather than directly acquiring giant in places
   where giant is only held because vfs requires it.

Sponsored By:   Isilon Systems, Inc.
2005-01-24 10:48:29 +00:00
Alan Cox
75337a5677 Guard against address wrap in kernacc(). Otherwise, a program accessing a
bad address range through /dev/kmem can panic the machine.

Submitted by: Mark W. Krentel
Reported by: Kris Kennaway
MFC after: 1 week
2005-01-22 19:21:29 +00:00
Bosko Milekic
eca64e79b5 s/round_page/trunc_page/g
I meant trunc_page.  It's only a coincidence this hasn't caused
problems yet.

Pointed out by: Antoine Brodin <antoine.brodin@laposte.net>
2005-01-22 00:09:34 +00:00
Bosko Milekic
e4eb384b47 Bring in MemGuard, a very simple and small replacement allocator
designed to help detect tamper-after-free scenarios, a problem more
and more common and likely with multithreaded kernels where race
conditions are more prevalent.

Currently MemGuard can only take over malloc()/realloc()/free() for
particular (a) malloc type(s) and the code brought in with this
change manually instruments it to take over M_SUBPROC allocations
as an example.  If you are planning to use it, for now you must:

	1) Put "options DEBUG_MEMGUARD" in your kernel config.
	2) Edit src/sys/kern/kern_malloc.c manually, look for
	   "XXX CHANGEME" and replace the M_SUBPROC comparison with
	   the appropriate malloc type (this might require additional
	   but small/simple code modification if, say, the malloc type
	   is declared out of scope).
	3) Build and install your kernel.  Tune vm.memguard_divisor
	   boot-time tunable which is used to scale how much of kmem_map
	   you want to allott for MemGuard's use.  The default is 10,
	   so kmem_size/10.

ToDo:
	1) Bring in a memguard(9) man page.
	2) Better instrumentation (e.g., boot-time) of MemGuard taking
	   over malloc types.
	3) Teach UMA about MemGuard to allow MemGuard to override zone
	   allocations too.
	4) Improve MemGuard if necessary.

This work is partly based on some old patches from Ian Dowse.
2005-01-21 18:09:17 +00:00
Alan Cox
986b43f845 Add checks to vm_map_findspace() to test for address wrap. The conditions
where this could occur are very rare, but possible.

Submitted by: Mark W. Krentel
MFC after: 2 weeks
2005-01-18 19:50:09 +00:00
Alan Cox
d936694f09 Consider three objects, O, BO, and BBO, where BO is O's backing object
and BBO is BO's backing object.  Now, suppose that O and BO are being
collapsed.  Furthermore, suppose that BO has been marked dead
(OBJ_DEAD) by vm_object_backing_scan() and that either
vm_object_backing_scan() has been forced to sleep due to encountering
a busy page or vm_object_collapse() has been forced to sleep due to
memory allocation in the swap pager.  If vm_object_deallocate() is
then called on BBO and BO is BBO's only shadow object,
vm_object_deallocate() will collapse BO and BBO.  In doing so, it adds
a necessary temporary reference to BO.  If this collapse also sleeps
and the prior collapse resumes first, the temporary reference will
cause vm_object_collapse to panic with the message "backing_object %p
was somehow re-referenced during collapse!"

Resolve this race by changing vm_object_deallocate() such that it
doesn't collapse BO and BBO if BO is marked dead.  Once O and BO are
collapsed, vm_object_collapse() will attempt to collapse O and BBO.
So, vm_object_deallocate() on BBO need do nothing.

Reported by: Peter Holm on 20050107
URL: http://www.holm.cc/stress/log/cons102.html

In collaboration with: tegge@
Candidate for RELENG_4 and RELENG_5
MFC after: 2 weeks
2005-01-15 21:12:47 +00:00
Poul-Henning Kamp
7c0745eeae Eliminate unused and unnecessary "cred" argument from vinvalbuf() 2005-01-14 07:33:51 +00:00
Poul-Henning Kamp
8df6bac4c7 Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

	The credentials for syncing a file (ability to write to the
	file) should be checked at the system call level.

	Credentials for syncing one or more filesystems ("none")
	should be checked at the system call level as well.

	If the filesystem implementation needs a particular credential
	to carry out the syncing it would logically have to the
	cached mount credential, or a credential cached along with
	any delayed write data.

Discussed with:	rwatson
2005-01-11 07:36:22 +00:00
Bosko Milekic
c5c1b16ec5 While we want the recursion protection for the bucket zones so that
recursion from the VM is handled (and the calling code that allocates
buckets knows how to deal with it), we do not want to prevent allocation
from the slab header zones (slabzone and slabrefzone) if uk_recurse is
not zero for them.  The reason is that it could lead to NULL being
returned for the slab header allocations even in the M_WAITOK
case, and the caller can't handle that (this is also explained in a
comment with this commit).

The problem analysis is documented in our mailing lists:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current

(see entire thread for proper context).

Crash dump data provided by: Peter Holm <peter@holm.cc>
2005-01-11 03:33:09 +00:00
Stefan Farfeleder
1e183df21e ISO C requires at least one element in an initialiser list. 2005-01-10 20:30:04 +00:00
Alan Cox
5ba514bc89 Move the acquisition and release of the page queues lock outside of a loop
in vm_object_split() to avoid repeated acquisition and release.
2005-01-08 23:41:11 +00:00
Alan Cox
46fbc58202 Transfer responsibility for freeing the page taken from the cache
queue and (possibly) unlocking the containing object from
vm_page_alloc() to vm_page_select_cache().  Recent optimizations to
vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and
pmap_enter_quick() have resulted in panic()s because vm_page_alloc()
mistakenly unlocked objects that had not been locked by
vm_page_select_cache().

Reported by: Peter Holm and Kris Kennaway
2005-01-07 05:02:19 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
df2e33bf42 Revise the part of vm_pageout_scan() that moves pages from the cache
queue to the free queue.  With this change, if a page from the cache
queue belongs to a locked object, it is simply skipped over rather
than moved to the inactive queue.
2005-01-06 20:22:36 +00:00
Poul-Henning Kamp
4f8205e5d1 When allocating bio's in the swap_pager use M_WAITOK since the
alternative is much worse.
2005-01-03 13:28:56 +00:00
Alan Cox
0869d38ba6 Assert that page allocations during an interrupt specify
VM_ALLOC_INTERRUPT.

Assert that pages removed from the cache queue are not busy.
2004-12-31 19:50:45 +00:00
Alan Cox
7aa2190c8e Access to the page's busy field is (now) synchronized by the containing
object's lock.  Therefore, the assertion that the page queues lock is held
can be removed from vm_page_io_start().
2004-12-29 04:18:22 +00:00
Alan Cox
91f7a86064 Note that access to the page's busy count is synchronized by the containing
object's lock.
2004-12-27 05:27:59 +00:00
Alan Cox
40198b3c04 Assert that the vm object is locked on entry to vm_page_sleep_if_busy();
remove some unneeded code.
2004-12-26 21:46:44 +00:00
Bosko Milekic
7b8712053c Add my copyright and update Jeff's copyright on UMA source files,
as per his request.

Discussed with: Jeffrey Roberson
2004-12-26 00:35:12 +00:00
Poul-Henning Kamp
475e8cc394 fix comment 2004-12-25 21:30:41 +00:00
Alan Cox
a51b084059 Continue the transition from synchronizing access to the page's PG_BUSY
flag and busy field with the global page queues lock to synchronizing their
access with the containing object's lock.  Specifically, acquire the
containing object's lock before reading the page's PG_BUSY flag and busy
field in vm_fault().

Reviewed by: tegge@
2004-12-24 19:31:54 +00:00
Alan Cox
1f70d62298 Modify pmap_enter_quick() so that it expects the page queues to be locked
on entry and it assumes the responsibility for releasing the page queues
lock if it must sleep.

Remove a bogus comment from pmap_enter_quick().

Using the first change, modify vm_map_pmap_enter() so that the page queues
lock is acquired and released once, rather than each time that a page
is mapped.
2004-12-23 20:16:11 +00:00
Alan Cox
98fe9a0ddf Eliminate another unnecessary call to vm_page_busy(). (See revision 1.333
for a detailed explanation.)
2004-12-17 18:54:51 +00:00
Alan Cox
06c98c5dcc Enable debug.mpsafevm by default on alpha. 2004-12-17 17:17:36 +00:00
Alan Cox
85f5b24573 In the common case, pmap_enter_quick() completes without sleeping.
In such cases, the busying of the page and the unlocking of the
containing object by vm_map_pmap_enter() and vm_fault_prefault() is
unnecessary overhead.  To eliminate this overhead, this change
modifies pmap_enter_quick() so that it expects the object to be locked
on entry and it assumes the responsibility for busying the page and
unlocking the object if it must sleep.  Note: alpha, amd64, i386 and
ia64 are the only implementations optimized by this change; arm,
powerpc, and sparc64 still conservatively busy the page and unlock the
object within every pmap_enter_quick() call.

Additionally, this change is the first case where we synchronize
access to the page's PG_BUSY flag and busy field using the containing
object's lock rather than the global page queues lock.  (Modifications
to the page's PG_BUSY flag and busy field have asserted both locks for
several weeks, enabling an incremental transition.)
2004-12-15 19:55:05 +00:00
Alan Cox
90688d137c With the removal of kern/uipc_jumbo.c and sys/jumbo.h,
vm_object_allocate_wait() is not used.  Remove it.
2004-12-08 05:01:47 +00:00
Alan Cox
2ad036b657 Almost nine years ago, when support for 1TB files was introduced in
revision 1.55, the address parameter to vnode_pager_addr() was changed
from an unsigned 32-bit quantity to a signed 64-bit quantity.  However,
an out-of-range check on the address was not updated.  Consequently,
memory-mapped I/O on files greater than 2GB could cause a kernel panic.
Since the address is now a signed 64-bit quantity, the problem resolution
is simply to remove a cast.

Reviewed by: bde@ and tegge@
PR: 73010
MFC after: 1 week
2004-12-07 22:05:38 +00:00
Alan Cox
d8fed1d050 Correct a sanity check in vnode_pager_generic_putpages(). The cast used
to implement the sanity check should have been changed when we converted
the implementation of vm_pindex_t from 32 to 64 bits.  (Thus, RELENG_4 is
not affected.)  The consequence of this error would be a legimate write to
an extremely large file being treated as an errant attempt to write meta-
data.

Discussed with: tegge@
2004-12-05 21:48:11 +00:00
David Schultz
6004362e66 Don't include sys/user.h merely for its side-effect of recursively
including other headers.
2004-11-27 06:51:39 +00:00
Olivier Houchard
6fc96493ac Remove useless casts. 2004-11-26 15:04:26 +00:00
Xin LI
8e33bced3c Try to close a potential, but serious race in our VM subsystem.
Historically, our contigmalloc1() and contigmalloc2() assumes
that a page in PQ_CACHE can be unconditionally reused by busying
and freeing it.  Unfortunatelly, when object happens to be not
NULL, the code will set m->object to NULL and disregard the fact
that the page is actually in the VM page bucket, resulting in
page bucket hash table corruption and finally, a filesystem
corruption, or a 'page not in hash' panic.

This commit has borrowed the idea taken from DragonFlyBSD's fix
to the VM fix by Matthew Dillon[1].  This version of patch will
do the following checks:

	- When scanning pages in PQ_CACHE, check hold_count and
	  skip over pages that are held temporarily.
	- For pages in PQ_CACHE and selected as candidate of being
	  freed, check if it is busy at that time.

Note:  It seems that this is might be unrelated to kern/72539.

Obtained from:	DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1]
Reminded by:	Matt Dillon
Reworked by:	alc
MFC After:	1 week
2004-11-24 18:56:13 +00:00
David Schultz
9799b417d5 Disable U area swapping and remove the routines that create, destroy,
copy, and swap U areas.

Reviewed by:	arch@
2004-11-20 02:29:00 +00:00
Poul-Henning Kamp
9c83534dd8 Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)
2004-11-15 09:18:27 +00:00
Poul-Henning Kamp
5c6e573ffb Add pbgetbo()/pbrelbo() lighter weight versions of pbgetvp()/pbrelvp(). 2004-11-15 08:47:18 +00:00
Poul-Henning Kamp
287013d287 More kasserts. 2004-11-15 08:33:09 +00:00
Poul-Henning Kamp
d7fe1f51ad style polishing. 2004-11-15 08:22:38 +00:00
Poul-Henning Kamp
a752aa8f17 Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff. 2004-11-15 08:12:50 +00:00
Poul-Henning Kamp
e8a7bef39e expect the caller to have called pbrelvp() if necessary. 2004-11-15 08:07:26 +00:00
Poul-Henning Kamp
676f3ee26c Explicitly call pbrelvp() 2004-11-15 08:06:05 +00:00
Poul-Henning Kamp
d20b2f76cc Improve readability with a bunch of typedefs for the pager ops.
These can also be used for prototypes in the pagers.
2004-11-09 13:43:20 +00:00
Dag-Erling Smørgrav
7419d1e25f #include <vm/vm_param.h> instead of <machine/vmparam.h> (the former
includes the latter, but also declares variables which are defined
in kern/subr_param.c).

Change som VM parameters from quad_t to unsigned long.  They refer to
quantities (size limits for text, heap and stack segments) which must
necessarily be smaller than the size of the address space, so long is
adequate on all platforms.

MFC after:	1 week
2004-11-08 18:20:02 +00:00
Alan Cox
dad740e967 Eliminate an unnecessary atomic operation. Articulate the rationale in
a comment.
2004-11-06 21:48:45 +00:00
Robert Watson
dc2c7965c0 Abstract the logic to look up the uma_bucket_zone given a desired
number of entries into bucket_zone_lookup(), which helps make more
clear the logic of consumers of bucket zones.

Annotate the behavior of bucket_init() with a comment indicating
how the various data structures, including the bucket lookup tables,
are initialized.
2004-11-06 11:43:30 +00:00
Poul-Henning Kamp
a7f06e2bd4 Remove dangling variable 2004-11-06 11:33:11 +00:00
Robert Watson
f9d27e7524 Annotate what bucket_size[] array does; staticize since it's used only
in uma_core.c.
2004-11-06 11:24:40 +00:00
David Schultz
8bc61209d4 Fix the last known race in swapoff(), which could lead to a spurious panic:
swapoff: failed to locate %d swap blocks

The race occurred because putpages() can block between the time it
allocates swap space and the time it updates the swap metadata to
associate that space with a vm_object, so swapoff() would complain
about the temporary inconsistency.  I hoped to fix this by making
swp_pager_getswapspace() and swp_pager_meta_build() a single atomic
operation, but that proved to be inconvenient.  With this change,
swapoff() simply doesn't attempt to be so clever about detecting when
all the pageout activity to the target device should have drained.
2004-11-06 07:17:50 +00:00
Alan Cox
19187819b7 Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc()
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode.  To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.

Reviewed by: tegge@
2004-11-06 05:33:02 +00:00
John Baldwin
57ea1265bd - Set the priority of the page zeroing thread using sched_prio() when the
thread is created rather than adjusting the priority in the main
  function.  (kthread_create() should probably take the initial priority
  as an argument.)
- Only yield the CPU in the !PREEMPTION case if there are any other
  runnable threads.  Yielding when there isn't anything else better to do
  just wastes time in pointless context switches (albeit while the system
  is idle.)
2004-11-05 19:14:02 +00:00
Alan Cox
34d9e6fdae During traversal of the inactive queue, try locking the page's containing
object before accessing the page's flags or the object's reference count.
2004-11-05 06:24:05 +00:00
Alan Cox
b546ac5490 Eliminate another unnecessary call to vm_page_busy() that immediately
precedes a call to vm_page_rename().  (See the previous revision for a
detailed explanation.)
2004-11-05 05:40:45 +00:00
David Schultz
b3fed13e9d Close a race in swapoff(). Here are the gory details:
In order to avoid livelock, swapoff() skips over objects with a
  nonzero pip count and makes another pass if necessary.  Since it is
  impossible to know which objects we care about, it would choose an
  arbitrary object with a nonzero pip count and wait for it before
  making another pass, the theory being that this object would finish
  paging about as quickly as the ones we care about.  Unfortunately,
  we may have slept since we acquired a reference to this object.
  Hack around this problem by tsleep()ing on the pointer anyway, but
  timeout after a fixed interval.  More elegant solutions are possible,
  but the ones I considered unnecessarily complicate this rare case.

Also, kill some nits that seem to have crept into the swapoff() code
in the last 75 revisions or so:

- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since
  the latter can be derived from the former.

- Replace swp_pager_find_dev() with something simpler.  There's no
  need to iterate over the entire list of swap devices just to determine
  if a given block is assigned to the one we're interested in.

- Expand the scope of the swhash_mtx in a couple of places so that it
  isn't released and reacquired once for every hash bucket.

- Don't drop the swhash_mtx while holding a reference to an object.
  We need to lock the object first.  Unfortunately, doing so would
  violate the established lock order, so use VM_OBJECT_TRYLOCK() and
  try again on a subsequent pass if the object is already locked.

- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
2004-11-05 05:36:56 +00:00
Poul-Henning Kamp
6e67e2a710 Retire b_magic now, we have the bufobj containing the same hint. 2004-11-04 09:48:18 +00:00
Poul-Henning Kamp
c5d3d25e4f De-couple our I/O bio request from the embedded bio in buf by explicitly
copying the fields.
2004-11-04 08:38:07 +00:00
Poul-Henning Kamp
c569065139 Remove buf->b_dev field. 2004-11-04 07:59:57 +00:00
Alan Cox
d19ef81437 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
Alan Cox
f790de29b9 Introduce a Boolean variable wakeup_needed to avoid repeated, unnecessary
calls to wakeup() by vm_page_zero_idle_wakeup().
2004-10-31 19:32:57 +00:00
Alan Cox
b86e6ec007 During traversal of the active queue by vm_pageout_page_stats(), try
locking the page's containing object before accessing the page's flags.
2004-10-30 23:30:53 +00:00
Alan Cox
af117d1278 Eliminate an unused but initialized variable. 2004-10-30 20:11:23 +00:00
Alan Cox
4b8a5c4095 Add an assignment statement that I omitted from the previous revision. 2004-10-30 07:09:46 +00:00
Alan Cox
f4d49654ae Assert that the containing vm object is locked in vm_page_cache() and
vm_page_try_to_cache().
2004-10-28 05:26:21 +00:00
Bosko Milekic
a5a262c6db Fix a INVARIANTS-only bug introduced in Revision 1.104:
IF INVARIANTS is defined, and in the rare case that we have
allocated some objects from the slab and at least one initializer
on at least one of those objects failed, and we need to fail the
allocation and push the uninitialized items back into the slab
caches -- in that scenario, we would fail to [re]set the
bucket cache's ub_bucket item references to NULL, which would
eventually trigger a KASSERT.
2004-10-27 21:19:35 +00:00
Alan Cox
b08abf6cc0 During traversal of the active queue, try locking the page's containing
object before accessing the page's flags or the object's reference count.
If the trylock fails, handle the page as though it is busy.
2004-10-27 18:29:17 +00:00
Poul-Henning Kamp
6229cc508b Also check that the sectormask is bigger than zero.
Wrap this overly long KASSERT and remove newline.
2004-10-26 19:51:57 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Poul-Henning Kamp
ff4782b57d Don't clear flags we just checked were not set. 2004-10-26 05:57:29 +00:00
Alan Cox
63bb7041cc Assert that the containing vm object is locked in vm_page_flash(). 2004-10-25 19:52:44 +00:00
Alan Cox
75d0533847 Assert that the containing vm object is locked in vm_page_busy() and
vm_page_wakeup().
2004-10-24 23:53:47 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Alan Cox
e5526e6aa5 Acquire the vm object lock before rather than after calling
vm_page_sleep_if_busy().  (The motivation being to transition
synchronization of the vm_page's PG_BUSY flag from the global page queues
lock to the per-object lock.)
2004-10-24 19:32:19 +00:00
Alan Cox
ddf4bb37c8 Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup(). 2004-10-24 18:46:32 +00:00
Alan Cox
0f9f9bcb53 Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab()
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page.  Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked.  So, the busy flag might as well never be set.
2004-10-24 06:15:36 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00