Commit Graph

149 Commits

Author SHA1 Message Date
Alan Cox
0f752392c6 Update a comment describing the page queues.
Approved by:	re (hrs)
2007-07-13 04:42:20 +00:00
Alan Cox
2446e4f02c Enable the new physical memory allocator.
This allocator uses a binary buddy system with a twist.  First and
foremost, this allocator is required to support the implementation of
superpages.  As a side effect, it enables a more robust implementation
of contigmalloc(9).  Moreover, this reimplementation of
contigmalloc(9) eliminates the acquisition of Giant by
contigmalloc(..., M_NOWAIT, ...).

The twist is that this allocator tries to reduce the number of TLB
misses incurred by accesses through a direct map to small, UMA-managed
objects and page table pages.  Roughly speaking, the physical pages
that are allocated for such purposes are clustered together in the
physical address space.  The performance benefits vary.  In the most
extreme case, a uniprocessor kernel running on an Opteron, I measured
an 18% reduction in system time during a buildworld.

This allocator does not implement page coloring.  The reason is that
superpages have much the same effect.  The contiguous physical memory
allocation necessary for a superpage is inherently colored.

Finally, the one caveat is that this allocator does not effectively
support prezeroed pages.  I hope this is temporary.  On i386, this is
a slight pessimization.  However, on amd64, the beneficial effects of
the direct-map optimization outweigh the ill effects.  I speculate
that this is true in general of machines with a direct map.

Approved by:	re
2007-06-16 04:57:06 +00:00
Alan Cox
04a18977c8 Define every architecture as either VM_PHYSSEG_DENSE or
VM_PHYSSEG_SPARSE depending on whether the physical address space is
densely or sparsely populated with memory.  The effect of this
definition is to determine which of two implementations of
vm_page_array and PHYS_TO_VM_PAGE() is used.  The legacy
implementation is obtained by defining VM_PHYSSEG_DENSE, and a new
implementation that trades off time for space is obtained by defining
VM_PHYSSEG_SPARSE.  For now, all architectures except for ia64 and
sparc64 define VM_PHYSSEG_DENSE.  Defining VM_PHYSSEG_SPARSE on ia64
allows the entirety of my Itanium 2's memory to be used.  Previously,
only the first 1 GB could be used.  Defining VM_PHYSSEG_SPARSE on
sparc64 allows USIIIi-based systems to boot without crashing.

This change is a combination of Nathan Whitehorn's patch and my own
work in perforce.

Discussed with: kmacy, marius, Nathan Whitehorn
PR:		112194
2007-05-05 19:50:28 +00:00
Alan Cox
9f5c801b94 Change the way that unmanaged pages are created. Specifically,
immediately flag any page that is allocated to a OBJT_PHYS object as
unmanaged in vm_page_alloc() rather than waiting for a later call to
vm_page_unmanage().  This allows for the elimination of some uses of
the page queues lock.

Change the type of the kernel and kmem objects from OBJT_DEFAULT to
OBJT_PHYS.  This allows us to take advantage of the above change to
simplify the allocation of unmanaged pages in kmem_alloc() and
kmem_malloc().

Remove vm_page_unmanage().  It is no longer used.
2007-02-25 06:14:58 +00:00
Alan Cox
0cd31a0d75 Change the page's CLEANCHK flag from being a page queue mutex synchronized
flag to a vm object mutex synchronized flag.
2007-02-22 06:15:52 +00:00
Alan Cox
9af80719db Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's
busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object
lock instead of the global page queues lock.
2006-10-22 04:28:14 +00:00
Alan Cox
e1cb7bc081 Make vm_page_release_contig() static. 2006-09-03 22:24:08 +00:00
Alan Cox
eb4bbba83a Refactor vm_page_sleep_if_busy() so that the test for a busy page is
inlined and a procedure call is made in the rare case, i.e., when it is
necessary to sleep.  In this case, inlining the test actually makes the
kernel smaller.
2006-08-27 19:50:13 +00:00
Alan Cox
09ef0d6e0c The return value from vm_pageq_add_new_page() is not used. Eliminate it. 2006-08-25 04:36:19 +00:00
Alan Cox
b146f9e5d2 Reimplement the page's NOSYNC flag as an object-synchronized instead of a
page queues-synchronized flag.  Reduce the scope of the page queues lock in
vm_fault() accordingly.

Move vm_fault()'s call to vm_object_set_writeable_dirty() outside of the
scope of the page queues lock.  Reviewed by: tegge
Additionally, eliminate an unnecessary dereference in computing the
argument that is passed to vm_object_set_writeable_dirty().
2006-08-13 00:11:09 +00:00
Alan Cox
5786be7cc7 Introduce a field to struct vm_page for storing flags that are
synchronized by the lock on the object containing the page.

Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags.  Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.

Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().

Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
2006-08-09 17:43:27 +00:00
Alan Cox
39fd9b639f With the recent changes to the implementation of page coloring, the
the option PQ_NOOPT is used exclusively by vm_pageq.c.  Thus, the
include of opt_vmpage.h can be removed from vm_page.h.
2006-01-24 19:24:54 +00:00
Alexander Leidinger
ef39c05baa MI changes:
- provide an interface (macros) to the page coloring part of the VM system,
   this allows to try different coloring algorithms without the need to
   touch every file [1]
 - make the page queue tuning values readable: sysctl vm.stats.pagequeue
 - autotuning of the page coloring values based upon the cache size instead
   of options in the kernel config (disabling of the page coloring as a
   kernel option is still possible)

MD changes:
 - detection of the cache size: only IA32 and AMD64 (untested) contains
   cache size detection code, every other arch just comes with a dummy
   function (this results in the use of default values like it was the
   case without the autotuning of the page coloring)
 - print some more info on Intel CPU's (like we do on AMD and Transmeta
   CPU's)

Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue"
and report if the cache* values are zero (= bug in the cache detection code)
or not.

Based upon work by:	Chad David <davidc@acns.ab.ca> [1]
Reviewed by:		alc, arch (in 2004)
Discussed with:		alc, Chad David, arch (in 2004)
2005-12-31 14:39:20 +00:00
Robert Watson
8be9063ea1 Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,
as opt_vmpage.h will not be available to user space library builds.  A
similar existing check is present for KLD_MODULE for similar reasons.

MFC after:	3 days
2005-08-04 10:05:11 +00:00
Warner Losh
60727d8b86 /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
Alan Cox
91f7a86064 Note that access to the page's busy count is synchronized by the containing
object's lock.
2004-12-27 05:27:59 +00:00
Alan Cox
0f9f9bcb53 Introduce VM_ALLOC_NOBUSY, an option to vm_page_alloc() and vm_page_grab()
that indicates that the caller does not want a page with its busy flag set.
In many places, the global page queues lock is acquired and released just
to clear the busy flag on a just allocated page.  Both the allocation of
the page and the clearing of the busy flag occur while the containing vm
object is locked.  So, the busy flag might as well never be set.
2004-10-24 06:15:36 +00:00
Marcel Moolenaar
2bbb534b56 Move the cow field between wire_count and hold_count. This is the
position that is 64-bit aligned and makes sure that the valid and
dirty fields are also 64-bit aligned. This means that if PAGE_SIZE
is 32K, the size of the vm_page structure is only increased by 8
bytes instead of 16 bytes. More importantly, the vm_page structure
is either 120 or 128 bytes on ia64. These are "interesting" sizes.
2004-08-22 20:52:23 +00:00
Brian Feldman
4362fada8f Reimplement contigmalloc(9) with an algorithm which stands a greatly-
improved chance of working despite pressure from running programs.
Instead of trying to throw a bunch of pages out to swap and hope for
the best, only a range that can potentially fulfill contigmalloc(9)'s
request will have its contents paged out (potentially, not forcibly)
at a time.

The new contigmalloc operation still operates in three passes, but it
could potentially be tuned to more or less.  The first pass only looks
at pages in the cache and free pages, so they would be thrown out
without having to block.  If this is not enough, the subsequent passes
page out any unwired memory.  To combat memory pressure refragmenting
the section of memory being laundered, each page is removed from the
systems' free memory queue once it has been freed so that blocking
later doesn't cause the memory laundered so far to get reallocated.

The page-out operations are now blocking, as it would make little sense
to try to push out a page, then get its status immediately afterward
to remove it from the available free pages queue, if it's unlikely to
have been freed.  Another change is that if KVA allocation fails, the
allocated memory segment will be freed and not leaked.

There is a sysctl/tunable, defaulting to on, which causes the old
contigmalloc() algorithm to be used.  Nonetheless, I have been using
vm.old_contigmalloc=0 for over a month.  It is safe to switch at
run-time to see the difference it makes.

A new interface has been used which does not require mapping the
allocated pages into KVA: vm_page.h functions vm_page_alloc_contig()
and vm_page_release_contig().  These are what vm.old_contigmalloc=0
uses internally, so the sysctl/tunable does not affect their operation.

When using the contigmalloc(9) and contigfree(9) interfaces, memory
is now tracked with malloc(9) stats.  Several functions have been
exported from kern_malloc.c to allow other subsystems to use these
statistics, as well.  This invalidates the BUGS section of the
contigmalloc(9) manpage.
2004-07-19 06:21:27 +00:00
Alan Cox
69c1a910e5 Update stale comments regarding page coloring. 2004-06-05 21:06:42 +00:00
Alan Cox
62326de742 Move the definitions of SWAPBLK_NONE and SWAPBLK_MASK from vm_page.h to
blist.h, enabling the removal of numerous #includes from subr_blist.c.
(subr_blist.c and swap_pager.c are the only users of these definitions.)
2004-06-04 04:03:26 +00:00
Alan Cox
e363785643 Remove a stale comment: PG_DIRTY and PG_FILLED were removed in
revisions 1.17 and 1.12 respectively.
2004-05-30 20:48:15 +00:00
Warner Losh
05eb3785e7 Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
2004-04-06 20:15:37 +00:00
Alan Cox
889eb0fc62 Eliminate unused arguments from vm_page_startup(). 2004-04-04 23:33:36 +00:00
Alan Cox
45ad3d59ed Remove some long unused definitions. 2004-03-04 04:26:14 +00:00
Alan Cox
93dbd07122 - Align a comment within struct vm_page.
- Annotate the vm_page's valid field as synchronized by the containing
   vm object's lock.
2003-10-25 18:33:04 +00:00
Alan Cox
ab42316c2f - Retire vm_pageout_page_free(). Instead, use vm_page_select_cache() from
vm_pageout_scan().  Rationale: I don't like leaving a busy page in the
   cache queue with neither the vm object nor the vm page queues lock held.
 - Assert that the page is active in vm_pageout_page_stats().
2003-10-22 18:41:32 +00:00
Alan Cox
fee181a696 - Remove some long unused code. 2003-10-20 18:57:01 +00:00
Alan Cox
669890eaeb Retire vm_page_copy(). Its reason for being ended when peter@ modified
pmap_copy_page() et al. to accept a vm_page_t rather than a physical
address.  Also, this change will facilitate locking access to the vm page's
valid field.
2003-10-08 05:35:12 +00:00
Marcel Moolenaar
16bc6ff39e Assert that u_long is at least 64 bits if PAGE_SIZE is 32K.
Suggested by: phk
2003-08-25 19:58:01 +00:00
Marcel Moolenaar
21a708cfde Also define VM_PAGE_BITS_ALL for 16K and 32K pages. Make the constant
unsigned for all page sizes and unsigned long for 32K pages.
2003-08-23 06:30:47 +00:00
Marcel Moolenaar
1fa057c6f1 Add support for 16K and 32K page sizes. The valid and dirty maps
in struct vm_page are defined as u_int for 16K pages and u_long
for 32K pages, with the implied assumption that long will at least
be 64 bits wide on platforms where we support 32K pages.
2003-08-23 06:24:00 +00:00
Jake Burkholder
227f9a1c58 - Add vm_paddr_t, a physical address type. This is required for systems
where physical addresses larger than virtual addresses, such as i386s
  with PAE.
- Use this to represent physical addresses in the MI vm system and in the
  i386 pmap code.  This also changes the paddr parameter to d_mmap_t.
- Fix printf formats to handle physical addresses >4G in the i386 memory
  detection code, and due to kvtop returning vm_paddr_t instead of u_long.

Note that this is a name change only; vm_paddr_t is still the same as
vm_offset_t on all currently supported platforms.

Sponsored by:	DARPA, Network Associates Laboratories
Discussed with:	re, phk (cdevsw change)
2003-03-25 00:07:06 +00:00
Alan Cox
24c9ad6bed - Remove vm_page_sleep_busy(). The transition to vm_page_sleep_if_busy(),
which incorporates page queue and field locking, is complete.
 - Assert that the page queue lock rather than Giant is held in
   vm_page_flag_set().
2002-12-19 07:23:46 +00:00
Alan Cox
a12cc0e489 Remove vm_page_protect(). Instead, use pmap_page_protect() directly. 2002-11-18 04:05:22 +00:00
Alan Cox
ada2a050be Export the function vm_page_splay(). 2002-11-04 19:21:39 +00:00
Jeff Roberson
026aa839a4 - Add a new flag to vm_page_alloc, VM_ALLOC_NOOBJ. This tells
vm_page_alloc not to insert this page into an object.  The pindex is
   still used for colorization.
 - Rework vm_page_select_* to accept a color instead of an object and
   pindex to work with VM_PAGE_NOOBJ.
 - Document other VM_ALLOC_ flags.

Reviewed by:	peter, jake
2002-11-01 00:59:03 +00:00
Alan Cox
f3b676f0ad o Reinline vm_page_undirty(), reducing the kernel size. (This reverts
a part of vm_page.h revision 1.87 and vm_page.c revision 1.167.)
2002-10-20 19:57:55 +00:00
Matthew Dillon
b86ec922be Replace the vm_page hash table with a per-vmobject splay tree. There should
be no major change in performance from this change at this time but this
will allow other work to progress:  Giant lock removal around VM system
in favor of per-object mutexes, ranged fsyncs, more optimal COMMIT rpc's for
NFS, partial filesystem syncs by the syncer, more optimal object flushing,
etc.  Note that the buffer cache is already using a similar splay tree
mechanism.

Note that a good chunk of the old hash table code is still in the tree.
Alan or I will remove it prior to the release if the new code does not
introduce unsolvable bugs, else we can revert more easily.

Submitted by:	alc	(this is Alan's code)
Approved by:	re
2002-10-18 17:24:30 +00:00
Jeff Roberson
99571dc345 - Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH.
- Remove all instances of the mallochash.
 - Stash the slab pointer in the vm page's object pointer when allocating from
   the kmem_obj.
 - Use the overloaded object pointer to find slabs for malloced memory.
2002-09-18 08:26:30 +00:00
Alan Cox
fff6062ab6 o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever since
pmap_zero_page() and pmap_zero_page_area() were modified to accept
   a struct vm_page * instead of a physical address, vm_page_zero_fill()
   and vm_page_zero_fill_area() have served no purpose.
2002-08-25 00:22:31 +00:00
Alan Cox
38f612e053 o Remove the setting and clearing of the PG_MAPPED flag from the alpha and
ia64 pmap.
 o Remove the PG_MAPPED flag's declaration.
2002-08-10 18:01:39 +00:00
Alan Cox
e5f8bd9418 o Introduce vm_page_sleep_if_busy() as an eventual replacement for
vm_page_sleep_busy().  vm_page_sleep_if_busy() uses the page
   queues lock.
2002-07-29 19:41:22 +00:00
Alan Cox
2c071f61f0 o Modify vm_page_grab() to accept VM_ALLOC_WIRED. 2002-07-28 23:46:19 +00:00
Alan Cox
6fd77192b2 o Remove dead and/or unused code. 2002-07-20 05:06:20 +00:00
Alan Cox
827b2fa091 o Introduce an argument, VM_ALLOC_WIRED, that requests vm_page_alloc()
to return a wired page.
 o Use VM_ALLOC_WIRED within Alpha's pmap_growkernel().  Also, because
   Alpha's pmap_growkernel() calls vm_page_alloc() from within a critical
   section, specify VM_ALLOC_INTERRUPT instead of VM_ALLOC_SYSTEM.  (Only
   VM_ALLOC_INTERRUPT is implemented entirely with a spin mutex.)
 o Assert that the page queues mutex is held in vm_page_wire()
   on Alpha, just like the other platforms.
2002-07-18 04:08:10 +00:00
Alan Cox
1f54526952 o Complete the locking of page queue accesses by vm_page_unwire().
o Assert that the page queues lock is held in vm_page_unwire().
 o Make vm_page_lock_queues() and vm_page_unlock_queues() visible
   to kernel loadable modules.
2002-07-13 20:55:21 +00:00
Alan Cox
70c1763634 o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the free
queue lock (revision 1.33 of vm/vm_page.c removed them).
 o Make the free queue lock a spin lock because it's sometimes acquired
   inside of a critical section.
2002-07-04 22:07:37 +00:00
Kenneth D. Merry
98cb733c67 At long last, commit the zero copy sockets code.
MAKEDEV:	Add MAKEDEV glue for the ti(4) device nodes.

ti.4:		Update the ti(4) man page to include information on the
		TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
		and also include information about the new character
		device interface and the associated ioctls.

man9/Makefile:	Add jumbo.9 and zero_copy.9 man pages and associated
		links.

jumbo.9:	New man page describing the jumbo buffer allocator
		interface and operation.

zero_copy.9:	New man page describing the general characteristics of
		the zero copy send and receive code, and what an
		application author should do to take advantage of the
		zero copy functionality.

NOTES:		Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
		TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files:	Add uipc_jumbo.c and uipc_cow.c.

conf/options:	Add the 5 options mentioned above.

kern_subr.c:	Receive side zero copy implementation.  This takes
		"disposable" pages attached to an mbuf, gives them to
		a user process, and then recycles the user's page.
		This is only active when ZERO_COPY_SOCKETS is turned on
		and the kern.ipc.zero_copy.receive sysctl variable is
		set to 1.

uipc_cow.c:	Send side zero copy functions.  Takes a page written
		by the user and maps it copy on write and assigns it
		kernel virtual address space.  Removes copy on write
		mapping once the buffer has been freed by the network
		stack.

uipc_jumbo.c:	Jumbo disposable page allocator code.  This allocates
		(optionally) disposable pages for network drivers that
		want to give the user the option of doing zero copy
		receive.

uipc_socket.c:	Add kern.ipc.zero_copy.{send,receive} sysctls that are
		enabled if ZERO_COPY_SOCKETS is turned on.

		Add zero copy send support to sosend() -- pages get
		mapped into the kernel instead of getting copied if
		they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
		can be used elsewhere.  (uipc_cow.c)

if_media.c:	In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
		calling malloc() with M_WAITOK.  Return an error if
		the M_NOWAIT malloc fails.

		The ti(4) driver and the wi(4) driver, at least, call
		this with a mutex held.  This causes witness warnings
		for 'ifconfig -a' with a wi(4) or ti(4) board in the
		system.  (I've only verified for ti(4)).

ip_output.c:	Fragment large datagrams so that each segment contains
		a multiple of PAGE_SIZE amount of data plus headers.
		This allows the receiver to potentially do page
		flipping on receives.

if_ti.c:	Add zero copy receive support to the ti(4) driver.  If
		TI_PRIVATE_JUMBOS is not defined, it now uses the
		jumbo(9) buffer allocator for jumbo receive buffers.

		Add a new character device interface for the ti(4)
		driver for the new debugging interface.  This allows
		(a patched version of) gdb to talk to the Tigon board
		and debug the firmware.  There are also a few additional
		debugging ioctls available through this interface.

		Add header splitting support to the ti(4) driver.

		Tweak some of the default interrupt coalescing
		parameters to more useful defaults.

		Add hooks for supporting transmit flow control, but
		leave it turned off with a comment describing why it
		is turned off.

if_tireg.h:	Change the firmware rev to 12.4.11, since we're really
		at 12.4.11 plus fixes from 12.4.13.

		Add defines needed for debugging.

		Remove the ti_stats structure, it is now defined in
		sys/tiio.h.

ti_fw.h:	12.4.11 firmware.

ti_fw2.h:	12.4.11 firmware, plus selected fixes from 12.4.13,
		and my header splitting patches.  Revision 12.4.13
		doesn't handle 10/100 negotiation properly.  (This
		firmware is the same as what was in the tree previously,
		with the addition of header splitting support.)

sys/jumbo.h:	Jumbo buffer allocator interface.

sys/mbuf.h:	Add a new external mbuf type, EXT_DISPOSABLE, to
		indicate that the payload buffer can be thrown away /
		flipped to a userland process.

socketvar.h:	Add prototype for socow_setup.

tiio.h:		ioctl interface to the character portion of the ti(4)
		driver, plus associated structure/type definitions.

uio.h:		Change prototype for uiomoveco() so that we'll know
		whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c:	In vm_fault(), check to see whether we need to do a page
		based copy on write fault.

vm_object.c:	Add a new function, vm_object_allocate_wait().  This
		does the same thing that vm_object allocate does, except
		that it gives the caller the opportunity to specify whether
		it should wait on the uma_zalloc() of the object structre.

		This allows vm objects to be allocated while holding a
		mutex.  (Without generating WITNESS warnings.)

		vm_object_allocate() is implemented as a call to
		vm_object_allocate_wait() with the malloc flag set to
		M_WAITOK.

vm_object.h:	Add prototype for vm_object_allocate_wait().

vm_page.c:	Add page-based copy on write setup, clear and fault
		routines.

vm_page.h:	Add page based COW function prototypes and variable in
		the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.
2002-06-26 03:37:47 +00:00
Jeff Roberson
e78f35b33f Turn VM_ALLOC_ZERO into a flag.
Submitted by:	tegge
Reviewed by:	dillon
2002-06-25 22:01:12 +00:00