650 lines
28 KiB
Perl
650 lines
28 KiB
Perl
|
.\" Copyright (c) 1988 The Regents of the University of California.
|
||
|
.\" All rights reserved.
|
||
|
.\"
|
||
|
.\" Redistribution and use in source and binary forms, with or without
|
||
|
.\" modification, are permitted provided that the following conditions
|
||
|
.\" are met:
|
||
|
.\" 1. Redistributions of source code must retain the above copyright
|
||
|
.\" notice, this list of conditions and the following disclaimer.
|
||
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||
|
.\" notice, this list of conditions and the following disclaimer in the
|
||
|
.\" documentation and/or other materials provided with the distribution.
|
||
|
.\" 3. All advertising materials mentioning features or use of this software
|
||
|
.\" must display the following acknowledgement:
|
||
|
.\" This product includes software developed by the University of
|
||
|
.\" California, Berkeley and its contributors.
|
||
|
.\" 4. Neither the name of the University nor the names of its contributors
|
||
|
.\" may be used to endorse or promote products derived from this software
|
||
|
.\" without specific prior written permission.
|
||
|
.\"
|
||
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
||
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
||
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||
|
.\" SUCH DAMAGE.
|
||
|
.\"
|
||
|
.\" @(#)kernmalloc.t 5.1 (Berkeley) 4/16/91
|
||
|
.\"
|
||
|
.\" reference a system routine name
|
||
|
.de RN
|
||
|
\fI\\$1\fP\^(\h'1m/24u')\\$2
|
||
|
..
|
||
|
.\" reference a header name
|
||
|
.de H
|
||
|
.NH \\$1
|
||
|
\\$2
|
||
|
..
|
||
|
.\" begin figure
|
||
|
.\" .FI "title"
|
||
|
.nr Fn 0 1
|
||
|
.de FI
|
||
|
.ds Lb Figure \\n+(Fn
|
||
|
.ds Lt \\$1
|
||
|
.KF
|
||
|
.DS B
|
||
|
.nf
|
||
|
..
|
||
|
.\"
|
||
|
.\" end figure
|
||
|
.de Fe
|
||
|
.sp .5
|
||
|
.\" cheat: original indent is stored in \n(OI by .DS B; restore it
|
||
|
.\" then center legend after .DE rereads and centers the block.
|
||
|
\\\\.in \\n(OI
|
||
|
\\\\.ce
|
||
|
\\\\*(Lb. \\\\*(Lt
|
||
|
.sp .5
|
||
|
.DE
|
||
|
.KE
|
||
|
.if \nd 'ls 2
|
||
|
..
|
||
|
.EQ
|
||
|
delim $$
|
||
|
.EN
|
||
|
.ds CH "
|
||
|
.pn 295
|
||
|
.sp
|
||
|
.rs
|
||
|
.ps -1
|
||
|
.sp -1
|
||
|
.fi
|
||
|
Reprinted from:
|
||
|
\fIProceedings of the San Francisco USENIX Conference\fP,
|
||
|
pp. 295-303, June 1988.
|
||
|
.ps
|
||
|
.\".sp |\n(HMu
|
||
|
.rm CM
|
||
|
.nr PO 1.25i
|
||
|
.TL
|
||
|
Design of a General Purpose Memory Allocator for the 4.3BSD UNIX\(dg Kernel
|
||
|
.ds LF Summer USENIX '88
|
||
|
.ds CF "%
|
||
|
.ds RF San Francisco, June 20-24
|
||
|
.EH 'Design of a General Purpose Memory ...''McKusick, Karels'
|
||
|
.OH 'McKusick, Karels''Design of a General Purpose Memory ...'
|
||
|
.FS
|
||
|
\(dgUNIX is a registered trademark of AT&T in the US and other countries.
|
||
|
.FE
|
||
|
.AU
|
||
|
Marshall Kirk McKusick
|
||
|
.AU
|
||
|
Michael J. Karels
|
||
|
.AI
|
||
|
Computer Systems Research Group
|
||
|
Computer Science Division
|
||
|
Department of Electrical Engineering and Computer Science
|
||
|
University of California, Berkeley
|
||
|
Berkeley, California 94720
|
||
|
.AB
|
||
|
The 4.3BSD UNIX kernel uses many memory allocation mechanisms,
|
||
|
each designed for the particular needs of the utilizing subsystem.
|
||
|
This paper describes a general purpose dynamic memory allocator
|
||
|
that can be used by all of the kernel subsystems.
|
||
|
The design of this allocator takes advantage of known memory usage
|
||
|
patterns in the UNIX kernel and a hybrid strategy that is time-efficient
|
||
|
for small allocations and space-efficient for large allocations.
|
||
|
This allocator replaces the multiple memory allocation interfaces
|
||
|
with a single easy-to-program interface,
|
||
|
results in more efficient use of global memory by eliminating
|
||
|
partitioned and specialized memory pools,
|
||
|
and is quick enough that no performance loss is observed
|
||
|
relative to the current implementations.
|
||
|
The paper concludes with a discussion of our experience in using
|
||
|
the new memory allocator,
|
||
|
and directions for future work.
|
||
|
.AE
|
||
|
.LP
|
||
|
.H 1 "Kernel Memory Allocation in 4.3BSD
|
||
|
.PP
|
||
|
The 4.3BSD kernel has at least ten different memory allocators.
|
||
|
Some of them handle large blocks,
|
||
|
some of them handle small chained data structures,
|
||
|
and others include information to describe I/O operations.
|
||
|
Often the allocations are for small pieces of memory that are only
|
||
|
needed for the duration of a single system call.
|
||
|
In a user process such short-term
|
||
|
memory would be allocated on the run-time stack.
|
||
|
Because the kernel has a limited run-time stack,
|
||
|
it is not feasible to allocate even moderate blocks of memory on it.
|
||
|
Consequently, such memory must be allocated through a more dynamic mechanism.
|
||
|
For example,
|
||
|
when the system must translate a pathname,
|
||
|
it must allocate a one kilobye buffer to hold the name.
|
||
|
Other blocks of memory must be more persistent than a single system call
|
||
|
and really have to be allocated from dynamic memory.
|
||
|
Examples include protocol control blocks that remain throughout
|
||
|
the duration of the network connection.
|
||
|
.PP
|
||
|
Demands for dynamic memory allocation in the kernel have increased
|
||
|
as more services have been added.
|
||
|
Each time a new type of memory allocation has been required,
|
||
|
a specialized memory allocation scheme has been written to handle it.
|
||
|
Often the new memory allocation scheme has been built on top
|
||
|
of an older allocator.
|
||
|
For example, the block device subsystem provides a crude form of
|
||
|
memory allocation through the allocation of empty buffers [Thompson78].
|
||
|
The allocation is slow because of the implied semantics of
|
||
|
finding the oldest buffer, pushing its contents to disk if they are dirty,
|
||
|
and moving physical memory into or out of the buffer to create
|
||
|
the requested size.
|
||
|
To reduce the overhead, a ``new'' memory allocator was built in 4.3BSD
|
||
|
for name translation that allocates a pool of empty buffers.
|
||
|
It keeps them on a free list so they can
|
||
|
be quickly allocated and freed [McKusick85].
|
||
|
.PP
|
||
|
This memory allocation method has several drawbacks.
|
||
|
First, the new allocator can only handle a limited range of sizes.
|
||
|
Second, it depletes the buffer pool, as it steals memory intended
|
||
|
to buffer disk blocks to other purposes.
|
||
|
Finally, it creates yet another interface of
|
||
|
which the programmer must be aware.
|
||
|
.PP
|
||
|
A generalized memory allocator is needed to reduce the complexity
|
||
|
of writing code inside the kernel.
|
||
|
Rather than providing many semi-specialized ways of allocating memory,
|
||
|
the kernel should provide a single general purpose allocator.
|
||
|
With only a single interface,
|
||
|
programmers do not need to figure
|
||
|
out the most appropriate way to allocate memory.
|
||
|
If a good general purpose allocator is available,
|
||
|
it helps avoid the syndrome of creating yet another special
|
||
|
purpose allocator.
|
||
|
.PP
|
||
|
To ease the task of understanding how to use it,
|
||
|
the memory allocator should have an interface similar to the interface
|
||
|
of the well-known memory allocator provided for
|
||
|
applications programmers through the C library routines
|
||
|
.RN malloc
|
||
|
and
|
||
|
.RN free .
|
||
|
Like the C library interface,
|
||
|
the allocation routine should take a parameter specifying the
|
||
|
size of memory that is needed.
|
||
|
The range of sizes for memory requests should not be constrained.
|
||
|
The free routine should take a pointer to the storage being freed,
|
||
|
and should not require additional information such as the size
|
||
|
of the piece of memory being freed.
|
||
|
.H 1 "Criteria for a Kernel Memory Allocator
|
||
|
.PP
|
||
|
The design specification for a kernel memory allocator is similar to,
|
||
|
but not identical to,
|
||
|
the design criteria for a user level memory allocator.
|
||
|
The first criterion for a memory allocator is that it make good use
|
||
|
of the physical memory.
|
||
|
Good use of memory is measured by the amount of memory needed to hold
|
||
|
a set of allocations at any point in time.
|
||
|
Percentage utilization is expressed as:
|
||
|
.EQ
|
||
|
utilization~=~requested over required
|
||
|
.EN
|
||
|
Here, ``requested'' is the sum of the memory that has been requested
|
||
|
and not yet freed.
|
||
|
``Required'' is the amount of memory that has been
|
||
|
allocated for the pool from which the requests are filled.
|
||
|
An allocator requires more memory than requested because of fragmentation
|
||
|
and a need to have a ready supply of free memory for future requests.
|
||
|
A perfect memory allocator would have a utilization of 100%.
|
||
|
In practice,
|
||
|
having a 50% utilization is considered good [Korn85].
|
||
|
.PP
|
||
|
Good memory utilization in the kernel is more important than
|
||
|
in user processes.
|
||
|
Because user processes run in virtual memory,
|
||
|
unused parts of their address space can be paged out.
|
||
|
Thus pages in the process address space
|
||
|
that are part of the ``required'' pool that are not
|
||
|
being ``requested'' need not tie up physical memory.
|
||
|
Because the kernel is not paged,
|
||
|
all pages in the ``required'' pool are held by the kernel and
|
||
|
cannot be used for other purposes.
|
||
|
To keep the kernel utilization percentage as high as possible,
|
||
|
it is desirable to release unused memory in the ``required'' pool
|
||
|
rather than to hold it as is typically done with user processes.
|
||
|
Because the kernel can directly manipulate its own page maps,
|
||
|
releasing unused memory is fast;
|
||
|
a user process must do a system call to release memory.
|
||
|
.PP
|
||
|
The most important criterion for a memory allocator is that it be fast.
|
||
|
Because memory allocation is done frequently,
|
||
|
a slow memory allocator will degrade the system performance.
|
||
|
Speed of allocation is more critical when executing in the
|
||
|
kernel than in user code,
|
||
|
because the kernel must allocate many data structure that user
|
||
|
processes can allocate cheaply on their run-time stack.
|
||
|
In addition, the kernel represents the platform on which all user
|
||
|
processes run,
|
||
|
and if it is slow, it will degrade the performance of every process
|
||
|
that is running.
|
||
|
.PP
|
||
|
Another problem with a slow memory allocator is that programmers
|
||
|
of frequently-used kernel interfaces will feel that they
|
||
|
cannot afford to use it as their primary memory allocator.
|
||
|
Instead they will build their own memory allocator on top of the
|
||
|
original by maintaining their own pool of memory blocks.
|
||
|
Multiple allocators reduce the efficiency with which memory is used.
|
||
|
The kernel ends up with many different free lists of memory
|
||
|
instead of a single free list from which all allocation can be drawn.
|
||
|
For example,
|
||
|
consider the case of two subsystems that need memory.
|
||
|
If they have their own free lists,
|
||
|
the amount of memory tied up in the two lists will be the
|
||
|
sum of the greatest amount of memory that each of
|
||
|
the two subsystems has ever used.
|
||
|
If they share a free list,
|
||
|
the amount of memory tied up in the free list may be as low as the
|
||
|
greatest amount of memory that either subsystem used.
|
||
|
As the number of subsystems grows,
|
||
|
the savings from having a single free list grow.
|
||
|
.H 1 "Existing User-level Implementations
|
||
|
.PP
|
||
|
There are many different algorithms and
|
||
|
implementations of user-level memory allocators.
|
||
|
A survey of those available on UNIX systems appeared in [Korn85].
|
||
|
Nearly all of the memory allocators tested made good use of memory,
|
||
|
though most of them were too slow for use in the kernel.
|
||
|
The fastest memory allocator in the survey by nearly a factor of two
|
||
|
was the memory allocator provided on 4.2BSD originally
|
||
|
written by Chris Kingsley at California Institute of Technology.
|
||
|
Unfortunately,
|
||
|
the 4.2BSD memory allocator also wasted twice as much memory
|
||
|
as its nearest competitor in the survey.
|
||
|
.PP
|
||
|
The 4.2BSD user-level memory allocator works by maintaining a set of lists
|
||
|
that are ordered by increasing powers of two.
|
||
|
Each list contains a set of memory blocks of its corresponding size.
|
||
|
To fulfill a memory request,
|
||
|
the size of the request is rounded up to the next power of two.
|
||
|
A piece of memory is then removed from the list corresponding
|
||
|
to the specified power of two and returned to the requester.
|
||
|
Thus, a request for a block of memory of size 53 returns
|
||
|
a block from the 64-sized list.
|
||
|
A typical memory allocation requires a roundup calculation
|
||
|
followed by a linked list removal.
|
||
|
Only if the list is empty is a real memory allocation done.
|
||
|
The free operation is also fast;
|
||
|
the block of memory is put back onto the list from which it came.
|
||
|
The correct list is identified by a size indicator stored
|
||
|
immediately preceding the memory block.
|
||
|
.H 1 "Considerations Unique to a Kernel Allocator
|
||
|
.PP
|
||
|
There are several special conditions that arise when writing a
|
||
|
memory allocator for the kernel that do not apply to a user process
|
||
|
memory allocator.
|
||
|
First, the maximum memory allocation can be determined at
|
||
|
the time that the machine is booted.
|
||
|
This number is never more than the amount of physical memory on the machine,
|
||
|
and is typically much less since a machine with all its
|
||
|
memory dedicated to the operating system is uninteresting to use.
|
||
|
Thus, the kernel can statically allocate a set of data structures
|
||
|
to manage its dynamically allocated memory.
|
||
|
These data structures never need to be
|
||
|
expanded to accommodate memory requests;
|
||
|
yet, if properly designed, they need not be large.
|
||
|
For a user process, the maximum amount of memory that may be allocated
|
||
|
is a function of the maximum size of its virtual memory.
|
||
|
Although it could allocate static data structures to manage
|
||
|
its entire virtual memory,
|
||
|
even if they were efficiently encoded they would potentially be huge.
|
||
|
The other alternative is to allocate data structures as they are needed.
|
||
|
However, that adds extra complications such as new
|
||
|
failure modes if it cannot allocate space for additional
|
||
|
structures and additional mechanisms to link them all together.
|
||
|
.PP
|
||
|
Another special condition of the kernel memory allocator is that it
|
||
|
can control its own address space.
|
||
|
Unlike user processes that can only grow and shrink their heap at one end,
|
||
|
the kernel can keep an arena of kernel addresses and allocate
|
||
|
pieces from that arena which it then populates with physical memory.
|
||
|
The effect is much the same as a user process that has parts of
|
||
|
its address space paged out when they are not in use,
|
||
|
except that the kernel can explicitly control the set of pages
|
||
|
allocated to its address space.
|
||
|
The result is that the ``working set'' of pages in use by the
|
||
|
kernel exactly corresponds to the set of pages that it is really using.
|
||
|
.FI "One day memory usage on a Berkeley time-sharing machine"
|
||
|
.so usage.tbl
|
||
|
.Fe
|
||
|
.PP
|
||
|
A final special condition that applies to the kernel is that
|
||
|
all of the different uses of dynamic memory are known in advance.
|
||
|
Each one of these uses of dynamic memory can be assigned a type.
|
||
|
For each type of dynamic memory that is allocated,
|
||
|
the kernel can provide allocation limits.
|
||
|
One reason given for having separate allocators is that
|
||
|
no single allocator could starve the rest of the kernel of all
|
||
|
its available memory and thus a single runaway
|
||
|
client could not paralyze the system.
|
||
|
By putting limits on each type of memory,
|
||
|
the single general purpose memory allocator can provide the same
|
||
|
protection against memory starvation.\(dg
|
||
|
.FS
|
||
|
\(dgOne might seriously ask the question what good it is if ``only''
|
||
|
one subsystem within the kernel hangs if it is something like the
|
||
|
network on a diskless workstation.
|
||
|
.FE
|
||
|
.PP
|
||
|
\*(Lb shows the memory usage of the kernel over a one day period
|
||
|
on a general timesharing machine at Berkeley.
|
||
|
The ``In Use'', ``Free'', and ``Mem Use'' fields are instantaneous values;
|
||
|
the ``Requests'' field is the number of allocations since system startup;
|
||
|
the ``High Use'' field is the maximum value of
|
||
|
the ``Mem Use'' field since system startup.
|
||
|
The figure demonstrates that most
|
||
|
allocations are for small objects.
|
||
|
Large allocations occur infrequently,
|
||
|
and are typically for long-lived objects
|
||
|
such as buffers to hold the superblock for
|
||
|
a mounted file system.
|
||
|
Thus, a memory allocator only needs to be
|
||
|
fast for small pieces of memory.
|
||
|
.H 1 "Implementation of the Kernel Memory Allocator
|
||
|
.PP
|
||
|
In reviewing the available memory allocators,
|
||
|
none of their strategies could be used without some modification.
|
||
|
The kernel memory allocator that we ended up with is a hybrid
|
||
|
of the fast memory allocator found in the 4.2BSD C library
|
||
|
and a slower but more-memory-efficient first-fit allocator.
|
||
|
.PP
|
||
|
Small allocations are done using the 4.2BSD power-of-two list strategy;
|
||
|
the typical allocation requires only a computation of
|
||
|
the list to use and the removal of an element if it is available,
|
||
|
so it is quite fast.
|
||
|
Macros are provided to avoid the cost of a subroutine call.
|
||
|
Only if the request cannot be fulfilled from a list is a call
|
||
|
made to the allocator itself.
|
||
|
To ensure that the allocator is always called for large requests,
|
||
|
the lists corresponding to large allocations are always empty.
|
||
|
Appendix A shows the data structures and implementation of the macros.
|
||
|
.PP
|
||
|
Similarly, freeing a block of memory can be done with a macro.
|
||
|
The macro computes the list on which to place the request
|
||
|
and puts it there.
|
||
|
The free routine is called only if the block of memory is
|
||
|
considered to be a large allocation.
|
||
|
Including the cost of blocking out interrupts,
|
||
|
the allocation and freeing macros generate respectively
|
||
|
only nine and sixteen (simple) VAX instructions.
|
||
|
.PP
|
||
|
Because of the inefficiency of power-of-two allocation strategies
|
||
|
for large allocations,
|
||
|
a different strategy is used for allocations larger than two kilobytes.
|
||
|
The selection of two kilobytes is derived from our statistics on
|
||
|
the utilization of memory within the kernel,
|
||
|
that showed that 95 to 98% of allocations are of size one kilobyte or less.
|
||
|
A frequent caller of the memory allocator
|
||
|
(the name translation function)
|
||
|
always requests a one kilobyte block.
|
||
|
Additionally the allocation method for large blocks is based on allocating
|
||
|
pieces of memory in multiples of pages.
|
||
|
Consequently the actual allocation size for requests of size
|
||
|
$2~times~pagesize$ or less are identical.\(dg
|
||
|
.FS
|
||
|
\(dgTo understand why this number is $size 8 {2~times~pagesize}$ one
|
||
|
observes that the power-of-two algorithm yields sizes of 1, 2, 4, 8, \&...
|
||
|
pages while the large block algorithm that allocates in multiples
|
||
|
of pages yields sizes of 1, 2, 3, 4, \&... pages.
|
||
|
Thus for allocations of sizes between one and two pages
|
||
|
both algorithms use two pages;
|
||
|
it is not until allocations of sizes between two and three pages
|
||
|
that a difference emerges where the power-of-two algorithm will use
|
||
|
four pages while the large block algorithm will use three pages.
|
||
|
.FE
|
||
|
In 4.3BSD on the VAX, the (software) page size is one kilobyte,
|
||
|
so two kilobytes is the smallest logical cutoff.
|
||
|
.PP
|
||
|
Large allocations are first rounded up to be a multiple of the page size.
|
||
|
The allocator then uses a first-fit algorithm to find space in the
|
||
|
kernel address arena set aside for dynamic allocations.
|
||
|
Thus a request for a five kilobyte piece of memory will use exactly
|
||
|
five pages of memory rather than eight kilobytes as with
|
||
|
the power-of-two allocation strategy.
|
||
|
When a large piece of memory is freed,
|
||
|
the memory pages are returned to the free memory pool,
|
||
|
and the address space is returned to the kernel address arena
|
||
|
where it is coalesced with adjacent free pieces.
|
||
|
.PP
|
||
|
Another technique to improve both the efficiency of memory utilization
|
||
|
and the speed of allocation
|
||
|
is to cluster same-sized small allocations on a page.
|
||
|
When a list for a power-of-two allocation is empty,
|
||
|
a new page is allocated and divided into pieces of the needed size.
|
||
|
This strategy speeds future allocations as several pieces of memory
|
||
|
become available as a result of the call into the allocator.
|
||
|
.PP
|
||
|
.FI "Calculation of allocation size"
|
||
|
.so alloc.fig
|
||
|
.Fe
|
||
|
Because the size is not specified when a block of memory is freed,
|
||
|
the allocator must keep track of the sizes of the pieces it has handed out.
|
||
|
The 4.2BSD user-level allocator stores the size of each block
|
||
|
in a header just before the allocation.
|
||
|
However, this strategy doubles the memory requirement for allocations that
|
||
|
require a power-of-two-sized block.
|
||
|
Therefore,
|
||
|
instead of storing the size of each piece of memory with the piece itself,
|
||
|
the size information is associated with the memory page.
|
||
|
\*(Lb shows how the kernel determines
|
||
|
the size of a piece of memory that is being freed,
|
||
|
by calculating the page in which it resides,
|
||
|
and looking up the size associated with that page.
|
||
|
Eliminating the cost of the overhead per piece improved utilization
|
||
|
far more than expected.
|
||
|
The reason is that many allocations in the kernel are for blocks of
|
||
|
memory whose size is exactly a power of two.
|
||
|
These requests would be nearly doubled if the user-level strategy were used.
|
||
|
Now they can be accommodated with no wasted memory.
|
||
|
.PP
|
||
|
The allocator can be called both from the top half of the kernel,
|
||
|
which is willing to wait for memory to become available,
|
||
|
and from the interrupt routines in the bottom half of the kernel
|
||
|
that cannot wait for memory to become available.
|
||
|
Clients indicate their willingness (and ability) to wait with a flag
|
||
|
to the allocation routine.
|
||
|
For clients that are willing to wait,
|
||
|
the allocator guarrentees that their request will succeed.
|
||
|
Thus, these clients can need not check the return value from the allocator.
|
||
|
If memory is unavailable and the client cannot wait,
|
||
|
the allocator returns a null pointer.
|
||
|
These clients must be prepared to cope with this
|
||
|
(hopefully infrequent) condition
|
||
|
(usually by giving up and hoping to do better later).
|
||
|
.H 1 "Results of the Implementation
|
||
|
.PP
|
||
|
The new memory allocator was written about a year ago.
|
||
|
Conversion from the old memory allocators to the new allocator
|
||
|
has been going on ever since.
|
||
|
Many of the special purpose allocators have been eliminated.
|
||
|
This list includes
|
||
|
.RN calloc ,
|
||
|
.RN wmemall ,
|
||
|
and
|
||
|
.RN zmemall .
|
||
|
Many of the special purpose memory allocators built on
|
||
|
top of other allocators have also been eliminated.
|
||
|
For example, the allocator that was built on top of the buffer pool allocator
|
||
|
.RN geteblk
|
||
|
to allocate pathname buffers in
|
||
|
.RN namei
|
||
|
has been eliminated.
|
||
|
Because the typical allocation is so fast,
|
||
|
we have found that none of the special purpose pools are needed.
|
||
|
Indeed, the allocation is about the same as the previous cost of
|
||
|
allocating buffers from the network pool (\fImbuf\fP\^s).
|
||
|
Consequently applications that used to allocate network
|
||
|
buffers for their own uses have been switched over to using
|
||
|
the general purpose allocator without increasing their running time.
|
||
|
.PP
|
||
|
Quantifying the performance of the allocator is difficult because
|
||
|
it is hard to measure the amount of time spent allocating
|
||
|
and freeing memory in the kernel.
|
||
|
The usual approach is to compile a kernel for profiling
|
||
|
and then compare the running time of the routines that
|
||
|
implemented the old abstraction versus those that implement the new one.
|
||
|
The old routines are difficult to quantify because
|
||
|
individual routines were used for more than one purpose.
|
||
|
For example, the
|
||
|
.RN geteblk
|
||
|
routine was used both to allocate one kilobyte memory blocks
|
||
|
and for its intended purpose of providing buffers to the filesystem.
|
||
|
Differentiating these uses is often difficult.
|
||
|
To get a measure of the cost of memory allocation before
|
||
|
putting in our new allocator,
|
||
|
we summed up the running time of all the routines whose
|
||
|
exclusive task was memory allocation.
|
||
|
To this total we added the fraction
|
||
|
of the running time of the multi-purpose routines that could
|
||
|
clearly be identified as memory allocation usage.
|
||
|
This number showed that approximately three percent of
|
||
|
the time spent in the kernel could be accounted to memory allocation.
|
||
|
.PP
|
||
|
The new allocator is difficult to measure
|
||
|
because the usual case of the memory allocator is implemented as a macro.
|
||
|
Thus, its running time is a small fraction of the running time of the
|
||
|
numerous routines in the kernel that use it.
|
||
|
To get a bound on the cost,
|
||
|
we changed the macro always to call the memory allocation routine.
|
||
|
Running in this mode, the memory allocator accounted for six percent
|
||
|
of the time spent in the kernel.
|
||
|
Factoring out the cost of the statistics collection and the
|
||
|
subroutine call overhead for the cases that could
|
||
|
normally be handled by the macro,
|
||
|
we estimate that the allocator would account for
|
||
|
at most four percent of time in the kernel.
|
||
|
These measurements show that the new allocator does not introduce
|
||
|
significant new run-time costs.
|
||
|
.PP
|
||
|
The other major success has been in keeping the size information
|
||
|
on a per-page basis.
|
||
|
This technique allows the most frequently requested sizes to be
|
||
|
allocated without waste.
|
||
|
It also reduces the amount of bookkeeping information associated
|
||
|
with the allocator to four kilobytes of information
|
||
|
per megabyte of memory under management (with a one kilobyte page size).
|
||
|
.H 1 "Future Work
|
||
|
.PP
|
||
|
Our next project is to convert many of the static
|
||
|
kernel tables to be dynamically allocated.
|
||
|
Static tables include the process table, the file table,
|
||
|
and the mount table.
|
||
|
Making these tables dynamic will have two benefits.
|
||
|
First, it will reduce the amount of memory
|
||
|
that must be statically allocated at boot time.
|
||
|
Second, it will eliminate the arbitrary upper limit imposed
|
||
|
by the current static sizing
|
||
|
(although a limit will be retained to constrain runaway clients).
|
||
|
Other researchers have already shown the memory savings
|
||
|
achieved by this conversion [Rodriguez88].
|
||
|
.PP
|
||
|
Under the current implementation,
|
||
|
memory is never moved from one size list to another.
|
||
|
With the 4.2BSD memory allocator this causes problems,
|
||
|
particularly for large allocations where a process may use
|
||
|
a quarter megabyte piece of memory once,
|
||
|
which is then never available for any other size request.
|
||
|
In our hybrid scheme,
|
||
|
memory can be shuffled between large requests so that large blocks
|
||
|
of memory are never stranded as they are with the 4.2BSD allocator.
|
||
|
However, pages allocated to small requests are allocated once
|
||
|
to a particular size and never changed thereafter.
|
||
|
If a burst of requests came in for a particular size,
|
||
|
that size would acquire a large amount of memory
|
||
|
that would then not be available for other future requests.
|
||
|
.PP
|
||
|
In practice, we do not find that the free lists become too large.
|
||
|
However, we have been investigating ways to handle such problems
|
||
|
if they occur in the future.
|
||
|
Our current investigations involve a routine
|
||
|
that can run as part of the idle loop that would sort the elements
|
||
|
on each of the free lists into order of increasing address.
|
||
|
Since any given page has only one size of elements allocated from it,
|
||
|
the effect of the sorting would be to sort the list into distinct pages.
|
||
|
When all the pieces of a page became free,
|
||
|
the page itself could be released back to the free pool so that
|
||
|
it could be allocated to another purpose.
|
||
|
Although there is no guarantee that all the pieces of a page would ever
|
||
|
be freed,
|
||
|
most allocations are short-lived, lasting only for the duration of
|
||
|
an open file descriptor, an open network connection, or a system call.
|
||
|
As new allocations would be made from the page sorted to
|
||
|
the front of the list,
|
||
|
return of elements from pages at the back would eventually
|
||
|
allow pages later in the list to be freed.
|
||
|
.PP
|
||
|
Two of the traditional UNIX
|
||
|
memory allocators remain in the current system.
|
||
|
The terminal subsystem uses \fIclist\fP\^s (character lists).
|
||
|
That part of the system is expected to undergo major revision within
|
||
|
the the next year or so, and it will probably be changed to use
|
||
|
\fImbuf\fP\^s as it is merged into the network system.
|
||
|
The other major allocator that remains is
|
||
|
.RN getblk ,
|
||
|
the routine that manages the filesystem buffer pool memory
|
||
|
and associated control information.
|
||
|
Only the filesystem uses
|
||
|
.RN getblk
|
||
|
in the current system;
|
||
|
it manages the constant-sized buffer pool.
|
||
|
We plan to merge the filesystem buffer cache into the virtual memory system's
|
||
|
page cache in the future.
|
||
|
This change will allow the size of the buffer pool to be changed
|
||
|
according to memory load,
|
||
|
but will require a policy for balancing memory needs
|
||
|
with filesystem cache performance.
|
||
|
.H 1 "Acknowledgments
|
||
|
.PP
|
||
|
In the spirit of community support,
|
||
|
we have made various versions of our allocator available to our test sites.
|
||
|
They have been busily burning it in and giving
|
||
|
us feedback on their experiences.
|
||
|
We acknowledge their invaluable input.
|
||
|
The feedback from the Usenix program committee on the initial draft of
|
||
|
our paper suggested numerous important improvements.
|
||
|
.H 1 "References
|
||
|
.LP
|
||
|
.IP Korn85 \w'Rodriguez88\0\0'u
|
||
|
David Korn, Kiem-Phong Vo,
|
||
|
``In Search of a Better Malloc''
|
||
|
\fIProceedings of the Portland Usenix Conference\fP,
|
||
|
pp 489-506, June 1985.
|
||
|
.IP McKusick85
|
||
|
M. McKusick, M. Karels, S. Leffler,
|
||
|
``Performance Improvements and Functional Enhancements in 4.3BSD''
|
||
|
\fIProceedings of the Portland Usenix Conference\fP,
|
||
|
pp 519-531, June 1985.
|
||
|
.IP Rodriguez88
|
||
|
Robert Rodriguez, Matt Koehler, Larry Palmer, Ricky Palmer,
|
||
|
``A Dynamic UNIX Operating System''
|
||
|
\fIProceedings of the San Francisco Usenix Conference\fP,
|
||
|
June 1988.
|
||
|
.IP Thompson78
|
||
|
Ken Thompson,
|
||
|
``UNIX Implementation''
|
||
|
\fIBell System Technical Journal\fP, volume 57, number 6,
|
||
|
pp 1931-1946, 1978.
|