Commit Graph

195 Commits

Author SHA1 Message Date
davide
26a7b21456 Remove a spurious keg lock acquisition. 2013-06-28 21:13:19 +00:00
jeff
4201cd7bd1 - Resolve bucket recursion issues by passing a cookie with zone flags
through bucket_alloc() to uma_zalloc_arg() and uma_zfree_arg().
 - Make some smaller buckets for large zones to further reduce memory
   waste.
 - Implement uma_zone_reserve().  This holds aside a number of items only
   for callers who specify M_USE_RESERVE.  buckets will never be filled
   from reserve allocations.

Sponsored by:	EMC / Isilon Storage Division
2013-06-26 00:57:38 +00:00
jeff
a6b6e4783c - Add a per-zone lock for zones without kegs.
- Be more explicit about zone vs keg locking.  This functionally changes
   almost nothing.
 - Add a size parameter to uma_zcache_create() so we can size the buckets.
 - Pass the zone to bucket_alloc() so it can modify allocation flags
   as appropriate.
 - Fix a bug in zone_alloc_bucket() where I missed an address of operator
   in a failure case.  (Found by pho)

Sponsored by:	EMC / Isilon Storage Division
2013-06-20 19:08:12 +00:00
jeff
b81bfe8f58 - Persist the caller's flags in the bucket allocation flags so we don't
lose a M_NOVM when we recurse into a bucket allocation.

Sponsored by:	EMC / Isilon Storage Division
2013-06-19 02:30:32 +00:00
jeff
cca9ad5b94 Refine UMA bucket allocation to reduce space consumption and improve
performance.

 - Always free to the alloc bucket if there is space.  This gives LIFO
   allocation order to improve hot-cache performance.  This also allows
   for zones with a single bucket per-cpu rather than a pair if the entire
   working set fits in one bucket.
 - Enable per-cpu caches of buckets.  To prevent recursive bucket
   allocation one bucket zone still has per-cpu caches disabled.
 - Pick the initial bucket size based on a table driven maximum size
   per-bucket rather than the number of items per-page.  This gives
   more sane initial sizes.
 - Only grow the bucket size when we face contention on the zone lock, this
   causes bucket sizes to grow more slowly.
 - Adjust the number of items per-bucket to account for the header space.
   This packs the buckets more efficiently per-page while making them
   not quite powers of two.
 - Eliminate the per-zone free bucket list.  Always return buckets back
   to the bucket zone.  This ensures that as zones grow into larger
   bucket sizes they eventually discard the smaller sizes.  It persists
   fewer buckets in the system.  The locking is slightly trickier.
 - Only switch buckets in zalloc, not zfree, this eliminates pathological
   cases where we ping-pong between two buckets.
 - Ensure that the thread that fills a new bucket gets to allocate from
   it to give a better upper bound on allocation time.

Sponsored by:	EMC / Isilon Storage Division
2013-06-18 04:50:20 +00:00
jeff
1980616f65 - Add a new UMA API: uma_zcache_create(). This makes a zone without any
backing memory that is only a container for per-cpu caches of arbitrary
   pointer items.  These zones have no kegs.
 - Convert the regular keg based allocator to use the new import/release
   functions.
 - Move some stats to be atomics since they would require excessive zone
   locking/unlocking with the new import/release paradigm.  Make
   zone_free_item simpler now that callers can manage more stats.
 - Check for these cache-only zones in the public APIs and debugging
   code by checking zone_first_keg() against NULL.

Sponsored by:	EMC / Isilong Storage Division
2013-06-17 03:43:47 +00:00
jeff
84a32e0176 - Convert the slab free item list from a linked array of indices to a
bitmap using sys/bitset.  This is much simpler, has lower space
   overhead and is cheaper in most cases.
 - Use a second bitmap for invariants asserts and improve the quality of
   the asserts as well as the number of erroneous conditions that we will
   catch.
 - Drastically simplify sizing code.  Special case refcnt zones since they
   will be going away.
 - Update stale comments.

Sponsored by:	EMC / Isilon Storage Division
2013-06-13 21:05:38 +00:00
glebius
18dd370b59 Panic if UMA_ZONE_PCPU is created at early stages of boot, when mp_ncpus
isn't yet initialized. Otherwise we will panic at first allocation later.

Sponsored by:	Nginx, Inc.
2013-04-22 09:02:23 +00:00
glebius
204e3efd77 Convert UMA code to C99 uintXX_t types. 2013-04-09 17:43:48 +00:00
glebius
3206771906 Fix KASSERTs: maximum number of items per slab is 256. 2013-04-09 12:20:44 +00:00
glebius
7f9db020a2 Merge from projects/counters: UMA_ZONE_PCPU zones.
These zones have slab size == sizeof(struct pcpu), but request from VM
enough pages to fit (uk_slabsize * mp_ncpus). An item allocated from such
zone would have a separate twin for each CPU in the system, and these twins
are at a distance of sizeof(struct pcpu) from each other. This magic value
of distance would allow us to make some optimizations later.

  To address private item from a CPU simple arithmetics should be used:

  item = (type *)((char *)base + sizeof(struct pcpu) * curcpu)

  These arithmetics are available as zpcpu_get() macro in pcpu.h.

  To introduce non-page size slabs a new field had been added to uma_keg
uk_slabsize. This shifted some frequently used fields of uma_keg to the
fourth cache line on amd64. To mitigate this pessimization, uma_keg fields
were a bit rearranged and least frequently used uk_name and uk_link moved
down to the fourth cache line. All other fields, that are dereferenced
frequently fit into first three cache lines.

Sponsored by:	Nginx, Inc.
2013-04-08 19:10:45 +00:00
attilio
74f58faa15 MFC 2013-02-26 23:52:23 +00:00
attilio
15bf891afe Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() to
their "write" versions.

Sponsored by:	EMC / Isilon storage division
2013-02-20 12:03:20 +00:00
attilio
658534ed5a Switch vm_object lock to be a rwlock.
* VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations
* VM_OBJECT_SLEEP() is introduced as a general purpose primitve to
  get a sleep operation using a VM_OBJECT_LOCK() as protection
* The approach must bear with vm_pager.h namespace pollution so many
  files require including directly rwlock.h
2013-02-20 10:38:34 +00:00
glebius
e3e319a0b6 Fix typo in debug printf. 2013-01-29 19:06:16 +00:00
pjd
a585ca9ec8 Implemented uma_zone_set_warning(9) function that sets a warning, which
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.

All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.

Discussed on:	arch
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-12-07 22:27:13 +00:00
mdf
1bc1b805d7 Const-ify the zone name argument to uma_zcreate(9).
MFC after:	3 days
2012-10-26 17:51:05 +00:00
eadler
23c67e54ce Print flags as hex instead of an integer.
PR:		kern/168210
Submitted by:	linimon
Reviewed by:	alc
Approved by:	cperciva
MFC after:	3 days
2012-10-22 02:11:57 +00:00
glebius
e4b6b754eb If caller specifies UMA_ZONE_OFFPAGE explicitly, then do not waste memory
in an allocation for a slab.

Reviewed by:	jeff
2012-09-18 20:28:55 +00:00
glebius
b1ab314c3f Fix function name in keg_cachespread_init() assert. 2012-08-26 09:54:11 +00:00
eadler
16452223a2 Add missing sleep stat increase
PR:		kern/168211
Submitted by:	linimon
Reviewed by:	alc
Approved by:	cperciva
MFC after:	3 days
2012-07-07 17:46:11 +00:00
jhb
ab100847da Honor db_pager_quit in 'show uma' and 'show malloc'.
MFC after:	1 month
2012-07-02 16:14:52 +00:00
emax
0984d7ec39 Tweak condition for disabling allocation from per-CPU buckets in
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.

Reviewed by:	alc
MFC after:	1 week
2012-05-23 18:56:29 +00:00
kmacy
84d434965a exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create

Reviewed by:	alc, avg
MFC after:	2 weeks
2012-01-27 20:18:31 +00:00
glebius
2522c42334 Make memguard(9) capable to guard uma(9) allocations. 2011-10-12 18:08:28 +00:00
alc
2c928f6173 Correct an error in r222163. Unless UMA_MD_SMALL_ALLOC is defined,
startup_alloc() must be used until uma_startup2() is called.

Reported by:	jh
2011-05-22 17:46:16 +00:00
alc
a899800e2a 1. Prior to r214782, UMA did not support multipage allocations before
uma_startup2() was called.  Thus, setting the variable "booted" to true in
uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because
any allocations made after uma_startup() but before uma_startup2() could be
satisfied by uma_small_alloc().  Now, however, some multipage allocations
are necessary before uma_startup2() just to allocate zone structures on
machines with a large number of processors.  Thus, a Boolean can no longer
effectively describe the state of the UMA allocator.  Instead, make "booted"
have three values to describe how far initialization has progressed.  This
allows multipage allocations to continue using startup_alloc() until
uma_startup2(), but single-page allocations may begin using
uma_small_alloc() after uma_startup().

2. With the aforementioned change, only a modest increase in boot pages is
necessary to boot UMA on a large number of processors.

3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM.  It has only been used between
r182028 and r204128.

Reviewed by:	attilio [1], nwhitehorn [3]
Tested by:	sbruno
2011-05-21 17:43:43 +00:00
alc
4037bb644a Eliminate a redundant #include. ("vm/vm_param.h" already includes
"machine/vmparam.h".)
2011-05-20 15:26:31 +00:00
jeff
2d7d8c05e7 - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
   and other miscellaneous small features.
2011-03-21 09:40:01 +00:00
mdf
7fc649fc41 Explicitly wire the user buffer rather than doing it implicitly in
sbuf_new_for_sysctl(9).  This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.

Inspired by:	rwatson
2011-01-27 00:34:12 +00:00
jhb
5497beb840 Update startup_alloc() to support multi-page allocations and allow internal
zones whose objects are larger than a page to use startup_alloc().  This
allows allocation of zone objects during early boot on machines with a large
number of CPUs since the resulting zone objects are larger than a page.

Submitted by:	trema
Reviewed by:	attilio
MFC after:	1 week
2010-11-04 15:33:50 +00:00
mdf
3f66b92677 uma_zfree(zone, NULL) should do nothing, to match free(9).
Noticed by:	Ron Steinke <rsteinke at isilon dot com>
MFC after:	3 days
2010-10-19 16:06:00 +00:00
lstewart
4171305db0 Change uma_zone_set_max to return the effective value of "nitems" after
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.

Sponsored by:	FreeBSD Foundation
Reviewed by:	gnn, jhb
2010-10-16 04:41:45 +00:00
lstewart
2770ad22da - Simplify implementation of uma_zone_get_max.
- Add uma_zone_get_cur which returns the current approximate occupancy of
  a zone. This is useful for providing stats via sysctl amongst other things.

Sponsored by:	FreeBSD Foundation
Reviewed by:	gnn, jhb
MFC after:	2 weeks
2010-10-16 04:14:45 +00:00
mdf
5695ef4698 Re-add r212370 now that the LOR in powerpc64 has been resolved:
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing
NUL byte.  This behaviour was preserved, though it should not be
necessary.

Reviewed by:    phk (original patch)
2010-09-16 16:13:12 +00:00
mdf
3ed6eac561 Revert r212370, as it causes a LOR on powerpc. powerpc does a few
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.

Requested by: nwhitehorn
2010-09-13 18:48:23 +00:00
mdf
bc54684253 Add a drain function for struct sysctl_req, and use it for a variety of
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.

Note that some sysctl handlers were explicitly outputting a trailing NUL
byte.  This behaviour was preserved, though it should not be necessary.

Reviewed by:	phk
2010-09-09 18:33:46 +00:00
andre
c239a236c1 Add uma_zone_get_max() to obtain the effective limit after a call
to uma_zone_set_max().

The UMA zone limit is not exactly set to the value supplied but
rounded up to completely fill the backing store increment (a page
normally).  This can lead to surprising situations where the number
of elements allocated from UMA is higher than the supplied limit
value.  The new get function reads back the effective value so that
the supplied limit value can be adjusted to the real limit.

Reviewed by:	jeffr
MFC after:	1 week
2010-08-16 14:24:00 +00:00
sbruno
3571902304 Add a new column to the output of vmstat -z to indicate the number
of times the system was forced to sleep when requesting a new allocation.

Expand the debugger hook, db_show_uma, to display these results as well.

This has proven to be very useful in out of memory situations when
it is not known why systems have become sluggish or fail in odd ways.

Reviewed by:	rwatson alc
Approved by:	scottl (mentor) peter
Obtained from:	Yahoo Inc.
2010-06-15 19:28:37 +00:00
jhb
9b74a62d73 Update several places that iterate over CPUs to use CPU_FOREACH(). 2010-06-11 18:46:34 +00:00
alc
0ec9744723 It makes more sense for the object-based backend allocator to use OBJT_PHYS
objects instead of OBJT_DEFAULT objects because we never reclaim or pageout
the allocated pages.  Moreover, they are mapped with pmap_qenter(), which
creates unmanaged mappings.

Reviewed by:	kib
2010-05-03 17:35:31 +00:00
kmacy
1dc1263413 On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib
2010-04-30 00:46:43 +00:00
antoine
bfd388c026 (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
alc
d26e62824b Add support for UMA_SLAB_KERNEL to page_free(). (While I'm here remove an
unnecessary newline character from the end of two panic messages.)
2009-06-18 07:27:11 +00:00
jeff
69d1bd8670 - Make the keg abstraction more complete. Permit a zone to have multiple
backend kegs so it may source compatible memory from multiple backends.
   This is useful for cases such as NUMA or different layouts for the same
   memory type.
 - Provide a new api for adding new backend kegs to secondary zones.
 - Provide a new flag for adjusting the layout of zones to stagger
   allocations better across cache lines.

Sponsored by:	Nokia
2009-01-25 09:11:24 +00:00
antoine
d4ceeb8daa Remove unused variable nosleepwithlocks.
PR:		126609
Submitted by:	Mateusz Guzik
MFC after:	1 month
X-MFC:		to stable/7 only, this variable is still used in stable/6
2008-08-23 12:40:07 +00:00
nwhitehorn
077618820d Allow the MD UMA allocator to use VM routines like kmem_*(). Existing code requires MD allocator to be available early in the boot process, before the VM is fully available. This defines a new VM define (UMA_MD_SMALL_ALLOC_NEEDS_VM) that allows an MD UMA small allocator to become available at the same time as the default UMA allocator.
Approved by:	marcel (mentor)
2008-08-23 01:35:36 +00:00
alc
067dba5f97 Reintroduce UMA_SLAB_KMAP; however, change its spelling to
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.)  UMA_SLAB_KERNEL is now required by the jumbo frame
allocators.  Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object.  In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT.  This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL.  This change teaches
UMA how to reset the object field to the kernel object.

Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks
2008-04-04 18:41:12 +00:00
jhb
a448c36f7e Allow recursion on the 'zones' internal UMA zone.
Submitted by:	thompsa
MFC after:	1 week
Approved by:	re (kensmith)
Discussed with:	jeff
2007-10-11 20:11:27 +00:00
attilio
7dd8ed88a9 Revert VMCNT_* operations introduction.
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.

Requested by: alc
Approved by: jeff (mentor)
2007-05-31 22:52:15 +00:00