75 Commits

Author SHA1 Message Date
markj
d0e4a8e28e Include _bitset.h to get BITSET_DEFINE, used to define struct slabbits.
MFC after:	1 week
2017-09-15 14:59:35 +00:00
markj
7d8d6899fc Widen uk_pgoff, the slab header offset field.
16 bits is only wide enough for kegs with an item size of up to 64KB.
At that size or larger, slab headers are typically offpage because the
item size is a multiple of the page size, but there is no requirement
that this be the case.

We can widen the field without affecting the layout of struct uma_keg
since the removal of uk_slabsize in r315077 left an adjacent hole.

PR:		218911
MFC after:	2 weeks
2017-09-13 21:54:37 +00:00
avg
de71736cdc uma: eliminate uk_slabsize field
The field was not used beyond the initial keg setup stage anyway.

MFC after:	1 month (if ever)
2017-03-11 16:35:36 +00:00
cperciva
e2e7efe985 Autotune the number of pages set aside for UMA startup based on the number
of CPUs present.  On amd64 this unbreaks the boot for systems with 92 or
more CPUs; the limit will vary on other systems depending on the size of
their uma_zone and uma_cache structures.

The major consumer of pages during UMA startup is the 19 zone structures
which are set up before UMA has bootstrapped itself sufficiently to use
the rest of the available memory:  UMA Slabs, UMA Hash, 4 / 6 / 8 / 12 /
16 / 32 / 64 / 128 / 256 Bucket, vmem btag, VM OBJECT, RADIX NODE, MAP,
KMAP ENTRY, MAP ENTRY, VMSPACE, and fakepg.  If the zone structures occupy
more than one page, they will not share pages and the number of pages
currently needed for startup is 19 * pages_per_zone + N, where N is the
number of pages used for allocating other structures; on amd64 N = 3 at
present (2 pages are allocated for UMA Kegs, and one page for UMA Hash).

This patch adds a new definition UMA_BOOT_PAGES_ZONES, currently set to 32,
and if a zone structure does not fit into a single page sets boot_pages to
UMA_BOOT_PAGES_ZONES * pages_per_zone instead of UMA_BOOT_PAGES (which
remains at 64).  Consequently this patch has no effect on systems where the
zone structure fits into 2 or fewer pages (on amd64, 59 or fewer CPUs), but
increases boot_pages sufficiently on systems where the large number of CPUs
makes this structure larger.  It seems safe to assume that systems with 60+
CPUs can afford to set aside an additional 128kB of memory per 32 CPUs.

The vm.boot_pages tunable continues to override this computation, but is
unlikely to be necessary in the future.

Tested on:	EC2 x1.32xlarge
Relnotes:	FreeBSD can now boot on 92+ CPU systems without requiring
		vm.boot_pages to be manually adjusted.
Reviewed by:	jeff, alc, adrian
Approved by:	re (kib)
2016-07-07 18:37:12 +00:00
pfg
8327dbfe4d sys/vm: minor spelling fixes in comments.
No functional change.
2016-05-02 20:16:29 +00:00
glebius
b8acb1ed4a Remove UMA_ZONE_REFCNT feature, now unused.
Blessed by:	jeff
2016-03-01 00:33:32 +00:00
glebius
b3c4f0ddbf Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a
requirement for uma_int.h.

Suggested by:	jhb
2016-02-09 20:22:35 +00:00
glebius
953ea03018 Redo r292484. Embed task(9) into zone, so that uz_maxaction is called
in a context that can sleep, allowing consumers of the KPI to run their
drain routines without any extra measures.

Discussed with:	jtl
2016-02-03 23:30:17 +00:00
jtl
94d8d1452b Add a safety net to reclaim mbufs when one of the mbuf zones become
exhausted.

It is possible for a bug in the code (or, theoretically, even unusual
network conditions) to exhaust all possible mbufs or mbuf clusters.
When this occurs, things can grind to a halt fairly quickly. However,
we currently do not call mb_reclaim() unless the entire system is
experiencing a low-memory condition.

While it is best to try to prevent exhaustion of one of the mbuf zones,
it would also be useful to have a mechanism to attempt to recover from
these situations by freeing "expendable" mbufs.

This patch makes two changes:

a) The patch adds a generic API to the UMA zone allocator to set a
function that should be called when an allocation fails because the
zone limit has been reached. Because of the way this function can be
called, it really should do minimal work.

b) The patch uses this API to try to free mbufs when an allocation
fails from one of the mbuf zones because the zone limit has been
reached. The function schedules a callout to run mb_reclaim().

Differential Revision:	https://reviews.freebsd.org/D3864
Reviewed by:	gnn
Comments by:	rrs, glebius
MFC after:	2 weeks
Sponsored by:	Juniper Networks
2015-12-20 02:05:33 +00:00
scottl
ede23391e8 Revert r281451. It causes a panic/hang early in boot for a number of
users, myself included.  The original code is likely papering over a
larger bug that needs to be explored, but for now get things back to
a working state.

Obtained from:	Netflix, Inc.
MFC after:	immediately
2015-04-24 17:03:53 +00:00
dchagin
02e5fc3da7 Rework r281162. Indeed, the flexible array member is preferable here.
Suggested by:   Justin T. Gibbs

MFC after:	3 days
2015-04-12 06:21:58 +00:00
rstone
57feb6fb43 Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int.  This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc.  zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision:	https://reviews.freebsd.org/D2106
Reviewed by:	kib
Reported by:	Michael Fuckner <michael@fuckner.net>
Tested by:	Michael Fuckner <michael@fuckner.net>
MFC after:	2 weeks
2015-04-01 12:42:26 +00:00
mav
bdb3c9c41b Implement soft pressure on UMA cache bucket sizes.
Every time system detects low memory condition decrease bucket sizes for
each zone by one item.  As result, higher memory pressure will push to
smaller bucket sizes and so smaller per-CPU caches and so more efficient
memory use.

Before this change there was no force to oppose buckets growth as result
of practically inevitable zone lock conflicts, and after some run time
per-CPU caches could consume enough RAM to kill the system.
2013-11-19 10:05:53 +00:00
kib
8ca067efb2 PG_SLAB no longer serves a useful purpose, since m->object is no
longer abused to store pointer to slab. Remove it.

Reviewed by:    alc
Sponsored by:   The FreeBSD Foundation
Approved by:	re (hrs)
2013-09-17 07:35:26 +00:00
kib
4675fcfce0 Different consumers of the struct vm_page abuse pageq member to keep
additional information, when the page is guaranteed to not belong to a
paging queue.  Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.

Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked.  See r141955 and
r253140 for examples.

Change the pageq member into a union containing explicitly-typed
members.  Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().

Requested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2013-08-10 17:36:42 +00:00
glebius
e0b9f3e4d8 Since r251709 a slab no longer use 8-bit indicies to manage items,
thus remove a stale comment.

Reviewed by:	jeff
2013-07-24 06:13:00 +00:00
jeff
4201cd7bd1 - Resolve bucket recursion issues by passing a cookie with zone flags
through bucket_alloc() to uma_zalloc_arg() and uma_zfree_arg().
 - Make some smaller buckets for large zones to further reduce memory
   waste.
 - Implement uma_zone_reserve().  This holds aside a number of items only
   for callers who specify M_USE_RESERVE.  buckets will never be filled
   from reserve allocations.

Sponsored by:	EMC / Isilon Storage Division
2013-06-26 00:57:38 +00:00
jeff
a6b6e4783c - Add a per-zone lock for zones without kegs.
- Be more explicit about zone vs keg locking.  This functionally changes
   almost nothing.
 - Add a size parameter to uma_zcache_create() so we can size the buckets.
 - Pass the zone to bucket_alloc() so it can modify allocation flags
   as appropriate.
 - Fix a bug in zone_alloc_bucket() where I missed an address of operator
   in a failure case.  (Found by pho)

Sponsored by:	EMC / Isilon Storage Division
2013-06-20 19:08:12 +00:00
jeff
cca9ad5b94 Refine UMA bucket allocation to reduce space consumption and improve
performance.

 - Always free to the alloc bucket if there is space.  This gives LIFO
   allocation order to improve hot-cache performance.  This also allows
   for zones with a single bucket per-cpu rather than a pair if the entire
   working set fits in one bucket.
 - Enable per-cpu caches of buckets.  To prevent recursive bucket
   allocation one bucket zone still has per-cpu caches disabled.
 - Pick the initial bucket size based on a table driven maximum size
   per-bucket rather than the number of items per-page.  This gives
   more sane initial sizes.
 - Only grow the bucket size when we face contention on the zone lock, this
   causes bucket sizes to grow more slowly.
 - Adjust the number of items per-bucket to account for the header space.
   This packs the buckets more efficiently per-page while making them
   not quite powers of two.
 - Eliminate the per-zone free bucket list.  Always return buckets back
   to the bucket zone.  This ensures that as zones grow into larger
   bucket sizes they eventually discard the smaller sizes.  It persists
   fewer buckets in the system.  The locking is slightly trickier.
 - Only switch buckets in zalloc, not zfree, this eliminates pathological
   cases where we ping-pong between two buckets.
 - Ensure that the thread that fills a new bucket gets to allocate from
   it to give a better upper bound on allocation time.

Sponsored by:	EMC / Isilon Storage Division
2013-06-18 04:50:20 +00:00
jeff
1980616f65 - Add a new UMA API: uma_zcache_create(). This makes a zone without any
backing memory that is only a container for per-cpu caches of arbitrary
   pointer items.  These zones have no kegs.
 - Convert the regular keg based allocator to use the new import/release
   functions.
 - Move some stats to be atomics since they would require excessive zone
   locking/unlocking with the new import/release paradigm.  Make
   zone_free_item simpler now that callers can manage more stats.
 - Check for these cache-only zones in the public APIs and debugging
   code by checking zone_first_keg() against NULL.

Sponsored by:	EMC / Isilong Storage Division
2013-06-17 03:43:47 +00:00
jeff
84a32e0176 - Convert the slab free item list from a linked array of indices to a
bitmap using sys/bitset.  This is much simpler, has lower space
   overhead and is cheaper in most cases.
 - Use a second bitmap for invariants asserts and improve the quality of
   the asserts as well as the number of erroneous conditions that we will
   catch.
 - Drastically simplify sizing code.  Special case refcnt zones since they
   will be going away.
 - Update stale comments.

Sponsored by:	EMC / Isilon Storage Division
2013-06-13 21:05:38 +00:00
glebius
204e3efd77 Convert UMA code to C99 uintXX_t types. 2013-04-09 17:43:48 +00:00
glebius
486eba7ad7 Swap us_freecount and us_flags, achieving same structure size
as before previous commit.

Submitted by:	alc
2013-04-09 17:25:15 +00:00
glebius
d0006df0df Since now we support 256 items per slab, we need more bits
for us_freecount.

This grows uma_slab_head on 32-bit arches, but growth isn't
significant. Taking kmem zones as example, only the 32 byte
zone is affected, ipers is reduced from 113 to 112.

In collaboration with:	kib
2013-04-09 15:15:52 +00:00
glebius
7f9db020a2 Merge from projects/counters: UMA_ZONE_PCPU zones.
These zones have slab size == sizeof(struct pcpu), but request from VM
enough pages to fit (uk_slabsize * mp_ncpus). An item allocated from such
zone would have a separate twin for each CPU in the system, and these twins
are at a distance of sizeof(struct pcpu) from each other. This magic value
of distance would allow us to make some optimizations later.

  To address private item from a CPU simple arithmetics should be used:

  item = (type *)((char *)base + sizeof(struct pcpu) * curcpu)

  These arithmetics are available as zpcpu_get() macro in pcpu.h.

  To introduce non-page size slabs a new field had been added to uma_keg
uk_slabsize. This shifted some frequently used fields of uma_keg to the
fourth cache line on amd64. To mitigate this pessimization, uma_keg fields
were a bit rearranged and least frequently used uk_name and uk_link moved
down to the fourth cache line. All other fields, that are dereferenced
frequently fit into first three cache lines.

Sponsored by:	Nginx, Inc.
2013-04-08 19:10:45 +00:00
attilio
cc89d0bd92 Merge from vmc-playground branch:
Replace the sub-optimal uma_zone_set_obj() primitive with more modern
uma_zone_reserve_kva().  The new primitive reserves before hand
the necessary KVA space to cater the zone allocations and allocates pages
with ALLOC_NOOBJ.  More specifically:
- uma_zone_reserve_kva() does not need an object to cater the backend
  allocator.
- uma_zone_reserve_kva() can cater M_WAITOK requests, in order to
  serve zones which need to do uma_prealloc() too.
- When possible, uma_zone_reserve_kva() uses directly the direct-mapping
  by uma_small_alloc() rather than relying on the KVA / offset
  combination.

The removal of the object attribute allows 2 further changes:
1) _vm_object_allocate() becomes static within vm_object.c
2) VM_OBJECT_LOCK_INIT() is removed.  This function is replaced by
   direct calls to mtx_init() as there is no need to export it anymore
   and the calls aren't either homogeneous anymore: there are now small
   differences between arguments passed to mtx_init().

Sponsored by:	EMC / Isilon storage division
Reviewed by:	alc (which also offered almost all the comments)
Tested by:	pho, jhb, davide
2013-02-26 23:35:27 +00:00
glebius
1292747048 Comment fix: there is no ub_ptr, instead explain meaning of uz_count
field verbally.
2012-12-21 10:09:45 +00:00
pjd
a585ca9ec8 Implemented uma_zone_set_warning(9) function that sets a warning, which
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.

All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.

Discussed on:	arch
Obtained from:	WHEEL Systems
MFC after:	2 weeks
2012-12-07 22:27:13 +00:00
mdf
1bc1b805d7 Const-ify the zone name argument to uma_zcreate(9).
MFC after:	3 days
2012-10-26 17:51:05 +00:00
alc
a899800e2a 1. Prior to r214782, UMA did not support multipage allocations before
uma_startup2() was called.  Thus, setting the variable "booted" to true in
uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because
any allocations made after uma_startup() but before uma_startup2() could be
satisfied by uma_small_alloc().  Now, however, some multipage allocations
are necessary before uma_startup2() just to allocate zone structures on
machines with a large number of processors.  Thus, a Boolean can no longer
effectively describe the state of the UMA allocator.  Instead, make "booted"
have three values to describe how far initialization has progressed.  This
allows multipage allocations to continue using startup_alloc() until
uma_startup2(), but single-page allocations may begin using
uma_small_alloc() after uma_startup().

2. With the aforementioned change, only a modest increase in boot pages is
necessary to boot UMA on a large number of processors.

3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM.  It has only been used between
r182028 and r204128.

Reviewed by:	attilio [1], nwhitehorn [3]
Tested by:	sbruno
2011-05-21 17:43:43 +00:00
alc
3c08fd7d05 Fix spelling errors. 2011-05-20 17:28:00 +00:00
sbruno
3571902304 Add a new column to the output of vmstat -z to indicate the number
of times the system was forced to sleep when requesting a new allocation.

Expand the debugger hook, db_show_uma, to display these results as well.

This has proven to be very useful in out of memory situations when
it is not known why systems have become sluggish or fail in odd ways.

Reviewed by:	rwatson alc
Approved by:	scottl (mentor) peter
Obtained from:	Yahoo Inc.
2010-06-15 19:28:37 +00:00
kmacy
122090fb7e - enable alignment on amd64 only
- only align pcpu caches and the volatile portion of uma_zone
2010-03-22 22:39:32 +00:00
kmacy
377858b3ae turn 205266 in to a no-op until the problem can be properly diagnosed 2010-03-18 20:30:25 +00:00
kmacy
4e6ab892f5 Cache line align various structures and move volatile counters to
not share a cache line with (mostly) immutable state

Reviewed by:	jeff@
MFC after:	7 days
2010-03-17 21:18:28 +00:00
antoine
646097b80a Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros.
MFC after:	1 month
2009-12-05 17:45:56 +00:00
jeff
69d1bd8670 - Make the keg abstraction more complete. Permit a zone to have multiple
backend kegs so it may source compatible memory from multiple backends.
   This is useful for cases such as NUMA or different layouts for the same
   memory type.
 - Provide a new api for adding new backend kegs to secondary zones.
 - Provide a new flag for adjusting the layout of zones to stagger
   allocations better across cache lines.

Sponsored by:	Nokia
2009-01-25 09:11:24 +00:00
rwatson
e8ccc2cbbd Update stale comment on protecting UMA per-CPU caches: we now use
critical sections rather than mutexes.
2007-05-09 22:53:34 +00:00
rwatson
65053d65ad Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can be
used from memstat_uma.c for the purposes of kvm access without lots
of additional unsafe includes.

MFC after:	3 days
2005-08-04 10:03:53 +00:00
rwatson
4ead3022dd Improve canonicalization of copyrights. Order copyrights by order of
assertion (jeff, bmilekic, rwatson).

Suggested ages ago by:	bde
MFC after:		1 week
2005-07-16 09:51:52 +00:00
silby
24f7a1c3d6 Increase the flags field for kegs from a 16 to a 32 bit value;
we have exhausted all 16 flags.
2005-07-16 02:23:41 +00:00
rwatson
87bbce9e08 Track UMA(9) allocation failures by zone, and export via sysctl.
Requested by:	victor cruceru <victor dot cruceru at gmail dot com>
MFC after:	1 week
2005-07-15 23:34:39 +00:00
rwatson
83343a94ec Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocator
statistics via a binary structure stream:

- Add structure 'uma_stream_header', which defines a stream version,
  definition of MAXCPUs used in the stream, and the number of zone
  records in the stream.

- Add structure 'uma_type_header', which defines the name, alignment,
  size, resource allocation limits, current pages allocated, preferred
  bucket size, and central zone + keg statistics.

- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
  includes the number of allocations and frees, as well as the number
  of free items in the cache.

- When the sysctl is queried, return a stream header, followed by a
  series of type descriptions, each consisting of a type header
  followed by a series of MAXCPUs uma_percpu_stat structures holding
  per-CPU allocation information.  Typical values of MAXCPU will be
  1 (UP compiled kernel) and 16 (SMP compiled kernel).

This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.

While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.

A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days.  This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.

MFC after:	1 week
2005-07-14 16:35:13 +00:00
rwatson
c24543fa50 In addition to tracking allocs in the zone, also track frees. Add
a zone free counter, as well as a cache free counter.

MFC after:	1 week
2005-07-14 16:17:21 +00:00
alc
67602b23a9 Increase UMA_BOOT_PAGES to prevent a crash during initialization. See
http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed
description of the crash.

Reported by: Eric Anderson
Approved by: re (scottl)
MFC after: 3 days
2005-06-16 17:06:34 +00:00
rwatson
bb1e0b257a Modify UMA to use critical sections to protect per-CPU caches, rather than
mutexes, which offers lower overhead on both UP and SMP.  When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks.  We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access.  In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.

Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.

Reviewed by:	bmilekic, jeff (somewhat)
Tested by:	rwatson, kris, gnn, scottl, mike at sentex dot net, others
2005-04-29 18:56:36 +00:00
bmilekic
f9dded75d0 Well, it seems that I pre-maturely removed the "All rights reserved"
statement from some files, so re-add it for the moment, until the
related legalese is sorted out.  This change affects:

sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
2005-02-16 21:45:59 +00:00
imp
f0bf889d0d /* -> /*- for license, minor formatting changes 2005-01-07 02:29:27 +00:00
bmilekic
764e80eed7 Add my copyright and update Jeff's copyright on UMA source files,
as per his request.

Discussed with: Jeffrey Roberson
2004-12-26 00:35:12 +00:00
cognet
f788045cc2 Remove useless casts. 2004-11-26 15:04:26 +00:00