uma_startup2() was called. Thus, setting the variable "booted" to true in
uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because
any allocations made after uma_startup() but before uma_startup2() could be
satisfied by uma_small_alloc(). Now, however, some multipage allocations
are necessary before uma_startup2() just to allocate zone structures on
machines with a large number of processors. Thus, a Boolean can no longer
effectively describe the state of the UMA allocator. Instead, make "booted"
have three values to describe how far initialization has progressed. This
allows multipage allocations to continue using startup_alloc() until
uma_startup2(), but single-page allocations may begin using
uma_small_alloc() after uma_startup().
2. With the aforementioned change, only a modest increase in boot pages is
necessary to boot UMA on a large number of processors.
3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM. It has only been used between
r182028 and r204128.
Reviewed by: attilio [1], nwhitehorn [3]
Tested by: sbruno
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
zones whose objects are larger than a page to use startup_alloc(). This
allows allocation of zone objects during early boot on machines with a large
number of CPUs since the resulting zone objects are larger than a page.
Submitted by: trema
Reviewed by: attilio
MFC after: 1 week
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
MFC after: 2 weeks
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.
Reviewed by: phk (original patch)
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.
Requested by: nwhitehorn
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.
Reviewed by: phk
to uma_zone_set_max().
The UMA zone limit is not exactly set to the value supplied but
rounded up to completely fill the backing store increment (a page
normally). This can lead to surprising situations where the number
of elements allocated from UMA is higher than the supplied limit
value. The new get function reads back the effective value so that
the supplied limit value can be adjusted to the real limit.
Reviewed by: jeffr
MFC after: 1 week
of times the system was forced to sleep when requesting a new allocation.
Expand the debugger hook, db_show_uma, to display these results as well.
This has proven to be very useful in out of memory situations when
it is not known why systems have become sluggish or fail in odd ways.
Reviewed by: rwatson alc
Approved by: scottl (mentor) peter
Obtained from: Yahoo Inc.
objects instead of OBJT_DEFAULT objects because we never reclaim or pageout
the allocated pages. Moreover, they are mapped with pmap_qenter(), which
creates unmanaged mappings.
Reviewed by: kib
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month
backend kegs so it may source compatible memory from multiple backends.
This is useful for cases such as NUMA or different layouts for the same
memory type.
- Provide a new api for adding new backend kegs to secondary zones.
- Provide a new flag for adjusting the layout of zones to stagger
allocations better across cache lines.
Sponsored by: Nokia
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.
Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
boot by MD code to indicated detected alignment preference. Rather than
cache alignment being encoded in UMA consumers by defining a global
alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now
a special value (-1) that causes UMA to look at registered alignment. If
no preferred alignment has been selected by MD code, a default alignment
of (16 - 1) will be used.
Currently, no hardware platforms specify alignment; architecture
maintainers will need to modify MD startup code to specify an alignment
if desired. This must occur before initialization of UMA so that all UMA
zones pick up the requested alignment.
Reviewed by: jeff, alc
Submitted by: attilio
zone. Cluster allocations fail when this happens. Also processes that may have
blocked on cluster allocations will never be woken up. Thanks to rwatson for
an overview of the issue and pointers to the mbuma paper and his tool to dump
out UMA zones.
Reviewed by: andre@
maxpages on a zone is woken up, with the rest never being woken up as
a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked
procsses instead. This change introduces a thundering herd, but since
this should be relatively infrequent, optimizing this (by introducing
a count of blocked processes, for example) may be premature.
Reviewd by: ups@
allocations were made using improper flags in interrupt context.
Replace with a simple WITNESS warning call. This restores the
invariant that M_WAITOK allocations will always succeed or die
horribly trying, which is relied on by many UMA consumers.
MFC after: 3 weeks
Discussed with: jhb
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.
MFC after: 1 week
Reviwed by: alc
libmemstat(3) is used by vmstat (and friends) to produce more accurate
and more detailed statistics information in a machine-readable way,
and vmstat continues to provide the same text-based front-end.
This change should not be MFC'd.
report this as an allocation failure for the item type. The failure
will be separately recorded with the bucket type. This my eliminate
high mbuf allocation failure counts under some circumstances, which
can be alarming in appearance, but not actually a problem in
practice.
MFC after: 2 weeks
Reported by: ps, Peter J. Blok <pblok at bsd4all dot org>,
OxY <oxy at field dot hu>,
Gabor MICSKO <gmicskoa at szintezis dot hu>
The difference between WITNESS_CHECK() and WITNESS_WARN() is that
WITNESS_CHECK() should be used in the places that the return value of
witness_warn() is checked, whereas WITNESS_WARN() should be used in places
where the return value is ignored. Specifically, in a kernel without
WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as
WITNESS_CHECK evaluates to 0. I also updated the one place that was
checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
UMA boot pages.
Disable recursion on the general UMA lock now that startup_alloc() no
longer uses it.
Eliminate the variable uma_boot_free. It serves no purpose.
Note: This change eliminates a lock-order reversal between a system
map mutex and the UMA lock. See
http://sources.zabbadoz.net/freebsd/lor.html#109 for details.
MFC after: 3 days
monitoring API, which might or might not be the same as the internal
maximum (currently none).
Export flag information on UMA zones -- in particular, whether or
not this is a secondary zone, and so the keg free count should be
considered in that light.
MFC after: 1 day
- Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to
to update the zone's uz_frees statistic. Previously, the statistic was
updated unconditionally.
- Use the flag in situations where a "real" free occurs: i.e., one where
the caller is freeing an allocated item, to be differentiated from
situations where uma_zfree_internal() is used to tear down the item
during slab teardown in order to invoke its fini() method. Also use
the flag when UMA is freeing its internal objects.
- When exchanging a bucket with the zone from the per-CPU cache when
freeing an item, flush cache statistics back to the zone (since the
zone lock and critical section are both held) to match the allocation
case.
MFC after: 3 days
per-CPU cache statistics. UMA sizes the cache array based on the
number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
could read off the end of the array (into the next zone).
Reported by: yongari
MFC after: 1 week
it covers the following of the uc_alloc/freebucket cache pointers.
Originally, I felt that the race wasn't helped by holding the mutex,
hence a comment in the code and not holding it across the cache access.
However, it does improve consistency, as while it doesn't prevent
bucket exchange, it does prevent bucket pointer invalidation. So a
race in gathering cache free space statistics still can occur, but not
one that follows an invalid bucket pointer, if the mutex is held.
Submitted by: yongari
MFC after: 1 week