This is a last resort for very low memory condition in case other measures
to free memory were ineffective. Sequentially cycle through all CPUs and
extract per-CPU cache buckets into zone cache from where they can be freed.
Lock congestion is the same, whether it happens on alloc or free, so
handle it equally. Now that we have back pressure, there is no problem
to grow buckets a bit faster. Any way growth is much slower then in 9.x.
These new buckets make bucket size self-tuning more soft and precise.
Without them there are buckets for 1, 5, 13, 29, ... items. While at
bigger sizes difference about 2x is fine, at smallest ones it is 5x and
2.6x respectively. New buckets make that line look like 1, 3, 5, 9, 13,
29, reducing jumps between steps, making algorithm work softer, allocating
and freeing memory in better fitting chunks. Otherwise there is quite a
big gap between allocating 128K and 5x128K of RAM at once.
Every time system detects low memory condition decrease bucket sizes for
each zone by one item. As result, higher memory pressure will push to
smaller bucket sizes and so smaller per-CPU caches and so more efficient
memory use.
Before this change there was no force to oppose buckets growth as result
of practically inevitable zone lock conflicts, and after some run time
per-CPU caches could consume enough RAM to kill the system.
The consequence of the bug is that fini calls are not done
when a slab is freed by a call-back from the page daemon.
It went unnoticed for two months because fini is little used.
I spotted the bug while reading the code to learn how it works
so I could write it up for the next edition of the Design and
Implementation of FreeBSD book.
No MFC needed as this code exists only in HEAD.
Reviewed by: kib, jeff
Tested by: pho
additional information, when the page is guaranteed to not belong to a
paging queue. Usually, this results in a lot of type casts which make
reasoning about the code correctness harder.
Sometimes m->object is used instead of pageq, which could cause real
and confusing bugs if non-NULL m->object is leaked. See r141955 and
r253140 for examples.
Change the pageq member into a union containing explicitly-typed
members. Use them instead of type-punning or abusing m->object in x86
pmaps, uma and vm_page_alloc_contig().
Requested and reviewed by: alc
Sponsored by: The FreeBSD Foundation
transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
through bucket_alloc() to uma_zalloc_arg() and uma_zfree_arg().
- Make some smaller buckets for large zones to further reduce memory
waste.
- Implement uma_zone_reserve(). This holds aside a number of items only
for callers who specify M_USE_RESERVE. buckets will never be filled
from reserve allocations.
Sponsored by: EMC / Isilon Storage Division
- Be more explicit about zone vs keg locking. This functionally changes
almost nothing.
- Add a size parameter to uma_zcache_create() so we can size the buckets.
- Pass the zone to bucket_alloc() so it can modify allocation flags
as appropriate.
- Fix a bug in zone_alloc_bucket() where I missed an address of operator
in a failure case. (Found by pho)
Sponsored by: EMC / Isilon Storage Division
performance.
- Always free to the alloc bucket if there is space. This gives LIFO
allocation order to improve hot-cache performance. This also allows
for zones with a single bucket per-cpu rather than a pair if the entire
working set fits in one bucket.
- Enable per-cpu caches of buckets. To prevent recursive bucket
allocation one bucket zone still has per-cpu caches disabled.
- Pick the initial bucket size based on a table driven maximum size
per-bucket rather than the number of items per-page. This gives
more sane initial sizes.
- Only grow the bucket size when we face contention on the zone lock, this
causes bucket sizes to grow more slowly.
- Adjust the number of items per-bucket to account for the header space.
This packs the buckets more efficiently per-page while making them
not quite powers of two.
- Eliminate the per-zone free bucket list. Always return buckets back
to the bucket zone. This ensures that as zones grow into larger
bucket sizes they eventually discard the smaller sizes. It persists
fewer buckets in the system. The locking is slightly trickier.
- Only switch buckets in zalloc, not zfree, this eliminates pathological
cases where we ping-pong between two buckets.
- Ensure that the thread that fills a new bucket gets to allocate from
it to give a better upper bound on allocation time.
Sponsored by: EMC / Isilon Storage Division
backing memory that is only a container for per-cpu caches of arbitrary
pointer items. These zones have no kegs.
- Convert the regular keg based allocator to use the new import/release
functions.
- Move some stats to be atomics since they would require excessive zone
locking/unlocking with the new import/release paradigm. Make
zone_free_item simpler now that callers can manage more stats.
- Check for these cache-only zones in the public APIs and debugging
code by checking zone_first_keg() against NULL.
Sponsored by: EMC / Isilong Storage Division
bitmap using sys/bitset. This is much simpler, has lower space
overhead and is cheaper in most cases.
- Use a second bitmap for invariants asserts and improve the quality of
the asserts as well as the number of erroneous conditions that we will
catch.
- Drastically simplify sizing code. Special case refcnt zones since they
will be going away.
- Update stale comments.
Sponsored by: EMC / Isilon Storage Division
These zones have slab size == sizeof(struct pcpu), but request from VM
enough pages to fit (uk_slabsize * mp_ncpus). An item allocated from such
zone would have a separate twin for each CPU in the system, and these twins
are at a distance of sizeof(struct pcpu) from each other. This magic value
of distance would allow us to make some optimizations later.
To address private item from a CPU simple arithmetics should be used:
item = (type *)((char *)base + sizeof(struct pcpu) * curcpu)
These arithmetics are available as zpcpu_get() macro in pcpu.h.
To introduce non-page size slabs a new field had been added to uma_keg
uk_slabsize. This shifted some frequently used fields of uma_keg to the
fourth cache line on amd64. To mitigate this pessimization, uma_keg fields
were a bit rearranged and least frequently used uk_name and uk_link moved
down to the fourth cache line. All other fields, that are dereferenced
frequently fit into first three cache lines.
Sponsored by: Nginx, Inc.
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.
The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho
Replace the sub-optimal uma_zone_set_obj() primitive with more modern
uma_zone_reserve_kva(). The new primitive reserves before hand
the necessary KVA space to cater the zone allocations and allocates pages
with ALLOC_NOOBJ. More specifically:
- uma_zone_reserve_kva() does not need an object to cater the backend
allocator.
- uma_zone_reserve_kva() can cater M_WAITOK requests, in order to
serve zones which need to do uma_prealloc() too.
- When possible, uma_zone_reserve_kva() uses directly the direct-mapping
by uma_small_alloc() rather than relying on the KVA / offset
combination.
The removal of the object attribute allows 2 further changes:
1) _vm_object_allocate() becomes static within vm_object.c
2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by
direct calls to mtx_init() as there is no need to export it anymore
and the calls aren't either homogeneous anymore: there are now small
differences between arguments passed to mtx_init().
Sponsored by: EMC / Isilon storage division
Reviewed by: alc (which also offered almost all the comments)
Tested by: pho, jhb, davide
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.
All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.
Discussed on: arch
Obtained from: WHEEL Systems
MFC after: 2 weeks
low memory situation. I've observed a situation where per-CPU
allocations were disabled while there were enough free cached pages.
Basically, cnt.v_free_count was sitting stable at a value lower
than cnt.v_free_min and that caused massive performance drop.
Reviewed by: alc
MFC after: 1 week
excluding other allocations including UMA now entails the addition of
a single flag to kmem_alloc or uma zone create
Reviewed by: alc, avg
MFC after: 2 weeks
uma_startup2() was called. Thus, setting the variable "booted" to true in
uma_startup() was ok on machines with UMA_MD_SMALL_ALLOC defined, because
any allocations made after uma_startup() but before uma_startup2() could be
satisfied by uma_small_alloc(). Now, however, some multipage allocations
are necessary before uma_startup2() just to allocate zone structures on
machines with a large number of processors. Thus, a Boolean can no longer
effectively describe the state of the UMA allocator. Instead, make "booted"
have three values to describe how far initialization has progressed. This
allows multipage allocations to continue using startup_alloc() until
uma_startup2(), but single-page allocations may begin using
uma_small_alloc() after uma_startup().
2. With the aforementioned change, only a modest increase in boot pages is
necessary to boot UMA on a large number of processors.
3. Retire UMA_MD_SMALL_ALLOC_NEEDS_VM. It has only been used between
r182028 and r204128.
Reviewed by: attilio [1], nwhitehorn [3]
Tested by: sbruno
sbuf_new_for_sysctl(9). This allows using an sbuf with a SYSCTL_OUT
drain for extremely large amounts of data where the caller knows that
appropriate references are held, and sleeping is not an issue.
Inspired by: rwatson
zones whose objects are larger than a page to use startup_alloc(). This
allows allocation of zone objects during early boot on machines with a large
number of CPUs since the resulting zone objects are larger than a page.
Submitted by: trema
Reviewed by: attilio
MFC after: 1 week
rounding. The same value can also be obtained with uma_zone_get_max, but this
change avoids a caller having to make two back-to-back calls.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
- Add uma_zone_get_cur which returns the current approximate occupancy of
a zone. This is useful for providing stats via sysctl amongst other things.
Sponsored by: FreeBSD Foundation
Reviewed by: gnn, jhb
MFC after: 2 weeks
Add a drain function for struct sysctl_req, and use it for a variety
of handlers, some of which had to do awkward things to get a large
enough SBUF_FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing
NUL byte. This behaviour was preserved, though it should not be
necessary.
Reviewed by: phk (original patch)
unexpected things in copyout(9) and so wiring the user buffer is not
sufficient to perform a copyout(9) while holding a random mutex.
Requested by: nwhitehorn
handlers, some of which had to do awkward things to get a large enough
FIXEDLEN buffer.
Note that some sysctl handlers were explicitly outputting a trailing NUL
byte. This behaviour was preserved, though it should not be necessary.
Reviewed by: phk
to uma_zone_set_max().
The UMA zone limit is not exactly set to the value supplied but
rounded up to completely fill the backing store increment (a page
normally). This can lead to surprising situations where the number
of elements allocated from UMA is higher than the supplied limit
value. The new get function reads back the effective value so that
the supplied limit value can be adjusted to the real limit.
Reviewed by: jeffr
MFC after: 1 week
of times the system was forced to sleep when requesting a new allocation.
Expand the debugger hook, db_show_uma, to display these results as well.
This has proven to be very useful in out of memory situations when
it is not known why systems have become sluggish or fail in odd ways.
Reviewed by: rwatson alc
Approved by: scottl (mentor) peter
Obtained from: Yahoo Inc.
objects instead of OBJT_DEFAULT objects because we never reclaim or pageout
the allocated pages. Moreover, they are mapped with pmap_qenter(), which
creates unmanaged mappings.
Reviewed by: kib
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.
Supported by: Bitgravity Inc.
Discussed with: alc, jeffr, and kib
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month
backend kegs so it may source compatible memory from multiple backends.
This is useful for cases such as NUMA or different layouts for the same
memory type.
- Provide a new api for adding new backend kegs to secondary zones.
- Provide a new flag for adjusting the layout of zones to stagger
allocations better across cache lines.
Sponsored by: Nokia
UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM.
(UMA_SLAB_KMAP met its original demise in revision 1.30 of
vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame
allocators. Without it, UMA cannot correctly return pages from the
jumbo frame zones to the VM system because it resets the pages' object
field to NULL instead of the kernel object. In more detail, the jumbo
frame zones are created with the option UMA_ZONE_REFCNT. This causes
UMA to overwrite the pages' object field with the address of the slab.
However, when UMA wants to release these pages, it doesn't know how to
restore the object field, so it sets it to NULL. This change teaches
UMA how to reset the object field to the kernel object.
Crashes reported by: kris
Fix tested by: kris
Fix discussed with: jeff
MFC after: 6 weeks
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
boot by MD code to indicated detected alignment preference. Rather than
cache alignment being encoded in UMA consumers by defining a global
alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now
a special value (-1) that causes UMA to look at registered alignment. If
no preferred alignment has been selected by MD code, a default alignment
of (16 - 1) will be used.
Currently, no hardware platforms specify alignment; architecture
maintainers will need to modify MD startup code to specify an alignment
if desired. This must occur before initialization of UMA so that all UMA
zones pick up the requested alignment.
Reviewed by: jeff, alc
Submitted by: attilio
zone. Cluster allocations fail when this happens. Also processes that may have
blocked on cluster allocations will never be woken up. Thanks to rwatson for
an overview of the issue and pointers to the mbuma paper and his tool to dump
out UMA zones.
Reviewed by: andre@
maxpages on a zone is woken up, with the rest never being woken up as
a result of the ZFLAG_FULL flag being cleared. Wakeup all such blocked
procsses instead. This change introduces a thundering herd, but since
this should be relatively infrequent, optimizing this (by introducing
a count of blocked processes, for example) may be premature.
Reviewd by: ups@
allocations were made using improper flags in interrupt context.
Replace with a simple WITNESS warning call. This restores the
invariant that M_WAITOK allocations will always succeed or die
horribly trying, which is relied on by many UMA consumers.
MFC after: 3 weeks
Discussed with: jhb
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.
MFC after: 1 week
Reviwed by: alc
libmemstat(3) is used by vmstat (and friends) to produce more accurate
and more detailed statistics information in a machine-readable way,
and vmstat continues to provide the same text-based front-end.
This change should not be MFC'd.
report this as an allocation failure for the item type. The failure
will be separately recorded with the bucket type. This my eliminate
high mbuf allocation failure counts under some circumstances, which
can be alarming in appearance, but not actually a problem in
practice.
MFC after: 2 weeks
Reported by: ps, Peter J. Blok <pblok at bsd4all dot org>,
OxY <oxy at field dot hu>,
Gabor MICSKO <gmicskoa at szintezis dot hu>
The difference between WITNESS_CHECK() and WITNESS_WARN() is that
WITNESS_CHECK() should be used in the places that the return value of
witness_warn() is checked, whereas WITNESS_WARN() should be used in places
where the return value is ignored. Specifically, in a kernel without
WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as
WITNESS_CHECK evaluates to 0. I also updated the one place that was
checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
UMA boot pages.
Disable recursion on the general UMA lock now that startup_alloc() no
longer uses it.
Eliminate the variable uma_boot_free. It serves no purpose.
Note: This change eliminates a lock-order reversal between a system
map mutex and the UMA lock. See
http://sources.zabbadoz.net/freebsd/lor.html#109 for details.
MFC after: 3 days
monitoring API, which might or might not be the same as the internal
maximum (currently none).
Export flag information on UMA zones -- in particular, whether or
not this is a secondary zone, and so the keg free count should be
considered in that light.
MFC after: 1 day
- Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to
to update the zone's uz_frees statistic. Previously, the statistic was
updated unconditionally.
- Use the flag in situations where a "real" free occurs: i.e., one where
the caller is freeing an allocated item, to be differentiated from
situations where uma_zfree_internal() is used to tear down the item
during slab teardown in order to invoke its fini() method. Also use
the flag when UMA is freeing its internal objects.
- When exchanging a bucket with the zone from the per-CPU cache when
freeing an item, flush cache statistics back to the zone (since the
zone lock and critical section are both held) to match the allocation
case.
MFC after: 3 days
per-CPU cache statistics. UMA sizes the cache array based on the
number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU
could read off the end of the array (into the next zone).
Reported by: yongari
MFC after: 1 week
it covers the following of the uc_alloc/freebucket cache pointers.
Originally, I felt that the race wasn't helped by holding the mutex,
hence a comment in the code and not holding it across the cache access.
However, it does improve consistency, as while it doesn't prevent
bucket exchange, it does prevent bucket pointer invalidation. So a
race in gathering cache free space statistics still can occur, but not
one that follows an invalid bucket pointer, if the mutex is held.
Submitted by: yongari
MFC after: 1 week
statistics via a binary structure stream:
- Add structure 'uma_stream_header', which defines a stream version,
definition of MAXCPUs used in the stream, and the number of zone
records in the stream.
- Add structure 'uma_type_header', which defines the name, alignment,
size, resource allocation limits, current pages allocated, preferred
bucket size, and central zone + keg statistics.
- Add structure 'uma_percpu_stat', which, for each per-CPU cache,
includes the number of allocations and frees, as well as the number
of free items in the cache.
- When the sysctl is queried, return a stream header, followed by a
series of type descriptions, each consisting of a type header
followed by a series of MAXCPUs uma_percpu_stat structures holding
per-CPU allocation information. Typical values of MAXCPU will be
1 (UP compiled kernel) and 16 (SMP compiled kernel).
This query mechanism allows user space monitoring tools to extract
memory allocation statistics in a machine-readable form, and to do so
at a per-CPU granularity, allowing monitoring of allocation patterns
across CPUs in order to better understand the distribution of work and
memory flow over multiple CPUs.
While here, also export the number of UMA zones as a sysctl
vm.uma_count, in order to assist in sizing user swpace buffers to
receive the stream.
A follow-up commit of libmemstat(3), a library to monitor kernel memory
allocation, will occur in the next few days. This change directly
supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats
rather than separately maintained mbuf allocator statistics.
MFC after: 1 week
zone whenever it was moving buckets between the zone and the cache,
or when coalescing statistics across the CPU. Remove flushing of
statistics to the zone when coalescing statistics as part of sysctl,
as we won't be running on the right CPU to write to the cache
statistics.
Add a missed gathering of statistics: when uma_zalloc_internal()
does a special case allocation of a single item, make sure to update
the zone statistics to represent this. Previously this case wasn't
accounted for in user-visible statistics.
MFC after: 1 week
mutexes, which offers lower overhead on both UP and SMP. When allocating
from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
no longer perform any mutex operations, which offers a 1%-3% performance
improvement in a variety of micro-benchmarks. We rely on critical
sections to prevent (a) preemption resulting in reentrant access to UMA on
a single CPU, and (b) migration of the thread during access. In the event
we need to go back to the zone for a new bucket, we release the critical
section to acquire the global zone mutex, and must re-acquire the critical
section and re-evaluate which cache we are accessing in case migration has
occured, or circumstances have changed in the current cache.
Per-CPU cache statistics are now gathered lock-free by the sysctl, which
can result in small races in statistics reporting for caches.
Reviewed by: bmilekic, jeff (somewhat)
Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others
statement from some files, so re-add it for the moment, until the
related legalese is sorted out. This change affects:
sys/kern/kern_mbuf.c
sys/vm/memguard.c
sys/vm/memguard.h
sys/vm/uma.h
sys/vm/uma_core.c
sys/vm/uma_dbg.c
sys/vm/uma_dbg.h
sys/vm/uma_int.h
UMA_ZONE_REFCNT and UMA_ZONE_MALLOC zones, as the page(s) undoubtedly
came from kmem_map for those two. Previously it would set it back
to NULL for UMA_ZONE_REFCNT zones and although this was probably not
fatal, it added MORE code for no reason.
recursion from the VM is handled (and the calling code that allocates
buckets knows how to deal with it), we do not want to prevent allocation
from the slab header zones (slabzone and slabrefzone) if uk_recurse is
not zero for them. The reason is that it could lead to NULL being
returned for the slab header allocations even in the M_WAITOK
case, and the caller can't handle that (this is also explained in a
comment with this commit).
The problem analysis is documented in our mailing lists:
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current
(see entire thread for proper context).
Crash dump data provided by: Peter Holm <peter@holm.cc>
number of entries into bucket_zone_lookup(), which helps make more
clear the logic of consumers of bucket zones.
Annotate the behavior of bucket_init() with a comment indicating
how the various data structures, including the bucket lookup tables,
are initialized.
IF INVARIANTS is defined, and in the rare case that we have
allocated some objects from the slab and at least one initializer
on at least one of those objects failed, and we need to fail the
allocation and push the uninitialized items back into the slab
caches -- in that scenario, we would fail to [re]set the
bucket cache's ub_bucket item references to NULL, which would
eventually trigger a KASSERT.
state management corruption, mbuf leaks, general mbuf corruption,
and at least on i386 a first level splash damage radius that
encompasses up to about half a megabyte of the memory after
an mbuf cluster's allocation slab. In short, this has caused
instability nightmares anywhere the right kind of network traffic
is present.
When the polymorphic refcount slabs were added to UMA, the new types
were not used pervasively. In particular, the slab management
structure was turned into one for refcounts, and one for non-refcounts
(supposed to be mostly like the old slab management structure),
but the latter was almost always used through out. In general, every
access to zones with UMA_ZONE_REFCNT turned on corrupted the
"next free" slab offset offset and the refcount with each other and
with other allocations (on i386, 2 mbuf clusters per 4096 byte slab).
Fix things so that the right type is used to access refcounted zones
where it was not before. There are additional errors in gross
overestimation of padding, it seems, that would cause a large kegs
(nee zones) to be allocated when small ones would do. Unless I have
analyzed this incorrectly, it is not directly harmful.
so that they know whether the allocation is supposed to be able to sleep
or not.
* Allow uma_zone constructors and initialation functions to return either
success or error. Almost all of the ones in the tree currently return
success unconditionally, but mbuf is a notable exception: the packet
zone constructor wants to be able to fail if it cannot suballocate an
mbuf cluster, and the mbuf allocators want to be able to fail in general
in a MAC kernel if the MAC mbuf initializer fails. This fixes the
panics people are seeing when they run out of memory for mbuf clusters.
* Allow debug.nosleepwithlocks on WITNESS to be disabled, without changing
the default.
Both bmilekic and jeff have reviewed the changes made to make failable
zone allocations work.