Commit Graph

16 Commits

Author SHA1 Message Date
cem
493fc2338c subr_vmem: Fix double-free in error case of vmem_create
If vmem_init() fails, 'vm' is already destroyed and freed.  Don't free it
again.

Reported by:	Coverity
CID:		1042110
Sponsored by:	EMC / Isilon Storage Division
2016-05-11 23:16:11 +00:00
jkim
318c4f97e6 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
mav
31390ca675 Add vmem locking to r281026.
While races there are not fatal, they cause result underestimation, that
cause unneeded ARC reclaims.

MFC after:	1 month
2015-04-05 14:17:26 +00:00
mav
326429ebdd Make ZFS ARC track both KVA usage and fragmentation.
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise).  FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
 - restore KVA usage limitations in a way the most close to Illumos;
 - introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient.  On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk.  Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again. :)

MFC after:	1 month
2015-04-03 14:45:48 +00:00
rstone
57feb6fb43 Fix integer truncation bug in malloc(9)
A couple of internal functions used by malloc(9) and uma truncated
a size_t down to an int.  This could cause any number of issues
(e.g. indefinite sleeps, memory corruption) if any kernel
subsystem tried to allocate 2GB or more through malloc.  zfs would
attempt such an allocation when run on a system with 2TB or more
of RAM.

Note to self: When this is MFCed, sparc64 needs the same fix.

Differential revision:	https://reviews.freebsd.org/D2106
Reviewed by:	kib
Reported by:	Michael Fuckner <michael@fuckner.net>
Tested by:	Michael Fuckner <michael@fuckner.net>
MFC after:	2 weeks
2015-04-01 12:42:26 +00:00
mav
eda1855b80 Periodically wake up threads waiting for vmem(9) resources, so they could
ask for resource reclamation again.

This is kind of dirty hack, but as last resort this is better then stuck
indefinitely because of KVA fragmentation, waiting until some random event
free something sufficient.  OpenSolaris also has this hack in its vmem(9).

MFC after:	2 weeks
2015-03-30 13:30:53 +00:00
mav
ae6af27af5 Add four new DDB commands to display vmem(9) statistics.
In particular, such DDB commands were added:
        show vmem <addr>
        show all vmem
        show vmemdump <addr>
        show all vmemdump

As possible usage, that allows to see KVA usage and fragmentation.
2015-03-29 10:02:29 +00:00
kib
f389aa239b Make debug.vmem_check a tunable. It is useful to set it early.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-03-28 23:30:51 +00:00
np
bf77eba909 Do not set M_BESTFIT if a strategy has already been provided. This
fixes problems when using M_FIRSTFIT.

Reviewed by:	jeff@
MFC after:	1 week
2014-04-16 21:39:43 +00:00
mav
fb7844278f Create own free list for each of the first 32 possible allocation sizes.
In case of 4K allocation quantum that means for allocations up to 128K.

With growth of memory fragmentation these lists may grow to quite a large
sizes (tenths and hundreds of thousands items).  Having in one list items
of different sizes in worst case may require full linear list traversal,
that may be very expensive.  Having lists for items of single size means
that unless user specify some alignment or border requirements (that are
very rare cases) first item found on the list should satisfy the request.

While running SPEC NFS benchmark on top of ZFS on 24-core machine with
84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8%
and lock congestion spinning around it from 20% to invisible levels.
And that all is by the cost of just 26 more pointers per vmem instance.

If at some point our kernel will start to actively use KVA allocations
with odd sizes above 128K, something may need to be done to bigger lists
also.
2013-12-11 21:48:04 +00:00
pho
967f62efad Added sysctl to turn off calls to vmem_check().
Sponsored by:	EMC / Isilon storage division
Discussed with:	 jeff
2013-08-20 11:06:56 +00:00
jeff
ed90d4ba3f - Use an arbitrary but reasonably large import size for kva on architectures
that don't support superpages.  This keeps the number of spans and internal
   fragmentation lower.
 - When the user asks for alignment from vmem_xalloc adjust the imported size
   by 2*align to be certain we can satisfy the allocation.  This comes at
   the expense of potential failures when the backend can't supply enough
   memory but could supply the requested size and alignment.

Sponsored by:	EMC / Isilon Storage Division
2013-08-19 23:02:39 +00:00
jeff
d330a11545 - Add a statically allocated memguard arena since it is needed very early
on.
 - Pass the appropriate flags to vmem_xalloc() when allocating space for
   the arena from kmem_arena.

Sponsored by:	EMC / Isilon Storage Division
2013-08-13 22:40:43 +00:00
jeff
de4ecca213 Replace kernel virtual address space allocation with vmem. This provides
transparent layering and better fragmentation.

 - Normalize functions that allocate memory to use kmem_*
 - Those that allocate address space are named kva_*
 - Those that operate on maps are named kmap_*
 - Implement recursive allocation handling for kmem_arena in vmem.

Reviewed by:	alc
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-08-07 06:21:20 +00:00
glebius
a0ef2f4f21 Remove unused argument from vmem_add1().
Reviewed by:	jeff
2013-07-24 08:02:56 +00:00
jeff
e725dd5c1e - Add a general purpose resource allocator, vmem, from NetBSD. It was
originally inspired by the Solaris vmem detailed in the proceedings
   of usenix 2001.  The NetBSD version was heavily refactored for bugs
   and simplicity.
 - Use this resource allocator to allocate the buffer and transient maps.
   Buffer cache defrags are reduced by 25% when used by filesystems with
   mixed block sizes.  Ultimately this may permit dynamic buffer cache
   sizing on low KVA machines.

Discussed with:	alc, kib, attilio
Tested by:	pho
Sponsored by:	EMC / Isilon Storage Division
2013-06-28 03:51:20 +00:00