Fix hangs with processes stuck sleeping on btalloc on i386.

r358097 introduced a problem for i386, where kernel builds will intermittently
get hung, typically with many processes sleeping on "btalloc".
I know nothing about VM, but received assistance from rlibby@ and markj@.

rlibby@ stated the following:
   It looks like the problem is that
   for systems that do not have UMA_MD_SMALL_ALLOC, we do
           uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
   but we haven't set an appropriate free function.  This is probably why
   UMA_ZONE_NOFREE was originally there.  When NOFREE was removed, it was
   appropriate for systems with uma_small_alloc.

   So by default we get page_free as our free function.  That calls
   kmem_free, which calls vmem_free ... but we do our allocs with
   vmem_xalloc.  I'm not positive, but I think the problem is that in
   effect we vmem_xalloc -> vmem_free, not vmem_xfree.

   Three possible fixes:
    1: The one you tested, but this is not best for systems with
       uma_small_alloc.
    2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC.
    3: Actually provide an appropriate vmem_bt_free function.

   I think we should just do option 2 with a comment, it's simple and it's
   what we used to do.  I'm not sure how much benefit we would see from
   option 3, but it's more work.

This patch implements #2. I haven't done a comment, since I don't know
what the problem is.

markj@ noted the following:
   I think the suggested patch is ok, but not for the reason stated.
   On platforms without a direct map the problem is:
   to allocate btags we need a slab,
   and to allocate a slab we need to map a page, and to map a page we need
   to allocate btags.

   We handle this recursion using a custom slab allocator which specifies
   M_USE_RESERVE, allowing it to dip into a reserve of free btags.
   Because the returned slab can be used to keep the reserve populated,
   this ensures that there are always enough free btags available to
   handle the recursion.

   UMA_ZONE_NOFREE ensures that we never reclaim free slabs from the zone.
   However, when it was removed, an apparent bug in UMA was exposed:
   keg_drain() ignores the reservation set by uma_zone_reserve()
   in vmem_startup().
   So under memory pressure we reclaim the free btags that are needed to
   break the recursion.
   That's why adding _NOFREE back fixes the problem: it disables the
   reclamation.

   We could perhaps fix it more cleverly, by modifying keg_drain() to always
   leave uk_reserve slabs available.

markj@'s initial patch failed testing, so committing this patch was agreed
upon as the interim solution.
Either rlibby@ or markj@ might choose to add a comment to it.

PR:		248008
Reviewed by:	rlibby, markj
This commit is contained in:
rmacklem 2020-08-25 00:58:14 +00:00
parent 2b8ca65146
commit f36734fd6a

View File

@ -668,10 +668,14 @@ vmem_startup(void)
vmem_zone = uma_zcreate("vmem", vmem_zone = uma_zcreate("vmem",
sizeof(struct vmem), NULL, NULL, NULL, NULL, sizeof(struct vmem), NULL, NULL, NULL, NULL,
UMA_ALIGN_PTR, 0); UMA_ALIGN_PTR, 0);
#ifdef UMA_MD_SMALL_ALLOC
vmem_bt_zone = uma_zcreate("vmem btag", vmem_bt_zone = uma_zcreate("vmem btag",
sizeof(struct vmem_btag), NULL, NULL, NULL, NULL, sizeof(struct vmem_btag), NULL, NULL, NULL, NULL,
UMA_ALIGN_PTR, UMA_ZONE_VM); UMA_ALIGN_PTR, UMA_ZONE_VM);
#ifndef UMA_MD_SMALL_ALLOC #else
vmem_bt_zone = uma_zcreate("vmem btag",
sizeof(struct vmem_btag), NULL, NULL, NULL, NULL,
UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE);
mtx_init(&vmem_bt_lock, "btag lock", NULL, MTX_DEF); mtx_init(&vmem_bt_lock, "btag lock", NULL, MTX_DEF);
uma_prealloc(vmem_bt_zone, BT_MAXALLOC); uma_prealloc(vmem_bt_zone, BT_MAXALLOC);
/* /*