From f36734fd6aa674c47688b4e97862f789f50ed1d5 Mon Sep 17 00:00:00 2001 From: rmacklem Date: Tue, 25 Aug 2020 00:58:14 +0000 Subject: [PATCH] Fix hangs with processes stuck sleeping on btalloc on i386. r358097 introduced a problem for i386, where kernel builds will intermittently get hung, typically with many processes sleeping on "btalloc". I know nothing about VM, but received assistance from rlibby@ and markj@. rlibby@ stated the following: It looks like the problem is that for systems that do not have UMA_MD_SMALL_ALLOC, we do uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc); but we haven't set an appropriate free function. This is probably why UMA_ZONE_NOFREE was originally there. When NOFREE was removed, it was appropriate for systems with uma_small_alloc. So by default we get page_free as our free function. That calls kmem_free, which calls vmem_free ... but we do our allocs with vmem_xalloc. I'm not positive, but I think the problem is that in effect we vmem_xalloc -> vmem_free, not vmem_xfree. Three possible fixes: 1: The one you tested, but this is not best for systems with uma_small_alloc. 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC. 3: Actually provide an appropriate vmem_bt_free function. I think we should just do option 2 with a comment, it's simple and it's what we used to do. I'm not sure how much benefit we would see from option 3, but it's more work. This patch implements #2. I haven't done a comment, since I don't know what the problem is. markj@ noted the following: I think the suggested patch is ok, but not for the reason stated. On platforms without a direct map the problem is: to allocate btags we need a slab, and to allocate a slab we need to map a page, and to map a page we need to allocate btags. We handle this recursion using a custom slab allocator which specifies M_USE_RESERVE, allowing it to dip into a reserve of free btags. Because the returned slab can be used to keep the reserve populated, this ensures that there are always enough free btags available to handle the recursion. UMA_ZONE_NOFREE ensures that we never reclaim free slabs from the zone. However, when it was removed, an apparent bug in UMA was exposed: keg_drain() ignores the reservation set by uma_zone_reserve() in vmem_startup(). So under memory pressure we reclaim the free btags that are needed to break the recursion. That's why adding _NOFREE back fixes the problem: it disables the reclamation. We could perhaps fix it more cleverly, by modifying keg_drain() to always leave uk_reserve slabs available. markj@'s initial patch failed testing, so committing this patch was agreed upon as the interim solution. Either rlibby@ or markj@ might choose to add a comment to it. PR: 248008 Reviewed by: rlibby, markj --- sys/kern/subr_vmem.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/sys/kern/subr_vmem.c b/sys/kern/subr_vmem.c index 57fdb2101dd7..2c8da6634d7a 100644 --- a/sys/kern/subr_vmem.c +++ b/sys/kern/subr_vmem.c @@ -668,10 +668,14 @@ vmem_startup(void) vmem_zone = uma_zcreate("vmem", sizeof(struct vmem), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); +#ifdef UMA_MD_SMALL_ALLOC vmem_bt_zone = uma_zcreate("vmem btag", sizeof(struct vmem_btag), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_VM); -#ifndef UMA_MD_SMALL_ALLOC +#else + vmem_bt_zone = uma_zcreate("vmem btag", + sizeof(struct vmem_btag), NULL, NULL, NULL, NULL, + UMA_ALIGN_PTR, UMA_ZONE_VM | UMA_ZONE_NOFREE); mtx_init(&vmem_bt_lock, "btag lock", NULL, MTX_DEF); uma_prealloc(vmem_bt_zone, BT_MAXALLOC); /*