riscv: Fix pindex level confusion

The pindex values are assigned from the L3 leaves upwards, meaning there
are NUL2E L3 tables and then NUL1E L2 tables (with a futher NUL0E L1
tables in future when we implement Sv48 support). Therefore anything
below NUL2E is an L3 table's page and anything above or equal to NUL2E
is an L2 table's page (with the threshold of NUL2E + NUL1E marking the
start of the L1 tables' pages in Sv48). Thus all the comparisons and
arithmetic operations must use NUL2E to handle the L3/L2 allocation (and
thus L2/L1 entry) transition point, not NUL1E as all but pmap_alloc_l2
were doing.

To make matters confusing, the NUL1E and NUL2E definitions in the RISC-V
pmap are based on a 4-level page hierarchy but we currently use the
3-level Sv39 format (as that's the only required one, and hardware
support for the 4-level Sv48 is not widespread). This means that, in
effect, the above bug cancels out with the bloated NULxE definitions
such that things "work" (but are still technically wrong, and thus would
break when adding Sv48 support), with one exception. pmap_enter_l2 is
currently the only function to use the correct constant, but since
_pmap_alloc_l3 uses the incorrect constant, it will do complete nonsense
when it needs to allocate a new L2 table (which is rather rare). In this
instance, _pmap_alloc_l3, whilst it would correctly determine the pindex
was for an L2 table, would only subtract NUL1E when computing l1index
and thus go way out of bounds (by 511*512*512 bytes, or 127.75 GiB) of
its own L1 table and, thanks to pmap_distribute_l1, of every other
pmap's L1 table in the whole system. This has likely never been hit as
it would presumably instantly fault and panic.

Reviewed by:	markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D31087
This commit is contained in:
Jessica Clarke 2021-07-21 02:47:01 +01:00
parent a1f9cdb1ab
commit ade2ea3c45

View File

@ -1132,7 +1132,7 @@ _pmap_unwire_ptp(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free)
vm_paddr_t phys;
PMAP_LOCK_ASSERT(pmap, MA_OWNED);
if (m->pindex >= NUL1E) {
if (m->pindex >= NUL2E) {
pd_entry_t *l1;
l1 = pmap_l1(pmap, va);
pmap_clear(l1);
@ -1143,7 +1143,7 @@ _pmap_unwire_ptp(pmap_t pmap, vm_offset_t va, vm_page_t m, struct spglist *free)
pmap_clear(l2);
}
pmap_resident_count_dec(pmap, 1);
if (m->pindex < NUL1E) {
if (m->pindex < NUL2E) {
pd_entry_t *l1;
vm_page_t pdpg;
@ -1279,11 +1279,11 @@ _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp)
* it isn't already there.
*/
if (ptepindex >= NUL1E) {
if (ptepindex >= NUL2E) {
pd_entry_t *l1;
vm_pindex_t l1index;
l1index = ptepindex - NUL1E;
l1index = ptepindex - NUL2E;
l1 = &pmap->pm_l1[l1index];
KASSERT((pmap_load(l1) & PTE_V) == 0,
("%s: L1 entry %#lx is valid", __func__, pmap_load(l1)));
@ -1301,7 +1301,7 @@ _pmap_alloc_l3(pmap_t pmap, vm_pindex_t ptepindex, struct rwlock **lockp)
l1 = &pmap->pm_l1[l1index];
if (pmap_load(l1) == 0) {
/* recurse for allocating page dir */
if (_pmap_alloc_l3(pmap, NUL1E + l1index,
if (_pmap_alloc_l3(pmap, NUL2E + l1index,
lockp) == NULL) {
vm_page_unwire_noq(m);
vm_page_free_zero(m);