sure to clear the lower 12 bits. We're adding the translation
attributes to the physical address and non-zero bits in the first
12 bits would give us something unexpected, including invalid bit
values. Those trigger nested general protection faults.
We do not have to clear the region bits, because they are ignored
anyway, so we can replace an existing dep instruction with the one
we need.
This fixes GP faults for the swapper thread, as it's the only thread
that has a direct-mapped stack. Since the bug is in the nested TLB
fault handler, the frequency of hitting the GP is in the order of
hours/days under load.