than modulo-8, because clang emits ldrd and strd instructions for addresses that are only 4-byte aligned