o The SDM states that flushing the RSE in the cycle prior to the
call to ia32 code yields the best performance. We don't really
care to much about performance here, but we do the same anyway.
I'm being paranoia and conservative here.
o Only initialize the ia32 state registers, not the registers used
as scratch by the ia32 engine. This saves a couple of loads from
the trapframe, but also helps debugging: we don't clobber useful
debugging data (engineering hints :-)
o Make sure all general registers constituting ia32 state have been
initialized. If there's no useful to be loaded from the trapframe,
clear the register. This avoids accidentally leaking NaT bits.
o Make sure we set ar.k6 prior to clobbering ar.bspstore and also
set ar.k7 prior to setting sp. This fixes a race seen for ia64
native code as well (and previously fixed too).