o Move all code into a single file for easier maintenance.
o Use a single global lock to avoid having to handle either
multiple locks or race conditions.
o Make sure to disable the high FP registers after saving
or dropping them.
o use msleep() to wait for the other CPU to save the high
FP registers.
This change fixes the high FP inconsistency panics.
A single global lock typically serializes too much, which may
be noticable when a lot of threads use the high FP registers,
but in that case it's probably better to switch the high FP
context synchronuously. Put differently: cpu_switch() should
switch the high FP registers if the incoming and outgoing
threads both use the high FP registers.