ACPI C3 ends up doing a lot more work before entering sleep, some of which
requires grabbing a global ACPI hardware serialising mutex.
Because of this, the more CPU cores you have, the more that lock contends
under load, reaching close to the #1 lock contention (after VM, which is being
worked on.)
Tested:
* Sandy bridge Xeon, 2 socket * 8 core
* Ivy bridge Xeon v2, 2 socket * 8 core
* Westmere-EX, 4 socket * 10 core
* Ivybridge desktop
* Sandybridge mobile
* Ivybridge mobile
MFC after: 2 weeks