- the mutex aac_io_lock protects the main codepaths which handle queues and
hardware registers. Only one acquire/release is done in the top-half and
the taskqueue. This mutex also applies to the userland command path and
CAM data path.
- Move the taskqueue to the new Giant-free version.
- Register the disk device with DISKFLAG_NOGIANT so the top-half processing
runs without Giant.
- Move the dynamic command allocator to the worker thread to avoid locking
issues with bus_dmamem_alloc().
This gives about 20% improvement in most of my benchmarks.