Alexander Motin 9d9fd8b79f Micro-optimize OOA queue processing.
- Move ctl_get_cmd_entry() calls from every OOA traversal to when
  the requests first inserted, storing seridx in struct ctl_scsiio.
- Move some checks out of the loop in ctl_check_ooa().
- Replace checks for errors that can not happen with asserts.
- Transpose ctl_serialize_table, so that any OOA traversal accessed
  only one row (cache line).  Compact it from enum to uint8_t.
- Optimize static branch predictions in hottest places.

Due to O(n) nature on deep LUN queues this can be the hottest code
path in CTL, and additional 20% of IOPS I see in some 4KB I/O tests
are good to have in reserve.  About 50% of CPU time here according
to the profiles is now spent in two memory accesses per traversed
request in OOA.

Sponsored by:	iXsystems, Inc.
MFC after:	2 weeks
2021-02-27 10:40:24 -05:00
..
2021-01-02 19:57:58 -07:00
2021-02-27 10:40:24 -05:00
2021-02-25 21:43:12 -04:00
2021-01-15 20:09:55 +01:00
2021-02-17 16:32:11 -08:00
2021-02-23 17:47:07 +00:00
2021-02-24 22:45:24 +02:00