1/4 of the number of queues times queue entries is too limiting. It works up to about 4k IOPS / 3.0GB/s for hardware that can do 4.4k/3.2GB/s with nvd. 3/4 works better, though it highlights issues in the fairness of nda's choice of TRIM vs READ. That will be fixed separately.