Instead of collecting statistics for each combination of ports and logical
units, that consumed ~45KB per LU with present number of ports, collect
separate statistics for every port and every logical unit separately, that
consume only 176 bytes per each single LU/port. This reduces struct
ctl_lun size down to just 6KB.
Also new IOCTL API/ABI does not hardcode number of LUs/ports, and should
allow handling of very large quantities.
MFC after: 2 weeks (probably keeping old API enabled for some time)