Fix spa_alloc_tree sorting by offset in r305331.

Original commit "7090 zfs should improve allocation order" declares alloc
queue sorted by time and offset.  But in practice io_offset is always zero,
so sorting happened only by time, while order of writes with equal time was
completely random.  On Illumos this did not affected much thanks to using
high resolution timestamps.  On FreeBSD due to using much faster but low
resolution timestamps it caused bad data placement on disks, affecting
further read performance.

This change switches zio_timestamp_compare() from comparing uninitialized
io_offset to really populated io_bookmark values.  I haven't decided yet
what to do with timestampts, but on simple tests this change gives the
same peformance results by just making code to work as declared.

MFC after:	1 week
This commit is contained in:
mav 2016-12-08 15:58:03 +00:00
parent b7561e3695
commit 59a1ac276a

View File

@ -563,9 +563,24 @@ zio_timestamp_compare(const void *x1, const void *x2)
if (z1->io_queued_timestamp > z2->io_queued_timestamp)
return (1);
if (z1->io_offset < z2->io_offset)
if (z1->io_bookmark.zb_objset < z2->io_bookmark.zb_objset)
return (-1);
if (z1->io_offset > z2->io_offset)
if (z1->io_bookmark.zb_objset > z2->io_bookmark.zb_objset)
return (1);
if (z1->io_bookmark.zb_object < z2->io_bookmark.zb_object)
return (-1);
if (z1->io_bookmark.zb_object > z2->io_bookmark.zb_object)
return (1);
if (z1->io_bookmark.zb_level < z2->io_bookmark.zb_level)
return (-1);
if (z1->io_bookmark.zb_level > z2->io_bookmark.zb_level)
return (1);
if (z1->io_bookmark.zb_blkid < z2->io_bookmark.zb_blkid)
return (-1);
if (z1->io_bookmark.zb_blkid > z2->io_bookmark.zb_blkid)
return (1);
if (z1 < z2)