Allow skipping dmu_buf_will_dirty() call in dsl_dir_transfer_space().
dsl_dir_transfer_space() is mostly called after dsl_dir_diduse_space(),
which already calls dmu_buf_will_dirty() for the same dbuf and tx, so
its duplicate call in those cases will change nothing, only spend time.
Skipping this call by four times reduces time spent in dbuf_write_done()
and descendants, updating dataset statistics with several congested lock
acquisitions. When rewriting 8K zvol blocks at 1GB/s rate, this reduces
CPU time spent inside dbuf_write_done(), according to profiling, from 45%
of 683K samples to 18% of 422K.