96d67b46df
boundary. This was addressed several years ago by creating a parent tag hierarchy for the root buses that set the boundary restriction for appropriate buses and allowed child deviced to inherit it. Somewhere along the way, this restriction was turned into a case for marking the tag as a candidate for needing bounce buffers, instead of just splitting the segment along the boundary line. This flag also causes all maps associated with this tag to be non-NULL, which in turn causes bus_dmamap_sync() to take the slow path of function pointer indirection to discover that there's no bouncing work to do. The end result is a lot of pages set aside in bounce pools that will never be used, and a slow path for data buffers in nearly every DMA-capable PCIe device. For example, our workload at Netflix was spending nearly 1% of all CPU time going through this slow path. Fix this problem by being more selective about when to set the COULD_BOUNCE flag. Only set it when the boundary restriction exists and the consumer cannot do more than a single DMA segment at once. This fixes the case of dynamic buffers (mbufs, bio's) but doesn't address static buffers allocated from bus_dmamem_alloc(). That case will be addressed in the future. For those interested, this was discovered thanks to Dtrace Flame Graphs. Discussed with: jhb, kib Obtained from: Netflix, Inc. MFC after: 3 days