freebsd-dev/sys/x86
Scott Long 96d67b46df Old PCIe implementations cannot allow a DMA transfer to cross a 4GB
boundary.  This was addressed several years ago by creating a parent
tag hierarchy for the root buses that set the boundary restriction
for appropriate buses and allowed child deviced to inherit it.
Somewhere along the way, this restriction was turned into a case for
marking the tag as a candidate for needing bounce buffers, instead
of just splitting the segment along the boundary line.  This flag
also causes all maps associated with this tag to be non-NULL, which
in turn causes bus_dmamap_sync() to take the slow path of function
pointer indirection to discover that there's no bouncing work to
do.  The end result is a lot of pages set aside in bounce pools
that will never be used, and a slow path for data buffers in nearly
every DMA-capable PCIe device.  For example, our workload at Netflix
was spending nearly 1% of all CPU time going through this slow path.

Fix this problem by being more selective about when to set the
COULD_BOUNCE flag.  Only set it when the boundary restriction
exists and the consumer cannot do more than a single DMA segment
at once.  This fixes the case of dynamic buffers (mbufs, bio's)
but doesn't address static buffers allocated from bus_dmamem_alloc().
That case will be addressed in the future.

For those interested, this was discovered thanks to Dtrace Flame
Graphs.

Discussed with: jhb, kib
Obtained from:	Netflix, Inc.
MFC after:	3 days
2014-05-20 22:43:17 +00:00
..
acpica Change default logic to CONFORM because this routine is shared 2014-03-28 02:38:14 +00:00
bios Add missing header needed by free(9). 2012-09-30 15:42:20 +00:00
cpufreq Do not DELAY() for P-state transition unless we want to see the result. 2013-12-10 20:25:43 +00:00
include Add definitions for more structured extended features as well as 2014-05-16 17:45:09 +00:00
iommu Re-implement the DMAR I/O MMU code in terms of PCI RIDs 2014-04-01 15:48:46 +00:00
isa Remove vestiges of knowing the ISA bus, which we gave up on around 20 2014-03-19 21:03:04 +00:00
pci Add support for managing PCI bus numbers. As with BARs and PCI-PCI bridge 2014-02-12 04:30:37 +00:00
x86 Old PCIe implementations cannot allow a DMA transfer to cross a 4GB 2014-05-20 22:43:17 +00:00
xen Make this compile with gcc. 2014-04-05 22:43:18 +00:00