scottl fc9dad2dcf The run_filter() procedure is a means of working around DMA engine bugs in
old/broken hardware.  Unfortunately, it adds cache pressure and possible
mispredicted branches to the fast path of the bus_dmamap_load collection of
functions.  Since it's meant for slow path exception processing, de-inline
it and allow its conditions to be pre-computed at tag_create time and thus
short-circuited at runtime.

While here, cut down on the size of _bus_dmamap_load_buffer() by pushing the
bounce page logic into a non-inlined function.  Again, this helps with
cache pressure and mispredicted branches.

According to the TSC, this shaves off a few cycles on average.  Unfortunately,
the data varies quite a bit due to interrupts and preemption, so it's hard to
get a good measurement.  Real world measurements of network PPS are welcomed.
A merge to amd64 and other arches is pending more testing.
2006-09-11 06:48:53 +00:00
..
2006-08-10 06:04:00 +00:00
2006-09-03 22:24:08 +00:00