freebsd-dev/sys/arm64
Andrew Turner 3b34364450 Only call cpu_icache_sync_range when inserting an executable page. If the
page is non-executable the contents of the i-cache are unimportant so this
call is just adding unneeded overhead when inserting pages.

While doing research using gem5 with an O3 pipeline and 1k/32k/1M iTLB/L1
iCache/L2 Bjoern Zeeb (bz@) observed a fairly high rate of calls into
arm64_icache_sync_range() from pmap_enter() along with a high number of
instruction fetches and iTLB/iCache hits.

Limiting the calls to arm64_icache_sync_range() to only executable pages,
we observe the iTLB and iCache Hit going down by about 43%. These numbers
are quite misleading when looked at alone as at the same time instructions
retired were reduced by 19.2% and instruction fetches were reduced by 38.8%.
Overall this reduced the runtime of the test program by 22.4%.

On Juno hardware, in steady-state, running the same test, using the cycle
count to determine runtime, we do see a reduction of up to 28.9% in runtime.

While these numbers certainly depend on the program executed, we expect an
overall performance improvement.

Reported by:	bz
Obtained from:	ABT Systems Ltd
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2016-09-07 16:22:05 +00:00
..
acpica Add ARM64TODO comments to ACPI PCI stubs 2015-07-12 18:32:16 +00:00
arm64 Only call cpu_icache_sync_range when inserting an executable page. If the 2016-09-07 16:22:05 +00:00
cavium Remove the non-INTRNG support from the ThunderX PCIe drivers. 2016-07-14 17:23:49 +00:00
cloudabi64 Convert pointers obtained from the threadattr_t structure with TO_PTR(). 2016-08-24 10:13:18 +00:00
conf Introduce support for Annapurna Alpine CCU and NB devices 2016-09-07 05:34:41 +00:00
include Add a pc_clock pcpu field and use it to implement cpu_est_clockrate. This 2016-09-02 10:13:51 +00:00