4bde709fd3
page is non-executable the contents of the i-cache are unimportant so this call is just adding unneeded overhead when inserting pages. While doing research using gem5 with an O3 pipeline and 1k/32k/1M iTLB/L1 iCache/L2 Bjoern Zeeb (bz@) observed a fairly high rate of calls into arm64_icache_sync_range() from pmap_enter() along with a high number of instruction fetches and iTLB/iCache hits. Limiting the calls to arm64_icache_sync_range() to only executable pages, we observe the iTLB and iCache Hit going down by about 43%. These numbers are quite misleading when looked at alone as at the same time instructions retired were reduced by 19.2% and instruction fetches were reduced by 38.8%. Overall this reduced the runtime of the test program by 22.4%. On Juno hardware, in steady-state, running the same test, using the cycle count to determine runtime, we do see a reduction of up to 28.9% in runtime. While these numbers certainly depend on the program executed, we expect an overall performance improvement. Reported by: bz Obtained from: ABT Systems Ltd MFC after: 1 week Sponsored by: The FreeBSD Foundation |
||
---|---|---|
.. | ||
amd64 | ||
arm | ||
arm64 | ||
boot | ||
bsm | ||
cam | ||
cddl | ||
compat | ||
conf | ||
contrib | ||
crypto | ||
ddb | ||
dev | ||
fs | ||
gdb | ||
geom | ||
gnu | ||
i386 | ||
isa | ||
kern | ||
kgssapi | ||
libkern | ||
mips | ||
modules | ||
net | ||
net80211 | ||
netgraph | ||
netinet | ||
netinet6 | ||
netipsec | ||
netnatm | ||
netpfil | ||
netsmb | ||
nfs | ||
nfsclient | ||
nfsserver | ||
nlm | ||
ofed | ||
opencrypto | ||
pc98 | ||
powerpc | ||
riscv | ||
rpc | ||
security | ||
sparc64 | ||
sys | ||
teken | ||
tests | ||
tools | ||
ufs | ||
vm | ||
x86 | ||
xdr | ||
xen | ||
Makefile |