Improve the performance of the arm64 thread switching code.
The full system memory barrier around a TLB invalidation is stricter than required. It needs to wait on accesses to main memory, with just the weaker store variant before the invalidate. As such use the dsb istst, tlbi, dlb ish sequence already used in pmap. The tlbi instruction in this sequence is also unnecessarily using a broadcast invalidate when it just needs to invalidate the local CPUs TLB. Switch to a non-broadcast variant of this instruction. Sponsored by: DARPA, AFRL
This commit is contained in:
parent
8cf606a4d6
commit
cbf2160e81
@ -91,9 +91,9 @@ ENTRY(cpu_throw)
|
||||
isb
|
||||
|
||||
/* Invalidate the TLB */
|
||||
dsb sy
|
||||
tlbi vmalle1is
|
||||
dsb sy
|
||||
dsb ishst
|
||||
tlbi vmalle1
|
||||
dsb ish
|
||||
isb
|
||||
|
||||
/* If we are single stepping, enable it */
|
||||
@ -192,9 +192,9 @@ ENTRY(cpu_switch)
|
||||
isb
|
||||
|
||||
/* Invalidate the TLB */
|
||||
dsb sy
|
||||
tlbi vmalle1is
|
||||
dsb sy
|
||||
dsb ishst
|
||||
tlbi vmalle1
|
||||
dsb ish
|
||||
isb
|
||||
|
||||
/*
|
||||
|
Loading…
Reference in New Issue
Block a user