freebsd-dev/sys/amd64
Bruce Evans d5c90663b2 Don't use plain "ret" instructions at targets of jump instructions,
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles.  Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).

Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this.  I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each.  For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles.  4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.
2007-11-29 02:01:21 +00:00
..
acpica Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses 2007-05-08 22:01:04 +00:00
amd64 Don't use plain "ret" instructions at targets of jump instructions, 2007-11-29 02:01:21 +00:00
compile
conf Make ADAPTIVE_GIANT as the default in the kernel and remove the option. 2007-11-28 05:50:45 +00:00
ia32 Optimize vmmeter locking. 2007-06-10 21:59:14 +00:00
include Adjust the code to probe for the PCI config mechanism to use. 2007-11-28 22:20:08 +00:00
isa Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not 2007-10-26 03:23:54 +00:00
linux32 Fill in cr2 in the signal context from ksi->ksi_addr. 2007-09-20 13:46:26 +00:00
pci Adjust the code to probe for the PCI config mechanism to use. 2007-11-28 22:20:08 +00:00
Makefile