freebsd-dev

History

Bruce Evans d5c90663b2 Don't use plain "ret" instructions at targets of jump instructions, since the branch caches on at least Athlon XP through Athlon 64 CPU's don't understand such instructions and guarantee a cache miss taking at least 10 cycles. Use the documented workaround "ret $0" instead ("nop; ret" also works, but "ret $0" is probably faster on old CPUs). Normal code (even asm code) doesn't branch to "ret", since there is usually some cleanup to do, but the __mcount, .mcount and .mexitcount entry points were optimized too well to have the minimum number of instructions (3 instructions each if profiling is not enabled) and they did this. I didn't see a significant number of cache misses for .mexitcount, but for the shared "ret" for __mcount and .mcount I observed cache misses costing 26 cycles each. For a send(2) syscall that makes about 70 function calls, the cost of these cache misses alone increased the syscall time from about 4000 cycles to about 7000 cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled; after this fix, configuring profiling only costs about 600 cycles in the 4000, which is consistent with almost perfect branch prediction in the mcounting calls.		2007-11-29 02:01:21 +00:00
..
acpica	Handle CPUs with APIC IDs higher than 32 (at least one IBM server uses	2007-05-08 22:01:04 +00:00
amd64	Don't use plain "ret" instructions at targets of jump instructions,	2007-11-29 02:01:21 +00:00
compile
conf	Make ADAPTIVE_GIANT as the default in the kernel and remove the option.	2007-11-28 05:50:45 +00:00
ia32	Optimize vmmeter locking.	2007-06-10 21:59:14 +00:00
include	Adjust the code to probe for the PCI config mechanism to use.	2007-11-28 22:20:08 +00:00
isa	Split /dev/nvram driver out of isa/clock.c for i386 and amd64. I have not	2007-10-26 03:23:54 +00:00
linux32	Fill in cr2 in the signal context from ksi->ksi_addr.	2007-09-20 13:46:26 +00:00
pci	Adjust the code to probe for the PCI config mechanism to use.	2007-11-28 22:20:08 +00:00
Makefile