Commit Graph

36 Commits

Author SHA1 Message Date
Jung-uk Kim
3453537fa5 Use atomic load & store for TSC frequency. It may be overkill for amd64 but
safer for i386 because it can be easily over 4 GHz now.  More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly).  Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
2011-04-07 23:28:28 +00:00
Jung-uk Kim
bc34c87e81 Deprecate rarely used tsc_is_broken. Instead, we zero out tsc_freq because
it is almost always used with tsc_freq any way.
2011-03-10 20:02:58 +00:00
Jung-uk Kim
29462bea1e Turn off CPU frequency change notifiers when the TSC is P-state invariant
or it is forced by setting 'kern.timecounter.invariant_tsc' tunable
to non-zero.
2008-10-21 00:38:00 +00:00
Poul-Henning Kamp
ebfbcd612a Rename timer0_max_count to i8254_max_count.
Rename timer0_real_max_count to i8254_real_max_count and make it static.
Rename timer_freq to i8254_freq and make it a loader tunable.
2008-03-26 15:03:24 +00:00
Bruce Evans
d5c90663b2 Don't use plain "ret" instructions at targets of jump instructions,
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles.  Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).

Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this.  I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each.  For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles.  4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.
2007-11-29 02:01:21 +00:00
Bruce Evans
7e7c8806bf Remove entry points for -finstrument functions since they are currently
unused except to obfuscate disassemblies.  -mprofiler-epilogue is
currently with gcc-4 (it does too little), but -finstrument-functions
is broken in a different way (it does too much).

amd64 version: meger whitespace fixes from i386 version.
2007-11-29 01:15:03 +00:00
Nate Lawson
0d4ac62a35 Add an interface for drivers to be notified of changes to CPU frequency.
cpufreq_pre_change is called before the change, giving each driver a chance
to revoke the change.  cpufreq_post_change provides the results of the
change (success or failure).  cpufreq_levels_changed gives the unit number
of the cpufreq device whose number of available levels has changed.  Hook
in all the drivers I could find that needed it.

* TSC: update TSC frequency value.  When the available levels change, take the
highest possible level and notify the timecounter set_cputicker() of that
freq.  This gets rid of the "calcru: runtime went backwards" messages.
* identcpu: updates the sysctl hw.clockrate value
* Profiling: if profiling is active when the clock changes, let the user
know the results may be inaccurate.

Reviewed by:	bde, phk
MFC after:	1 month
2007-03-26 18:03:29 +00:00
Bruce Evans
6a70163fcc Removed some SMP ifdefs so that using the TSC as a cputime clock is
not completely decided at config time.  Just don't default to using
the TSC if there are multiple active CPUs.  Also, don't default to
using the TSC if it is broken.  SMP ifdefs are still used to disallow
using perfmon since perfmon is always broken if SMP is just configured.

This only helps much for SMP kernels running on 1 CPU.  The overheads
for using the i8254 cputime clock were a bit too high on 486/33's, and
now on multi-GHz CPUs they are usually in the 99-99.9% range.  Switching
from the old default of an i8254 clock to the TSC works poorly because
the overheads are not recalibrated.

Use the same condition for declaring perfmon stuff as for using it.
2006-10-29 09:48:44 +00:00
Bruce Evans
3a110062fd Cleaned up includes. <machine/profile.h> was unused. <machine/timerreg.h>
was only used in the GUPROF case, so the messes to get its i386 prerequisites
included shouldn't have been needed.

Fixed some style bugs. Quote #error contents, and don't repeat an #error
directive on amd64.
2006-10-28 06:38:51 +00:00
Bruce Evans
94450a83e8 Removed all traces of HIDENAME() in amd64 and i386 kernel code. Using
this used to be slightly cleaner than using ifdefs in a few places to
support both a.out and elf, but using it now just causes messes and
unportabilities.  It seems to be impossible to implement the elf
HIDENAME() portably in cpp (since token pasting of "." and <name> is
invalid).

*/prof_machdep.c:
- Removed all uses of CNAME().  CNAME() is easy enough to use in pure
  asm code, but using it in inline asm requires messy quoting.  The
  core pure asm code has been hacked on more and all uses of CNAME() in
  it have already gone away.  Just assume the elf convention here too.
- Removed now-uneeded include of <machine/asmacros.h>.
- Removed the workaround for a namespace conflict with this include.
2006-10-28 06:04:29 +00:00
Bruce Evans
447647908c Don't call mexitcount or provide a stub mexitcount to call when
profiling is configured but high resolution profiling is not configured.
Only functions in *.[Ss] called the stub, so efficiency was not
significantly affected.
2006-10-27 14:17:50 +00:00
Nate Lawson
b3a9f68f8c Fix LINT build, original breakage was rev 1.23. There are 2 definitions
of MCOUNT to have a C version and an asm version with the same name and
not have LOCORE ifdefs to distinguish them.  <machine/profile.h> provides
a C version and <machine/asmacros.h> provides an assembler version.

Discussed with:	bde
2005-05-20 17:16:24 +00:00
Nate Lawson
c363ab2430 Fix low res profiling kernel build. Move two defines to collapse the
#ifdef GUPROF case.
2005-05-19 05:22:52 +00:00
Yoshihiro Takahashi
24072ca35b - Move timerreg.h to <arch>/include and split i8253 specific defines into
i8253reg.h, and add some defines to control a speaker.
- Move PPI related defines from i386/isa/spkr.c into ppireg.h and use them.
- Move IO_{PPI,TIMER} defines into ppireg.h and timerreg.h respectively.
- Use isa/isareg.h rather than <arch>/isa/isa.h.

Tested on: i386, pc98
2005-05-14 09:10:02 +00:00
Yoshihiro Takahashi
d1725ef7ff Change a directory layout for pc98.
- Move MD files into <arch>/<arch>.
  - Move bus dependent files into <arch>/<bus>.
Rename some files to more suitable names.

Repo-copied by:	peter
Discussed with:	imp
2005-05-10 12:02:18 +00:00
Joerg Wunsch
a5f50ef9e4 netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild.  Extension to other compilers is supposed
to be possible, of course.

Submitted by:	netchild
Reviewed by:	various developers on arch@, some time ago
2005-03-02 21:33:29 +00:00
Bruce Evans
026afdcc05 Quick fix for overflow when tsc_freq >= 2^31. "int profrate" in struct
gmon and struct gmonhdr was originally just to represent the kernel
(profiling) clock frequency and it remains poorly suited to representing
the frequencies of fast counters like the TSC.  It broke a year or two
ago.  This quick fix keeps it working for another year or month or two
until TSC frequencies can exceed 2^32, by dividing the frequency by 2.
Dividing the frequency by 4 would work for a little longer but would
lose a little too much precision.
2004-05-26 09:43:38 +00:00
Tom Rhodes
a122cca953 These are changes to allow to use the Intel C/C++ compiler (lang/icc)
to build the kernel. It doesn't affect the operation if gcc.

Most of the changes are just adding __INTEL_COMPILER to #ifdef's, as
icc v8 may define __GNUC__ some parts may look strange but are
necessary.

Additional changes:
 - in_cksum.[ch]:
   * use a generic C version instead of the assembly version in the !gcc
     case (ASM code breaks with the optimizations icc does)
     -> no bad checksums with an icc compiled kernel
     Help from:		andre, grehan, das
     Stolen from: 	alpha version via ppc version
     The entire checksum code should IMHO be replaced with the DragonFly
     version (because it isn't guaranteed future revisions of gcc will
     include similar optimizations) as in:
        ---snip---
          Revision  Changes    Path
          1.12      +1 -0      src/sys/conf/files.i386
          1.4       +142 -558  src/sys/i386/i386/in_cksum.c
          1.5       +33 -69    src/sys/i386/include/in_cksum.h
          1.5       +2 -0      src/sys/netinet/igmp.c
          1.6       +0 -1      src/sys/netinet/in.h
          1.6       +2 -0      src/sys/netinet/ip_icmp.c

          1.4       +3 -4      src/contrib/ipfilter/ip_compat.h
          1.3       +1 -2      src/sbin/natd/icmp.c
          1.4       +0 -1      src/sbin/natd/natd.c
          1.48      +1 -0      src/sys/conf/files
          1.2       +0 -1      src/sys/conf/files.amd64
          1.13      +0 -1      src/sys/conf/files.i386
          1.5       +0 -1      src/sys/conf/files.pc98
          1.7       +1 -1      src/sys/contrib/ipfilter/netinet/fil.c
          1.10      +2 -3      src/sys/contrib/ipfilter/netinet/ip_compat.h
          1.10      +1 -1      src/sys/contrib/ipfilter/netinet/ip_fil.c
          1.7       +1 -1      src/sys/dev/netif/txp/if_txp.c
          1.7       +1 -1      src/sys/net/ip_mroute/ip_mroute.c
          1.7       +1 -2      src/sys/net/ipfw/ip_fw2.c
          1.6       +1 -2      src/sys/netinet/igmp.c
          1.4       +158 -116  src/sys/netinet/in_cksum.c
          1.6       +1 -1      src/sys/netinet/ip_gre.c
          1.7       +1 -2      src/sys/netinet/ip_icmp.c
          1.10      +1 -1      src/sys/netinet/ip_input.c
          1.10      +1 -2      src/sys/netinet/ip_output.c
          1.13      +1 -2      src/sys/netinet/tcp_input.c
          1.9       +1 -2      src/sys/netinet/tcp_output.c
          1.10      +1 -1      src/sys/netinet/tcp_subr.c
          1.10      +1 -1      src/sys/netinet/tcp_syncache.c
          1.9       +1 -2      src/sys/netinet/udp_usrreq.c

          1.5       +1 -2      src/sys/netinet6/ipsec.c
          1.5       +1 -2      src/sys/netproto/ipsec/ipsec.c
          1.5       +1 -1      src/sys/netproto/ipsec/ipsec_input.c
          1.4       +1 -2      src/sys/netproto/ipsec/ipsec_output.c

          and finally remove
            sys/i386/i386        in_cksum.c
            sys/i386/include     in_cksum.h
        ---snip---
 - endian.h:
   * DTRT in C++ mode
 - quad.h:
   * we don't use gcc v1 anymore, remove support for it
   Suggested by:	bde (long ago)
 - assym.h:
   * avoid zero-length arrays (remove dependency on a gcc specific
     feature)
     This change changes the contents of the object file, but as it's
     only used to generate some values for a header, and the generator
     knows how to handle this, there's no impact in the gcc case.
   Explained by:	bde
   Submitted by:	Marius Strobl <marius@alchemy.franken.de>
 - aicasm.c:
   * minor change to teach it about the way icc spells "-nostdinc"
   Not approved by:	gibbs (no reply to my mail)
 - bump __FreeBSD_version (lang/icc needs to know about the changes)

Incarnations of this patch survive gcc compiles since a loooong time,
I use it on my desktop. An icc compiled kernel works since Nov. 2003
(exceptions: snd_* if used as modules), it survives a build of the
entire ports collection with icc.

Parts of this commit contains suggestions or submissions from
Marius Strobl <marius@alchemy.franken.de>.

Reviewed by:	-arch
Submitted by:	netchild
2004-03-12 21:45:33 +00:00
David E. O'Brien
006124d811 Use __FBSDID(). 2003-06-02 16:32:55 +00:00
Bruce Evans
3c9d896571 Quick fix for high resolution kernel profiling on i386's. Use
-finstrument-functions instead of -mprofiler-epilogue.  The former
works essentially the same as the latter but has a higher overhead
(about 22 more bytes per function for passing unused args to the
profiling functions).

Removed all traces of the IDENT Makefile variable, which had been
reduced to just a place for holding profiling's contribution to CFLAGS
(the IDENT that gives the kernel identity was renamed to KERN_IDENT).
2002-07-13 22:28:34 +00:00
Poul-Henning Kamp
77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp
82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Peter Wemm
c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Peter Wemm
399f34180a Use .p2align to ensure consistant a.out/elf alignment. I'd have used
SUPERALIGN_TEXT, but this is inline assembler and after cpp has run.
Inspired by bde's comments on linux_locore.s.
1999-08-25 23:50:03 +00:00
Bruce Evans
ea2b3e3d1b Fixed profiling of elf kernels. Made high resolution profiling compile
for elf kernels (it is broken for all kernels due to lack of egcs support).

Renaming of many assembler labels is avoided by declaring by declaring
the labels that need to be visible to gprof as having type "function"
and depending on the elf version of gprof being zealous about discarding
the others.  A few type declarations are still missing, mainly for SMP.

PR:		9413
Submitted by:	Assar Westerlund <assar@sics.se> (initial parts)
1999-05-06 09:44:57 +00:00
Bruce Evans
0be036c31a Ifdefed the declarations of conditionally used variables. 1998-12-14 18:21:34 +00:00
Bruce Evans
212b37ff18 Support compiling with `gcc -pedantic' (don't use hard newlines in
(asm) string constants).
1998-04-19 15:41:06 +00:00
Bruce Evans
c1087c1324 Support compiling with `gcc -ansi'. 1998-04-15 17:47:40 +00:00
Poul-Henning Kamp
71f461f86a Rename "i586_ctr" to "tsc" (both upper and lower case instances).
Fix a couple of printfs too.

Warning: This changes the names of a couple of kernel options!
1997-12-26 20:42:37 +00:00
Bruce Evans
043e37feab Added a sysctl (machdep.cputime_clock) to select the clock used by
"high resolution" profiling.  The available clocks are:
- the i8254 clock
- on non-SMP i586's and i686's: the TSC
- on systems with I586_PMC_GUPROF configured, and PERFMON configured
  and available: all the performance counters.
This is unfinshed (there are problems with locking out the PERFMON
device driver, and with losing calibration after switching the clock),
but better than static configuration or writing to kmem.

Changed ifdefs to avoid generating code for non-working option
combinations.
1997-11-24 18:16:23 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
Satoshi Asami
e30f001135 More merge and update.
(1) deleted #if 0

    pc98/pc98/mse.c

(2) hold per-unit I/O ports in ed_softc

    pc98/pc98/if_ed.c
    pc98/pc98/if_ed98.h

(3) merge more files by segregating changes into headers.

  new file (moved from pc98/pc98):

    i386/isa/aic_98.h

  deleted:

    well, it's already in the commit message so I won't repeat the
    long list here ;)

Submitted by:	The FreeBSD(98) Development Team
1996-10-30 22:41:46 +00:00
Bruce Evans
d6b9e17eb5 Improved non-statistical (GUPROF) profiling:
- use a more accurate and more efficient method of compensating for
  overheads.  The old method counted too much time against leaf
  functions.
- normally use the Pentium timestamp counter if available.
  On Pentiums, the times are now accurate to within a couple of cpu
  clock cycles per function call in the (unlikely) event that there
  are no cache misses in or caused by the profiling code.
- optionally use an arbitrary Pentium event counter if available.
- optionally regress to using the i8254 counter.
- scaled the i8254 counter by a factor of 128.  Now the i8254 counters
  overflow slightly faster than the TSC counters for a 150MHz Pentium :-)
  (after about 16 seconds).  This is to avoid fractional overheads.

files.i386:
permon.c temporarily has to be classified as a profiling-routine
because a couple of functions in it may be called from profiling code.

options.i386:
- I586_CTR_GUPROF is currently unused (oops).
- I586_PMC_GUPROF should be something like 0x70000 to enable (but not
  use unless prof_machdep.c is changed) support for Pentium event
  counters.  7 is a control mode and the counter number 0 is somewhere
  in the 0000 bits (see perfmon.h for the encoding).

profile.h:
- added declarations.
- cleaned up separation of user mode declarations.

prof_machdep.c:
Mostly clock-select changes.  The default clock can be changed by
editing kmem.  There should be a sysctl for this.

subr_prof.c:
- added copyright.
- calibrate overheads for the new method.
- documented new method.
- fixed races and and machine dependencies in start/stop code.

mcount.c:
Use the new overhead compensation method.

gmon.h:
- changed GPROF4 counter type from unsigned to int.  Oops, this should
  be machine-dependent and/or int32_t.
- reorganized overhead counters.

Submitted by:	Pentium event counter changes mostly by wollman
1996-10-17 19:32:31 +00:00
Garrett Wollman
dadd5f3a95 Added a $Id$ keyword. Bruce still needs to put a copyright notice
on this file.
1996-04-08 16:41:06 +00:00
Bruce Evans
912e603778 Implemented non-statistical kernel profiling. This is based on
looking at a high resolution clock for each of the following events:
function call, function return, interrupt entry, interrupt exit,
and interesting branches.  The differences between the times of
these events are added at appropriate places in a ordinary histogram
(as if very fast statistical profiling sampled the pc at those
places) so that ordinary gprof can be used to analyze the times.

gmon.h:
Histogram counters need to be 4 bytes for microsecond resolutions.
They will need to be larger for the 586 clock.
The comments were vax-centric and wrong even on vaxes.  Does anyone
disagree?

gprof4.c:
The standard gprof should support counters of all integral sizes
and the size of the counter should be in the gmon header.  This
hack will do until then.  (Use gprof4 -u to examine the results
of non-statistical profiling.)

config/*:
Non-statistical profiling is configured with `config -pp'.
`config -p' still gives ordinary profiling.

kgmon/*:
Non-statistical profiling is enabled with `kgmon -B'.  `kgmon -b'
still enables ordinary profiling (and distables non-statistical
profiling) if non-statistical profiling is configured.
1995-12-29 15:30:05 +00:00