Heavily re-worked.

Updated to 2.0 . Included sections about how to use DDB, post-mortem analysis of a kernel crash where you didn't anticipate it and therefore didn't config -g it. Added a real-world example of a kgdb session.
svn path=/head/; revision=5343
1995-01-02 12:01:59 +00:00 · 1995-01-02 12:01:59 +00:00 · e02a6ccb4e · 2020-12-20 02:59:44 +00:00
commit e02a6ccb4e
parent 5cf1ec501a
1 changed files with 404 additions and 20 deletions
--- a/share/FAQ/kernel-debug.FAQ
+++ b/share/FAQ/kernel-debug.FAQ
@ -1,33 +1,417 @@
-                         Kernel debugging FAQ
-                    for FreeBSD 1.1.5.1 and below
+                   Kernel debugging FAQ for FreeBSD

-Last modified: $Id: kernel-debug.FAQ,v 1.1 1994/09/11 10:56:06 jkh Exp $
+Last modified: $Id: kernel-debug.FAQ,v 1.2 1994/10/03 03:19:41 gclarkii Exp $

-Here are some instructions for getting kernel debugging working on
-a crash dump, it assumes that you have enough swap space for a crash
-dump.

-*** Start ***
+*** Debugging a kernel crash dump with kgdb ***

-Config you're kernel using config -g
+  Here are some instructions for getting kernel debugging working on a
+  crash dump, it assumes that you have enough swap space for a crash
+  dump.  If you happen to have multiple swap partitions with the first
+  one being too small to keep the dump, you can configure your kernel to
+  use an alternate dump device (in the ``kernel'' line).  Dumps to non-
+  swap devices (e.g. tapes) are currently not supported.

-Remove ${STRIP} -x $@; from the Makefile for the kernel so it doesn't
-get stripped.
+  Config your kernel using config -g

-When the kernel's been built make a copy of it, say 386BSD.debug, and
-then run strip -x on the original. Install the original as normal.
+  Remember that you need to specify ``options DODUMP'' in your config
+  file in order to get kernel core dumps.

-Now, after a crash dump, go to /sys/compile/WHATEVER and run kgdb. From kgdb
-do:
+  When the kernel's been built make a copy of it, say kernel.debug, and
+  then run strip -x on the original. Install the original as normal.
+  You may also install the unstripped kernel, but symtab lookup time
+  for some programs might drastically increase.

-symbol-file 386BSD.debug
-exec-file /var/crash/system.0
-core-file /var/crash/ram.0
+  If you are testing a new kernel (e.g. by typing the new kernel's
+  name at the boot prompt), but need to boot a different one in order
+  to get your system up & running again, do boot it only into single
+  user state (the -s flag at the boot prompt), and then perform the
+  following steps:

-and viola, you can debug the crash dump using the kernel sources just like
-you can for any other program. 
+  fsck -p
+  mount -a -t ufs       # so your file system for /var/crash is writable
+  savecore -N /kernel.panicked /var/crash
+  exit                  # ...to multi-user
+
+  This instructs savecore to use another kernel for symbol name
+  extraction; it would default to the currently running kernel
+  otherwise.
+
+  Now, after a crash dump, go to /sys/compile/WHATEVER and run
+  kgdb. From kgdb do:
+
+  symbol-file kernel.debug
+  exec-file /var/crash/system.0
+  core-file /var/crash/ram.0
+
+  and voila, you can debug the crash dump using the kernel sources
+  just like you can for any other program.
+
+  If your kernel panicked due to a trap (perhaps the most common case
+  for getting a core dump), the following trick might help you.  Examine
+  the stack (`where') and look for the stack frame in the function
+  trap().  Go `up' to that frame, and then type:
+
+  frame frame->tf_ebp frame->tf_eip
+
+  This will tell kgdb to go to the stack frame explicitly named by a
+  frame pointer and instruction pointer, which is the location where
+  the trap occured.  There are still some bugs in kgdb (you can go
+  `up' from there, but not `down'; the stack trace will still remain
+  as it was before going to here), but generally this method will lead
+  you much closer to the failing piece of code.
+
+  Here's a script log of a kgdb session illustrating the above.  Long
+  lines have been folded to improve readability, and the lines are
+  numbered for reference.  Despite of this, it's a real-world error
+  trace taken during the development of the pcvt console driver.
+
+   1:Script started on Fri Dec 30 23:15:22 1994
+   2:uriah # cd /sys/compile/URIAH
+   3:uriah # kgdb kernel /var/crash/vmcore.1 
+   4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done.
+   5:IdlePTD 1f3000
+   6:panic: because you said to!
+   7:current pcb at 1e3f70
+   8:Reading in symbols for ../../i386/i386/machdep.c...done.
+   9:(kgdb) where
+  10:#0  boot (arghowto=256) (../../i386/i386/machdep.c line 767)
+  11:#1  0xf0115159 in panic ()
+  12:#2  0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698)
+  13:#3  0xf010185e in db_fncall ()
+  14:#4  0xf0101586 in db_command (-266509132, -266509516, -267381073)
+  15:#5  0xf0101711 in db_command_loop ()
+  16:#6  0xf01040a0 in db_trap ()
+  17:#7  0xf0192976 in kdb_trap (12, 0, -272630436, -266743723)
+  18:#8  0xf019d2eb in trap_fatal (...)
+  19:#9  0xf019ce60 in trap_pfault (...)
+  20:#10 0xf019cb2f in trap (...)
+  21:#11 0xf01932a1 in exception:calltrap ()
+  22:#12 0xf0191503 in cnopen (...)
+  23:#13 0xf0132c34 in spec_open ()
+  24:#14 0xf012d014 in vn_open ()
+  25:#15 0xf012a183 in open ()
+  26:#16 0xf019d4eb in syscall (...)
+  27:(kgdb) up 10
+  28:Reading in symbols for ../../i386/i386/trap.c...done.
+  29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\
+  30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\
+  31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\
+  32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\
+  33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\
+  34:ss = -266427884}) (../../i386/i386/trap.c line 283)
+  35:283                             (void) trap_pfault(&frame, FALSE);
+  36:(kgdb) frame frame->tf_ebp frame->tf_eip
+  37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done.
+  38:#0  0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\
+  39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403)
+  40:403             return ((*linesw[tp->t_line].l_open)(dev, tp));
+  41:(kgdb) list
+  42:398        
+  43:399             tp->t_state |= TS_CARR_ON;
+  44:400             tp->t_cflag |= CLOCAL;  /* cannot be a modem (:-) */
+  45:401     
+  46:402     #if PCVT_NETBSD || (PCVT_FREEBSD >= 200)
+  47:403             return ((*linesw[tp->t_line].l_open)(dev, tp));
+  48:404     #else
+  49:405             return ((*linesw[tp->t_line].l_open)(dev, tp, flag));
+  50:406     #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */
+  51:407     }
+  52:(kgdb) print tp
+  53:Reading in symbols for ../../i386/i386/cons.c...done.
+  54:$1 = (struct tty *) 0x1bae
+  55:(kgdb) print tp->t_line
+  56:$2 = 1767990816
+  57:(kgdb) up
+  58:#1  0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\
+  59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126)
+  60:       return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
+  61:(kgdb) up
+  62:#2  0xf0132c34 in spec_open ()
+  63:(kgdb) up
+  64:#3  0xf012d014 in vn_open ()
+  65:(kgdb) up
+  66:#4  0xf012a183 in open ()
+  67:(kgdb) up
+  68:#5  0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\
+  69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\
+  70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \
+  71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \
+  72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673)
+  73:673             error = (*callp->sy_call)(p, args, rval);
+  74:(kgdb) up
+  75:Initial frame selected; you cannot go up.
+  76:(kgdb) quit
+  77:uriah # exit
+  78:exit
+  79:
+  80:Script done on Fri Dec 30 23:18:04 1994
+
+  Comments to the above script:
+  
+  line  6:  this is a dump taken from within DDB (see below), hence the
+            panic comment ``because you said to!'', and a rather long
+            stack trace; the initial reason for going into DDB has been
+            a page fault trap though
+  
+  line 20:  the location of function ``trap()'' in the stack trace
+  
+  line 36:  force usage of a new stack frame, kgdb responds and displays
+            the source line where the trap happened; from looking at the
+            code, there's a high probability that either the pointer
+            access for ``tp'' was messed up, or the array access was
+            out of bounds
+  
+  line 52:  the pointer looks suspicious, but happens to be a valid
+            address...
+  
+  line 56:  ... but obviously points to garbage, so we have found our
+            error, sigh!  [For those uncommon with that particular piece
+            of code: tp->t_line refers to the line discipline of the
+            console device here, which must be a rather small integer
+            number.]
+  
+
+
+*** Post-mortem analysis of a dump ***
+
+  What to do if a kernel dumped core but you didn't expect it, and it's
+  therefore not compiled using config -g?
+
+  Not everything is lost here.  Don't panic. :-)
+
+  Of course, you still need to configure all your kernels with the
+  DODUMP option being set, otherwise you won't get a core dump at all.
+  (This is for safety reasons in the default kernels, to avoid them
+  trying to dump e.g. during system installation where there's no
+  FreeBSD partition at all and valuable data on the disk could be
+  destroyed.)
+
+  Go to your kernel compile directory, and edit the line containing
+  COPTFLAGS?=-O.  Add the `-g' option there (but DON'T change anything
+  on the level of optimization).  If you do already know roughly the
+  probable location of the failing piece of code (e.g., the `pcvt'
+  driver in the example above), remove all the object files for this
+  code.  Rebuild the kernel. Due to the time stamp change on the
+  Makefile, there will be some other object files rebuild, e.g.
+  trap.o.  With a bit of luck, the added -g option won't change
+  anything for the generated code, so you'll finally get a new kernel
+  with similiar code to the faulting one but some debugging symbols.
+  You should at least verify the old and new sizes with the `size'
+  command; if they mismatch, you probably need to give up here.
+
+  Go and examine the dump as described above.  The debugging symbols
+  might be incomplete for some places (as can be seen in the stack trace
+  in the example above: some functions are displayed without line
+  numbers and argument lists).  If you need more debugging symbols,
+  remove the appropriate object files and repeat the kgdb session until
+  you know enough.
+
+  All this is not guaranteed to work, but most likely will do it fine.



-  Paul Richards, FreeBSD core team member.
+*** On-line kernel debugging using DDB ***
+
+  While kgdb as an offline debugger provides a very high level of user
+  interface (e.g. it can lookup source files, display C structures
+  etc.), there are some things it cannot do.  The most important ones
+  being breakpointing and single-stepping kernel code.
+
+  If you need to do low-level debugging on your kernel, there's an on-
+  line debugger available called DDB.  It allows to set breakpoints,
+  single-step kernel functions, examine and change kernel variables
+  etc.  It can however not access kernel source files, and it does
+  only have access to the global and static symbols, but not to the
+  full debug information (including type and line number information)
+  like kgdb.
+
+  To configure your kernel to include DDB, add the option lines
+
+        options DDB
+        options "SYMTAB_SPACE=XXXX"
+
+  to your config file, and rebuild.  XXXX is the amount of space to be
+  reserved into a global array DDB examines to find its symbols at run
+  time.  It must be large enough to hold all symbols, but not too
+  large at all to avoid wasting space.  100000 Bytes are a good first
+  bet for a ``normal'' kernel.  The link stage will tell you about the
+  usage of the symtab space, you'll see something like:
+
+  dbsym: need 98765; avail 100000
+
+  If the amount of allocated space has been too small, the above
+  message is accompanied by the following error message:
+
+  not enough room in db_symtab array
+
+  and the link stage fails.  You then need to increase the number,
+  reconfig and recompile.  If your config(8) has been compiled to not
+  remove the old compile directory before continuing (this is a
+  compile-time option [CONFIG_DONT_CLOBBER]), you need to remove
+  db_aout.o prior to recompilation; this is the only file being
+  affected by the SYMTAB_SPACE option.
+
+
+  Once your DDB kernel is running, there are several ways to enter
+  DDB.  The first (and most early) way is to set the boot flag `-d'
+  (right at the boot prompt).  The kernel will start up in debug mode
+  and enter DDB prior to any device probing.  Hence you are able to
+  even debug the device probe/attach functions.
+
+  The second scenario is a hot-key on the keyboard, usually Ctrl-Alt-
+  ESC.  (For syscons, this can be remapped, and some of the
+  distributed maps do this, so watch out.)  There are patches
+  available for a COMCONSOLE kernel, ask me (joerg@FreeBSD.org) for
+  them.
+
+  The third way is that any panic condition will branch to DDB if the
+  kernel is configured to use it.  (Thus it is not wise to configure a
+  kernel with DDB for a machine running unattended.)
+
+
+  The DDB commands roughly resemble some gdb commands.  The first you
+  probably need is to set a breakpoint:
+
+  b function-name
+  b address
+
+  Numbers are taken hexadecimal by default, but to make them distinct
+  from symbol names, hex numbers starting with the letters `a' - `f'
+  need to be preceded with `0x' (for other numbers, this is optional).
+  Simple expressions are allowed, e.g. ``function-name + 0x103''.
+
+  To continue the operation of an interrupted kernel, simply type
+
+  c
+
+  To get a stack trace, use
+
+  trace
+
+  Note that when entering DDB via a hot-key, the kernel is currently
+  servicing an interrupt, so the stack trace might be not of much use
+  for you.
+
+  If you want to remove a breakpoint, use
+
+  del
+  del address-expression
+
+  The first form will be accepted immediately after a breakpoint hit,
+  and deletes the current breakpoint.  The second form can remove any
+  breakpoint, but you need to specify the exact address, as it can be
+  obtained from
+
+  show b
+
+  To single-step the kernel, try
+
+  s
+
+  This will step into functions, but you can make DDB trace them until
+  the matching return statement is reached by
+
+  n
+
+  NOTE: this is different from gdb's ``next'' statement, it's like
+  gdb's ``finish''.
+
+  To examine data from memory, use e.g.
+
+  x/wx 0xf0133fe0,40
+  x/hd db_symtab_space
+  x/bc termbuf,10
+  x/s stringbuf
+
+  for word/halfword/byte access, and hexadecimal/decimal/character/
+  string display.  The number after the comma is the object count.
+  To display the next 0x10 items, simply use
+
+  x ,10
+
+  Similiarly, use
+
+  x/ia foofunc,10
+
+  to disassemble the first 0x10 instructions of foofunc, and display
+  them along with their offset from the beginning of foofunc.
+
+  To modify the memory, use the write command:
+
+  w/b termbuf 0xa 0xb 0
+  w/w 0xf0010030 0 0
+
+  The command modifier (b/h/w) specifies the size of the data to be
+  writtten, the first following expression is the address to write to,
+  the remainder is interpreted as data to write to successive memory
+  locations.
+
+  If you need to know the current registers, use
+
+  show reg
+
+  Alternatively, you can display a single register value by e.g.
+
+  print $eax
+
+  and modify it by
+
+  set $eax new-value
+
+
+  Should you need to call some kernel functions from DDB, simply
+  say
+
+  call func(arg1, arg2, ...)
+
+  The return value will be printed.
+
+  For a ps-style summary of all running processes, use
+
+  ps
+
+
+
+  Well, you've now examined why your kernel failed, and you wish to
+  reboot.  Remember that, depending on the severity of previous
+  malfunctioning, not all parts of the kernel might still be working
+  as expected.  Perform one of the following actions to shut down and
+  reboot your system:
+
+
+  call diediedie()
+
+  (must usually be followed by another ``c[ontinue]'' statement),
+  will cause your kernel to dump core and reboot, so you can later
+  analyze the core on a higher level with kgdb.
+
+
+  call boot(0)
+
+  might be a good way to cleanly shut down the running system, sync()
+  all disks, and finally reboot.  As long as the disk and file system
+  interfaces of the kernel are not damaged, this might be a good way
+  for an almost clean shutdown.
+
+
+  call cpu_reset()
+
+  ...is the final way out of the desaster, almost similiar to hitting
+  the Big Red Button.
+
+
+
+*** What to do if i want to debug a console driver? ***
+
+  Since you need a console driver to run DDB on, things are more
+  complicated if the console driver itself is flakey.  You might
+  remember the ``options COMCONSOLE'' line, and hook up a standard
+  terminal onto your first serial port.  DDB works on any configured
+  console driver, of course it also works on a COMCONSOLE.
+
+
+
+  Paul Richards, FreeBSD core team member. (paul@FreeBSD.org)
+  J"org Wunsch (joerg@FreeBSD.org)