Bernhard Walle wrote:
Hello,

* Dave Anderson <anderson@redhat.com> [2006-12-08 19:37]:
> I wish I could help you out, but w/respect to anything associated with
> LKCD, I'm only a receptor of patches from the LKCD developers on the
> list, and I personally don't do any work with them at all.
>
> That whole ia64-specific lkcd_fix_mem.c file came from Troy Heber for
> ia64 LKCD dumpfile support (troy.heber@hp.com).  Troy's an active
> contributor on this list, and may have a quick answer -- I'm afraid I
> have no idea what it does...

Anyway, thanks for the information!

> Yes I agree (presuming that eventually the list turns into 64-bit
> symbol values).  But I don't see any attachment other than your pgp
> signature?

Sorry, I just forgot it. Here it is.

And yes, the values are larger in the end. And also our
/boot/System.map on IA64 has zero-prefixes. The map.4 file is
generated by lkcd, and they simply don't use the zero prefixes. Which
should not matter, IMO. So I vote for applying the attached patch.
 

Ah, Ok -- that makes sense now...

And your patch is sane -- and queued for the next release.

 
> I'd never seen those types of __crc_ absolutes, probably because
> they don't show up in our kernels.  The closest 2.6.5-era ia64
> Red Hat kernel (2.6.5-1.358) System.map starts out like this:
>
> a00000010010c5a0 T I_BDEV
> a0000001005bd140 r __ksymtab_I_BDEV
> a0000001005c6ca8 r __kstrtab_I_BDEV
> a00000010030ff60 T QUIRK_LIST
> a0000001005c0950 r __ksymtab_QUIRK_LIST
> a0000001005cb698 r __kstrtab_QUIRK_LIST
> a000000100714ff8 S ROOT_DEV
> a0000001005bb020 r __ksymtab_ROOT_DEV
> a0000001005c4248 r __kstrtab_ROOT_DEV
> a00000010030fd00 T SELECT_DRIVE
> a0000001005c0920 r __ksymtab_SELECT_DRIVE
> a0000001005cb660 r __kstrtab_SELECT_DRIVE
> a00000010030fde0 T SELECT_INTERRUPT
>
> Whatever... maybe a different build CONFIG or something?

Hm ..., I also don't understand why the /boot/System.map of the same
kernel isn't identical to the map.4 file generated by klcd. In fact,
map.4 is missing symbol and gdb fails to load. But even with the
/boot/System.map of the right kernel, it doesn't work (i.e. backtrace
is complete garbage).
 

I'm guessing that the backtrace of the active tasks are bogus,
but all of the sleeping tasks backtraces are OK?  If the LKCD
dump operation does *not* force the panic task and the other
currently-active tasks to drop a switch_stack on their stacks,
you'll not get a backtrace.  The panic task and active tasks in
the netdump, diskdump and kdump facilities all run through
an unw_init_running() as part of their shutdown procedures,
and each cpu stores the address of it's switch_stack in its
current->thread.ksp field.  Then the ia64 backtrace needs no
special handling between active and non-active tasks.

I would have thought that LKCD would do the same kind
of thing?  If the LKCD facilility *does* do that, then
it's a matter of finding the location of the switch_stack
on the kernel stack.

BTW, worst case, you can get a rough idea of what's going
on by using "bt -t", which dumps all of the kernel text addresses
found from just above the task_struct to the end of the stack.

For example, here's a "echo c > /proc/sysrq-trigger" kdump,
where you get a clear backtrace:

crash> bt
PID: 3235   TASK: e0000040484a0000  CPU: 0   COMMAND: "bash"
 #0 [BSP:e0000040484a13e8] machine_kexec at a000000100058a10
 #1 [BSP:e0000040484a13c8] crash_kexec at a0000001000cbea0
 #2 [BSP:e0000040484a13a0] sysrq_handle_crashdump at a0000001003a0680
 #3 [BSP:e0000040484a1350] __handle_sysrq at a00000010039fec0
 #4 [BSP:e0000040484a1320] write_sysrq_trigger at a0000001001e3390
 #5 [BSP:e0000040484a12d0] vfs_write at a000000100156800
 #6 [BSP:e0000040484a1258] sys_write at a000000100157350
 #7 [BSP:e0000040484a1258] ia64_ret_from_syscall at a00000010000c560
  EFRAME: e0000040484a7e40
      B0: 2000000000152820      CR_IIP: a000000000010620
 CR_IPSR: 00001213085a6010      CR_IFS: 0000000000000008
  AR_PFS: c000000000000008      AR_RSC: 000000000000000f
 AR_UNAT: 0000000000000000     AR_RNAT: 0000000000000000
  AR_CCV: 0000000000000000     AR_FPSR: 0009804c8a70033f
  LOADRS: 0000000001b80000 AR_BSPSTORE: 60000fff7fffc380
      B6: 2000000000218cc0          B7: a000000000010640
      PR: 0000000000590a41          R1: 2000000000290238
      R2: c000000000001fc7          R3: 60000ffffe76b6f0
      R8: 0000000000000001          R9: 0000000000000000
     R10: 0000000000000000         R11: c000000000000512
     R12: 60000ffffe76b6d0         R13: 200000000004f790
     R14: 0000000000000063         R15: 0000000000000403
     R16: 60000000000641ff         R17: 60000ffffe76b6b0
     R18: 0000000000000000         R19: 6000000000064210
     R20: 0000000000000001         R21: 6000000000030063
     R22: 2000000000636e79         R23: 6000000000064200
     R24: 0000000000000010         R25: 0000000000000000
     R26: c000000000000004         R27: 000000000000000f
     R28: a000000000010620         R29: 00001213085a6010
     R30: 0000000000000008         R31: 00000000005a0a41
      F6: 000000000000000000000     F7: 000000000000000000000
      F8: 000000000000000000000     F9: 000000000000000000000
     F10: 000000000000000000000    F11: 000000000000000000000
 #8 [BSP:e0000040484a1258] __kernel_syscall_via_break at a000000000010620
crash>

But if I do a "bt -t" on the same task, because the ia64 BSP area
is just above the task_struct, you can see the trace in kind of a
"reverse order":

crash> bt -t
PID: 3235   TASK: e0000040484a0000  CPU: 0   COMMAND: "bash"
              START: machine_kexec at a000000100058a10
  [e0000040484a12b8] ia64_ret_from_syscall at a00000010000c560
  [e0000040484a1308] sys_write at a000000100157350
  [e0000040484a1338] vfs_write at a000000100156800
  [e0000040484a1388] write_sysrq_trigger at a0000001001e3390
  [e0000040484a13b0] __handle_sysrq at a00000010039fec0
  [e0000040484a13d0] sysrq_handle_crashdump at a0000001003a0680
  [e0000040484a13e0] __handle_sysrq at a00000010039fe70
  [e0000040484a13f0] crash_kexec at a0000001000cbea0
  [e0000040484a1420] machine_kexec at a000000100058a10
  [e0000040484a1450] unw_init_running at a00000010000cdb0
  [e0000040484a1488] ia64_machine_kexec at a000000100058c60
  [e0000040484a14a8] ia64_handle_irq at a000000100011cd0
  [e0000040484a14d8] ia64_handle_irq at a000000100011c50
  [e0000040484a14f8] __do_IRQ at a0000001000e4120
  [e0000040484a1508] irq_exit at a000000100087220
  [e0000040484a1538] iosapic_end_level_irq at a00000010004f730
  [e0000040484a1550] __do_IRQ at a0000001000e4070
  [e0000040484a1580] do_softirq at a000000100087150
  [e0000040484a15c0] __do_softirq at a000000100086f90
  [e0000040484a1620] net_rx_action at a00000010051efc0
  [e0000040484a1630] e1000_check_options at a000000200965e18
  [e0000040484a16d0] e1000_clean at a00000020093e8e0
  [e0000040484a16e0] e1000_check_options at a000000200965e18
  [e0000040484a16f0] ip_rcv at a000000100568e40
  [e0000040484a1710] e1000_clean_rx_irq at a000000200944cd0
  [e0000040484a1770] __do_softirq at a000000100086f90
  [e0000040484a17c8] net_rx_action at a00000010051efc0
  [e0000040484a17d8] e1000_check_options at a000000200965e18
  [e0000040484a1880] e1000_clean at a00000020093e8e0
  [e0000040484a1890] e1000_check_options at a000000200965e18
  [e0000040484a18a0] ip_rcv at a000000100568e40
  [e0000040484a18c0] e1000_clean_rx_irq at a000000200944cd0
  [e0000040484a18f8] sync_buffer at a00000010015cb40
  [e0000040484a1910] io_schedule at a000000100620bd0
  [e0000040484a1940] __delayacct_blkio_start at a0000001000ebc70
  [e0000040484a19b8] io_schedule at a000000100620c00
  [e0000040484a78d0] machine_kexec at a000000100058a10
  [e0000040484a7c10] machine_kexec at a000000100058a10
  [e0000040484a7ca0] schedule at a00000010061f7c0
crash>

But -- since the ia64 is the only processor for which you
can get real, dependable, backtraces for, it would be nice
if it could work for LKCD dumpfiles.

Dave
 

 
But first I'll fix the header format which _is_ different in crash and
our SLES9 kernel (and klcdutils), and if it then doesn't work I'll
come back to the system maps.

Thanks for your help!

Regards,
  Bernhard