Re: [Crash-utility] Re:[RFC] Crash patch for DWARF CFI based unwind support

Monday, 23 October 2006

Rachita Kothiyal wrote:

...
 On Mon, Oct 23, 2006 at 09:31:11AM -0400, Dave Anderson wrote:
 >
 > Hmmm, yeah, good catch...
 >
 > But what happens the second time around, anyway?  Are the RSP/RIP
 > starting points so different such that the low_budget tracer's output
 > is so drastically different?  Or does it go off into the weeds because
 > the other user_regs_struct register offsets (that don't get initialized)
 > cause an OFFSET() failure?
 >

 Hi Dave

 This is what I get when I try it out on one my dumps:

 crash> bt      // ----------------------------------------------> 1
 PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
  #0 [ffff810223d73d78] crash_kexec at ffffffff801521d1
  #1 [ffff810223d73dc0] machine_kexec at ffffffff8011a739
  #2 [ffff810223d73e00] crash_kexec at ffffffff801521ed
  #3 [ffff810223d73e88] crash_kexec at ffffffff801521d1
  #4 [ffff810223d73eb8] __sysrq_get_key_op at ffffffff80288b29
  #5 [ffff810223d73ec0] __handle_sysrq at ffffffff80288d37
  #6 [ffff810223d73f00] write_sysrq_trigger at ffffffff801adf95
  #7 [ffff810223d73f10] vfs_write at ffffffff80179bf0
  #8 [ffff810223d73f40] sys_write at ffffffff8017a187
  #9 [ffff810223d73f80] system_call at ffffffff801096da
     RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
     RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
     RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
     RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
     R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
     R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
 crash> set unwind on // ------------------------------------------> 2
 unwind: on
 crash> bt
 PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
  #0 [ffff810223d73e08] crash_kexec at ffffffff801521d1
  #1 [ffff810223d73ec8] __handle_sysrq at ffffffff80288d37
  #2 [ffff810223d73f08] write_sysrq_trigger at ffffffff801adf95
  #3 [ffff810223d73f18] vfs_write at ffffffff80179bf0
  #4 [ffff810223d73f48] sys_write at ffffffff8017a187
  #5 [ffff810223d73f88] system_call at ffffffff801096da
     RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
     RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
     RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
     RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
     R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
     R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
 crash> set unwind off //-------------------------------------> 3
 unwind: off
 crash> bt
 PID: 4102   TASK: ffff81022e94d1e0  CPU: 0   COMMAND: "bash"
  #0 [ffff810223d73e88] crash_kexec at ffffffff801521d1
  #1 [ffff810223d73eb8] __sysrq_get_key_op at ffffffff80288b29
  #2 [ffff810223d73ec0] __handle_sysrq at ffffffff80288d37
  #3 [ffff810223d73f00] write_sysrq_trigger at ffffffff801adf95
  #4 [ffff810223d73f10] vfs_write at ffffffff80179bf0
  #5 [ffff810223d73f40] sys_write at ffffffff8017a187
  #6 [ffff810223d73f80] system_call at ffffffff801096da
     RIP: 00002b3979ef3900  RSP: 00007fff3122f360  RFLAGS: 00010287
     RAX: 0000000000000001  RBX: ffffffff801096da  RCX: 00000000fbad2a84
     RDX: 0000000000000002  RSI: 00002b397a181000  RDI: 0000000000000001
     RBP: 0000000000000002   R8: 00000000ffffffff   R9: 00002b397a072ae0
     R10: 0000000000000000  R11: 0000000000000246  R12: 00002b397a06c780
     R13: 00002b397a181000  R14: 0000000000000002  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

 The register values which the last bt starts working with(marked 3 above),
 are rsp=ffff810223d73e08 and rip=ffffffff801521d1 (from NT_PRSTATUS).
 So from that point in stack, output 3 and 1 are same.

 We also see that stack addresses in 2 and 3 are off by '0x8'.
  eg #1 ffff810223d73ec8 ------------> 2
        ffff810223d73ec0 ------------> 3

 This is because what crash is reporting is the stack address at which
 the return address was pushed on stack, while what the dwarf based bt is
 reporting is the CFA. In most cases, return address is stored at a location
 (CFA - 8). That is why the offset of 0x8.

 The low-budget tracer's backtraces are different from the dwarf-tracer
 because when the low-budget tracer is unwinding the stack by trying to read
 kernel text addresses, it actually comes across many addresses which were
 actually not pushed onto stack because of function calls.
 Specially for the panic task on kdumps, where after 'crash_kexec' is called,
 the registers are dumped onto stack(for creating NT_PRSTATUS section), this
 becomes misleading for the low-budget tracer mechanism. Thats why we see
 multiple crash_kexec entries in the backtrace. Static inline functions can
 also aggrevate this problem.

 In other cases, stale frames on the stack can also mislead the low-budget
 tracer.

 AFAICT, user_regs_struct register offsets are not the culprits here.

 Thanks
 Rachita 
So, in other words, if we hardwire the user_regs_struct so that
it uses the NT_PRSTATUS registers all the time, then we get
the second (preferred/better) budget back trace when unwind
is off.

That being the case, I argue for hardwiring them all the time.

Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Re:[RFC] Crash patch for DWARF CFI based unwind support