Maneesh Soni wrote:
On Thu, Aug 24, 2006 at 09:15:34AM -0400, Dave Anderson wrote:Hmmm, and gdb frames #4 and #5 below are? I understand
> Rachita Kothiyal wrote:
>
> > Hi Dave
> >
> > I was trying to implement better backtrace mechanism for crash using
> > dwarf info. And was trying to use the embedded gdb itself as gdb
> > already uses dwarf information for unwinding stack. I could get
> > "gdb bt" command working in "crash" after making one minor bug
> > fix in gdb_interface.c (Patch appended). Now one can get cleaner
> > backtrace particularly in x86_64 case using "gdb bt" command.
> >
>
> Wow -- your definition of "cleaner" apparently is different than mine... ;-)
>Looks are sometimes deceptive ;-).. Using gdb's stack winding code, the
unwanted stack frames (like frames # 4, 6, 7, 8) are avoided.> >
> > crash> bt
> > PID: 4146 TASK: ffff81022e848af0 CPU: 0 COMMAND: "insmod"
> > #0 [ffff81021efadbf8] crash_kexec at ffffffff801521d1
> > #1 [ffff81021efadc40] machine_kexec at ffffffff8011a739
> > #2 [ffff81021efadc80] crash_kexec at ffffffff801521ed
> > #3 [ffff81021efadd08] crash_kexec at ffffffff801521d1
> > #4 [ffff81021efadd30] bust_spinlocks at ffffffff8011fd6d
> > #5 [ffff81021efadd40] panic at ffffffff80131422
> > #6 [ffff81021efadda0] cond_resched at ffffffff804176c3
> > #7 [ffff81021efaddb0] wait_for_completion at ffffffff80417701
> > #8 [ffff81021efade00] __down_read at ffffffff80418d07
> > #9 [ffff81021efade30] fun2 at ffffffff80107017
> > #10 [ffff81021efade40] fun1 at ffffffff801311b6
> > #11 [ffff81021efade50] init_module at ffffffff8800200f
> > #12 [ffff81021efade60] sys_init_module at ffffffff8014c664
> > #13 [ffff81021efadf00] init_module at ffffffff88002068
> > #14 [ffff81021efadf80] system_call at ffffffff801096da
> > RIP: 00002b2153382d4a RSP: 00007fff57900a28 RFLAGS: 00010246
> > RAX: 00000000000000af RBX: ffffffff801096da RCX: 0000000000000000
> > RDX: 0000000000512010 RSI: 0000000000016d26 RDI: 00002b21531e5010
> > RBP: 00007fff57900c58 R8: 00002b21534f46d0 R9: 00002b21531fbd36
> > R10: 0000000000516040 R11: 0000000000000206 R12: 0000000000512010
> > R13: 00007fff579015c5 R14: 0000000000000000 R15: 00002b21531e5010
> > ORIG_RAX: 00000000000000af CS: 0033 SS: 002b
No actually it is doing the right thing currently; and it
> > crash> gdb bt 15
> > [Switching to thread 1 (process 4146)]#0 0xffffffff801521d1 in crash_kexec (regs=0x0) at kexec.h:64
> > 64 in kexec.h
> > #0 0xffffffff801521d1 in crash_kexec (regs=0x0) at kexec.h:64
> > #1 0xffffffff80131422 in panic (fmt=0xffffffff8044832c "Rachita triggering panic\n") at kernel/panic.c:87
> > #2 0xffffffff80107017 in fun2 (i=0) at init/main.c:608
> > #3 0xffffffff801311b6 in fun1 (j=Variable "j" is not available.
> > ) at kernel/panic.c:278
> > #4 0xffffffff8800200f in ?? ()
> > #5 0xffffc2000023d9d0 in ?? ()
> > #6 0xffffffff8014c664 in sys_init_module (umod=0xffff81022ef6c400, len=18446604445110683424,
> > uargs=0xffff81022ef6c6e8 "\020304366.\002\201377377x304366.\002\201377377340304366.\002\201377377H305366.\002\201377377260305366.\002\201377377\030306366.\002\201377377\200306366.\002\201377377")
> > at kernel/module.c:1911
> > #7 0xffffffff801096da in system_call () at bitops.h:230
> > #8 0x00002b2153382d4a in ?? ()
> > #9 0xffff81022e8516d0 in ?? ()
> > #10 0xffffffff8055c7c0 in migration_notifier ()
> > #11 0x0000000000000000 in ?? ()
> > #12 0x0000000000000001 in ?? ()
> > #13 0xffffffffffffffff in ?? ()
> > #14 0xffffffff8013ae2a in recalc_sigpending () at kernel/signal.c:227
> > (More stack frames follow...)
> > crash>
> >
> > ===============================================================================
> >
> > But as of now there are few issues with "gdb bt"
> >
> > 1) Sometimes the no. of stack frames displayed doesn't end for a long time
> > and also the "q" command doesn't work as desired once the screen is full.
> > The workaround is to give some limiting count like "gdb bt 10".
> > I tried gdb ver 6.1 externally (outside crash) also and see the same
> > long ending stack frames where as the latest gdb (ver 6.4), works fine. So
> > just wondering if you are planning to upgrade embedded gdb to ver 6.4?
> >
>
> Not really. That's a major undertaking with unpredictable results
> until it's attempted. Every time I do that, nightmares follow, so only
> if we get to the point where gdb-6.1 doesn't work at all, or cripples
> crash's use of it with a new vmlinux, should we even think of doing that.
>
>
> >
> > 2) As unlike crash, there is no concept of tasks in gdb, we can only see the
> > backtraces for tasks active at the time of crash.
> >
> >
> > Apart from "bt" this change also allows to get some other related commands
> > like "gdb info registers", "gdb info frame" and "gdb info threads" working.
> >
>
> Well, right off the bat, I'm not too keen on passing the vmcore to gdb,
> because I don't know what the unseen ramifications of that would be.
> Even so, you can't just do an "argc++" in gdb_main_loop() because
> that apparently presumes that crash is receiving *only* two arguments,
> in the "vmlinux vmcore" order. That cannot be presumed obviously,
> as the possible combinations of crash command line options/ordering
> are endless.
>
In anycase, currently gdb_main_loop() is not passing the right "argc"
to gdb.
The whole point is that crash's use of gdb was never
intended to pass it the vmcore file. gdb certainly won't
have a clue as to what to do with diskdump compressed-format
dumpfiles, xendump dumpfiles, xen "xm save" files, LKCD dumpfiles,
mcore dumpfiles, and will probably have issues with netdump and
diskdump vmcores for that matter -- since they don't save
the register sets of the non-panic cpus.
Correct. And once it's done, we have the best
> Secondly, until I see something useful in the case where the kernel
> takes an in-kernel exception that in turn causes the crash, I'm
> unconvinced. What does the trace look like if you take an
> oops or BUG() while running in kernel mode? Does gdb step
> past that point? (i.e., to the part of the backtrace we'd actually
> want to see) Certainly we won't see a register dump at the exact
> point of the exception. Would it make the jump from the x86_64
> interrupt stack (or any of the exception stacks) back to the
> process stack?
>Rachita, could you please test the patch which such dump also?
> Given that it only gives backtraces of the active tasks, we're
> still left with a half-baked implementation.
>
> And now, with the introduction of the new CONFIG_UNWIND_INFO
> and CONFIG_STACK_UNWIND configurations in the x86 and x86_64
> kernels, wouldn't it make more sense to utilize the approach taken by
> the crash-utility/ia64 unwind facility? Although the x86/x86_64
> implementation still appears to be a work in progress in the kernel,
> backporting that capability from the kernel to user-space would seem
> to be more useful. That's what was done for ia64, and for that reason
> it's the only architecture where we get dependable backtraces for
> all tasks, active or not.
>IIUC, this will involve writing dwarf support code for crash as a whole.
For that matter, if you really feel the need to jam
> Simple question -- and to be quite honest with you -- I don't
> understand why you wouldn't want to simply use gdb alone
> in this case?
>
>
Instead of writing stack unwinding code using dwarf info from
scratch, gdb's code was re-used.
crash> !gdb vmlinux vmcore
...
[do whatever you want with (the latest-and-greatest) gdb]
...
(gdb) q
crash>
Thanks,
Dave