From: Dave Anderson <anderson(a)redhat.com>
Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump
Date: Fri, 28 Jan 2011 09:31:50 -0500 (EST)
----- Original Message -----
> Also, I have a question about the fact that gcore hanged during the
> process of gathering note information.
>
> I attempted reproducing the bug on 2.6.35.10-74.fc14.x86_64 with
> crash-5.0.6-2.fc14.x86_64 and crash-5.1.1, but it have not been
> reproduced yet: gcore worked well for both crash versions.
>
> I then retried using 2.6.34-2.fc14.x86_64 but failed to boot on the
> same environment as in 2.6.35.10-74.fc14.x86_64.
>
> So, questions I have are: In what kind of environments did you face
> the hang? I want to and need to set up the same environment as
> yours. In Fedora Alpha, its kernel version was already 2.6.35
> according to the release notes:
>
>
http://fedoraproject.org/wiki/Fedora_14_Alpha_release_notes#Linux_Kernel_...
>
> Also, it is helpful if you show me a backtrace during gcore hanging.
I retested it with the latest gcore.tar.bz2 using the same fc14 dumpfile
and it works OK.
That's a good news. I've got confirmed the cause is in restore_frame_pointer().
I did re-verify that it hangs with the older version:
# ls -l /root/gcore.tar.bz2 gcore.tar.bz2
-rw-r--r-- 1 root root 28666 Jan 24 11:05 /root/gcore.tar.bz2 <- hangs
-rw-r--r-- 1 root root 29266 Jan 27 10:15 gcore.tar.bz2 <- works OK
#
(gdb) bt
#0 0x0000003e838cd6a0 in __lseek_nocancel () from /lib64/libc.so.6
#1 0x0000000000534fd8 in read_netdump (fd=-1, bufptr=0x7fffeb5977e0, cnt=8,
addr=18446612134417074248, paddr=2102855752)
at netdump.c:526
#2 0x000000000053b663 in read_kdump (fd=-1, bufptr=0x7fffeb5977e0, cnt=8,
addr=18446612134417074248, paddr=2102855752)
at netdump.c:2553
#3 0x000000000046bc1b in readmem (addr=18446612134417074248, memtype=1,
buffer=0x7fffeb5977e0, size=8,
type=0x2b95faf6d370 "restore_frame_pointer: resume rbp", error_handle=5) at
memory.c:1849
#4 0x00002b95faf6980c in restore_frame_pointer () from ./extensions/gcore.so
#5 0x00002b95faf6a196 in restore_rest () from ./extensions/gcore.so
#6 0x00002b95faf69d51 in genregs_get () from ./extensions/gcore.so
#7 0x00002b95faf6585c in fill_thread_core_info () from ./extensions/gcore.so
#8 0x00002b95faf65ccc in fill_note_info () from ./extensions/gcore.so
#9 0x00002b95faf64755 in gcore_coredump () from ./extensions/gcore.so
#10 0x00002b95faf6a95e in do_gcore () from ./extensions/gcore.so
#11 0x00002b95faf6a7f9 in cmd_gcore () from ./extensions/gcore.so
#12 0x0000000000454631 in exec_command () at main.c:674
#13 0x00000000004544de in main_loop () at main.c:633
#14 0x0000000000578b39 in captured_command_loop (data=0x3) at ./main.c:226
#15 0x0000000000577cfb in catch_errors (func=0x578b30 <captured_command_loop>,
func_args=0x0, errstring=0x82092c "",
mask=<value optimized out>) at exceptions.c:520
#16 0x0000000000579286 in captured_main (data=<value optimized out>) at
./main.c:924
#17 0x0000000000577cfb in catch_errors (func=0x578b70 <captured_main>,
func_args=0x7fffeb597f70, errstring=0x82092c "",
mask=<value optimized out>) at exceptions.c:520
#18 0x00000000005788d4 in gdb_main (args=0x7d56fb40) at ./main.c:939
#19 0x0000000000578916 in gdb_main_entry (argc=<value optimized out>,
argv=0x7d56fb40) at ./main.c:959
#20 0x00000000004d2b7d in gdb_main_loop (argc=2, argv=0x7fffeb598478) at
gdb_interface.c:78
#21 0x0000000000454281 in main (argc=3, argv=0x7fffeb598478) at main.c:547
(gdb)
Thanks for giving me a backtrace. It helps a lot.
It looks to me that restore_frame_pointer() loops here during the
trivial operation of tracing frame pointers on the stack.
I guess from the situation that the values of frame pointer are
looping on the kernel stack. Some of a serise of frame pointers are
broken?
If you're still interested, I can make the vmlinux/vmcore available to you.
I'm still interested in that. Could you provide me with them? I need
to figure out exact situtation of kernel stack relevant to the
behaviour of restore_frame_pointer().
Thanks,
HATAYAMA Daisuke