Re: [Crash-utility] [PATCH 0/4] crash utility: add ARM crashdump support
by Dave Anderson
----- "Lei Wen" <adrian.wenl(a)gmail.com> wrote:
> Hi Dave,
>
> What the status of this patch series now? Could crash utilities
> support analyzing arm machine core dump in the x86 host?
That is the plan, i.e., supporting the analysis of ARM dumpfiles
on both x86 and ARM hosts (and by extension on x86_64 hosts using
the x86 binary), and presumably "live" on ARM hosts.
> This is very useful feature, since the arm kernel already support the
> core dump by kdump enabled.
I'm waiting for the results of the Nokia/Sony-Ericsson collaboration
efforts. I've added Jan and Thomas's names to the cc: list.
Dave
14 years, 3 months
Re: [Crash-utility] mount cmd crashes crash
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> Sorry, forgot to reply all:
> ---------------------------
>
> On Wed, 2010-08-18 at 20:57 +0000, Dave Anderson wrote:
> > ----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> >
> > > I'm working on a dump of a system that did not have a PID 1. I don't
> > > think it's relevant to the crash itself, but it does cause crash get
> > > a seg fault.
>
> > >
> > > I don't know if it was important to have the context of pid 1 for
> > > reporting mounts, or just any context, but this hack makes the problem
> > > go away, although not a very efficient way to find the lowest existing
> > > PID above 0.
> >
> > Yeah, it's not important to use the context of pid 1, but it just needs
> > some context, and I had presumed that init would always exist. I thought
> > that the panic("Attempted to kill the idle task!") in do_exit() would
> > prevent pid 1 from ever going away -- but apparently your kernel figured
> > out how to do it elsewhere... ;-)
>
> That test is for PID 0, not PID 1 (at least on the kernel I'm
> debugging.) However, there is this also:
>
> if (unlikely(tsk == child_reaper))
> panic("Attempted to kill init!");
That's the one I *meant*... ;-)
>
> And child_reaper in the dump points to a task struct for init that isn't
> in the ps listing. Hmmm. Maybe that part *is* interesting in this dump...
>
> >
> > Your patch would pick a kernel thread pid, and apparently everything still
> > works OK? That being the case, it's fine with me.
>
> With the patch, these commands all produce the same output:
> crash-5.0.6-fix> mount >mount.out
> crash-5.0.6-fix> mount -n 2 >mount2.out
> crash-5.0.6-fix> mount -n 1459 >mount1459.out
>
> I discovered the -n option as my first workaround.
Actually, it looks like pid 0 could be used as well.
Anyway, queued for the next release.
Thanks,
Dave
14 years, 3 months
Re: [Crash-utility] Crash issue when loading vmcore
by Dave Anderson
----- "Paul-Kenji Cahier Furuya" <pkc(a)f1-photo.com> wrote:
> On 08/23/2010 22:51, Dave Anderson wrote:
> > So that doesn't make any sense unless the vmlinux file and the
> vmlinux that
> > was running on the crashed kernel are not the same kernels. Are you using
> > a different kernel as the secondary kdump kernel?
>
> Just checked kdump's config and it says:
> # If these are not set, kdump-config will try to use the current
> # and initrd if it is relocatable.
> And I did not set those variables.
>
> However I checked and found out the vmlinuz(bzImage, 7.7MB extracted)
> being run seems to be stripped, while the vmlinux from the kernel
> directory(124MB) is not.
>
> Could this affect the result? Is there any way to deal properly with
> that situation?(I am using my own kernel builds, so I do not have any
> "debug kernel" packages)
It appears that the kdump configuration should be using the
same kernel as the crashed kernel, but would relocate it
when it gets run as the kdump kernel. But that does not
explain the discrepancy between the symbol values listed
by the "VMCOREINFO" data and that of the vmlinux file that
you are using.
The vmlinuz file (with a "z" at the end) is useless for crash.
Crash needs the debuginfo-full vmlinux file that was created by
compiling the kernel with -g, and which is located at the topmost
directory in the kernel source build tree.
In any case, it would be trivial to figure this out if you could
log into the the live system and try to run crash there -- or even
simpler -- run "cat /proc/kallsyms" on that live system. Other than
that, I don't know what else to suggest at this point.
Dave
14 years, 3 months
Re: [Crash-utility] [PATCH 0/4] crash utility: add ARM crashdump support
by Dave Anderson
----- "Mika Westerberg" <ext-mika.1.westerberg(a)nokia.com> wrote:
>
> On Wed, Jun 30, 2010 at 03:10:58PM +0200, ext Dave Anderson wrote:
>
> > In any case, I'm more than happy to fold in ARM support, but I don't know what
> > to do in this case.
> >
> > I wonder if it would it be possible for you, Jan and Thomas to somehow collaborate
> > on this effort? It seems that both sides would benefit from the work of the other
> > side. I've added them to the cc list.
>
> Sure. Can those patches be found in some public ML? I quickly searched but
> couldn't find anything.
>
> Regards,
> MW
Jan is contacting you off-list.
Thanks,
Dave
14 years, 3 months
Re: [Crash-utility] Crash issue when loading vmcore
by Dave Anderson
----- "Paul-Kenji Cahier Furuya" <pkc(a)f1-photo.com> wrote:
> On 08/23/2010 22:20, Dave Anderson wrote:
> > Well, yes, you'd have to be able to log into the machine, and
> > then just run:
> >
> > # crash
> >
> > or if the /vmlinux file is not in a common location, do this:
> >
> > # crash /path/to/vmlinux
> >
> > And that presumes you've got crash installed on the system as well.
> >
> I might be able to get a physical access at some point, but right now I
> have none.
>
> Anything that helps from the logs?
Well, this part is still unexplainable -- in the crashd8.txt, the symbol
addresses that were seen by the crashed kernel are as shown:
# grep SYMBOL crashd8.txt
SYMBOL(init_uts_ns)=c06f9120
SYMBOL(node_online_map)=c0730644
SYMBOL(swapper_pg_dir)=c06e4000
SYMBOL(_stext)=c0101000
SYMBOL(vmlist)=c07d3540
SYMBOL(mem_map)=c07d3500
SYMBOL(contig_page_data)=c072ce80
SYMBOL(log_buf)=c06fc83c
SYMBOL(log_end)=c07bb7ec
SYMBOL(log_buf_len)=c06fc838
SYMBOL(logged_chars)=c07c38a0
#
But the "sym.l" list starts with a unity-mapped PAGE_OFFSET
value of c1000000 (instead of the more common c0000000)
c1000000 (T) _text
c1000000 (T) startup_32
c1000054 (t) default_entry
c1001000 (T) _stext
c1001010 (T) do_one_initcall
c1001180 (t) init_post
c10012c0 (T) name_to_dev_t
c1001500 (T) thread_saved_pc
c1001510 (T) prepare_to_copy
c1001590 (T) get_wchan
c1001640 (T) __switch_to
...
So that being the case, the symbol values for "init_uts_ns", "node_online_map",
and so on, don't even match those of the crashed kernel:
c1662120 (D) init_uts_ns
c1716000 (B) swapper_pg_dir
c1001000 (T) _stext
c173ed3c (B) vmlist
c173ed00 (B) mem_map
c1696180 (D) contig_page_data
c17248a0 (b) __log_buf
c17247ec (b) log_end
c16656b8 (d) log_buf_len
c172c8a0 (b) logged_chars
So that doesn't make any sense unless the vmlinux file and the vmlinux that
was running on the crashed kernel are not the same kernels. Are you using
a different kernel as the secondary kdump kernel?
Dave
node_online_map doesn't even exist in your sym.l file.
14 years, 3 months
Re: [Crash-utility] Crash issue when loading vmcore
by Dave Anderson
----- "Paul Cahier" <pkc(a)f1-photo.com> wrote:
> Hello,
>
> I have finished setting up kdump and kexec today, recompiling my kernel
> to add everything needed in there.
> I have triggered a kernel panic by echo c>/proc/sysrq-trigger, and found
> that the vmcore dump was indeed there after all was done.
>
> However I can not get any traces out of that crash dump(short version,
> long version at the end of the email):
>
> crash /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
> [...]
> crash: read error: kernel virtual address: c148a9a0 type: "kernel_config_data"
> WARNING: cannot read kernel_config_data
> crash: read error: kernel virtual address: c1487e28 type: "cpu_possible_mask"
The virtual addresses for "kernel_config_data" and "cpu_possible_mask" are
strange (too high?) -- I'll continue the analysis at the end of your "d7"
output below...
> If I try crash --minimal things do load but I'm stuck with the minimal
> error set that's not very helpful.
> All I'm looking at is getting a full trace of the kernel panic.
>
>
> - Paul-Kenji Cahier
>
>
> PS, the full version:
> crash -d7 /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
>
> crash 5.0.6
> Copyright (C) 2002-2010 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005, 2006 Fujitsu Limited
> Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public
> License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> vmcore_data:
> flags: a0 (KDUMP_LOCAL|KDUMP_ELF32)
> ndfd: 3
> ofp: b77344c0
> header_size: 1860
> num_pt_load_segments: 9
> pt_load_segment[0]:
> file_offset: 744
> phys_start: 0
> phys_end: a0000
> zero_fill: 0
> pt_load_segment[1]:
> file_offset: a0744
> phys_start: 100000
> phys_end: 1000000
> zero_fill: 0
> pt_load_segment[2]:
> file_offset: fa0744
> phys_start: 5000000
> phys_end: 38000000
> zero_fill: 0
> pt_load_segment[3]:
> file_offset: 33fa0744
> phys_start: 38000000
> phys_end: 3e5ff000
> zero_fill: 0
> pt_load_segment[4]:
> file_offset: 3a59f744
> phys_start: 3e6c6000
> phys_end: 3f594000
> zero_fill: 0
> pt_load_segment[5]:
> file_offset: 3b46d744
> phys_start: 3f59c000
> phys_end: 3f62a000
> zero_fill: 0
> pt_load_segment[6]:
> file_offset: 3b4fb744
> phys_start: 3f62e000
> phys_end: 3f6a9000
> zero_fill: 0
> pt_load_segment[7]:
> file_offset: 3b576744
> phys_start: 3f6e9000
> phys_end: 3f6ed000
> zero_fill: 0
> pt_load_segment[8]:
> file_offset: 3b57a744
> phys_start: 3f6ff000
> phys_end: 3f700000
> zero_fill: 0
> elf_header: 85368c0
> elf32: 85368c0
> notes32: 85368f4
> load32: 8536914
> elf64: 0
> notes64: 0
> load64: 0
> nt_prstatus: 8536a34
> nt_prpsinfo: 0
> nt_taskstruct: 0
> task_struct: 0
> page_size: 0
> switch_stack: 0
> xen_kdump_data: (unused)
> num_prstatus_notes: 2
> vmcoreinfo: 0
> size_vmcoreinfo: 0
> nt_prstatus_percpu:
> 08536a34 08536ad8
>
> Elf32_Ehdr:
> e_ident: \177ELF
> e_ident[EI_CLASS]: 1 (ELFCLASS32)
> e_ident[EI_DATA]: 1 (ELFDATA2LSB)
> e_ident[EI_VERSION]: 1 (EV_CURRENT)
> e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV)
> e_ident[EI_ABIVERSION]: 0
> e_type: 4 (ET_CORE)
> e_machine: 3 (EM_386)
> e_version: 1 (EV_CURRENT)
> e_entry: 0
> e_phoff: 34
> e_shoff: 0
> e_flags: 0
> e_ehsize: 34
> e_phentsize: 20
> e_phnum: a
> e_shentsize: 0
> e_shnum: 0
> e_shstrndx: 0
> Elf32_Phdr:
> p_type: 4 (PT_NOTE)
> p_offset: 372 (174)
> p_vaddr: 0
> p_paddr: 0
> p_filesz: 1488 (5d0)
> p_memsz: 1488 (5d0)
> p_flags: 0 ()
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 1860 (744)
> p_vaddr: c0000000
> p_paddr: 0
> p_filesz: 655360 (a0000)
> p_memsz: 655360 (a0000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 657220 (a0744)
> p_vaddr: c0100000
> p_paddr: 100000
> p_filesz: 15728640 (f00000)
> p_memsz: 15728640 (f00000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 16385860 (fa0744)
> p_vaddr: c5000000
> p_paddr: 5000000
> p_filesz: 855638016 (33000000)
> p_memsz: 855638016 (33000000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 872023876 (33fa0744)
> p_vaddr: ffffffff
> p_paddr: 38000000
> p_filesz: 106950656 (65ff000)
> p_memsz: 106950656 (65ff000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 978974532 (3a59f744)
> p_vaddr: ffffffff
> p_paddr: 3e6c6000
> p_filesz: 15523840 (ece000)
> p_memsz: 15523840 (ece000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 994498372 (3b46d744)
> p_vaddr: ffffffff
> p_paddr: 3f59c000
> p_filesz: 581632 (8e000)
> p_memsz: 581632 (8e000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 995080004 (3b4fb744)
> p_vaddr: ffffffff
> p_paddr: 3f62e000
> p_filesz: 503808 (7b000)
> p_memsz: 503808 (7b000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 995583812 (3b576744)
> p_vaddr: ffffffff
> p_paddr: 3f6e9000
> p_filesz: 16384 (4000)
> p_memsz: 16384 (4000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Phdr:
> p_type: 1 (PT_LOAD)
> p_offset: 995600196 (3b57a744)
> p_vaddr: ffffffff
> p_paddr: 3f6ff000
> p_filesz: 4096 (1000)
> p_memsz: 4096 (1000)
> p_flags: 7 (PF_X|PF_W|PF_R)
> p_align: 0
> Elf32_Nhdr:
> n_namesz: 5 ("CORE")
> n_descsz: 144
> n_type: 1 (NT_PRSTATUS)
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000401
> c06fef00 00000001 0000f004 0000f008
> 00000000 c06e3ecc 00000282 00000282
> 00000024 c06e3fa4 00000068 00000000
> Elf32_Nhdr:
> n_namesz: 5 ("CORE")
> n_descsz: 144
> n_type: 1 (NT_PRSTATUS)
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000dbd 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000000 00000000 c0712420 00007e7e
> 00000000 00000063 00000000 f0639f0c
> 00000063 0000007b 0000007b 000000d8
> 00000033 ffffffff c03145b2 00000060
> 00010086 f0639f0c 00000068 00000000
> Elf32_Nhdr:
> n_namesz: 11 ("VMCOREINFO")
> n_descsz: 1134
> n_type: 0 (unused)
> OSRELEASE=2.6.35.3-saber
> PAGESIZE=4096
> SYMBOL(init_uts_ns)=c06f9120
> SYMBOL(node_online_map)=c0730644
> SYMBOL(swapper_pg_dir)=c06e4000
> SYMBOL(_stext)=c0101000
> SYMBOL(vmlist)=c07d3540
> SYMBOL(mem_map)=c07d3500
> SYMBOL(contig_page_data)=c072ce80
> SIZE(page)=32
> SIZE(pglist_data)=4224
> SIZE(zone)=1024
> SIZE(free_area)=44
> SIZE(list_head)=8
> SIZE(nodemask_t)=4
> OFFSET(page.flags)=0
> OFFSET(page._count)=4
> OFFSET(page.mapping)=16
> OFFSET(page.lru)=24
> OFFSET(pglist_data.node_zones)=0
> OFFSET(pglist_data.nr_zones)=4140
> OFFSET(pglist_data.node_mem_map)=4144
> OFFSET(pglist_data.node_start_pfn)=4148
> OFFSET(pglist_data.node_spanned_pages)=4156
> OFFSET(pglist_data.node_id)=4160
> OFFSET(zone.free_area)=40
> OFFSET(zone.vm_stat)=728
> OFFSET(zone.spanned_pages)=916
> OFFSET(free_area.free_list)=0
> OFFSET(list_head.next)=0
> OFFSET(list_head.prev)=4
> OFFSET(vm_struct.addr)=4
> LENGTH(zone.free_area)=11
> SYMBOL(log_buf)=c06fc83c
> SYMBOL(log_end)=c07bb7ec
> SYMBOL(log_buf_len)=c06fc838
> SYMBOL(logged_chars)=c07c38a0
> LENGTH(free_area.free_list)=5
> NUMBER(NR_FREE_PAGES)=0
> NUMBER(PG_lru)=5
> NUMBER(PG_private)=11
> NUMBER(PG_swapcache)=16
> CONFIG_X86_PAE=y
> CRASHTIME=1282584565
> cannot determine relocation value: not a live system
> gdb /usr/src/linux-2.6.35.3/vmlinux
> GNU gdb (GDB) 7.0
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...
>
> <readmem: c148a9a0, KVADDR, "kernel_config_data", 32768, (ROE), 8bad3d8>
> crash: read error: kernel virtual address: c148a9a0 type: "kernel_config_data"
> WARNING: cannot read kernel_config_data
> <readmem: c1487e28, KVADDR, "cpu_possible_mask", 4, (FOE), bfed4bbc>
> crash: read error: kernel virtual address: c1487e28 type: "cpu_possible_mask"
The read error with for the "kernel_config_data" symbol at c148a9a0 and (which returns on
error -- that's what ROE means), and then the "cpu_possible_mask" symbol at c1487e28 (which
cause the session to fault or bail out -- FOE), mean that -- after translating those virtual
addresses to physical addresses by stripping off the c0000000 unity-map identifier -- those
physical addresses (at 148a9a0 and 1487e28 respectively) were not found in the dumpfile.
And that's because the ELF header of the vmcore does not show a PT_LOAD segment
that contains those physical addresses.
But as I mentioned before, the virtual addresses seem to be too high for
static kernel data symbols. If you run --minimal, does the "sym" command
show "cpu_possible_mask" at that address? I don't have anything later than
a 2.6.34 x86 dumpfile to use as a reference, but the symbol is much lower
in value in that kernel:
crash> sym cpu_possible_mask
c07ffa28 (R) cpu_possible_mask
crash>
And if I dump all of the symbols from within a --minimal session with that
dumpfile, I see this, where the "_end" of the static kernel virtual memory
is at c0c77000:
crash> sym -l
... [ cut ] ...
c0b50ffc (b) netlbl_unlhsh_lock
c0b51000 (b) klist_remove_lock
c0b51004 (B) __bss_stop
c0b52000 (b) .brk
c0b52000 (B) __brk_base
c0b62000 (b) .brk.pagetables
c0c67000 (b) .brk.dmi_alloc
c0c77000 (B) __brk_limit
c0c77000 (A) _end
crash>
And if you look at the "VMCOREINFO" data above in your dump for items that are
kernel symbol values, they make sense, i.e.,
> SYMBOL(node_online_map)=c0730644
> SYMBOL(swapper_pg_dir)=c06e4000
> SYMBOL(_stext)=c0101000
> SYMBOL(vmlist)=c07d3540
> SYMBOL(mem_map)=c07d3500
> SYMBOL(contig_page_data)=c072ce80
If you run a --minimal session, what do you see when you run the
two commands that I show above? (i.e., "sym cpu_possible_mask" & the output
of the tail end of "sym -l")
Dave
But for starters, if you run the --minimal session and then execute the
14 years, 3 months
Re: [Crash-utility] Crash issue when loading vmcore
by Dave Anderson
----- "Paul-Kenji Cahier Furuya" <pkc(a)f1-photo.com> wrote:
> Here's for sym cpu_possible_mask in minimal mode:
> crash> sym cpu_possible_mask
> c14e7d88 (R) cpu_possible_mask
>
> And here's the tail of sym -l:
> c175d4e0 (b) sunrpc_table_header
> c175d4e4 (B) sctp_assocs_id_lock
> c175d4e8 (B) proc_net_sctp
> c175d4ec (B) sctp_assocs_id
> c175d500 (B) sysctl_sctp_mem
> c175d50c (B) sysctl_sctp_rmem
> c175d518 (B) sysctl_sctp_wmem
> c175d524 (b) __key.46606
> c175d524 (b) sctp_ctl_sock
> c175d528 (b) sctp_pf_inet6_specific
> c175d52c (b) sctp_pf_inet_specific
> c175d530 (b) sctp_af_v4_specific
> c175d534 (b) sctp_af_v6_specific
> c175d538 (b) __key.44408
> c175d538 (b) sctp_rand.42824
> c175d53c (B) sctp_sockets_allocated
> c175d54c (b) sctp_memory_pressure
> c175d550 (b) sctp_memory_allocated
> c175d554 (b) sctp_sysctl_header
> c175d558 (b) zero
> c175d55c (b) klist_remove_lock
> c175d560 (B) __bss_stop
> c175e000 (b) .brk
> c175e000 (B) __brk_base
> c176e000 (b) .brk.pagetables
> c17ee000 (b) .brk.dmi_alloc
> c17fe000 (B) __brk_limit
> c17fe000 (A) _end
That's interesting -- did you add some huge data structure or something
to the kernel?
OK -- three more requests -- can you bring up the --minimal session, and
then do this:
crash> sym -l > sym.l
and send the "sym.l" file? (It's long, so send it as an attachment)
Secondly, send the output of:
# crash -d8 /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
The -d8 output will also show the physical address translation,
like this:
<readmem: c07ffa28, KVADDR, "cpu_possible_mask", 4, (FOE), bff3d43c>
addr: c07ffa28 paddr: 7ffa28 cnt: 4
And third, send the output of:
# readelf -a vmcore.201008231930
Dave
14 years, 3 months
Crash issue when loading vmcore
by Paul Cahier
Hello,
I have finished setting up kdump and kexec today, recompiling my kernel
to add everything needed in there.
I have triggered a kernel panic by echo c>/proc/sysrq-trigger, and found
that the vmcore dump was indeed there after all was done.
However I can not get any traces out of that crash dump(short version,
long version at the end of the email):
crash /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
[...]
crash: read error: kernel virtual address: c148a9a0 type:
"kernel_config_data"
WARNING: cannot read kernel_config_data
crash: read error: kernel virtual address: c1487e28 type:
"cpu_possible_mask"
If I try crash --minimal things do load but I'm stuck with the minimal
error set that's not very helpful.
All I'm looking at is getting a full trace of the kernel panic.
- Paul-Kenji Cahier
PS, the full version:
crash -d7 /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
crash 5.0.6
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
vmcore_data:
flags: a0 (KDUMP_LOCAL|KDUMP_ELF32)
ndfd: 3
ofp: b77344c0
header_size: 1860
num_pt_load_segments: 9
pt_load_segment[0]:
file_offset: 744
phys_start: 0
phys_end: a0000
zero_fill: 0
pt_load_segment[1]:
file_offset: a0744
phys_start: 100000
phys_end: 1000000
zero_fill: 0
pt_load_segment[2]:
file_offset: fa0744
phys_start: 5000000
phys_end: 38000000
zero_fill: 0
pt_load_segment[3]:
file_offset: 33fa0744
phys_start: 38000000
phys_end: 3e5ff000
zero_fill: 0
pt_load_segment[4]:
file_offset: 3a59f744
phys_start: 3e6c6000
phys_end: 3f594000
zero_fill: 0
pt_load_segment[5]:
file_offset: 3b46d744
phys_start: 3f59c000
phys_end: 3f62a000
zero_fill: 0
pt_load_segment[6]:
file_offset: 3b4fb744
phys_start: 3f62e000
phys_end: 3f6a9000
zero_fill: 0
pt_load_segment[7]:
file_offset: 3b576744
phys_start: 3f6e9000
phys_end: 3f6ed000
zero_fill: 0
pt_load_segment[8]:
file_offset: 3b57a744
phys_start: 3f6ff000
phys_end: 3f700000
zero_fill: 0
elf_header: 85368c0
elf32: 85368c0
notes32: 85368f4
load32: 8536914
elf64: 0
notes64: 0
load64: 0
nt_prstatus: 8536a34
nt_prpsinfo: 0
nt_taskstruct: 0
task_struct: 0
page_size: 0
switch_stack: 0
xen_kdump_data: (unused)
num_prstatus_notes: 2
vmcoreinfo: 0
size_vmcoreinfo: 0
nt_prstatus_percpu:
08536a34 08536ad8
Elf32_Ehdr:
e_ident: \177ELF
e_ident[EI_CLASS]: 1 (ELFCLASS32)
e_ident[EI_DATA]: 1 (ELFDATA2LSB)
e_ident[EI_VERSION]: 1 (EV_CURRENT)
e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV)
e_ident[EI_ABIVERSION]: 0
e_type: 4 (ET_CORE)
e_machine: 3 (EM_386)
e_version: 1 (EV_CURRENT)
e_entry: 0
e_phoff: 34
e_shoff: 0
e_flags: 0
e_ehsize: 34
e_phentsize: 20
e_phnum: a
e_shentsize: 0
e_shnum: 0
e_shstrndx: 0
Elf32_Phdr:
p_type: 4 (PT_NOTE)
p_offset: 372 (174)
p_vaddr: 0
p_paddr: 0
p_filesz: 1488 (5d0)
p_memsz: 1488 (5d0)
p_flags: 0 ()
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 1860 (744)
p_vaddr: c0000000
p_paddr: 0
p_filesz: 655360 (a0000)
p_memsz: 655360 (a0000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 657220 (a0744)
p_vaddr: c0100000
p_paddr: 100000
p_filesz: 15728640 (f00000)
p_memsz: 15728640 (f00000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 16385860 (fa0744)
p_vaddr: c5000000
p_paddr: 5000000
p_filesz: 855638016 (33000000)
p_memsz: 855638016 (33000000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 872023876 (33fa0744)
p_vaddr: ffffffff
p_paddr: 38000000
p_filesz: 106950656 (65ff000)
p_memsz: 106950656 (65ff000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 978974532 (3a59f744)
p_vaddr: ffffffff
p_paddr: 3e6c6000
p_filesz: 15523840 (ece000)
p_memsz: 15523840 (ece000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 994498372 (3b46d744)
p_vaddr: ffffffff
p_paddr: 3f59c000
p_filesz: 581632 (8e000)
p_memsz: 581632 (8e000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 995080004 (3b4fb744)
p_vaddr: ffffffff
p_paddr: 3f62e000
p_filesz: 503808 (7b000)
p_memsz: 503808 (7b000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 995583812 (3b576744)
p_vaddr: ffffffff
p_paddr: 3f6e9000
p_filesz: 16384 (4000)
p_memsz: 16384 (4000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Phdr:
p_type: 1 (PT_LOAD)
p_offset: 995600196 (3b57a744)
p_vaddr: ffffffff
p_paddr: 3f6ff000
p_filesz: 4096 (1000)
p_memsz: 4096 (1000)
p_flags: 7 (PF_X|PF_W|PF_R)
p_align: 0
Elf32_Nhdr:
n_namesz: 5 ("CORE")
n_descsz: 144
n_type: 1 (NT_PRSTATUS)
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000401
c06fef00 00000001 0000f004 0000f008
00000000 c06e3ecc 00000282 00000282
00000024 c06e3fa4 00000068 00000000
Elf32_Nhdr:
n_namesz: 5 ("CORE")
n_descsz: 144
n_type: 1 (NT_PRSTATUS)
00000000 00000000 00000000 00000000
00000000 00000000 00000dbd 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 c0712420 00007e7e
00000000 00000063 00000000 f0639f0c
00000063 0000007b 0000007b 000000d8
00000033 ffffffff c03145b2 00000060
00010086 f0639f0c 00000068 00000000
Elf32_Nhdr:
n_namesz: 11 ("VMCOREINFO")
n_descsz: 1134
n_type: 0 (unused)
OSRELEASE=2.6.35.3-saber
PAGESIZE=4096
SYMBOL(init_uts_ns)=c06f9120
SYMBOL(node_online_map)=c0730644
SYMBOL(swapper_pg_dir)=c06e4000
SYMBOL(_stext)=c0101000
SYMBOL(vmlist)=c07d3540
SYMBOL(mem_map)=c07d3500
SYMBOL(contig_page_data)=c072ce80
SIZE(page)=32
SIZE(pglist_data)=4224
SIZE(zone)=1024
SIZE(free_area)=44
SIZE(list_head)=8
SIZE(nodemask_t)=4
OFFSET(page.flags)=0
OFFSET(page._count)=4
OFFSET(page.mapping)=16
OFFSET(page.lru)=24
OFFSET(pglist_data.node_zones)=0
OFFSET(pglist_data.nr_zones)=4140
OFFSET(pglist_data.node_mem_map)=4144
OFFSET(pglist_data.node_start_pfn)=4148
OFFSET(pglist_data.node_spanned_pages)=4156
OFFSET(pglist_data.node_id)=4160
OFFSET(zone.free_area)=40
OFFSET(zone.vm_stat)=728
OFFSET(zone.spanned_pages)=916
OFFSET(free_area.free_list)=0
OFFSET(list_head.next)=0
OFFSET(list_head.prev)=4
OFFSET(vm_struct.addr)=4
LENGTH(zone.free_area)=11
SYMBOL(log_buf)=c06fc83c
SYMBOL(log_end)=c07bb7ec
SYMBOL(log_buf_len)=c06fc838
SYMBOL(logged_chars)=c07c38a0
LENGTH(free_area.free_list)=5
NUMBER(NR_FREE_PAGES)=0
NUMBER(PG_lru)=5
NUMBER(PG_private)=11
NUMBER(PG_swapcache)=16
CONFIG_X86_PAE=y
CRASHTIME=1282584565
cannot determine relocation value: not a live system
gdb /usr/src/linux-2.6.35.3/vmlinux
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
<readmem: c148a9a0, KVADDR, "kernel_config_data", 32768, (ROE), 8bad3d8>
crash: read error: kernel virtual address: c148a9a0 type:
"kernel_config_data"
WARNING: cannot read kernel_config_data
<readmem: c1487e28, KVADDR, "cpu_possible_mask", 4, (FOE), bfed4bbc>
crash: read error: kernel virtual address: c1487e28 type:
"cpu_possible_mask"
14 years, 3 months
Re: [Crash-utility] mount cmd crashes crash
by Bob Montgomery
Sorry, forgot to reply all:
---------------------------
On Wed, 2010-08-18 at 20:57 +0000, Dave Anderson wrote:
> ----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
>
> > I'm working on a dump of a system that did not have a PID 1. I
don't
> > think it's relevant to the crash itself, but it does cause crash get
> > a seg fault.
> >
> > I don't know if it was important to have the context of pid 1 for
> > reporting mounts, or just any context, but this hack makes the
problem
> > go away, although not a very efficient way to find the lowest
existing
> > PID above 0.
>
> Yeah, it's not important to use the context of pid 1, but it just
needs
> some context, and I had presumed that init would always exist. I
thought
> that the panic("Attempted to kill the idle task!") in do_exit() would
> prevent pid 1 from ever going away -- but apparently your kernel
figured
> out how to do it elsewhere... ;-)
That test is for PID 0, not PID 1 (at least on the kernel I'm
debugging.) However, there is this also:
if (unlikely(tsk == child_reaper))
panic("Attempted to kill init!");
And child_reaper in the dump points to a task struct for init that isn't
in the ps listing. Hmmm. Maybe that part *is* interesting in this
dump...
>
> Your patch would pick a kernel thread pid, and apparently everything
still
> works OK? That being the case, it's fine with me.
With the patch, these commands all produce the same output:
crash-5.0.6-fix> mount >mount.out
crash-5.0.6-fix> mount -n 2 >mount2.out
crash-5.0.6-fix> mount -n 1459 >mount1459.out
I discovered the -n option as my first workaround.
Bob M.
14 years, 3 months
Re: [Crash-utility] mount cmd crashes crash
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> I'm working on a dump of a system that did not have a PID 1. I don't
> think it's relevant to the crash itself, but it does cause crash get
> a seg fault.
>
> crash> ps | head
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffffffff805144c0 RU 0.0 0 0 [swapper]
> 0 -1 1 ffff81012bc0a100 RU 0.0 0 0 [swapper]
> 2 -1 0 ffff81012bd3c040 IN 0.0 0 0 [migration/0]
> 3 -1 0 ffff81012bd3e7c0 RU 0.0 0 0 [ksoftirqd/0]
> 4 -1 0 ffff81012bd3e080 IN 0.0 0 0 [watchdog/0]
> 5 -1 1 ffff81012bd3f800 IN 0.0 0 0 [migration/1]
> 6 -1 1 ffff81012bd3f0c0 RU 0.0 0 0 [ksoftirqd/1]
> 7 -1 1 ffff81012bc0a840 IN 0.0 0 0 [watchdog/1]
> 8 -1 0 ffff81012af02880 IN 0.0 0 0 [events/0]
> crash> mount
> Segmentation fault (core dumped)
>
> In cmd_mount, this returns null and subsequent use causes the seg fault:
>
> 1156
> 1157 namespace_context = pid_to_context(1);
>
> I don't know if it was important to have the context of pid 1 for
> reporting mounts, or just any context, but this hack makes the problem
> go away, although not a very efficient way to find the lowest existing
> PID above 0.
Yeah, it's not important to use the context of pid 1, but it just needs
some context, and I had presumed that init would always exist. I thought
that the panic("Attempted to kill the idle task!") in do_exit() would
prevent pid 1 from ever going away -- but apparently your kernel figured
out how to do it elsewhere... ;-)
Your patch would pick a kernel thread pid, and apparently everything still
works OK? That being the case, it's fine with me.
Thanks,
Dave
> --- filesys.c.orig 2010-08-18 14:03:26.000000000 -0600
> +++ filesys.c 2010-08-18 14:10:02.000000000 -0600
> @@ -1153,8 +1153,12 @@ cmd_mount(void)
> ulong vfsmount = 0;
> int flags = 0;
> int save_next;
> + ulong pid;
>
> - namespace_context = pid_to_context(1);
> + /* find a context */
> + pid = 1;
> + while ((namespace_context = pid_to_context(pid)) == NULL)
> + pid++;
>
> while ((c = getopt(argcnt, args, "ifn:")) != EOF) {
> switch(c)
>
> Bob Montgomery
> At HP
>
>
>
>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
14 years, 3 months