Re: [Crash-utility] [PATCH 0/4] crash utility: add ARM crashdump support
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Lei Wen" <adrian.wenl(a)gmail.com> wrote:
> Hi Dave,
> 
> What the status of this patch series now? Could crash utilities
> support analyzing arm machine core dump in the x86 host?
That is the plan, i.e., supporting the analysis of ARM dumpfiles
on both x86 and ARM hosts (and by extension on x86_64 hosts using
the x86 binary), and presumably "live" on ARM hosts.
> This is very useful feature, since the arm kernel already support the
> core dump by kdump enabled.
I'm waiting for the results of the Nokia/Sony-Ericsson collaboration
efforts.  I've added Jan and Thomas's names to the cc: list.
Dave
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] mount cmd crashes crash
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> Sorry, forgot to reply all:
> ---------------------------
> 
> On Wed, 2010-08-18 at 20:57 +0000, Dave Anderson wrote:
> > ----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> > 
> > > I'm working on a dump of a system that did not have a PID 1.  I don't
> > > think it's relevant to the crash itself, but it does cause crash get
> > > a seg fault.
> 
> > > 
> > > I don't know if it was important to have the context of pid 1 for
> > > reporting mounts, or just any context, but this hack makes the problem
> > > go away, although not a very efficient way to find the lowest existing
> > > PID above 0.  
> > 
> > Yeah, it's not important to use the context of pid 1, but it just needs
> > some context, and I had presumed that init would always exist.  I thought
> > that the panic("Attempted to kill the idle task!") in do_exit() would
> > prevent pid 1 from ever going away -- but apparently your kernel figured
> > out how to do it elsewhere...  ;-)
> 
> That test is for PID 0, not PID 1 (at least on the kernel I'm
> debugging.)  However, there is this also:
> 
>         if (unlikely(tsk == child_reaper))
>                 panic("Attempted to kill init!");
That's the one I *meant*...   ;-)
> 
> And child_reaper in the dump points to a task struct for init that isn't
> in the ps listing.  Hmmm.  Maybe that part *is* interesting in this dump...
> 
> > 
> > Your patch would pick a kernel thread pid, and apparently everything still
> > works OK?  That being the case, it's fine with me.
> 
> With the patch, these commands all produce the same output:
> crash-5.0.6-fix> mount >mount.out
> crash-5.0.6-fix> mount -n 2 >mount2.out
> crash-5.0.6-fix> mount -n 1459 >mount1459.out
> 
> I discovered the -n option as my first workaround.
Actually, it looks like pid 0 could be used as well.  
Anyway, queued for the next release.
Thanks,
  Dave
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Crash issue when loading vmcore
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Paul-Kenji Cahier Furuya" <pkc(a)f1-photo.com> wrote:
> On 08/23/2010 22:51, Dave Anderson wrote:
> > So that doesn't make any sense unless the vmlinux file and the
> vmlinux that
> > was running on the crashed kernel are not the same kernels.  Are you using
> > a different kernel as the secondary kdump kernel?
> 
> Just checked kdump's config and it says:
> #     If these are not set, kdump-config will try to use the current
> #     and initrd if it is relocatable.
> And I did not set those variables.
> 
> However I checked and found out the vmlinuz(bzImage, 7.7MB extracted)
> being run seems to be stripped, while the vmlinux from the kernel 
> directory(124MB) is not.
> 
> Could this affect the result? Is there any way to deal properly with 
> that situation?(I am using my own kernel builds, so I do not have any
> "debug kernel" packages)
It appears that the kdump configuration should be using the
same kernel as the crashed kernel, but would relocate it
when it gets run as the kdump kernel.  But that does not
explain the discrepancy between the symbol values listed
by the "VMCOREINFO" data and that of the vmlinux file that
you are using.
The vmlinuz file (with a "z" at the end) is useless for crash.
Crash needs the debuginfo-full vmlinux file that was created by
compiling the kernel with -g, and which is located at the topmost
directory in the kernel source build tree.
In any case, it would be trivial to figure this out if you could
log into the the live system and try to run crash there -- or even
simpler -- run "cat /proc/kallsyms" on that live system.  Other than
that, I don't know what else to suggest at this point.
Dave
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] [PATCH 0/4] crash utility: add ARM crashdump support
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Mika Westerberg" <ext-mika.1.westerberg(a)nokia.com> wrote:
> 
> On Wed, Jun 30, 2010 at 03:10:58PM +0200, ext Dave Anderson wrote:
> 
> > In any case, I'm more than happy to fold in ARM support, but I don't know what
> > to do in this case.
> > 
> > I wonder if it would it be possible for you, Jan and Thomas to somehow collaborate
> > on this effort?  It seems that both sides would benefit from the work of the other
> > side.  I've added them to the cc list.
> 
> Sure. Can those patches be found in some public ML? I quickly searched but
> couldn't find anything.
> 
> Regards,
> MW
Jan is contacting you off-list.
Thanks,
  Dave
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Crash issue when loading vmcore
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Paul-Kenji Cahier Furuya" <pkc(a)f1-photo.com> wrote:
> On 08/23/2010 22:20, Dave Anderson wrote:
> > Well, yes, you'd have to be able to log into the machine, and
> > then just run:
> >
> >    # crash
> >
> > or if the /vmlinux file is not in a common location, do this:
> >
> >    # crash /path/to/vmlinux
> >
> > And that presumes you've got crash installed on the system as well.
> >
> I might be able to get a physical access at some point, but right now I 
> have none.
> 
> Anything that helps from the logs?
Well, this part is still unexplainable -- in the crashd8.txt, the symbol
addresses that were seen by the crashed kernel are as shown:
 
# grep SYMBOL crashd8.txt
                         SYMBOL(init_uts_ns)=c06f9120
                         SYMBOL(node_online_map)=c0730644
                         SYMBOL(swapper_pg_dir)=c06e4000
                         SYMBOL(_stext)=c0101000
                         SYMBOL(vmlist)=c07d3540
                         SYMBOL(mem_map)=c07d3500
                         SYMBOL(contig_page_data)=c072ce80
                         SYMBOL(log_buf)=c06fc83c
                         SYMBOL(log_end)=c07bb7ec
                         SYMBOL(log_buf_len)=c06fc838
                         SYMBOL(logged_chars)=c07c38a0
#
But the "sym.l" list starts with a unity-mapped PAGE_OFFSET
value of c1000000 (instead of the more common c0000000) 
  c1000000 (T) _text
  c1000000 (T) startup_32
  c1000054 (t) default_entry
  c1001000 (T) _stext
  c1001010 (T) do_one_initcall
  c1001180 (t) init_post
  c10012c0 (T) name_to_dev_t
  c1001500 (T) thread_saved_pc
  c1001510 (T) prepare_to_copy
  c1001590 (T) get_wchan
  c1001640 (T) __switch_to
  ...
So that being the case, the symbol values for "init_uts_ns", "node_online_map",
and so on, don't even match those of the crashed kernel:
  c1662120 (D) init_uts_ns
  c1716000 (B) swapper_pg_dir
  c1001000 (T) _stext
  c173ed3c (B) vmlist
  c173ed00 (B) mem_map
  c1696180 (D) contig_page_data
  c17248a0 (b) __log_buf
  c17247ec (b) log_end
  c16656b8 (d) log_buf_len
  c172c8a0 (b) logged_chars
So that doesn't make any sense unless the vmlinux file and the vmlinux that
was running on the crashed kernel are not the same kernels.  Are you using
a different kernel as the secondary kdump kernel?
Dave
node_online_map doesn't even exist in your sym.l file.
 
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Crash issue when loading vmcore
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Paul Cahier" <pkc(a)f1-photo.com> wrote:
> Hello,
> 
> I have finished setting up kdump and kexec today, recompiling my kernel 
> to add everything needed in there.
> I have triggered a kernel panic by echo c>/proc/sysrq-trigger, and found 
> that the vmcore dump was indeed there after all was done.
> 
> However I can not get any traces out of that crash dump(short version,
> long version at the end of the email):
> 
> crash /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
> [...]
> crash: read error: kernel virtual address: c148a9a0  type: "kernel_config_data"
> WARNING: cannot read kernel_config_data
> crash: read error: kernel virtual address: c1487e28  type: "cpu_possible_mask"
The virtual addresses for "kernel_config_data" and "cpu_possible_mask" are
strange (too high?) -- I'll continue the analysis at the end of your "d7" 
output below...
 
> If I try crash --minimal things do load but I'm stuck with the minimal
> error set that's not very helpful.
> All I'm looking at is getting a full trace of the kernel panic.
> 
> 
> - Paul-Kenji Cahier
> 
> 
> PS, the full version:
> crash -d7 /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
> 
> crash 5.0.6
> Copyright (C) 2002-2010  Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006  IBM Corporation
> Copyright (C) 1999-2006  Hewlett-Packard Co
> Copyright (C) 2005, 2006  Fujitsu Limited
> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
> Copyright (C) 2005  NEC Corporation
> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public
> License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for
> details.
> 
> vmcore_data:
>                    flags: a0 (KDUMP_LOCAL|KDUMP_ELF32)
>                     ndfd: 3
>                      ofp: b77344c0
>              header_size: 1860
>     num_pt_load_segments: 9
>       pt_load_segment[0]:
>              file_offset: 744
>               phys_start: 0
>                 phys_end: a0000
>                zero_fill: 0
>       pt_load_segment[1]:
>              file_offset: a0744
>               phys_start: 100000
>                 phys_end: 1000000
>                zero_fill: 0
>       pt_load_segment[2]:
>              file_offset: fa0744
>               phys_start: 5000000
>                 phys_end: 38000000
>                zero_fill: 0
>       pt_load_segment[3]:
>              file_offset: 33fa0744
>               phys_start: 38000000
>                 phys_end: 3e5ff000
>                zero_fill: 0
>       pt_load_segment[4]:
>              file_offset: 3a59f744
>               phys_start: 3e6c6000
>                 phys_end: 3f594000
>                zero_fill: 0
>       pt_load_segment[5]:
>              file_offset: 3b46d744
>               phys_start: 3f59c000
>                 phys_end: 3f62a000
>                zero_fill: 0
>       pt_load_segment[6]:
>              file_offset: 3b4fb744
>               phys_start: 3f62e000
>                 phys_end: 3f6a9000
>                zero_fill: 0
>       pt_load_segment[7]:
>              file_offset: 3b576744
>               phys_start: 3f6e9000
>                 phys_end: 3f6ed000
>                zero_fill: 0
>       pt_load_segment[8]:
>              file_offset: 3b57a744
>               phys_start: 3f6ff000
>                 phys_end: 3f700000
>                zero_fill: 0
>               elf_header: 85368c0
>                    elf32: 85368c0
>                  notes32: 85368f4
>                   load32: 8536914
>                    elf64: 0
>                  notes64: 0
>                   load64: 0
>              nt_prstatus: 8536a34
>              nt_prpsinfo: 0
>            nt_taskstruct: 0
>              task_struct: 0
>                page_size: 0
>             switch_stack: 0
>           xen_kdump_data: (unused)
>         num_prstatus_notes: 2
>                 vmcoreinfo: 0
>            size_vmcoreinfo: 0
>         nt_prstatus_percpu:
>          08536a34 08536ad8
> 
> Elf32_Ehdr:
>                  e_ident: \177ELF
>        e_ident[EI_CLASS]: 1 (ELFCLASS32)
>         e_ident[EI_DATA]: 1 (ELFDATA2LSB)
>      e_ident[EI_VERSION]: 1 (EV_CURRENT)
>        e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV)
>   e_ident[EI_ABIVERSION]: 0
>                   e_type: 4 (ET_CORE)
>                e_machine: 3 (EM_386)
>                e_version: 1 (EV_CURRENT)
>                  e_entry: 0
>                  e_phoff: 34
>                  e_shoff: 0
>                  e_flags: 0
>                 e_ehsize: 34
>              e_phentsize: 20
>                  e_phnum: a
>              e_shentsize: 0
>                  e_shnum: 0
>               e_shstrndx: 0
> Elf32_Phdr:
>                   p_type: 4 (PT_NOTE)
>                 p_offset: 372 (174)
>                  p_vaddr: 0
>                  p_paddr: 0
>                 p_filesz: 1488 (5d0)
>                  p_memsz: 1488 (5d0)
>                  p_flags: 0 ()
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 1860 (744)
>                  p_vaddr: c0000000
>                  p_paddr: 0
>                 p_filesz: 655360 (a0000)
>                  p_memsz: 655360 (a0000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 657220 (a0744)
>                  p_vaddr: c0100000
>                  p_paddr: 100000
>                 p_filesz: 15728640 (f00000)
>                  p_memsz: 15728640 (f00000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 16385860 (fa0744)
>                  p_vaddr: c5000000
>                  p_paddr: 5000000
>                 p_filesz: 855638016 (33000000)
>                  p_memsz: 855638016 (33000000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 872023876 (33fa0744)
>                  p_vaddr: ffffffff
>                  p_paddr: 38000000
>                 p_filesz: 106950656 (65ff000)
>                  p_memsz: 106950656 (65ff000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 978974532 (3a59f744)
>                  p_vaddr: ffffffff
>                  p_paddr: 3e6c6000
>                 p_filesz: 15523840 (ece000)
>                  p_memsz: 15523840 (ece000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 994498372 (3b46d744)
>                  p_vaddr: ffffffff
>                  p_paddr: 3f59c000
>                 p_filesz: 581632 (8e000)
>                  p_memsz: 581632 (8e000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 995080004 (3b4fb744)
>                  p_vaddr: ffffffff
>                  p_paddr: 3f62e000
>                 p_filesz: 503808 (7b000)
>                  p_memsz: 503808 (7b000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 995583812 (3b576744)
>                  p_vaddr: ffffffff
>                  p_paddr: 3f6e9000
>                 p_filesz: 16384 (4000)
>                  p_memsz: 16384 (4000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Phdr:
>                   p_type: 1 (PT_LOAD)
>                 p_offset: 995600196 (3b57a744)
>                  p_vaddr: ffffffff
>                  p_paddr: 3f6ff000
>                 p_filesz: 4096 (1000)
>                  p_memsz: 4096 (1000)
>                  p_flags: 7 (PF_X|PF_W|PF_R)
>                  p_align: 0
> Elf32_Nhdr:
>                 n_namesz: 5 ("CORE")
>                 n_descsz: 144
>                   n_type: 1 (NT_PRSTATUS)
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000000 00000401
>                           c06fef00 00000001 0000f004 0000f008
>                           00000000 c06e3ecc 00000282 00000282
>                           00000024 c06e3fa4 00000068 00000000
> Elf32_Nhdr:
>                 n_namesz: 5 ("CORE")
>                 n_descsz: 144
>                   n_type: 1 (NT_PRSTATUS)
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000dbd 00000000
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 00000000 00000000
>                           00000000 00000000 c0712420 00007e7e
>                           00000000 00000063 00000000 f0639f0c
>                           00000063 0000007b 0000007b 000000d8
>                           00000033 ffffffff c03145b2 00000060
>                           00010086 f0639f0c 00000068 00000000
> Elf32_Nhdr:
>                 n_namesz: 11 ("VMCOREINFO")
>                 n_descsz: 1134
>                   n_type: 0 (unused)
>                           OSRELEASE=2.6.35.3-saber
>                           PAGESIZE=4096
>                           SYMBOL(init_uts_ns)=c06f9120
>                           SYMBOL(node_online_map)=c0730644
>                           SYMBOL(swapper_pg_dir)=c06e4000
>                           SYMBOL(_stext)=c0101000
>                           SYMBOL(vmlist)=c07d3540
>                           SYMBOL(mem_map)=c07d3500
>                           SYMBOL(contig_page_data)=c072ce80
>                           SIZE(page)=32
>                           SIZE(pglist_data)=4224
>                           SIZE(zone)=1024
>                           SIZE(free_area)=44
>                           SIZE(list_head)=8
>                           SIZE(nodemask_t)=4
>                           OFFSET(page.flags)=0
>                           OFFSET(page._count)=4
>                           OFFSET(page.mapping)=16
>                           OFFSET(page.lru)=24
>                           OFFSET(pglist_data.node_zones)=0
>                           OFFSET(pglist_data.nr_zones)=4140
>                           OFFSET(pglist_data.node_mem_map)=4144
>                           OFFSET(pglist_data.node_start_pfn)=4148
>                           OFFSET(pglist_data.node_spanned_pages)=4156
>                           OFFSET(pglist_data.node_id)=4160
>                           OFFSET(zone.free_area)=40
>                           OFFSET(zone.vm_stat)=728
>                           OFFSET(zone.spanned_pages)=916
>                           OFFSET(free_area.free_list)=0
>                           OFFSET(list_head.next)=0
>                           OFFSET(list_head.prev)=4
>                           OFFSET(vm_struct.addr)=4
>                           LENGTH(zone.free_area)=11
>                           SYMBOL(log_buf)=c06fc83c
>                           SYMBOL(log_end)=c07bb7ec
>                           SYMBOL(log_buf_len)=c06fc838
>                           SYMBOL(logged_chars)=c07c38a0
>                           LENGTH(free_area.free_list)=5
>                           NUMBER(NR_FREE_PAGES)=0
>                           NUMBER(PG_lru)=5
>                           NUMBER(PG_private)=11
>                           NUMBER(PG_swapcache)=16
>                           CONFIG_X86_PAE=y
>                           CRASHTIME=1282584565
> cannot determine relocation value: not a live system
> gdb /usr/src/linux-2.6.35.3/vmlinux
> GNU gdb (GDB) 7.0
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...
> 
> <readmem: c148a9a0, KVADDR, "kernel_config_data", 32768, (ROE), 8bad3d8>
> crash: read error: kernel virtual address: c148a9a0  type: "kernel_config_data"
> WARNING: cannot read kernel_config_data
> <readmem: c1487e28, KVADDR, "cpu_possible_mask", 4, (FOE), bfed4bbc>
> crash: read error: kernel virtual address: c1487e28  type: "cpu_possible_mask"
The read error with for the "kernel_config_data" symbol at c148a9a0 and (which returns on
error -- that's what ROE means), and then the "cpu_possible_mask" symbol at c1487e28 (which
cause the session to fault or bail out -- FOE), mean that -- after translating those virtual
addresses to physical addresses by stripping off the c0000000 unity-map identifier -- those 
physical addresses (at 148a9a0 and 1487e28 respectively) were not found in the dumpfile.
And that's because the ELF header of the vmcore does not show a PT_LOAD segment
that contains those physical addresses.
But as I mentioned before, the virtual addresses seem to be too high for
static kernel data symbols.  If you run --minimal, does the "sym" command
show "cpu_possible_mask" at that address?  I don't have anything later than
a 2.6.34 x86 dumpfile to use as a reference, but the symbol is much lower
in value in that kernel:
  crash> sym cpu_possible_mask
  c07ffa28 (R) cpu_possible_mask  
  crash>
And if I dump all of the symbols from within a --minimal session with that
dumpfile, I see this, where the "_end" of the static kernel virtual memory
is at c0c77000:
  crash> sym -l
  ... [ cut ] ...
  c0b50ffc (b) netlbl_unlhsh_lock
  c0b51000 (b) klist_remove_lock
  c0b51004 (B) __bss_stop
  c0b52000 (b) .brk
  c0b52000 (B) __brk_base
  c0b62000 (b) .brk.pagetables
  c0c67000 (b) .brk.dmi_alloc
  c0c77000 (B) __brk_limit
  c0c77000 (A) _end
  crash>
And if you look at the "VMCOREINFO" data above in your dump for items that are
kernel symbol values, they make sense, i.e., 
>                           SYMBOL(node_online_map)=c0730644
>                           SYMBOL(swapper_pg_dir)=c06e4000
>                           SYMBOL(_stext)=c0101000
>                           SYMBOL(vmlist)=c07d3540
>                           SYMBOL(mem_map)=c07d3500
>                           SYMBOL(contig_page_data)=c072ce80
If you run a --minimal session, what do you see when you run the
two commands that I show above?  (i.e., "sym cpu_possible_mask" & the output
of the tail end of "sym -l")
Dave
But for starters, if you run the --minimal session and then execute the
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] Crash issue when loading vmcore
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Paul-Kenji Cahier Furuya" <pkc(a)f1-photo.com> wrote:
> Here's for sym cpu_possible_mask in minimal mode:
> crash> sym cpu_possible_mask
> c14e7d88 (R) cpu_possible_mask
> 
> And here's the tail of sym -l:
> c175d4e0 (b) sunrpc_table_header
> c175d4e4 (B) sctp_assocs_id_lock
> c175d4e8 (B) proc_net_sctp
> c175d4ec (B) sctp_assocs_id
> c175d500 (B) sysctl_sctp_mem
> c175d50c (B) sysctl_sctp_rmem
> c175d518 (B) sysctl_sctp_wmem
> c175d524 (b) __key.46606
> c175d524 (b) sctp_ctl_sock
> c175d528 (b) sctp_pf_inet6_specific
> c175d52c (b) sctp_pf_inet_specific
> c175d530 (b) sctp_af_v4_specific
> c175d534 (b) sctp_af_v6_specific
> c175d538 (b) __key.44408
> c175d538 (b) sctp_rand.42824
> c175d53c (B) sctp_sockets_allocated
> c175d54c (b) sctp_memory_pressure
> c175d550 (b) sctp_memory_allocated
> c175d554 (b) sctp_sysctl_header
> c175d558 (b) zero
> c175d55c (b) klist_remove_lock
> c175d560 (B) __bss_stop
> c175e000 (b) .brk
> c175e000 (B) __brk_base
> c176e000 (b) .brk.pagetables
> c17ee000 (b) .brk.dmi_alloc
> c17fe000 (B) __brk_limit
> c17fe000 (A) _end
That's interesting -- did you add some huge data structure or something
to the kernel?
OK -- three more requests -- can you bring up the --minimal session, and
then do this:
  crash> sym -l > sym.l
and send the "sym.l" file?  (It's long, so send it as an attachment)
Secondly, send the output of:
  # crash -d8 /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
The -d8 output will also show the physical address translation,
like this:
  <readmem: c07ffa28, KVADDR, "cpu_possible_mask", 4, (FOE), bff3d43c>
      addr: c07ffa28  paddr: 7ffa28  cnt: 4
And third, send the output of:
  # readelf -a vmcore.201008231930
Dave
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Crash issue when loading vmcore
                                
                                
                                
                                    
                                        by Paul Cahier
                                    
                                
                                
                                        Hello,
I have finished setting up kdump and kexec today, recompiling my kernel 
to add everything needed in there.
I have triggered a kernel panic by echo c>/proc/sysrq-trigger, and found 
that the vmcore dump was indeed there after all was done.
However I can not get any traces out of that crash dump(short version, 
long version at the end of the email):
crash /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
[...]
crash: read error: kernel virtual address: c148a9a0  type: 
"kernel_config_data"
WARNING: cannot read kernel_config_data
crash: read error: kernel virtual address: c1487e28  type: 
"cpu_possible_mask"
If I try crash --minimal things do load but I'm stuck with the minimal 
error set that's not very helpful.
All I'm looking at is getting a full trace of the kernel panic.
- Paul-Kenji Cahier
PS, the full version:
crash -d7 /usr/src/linux-2.6.35.3/vmlinux vmcore.201008231930
crash 5.0.6
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
vmcore_data:
                   flags: a0 (KDUMP_LOCAL|KDUMP_ELF32)
                    ndfd: 3
                     ofp: b77344c0
             header_size: 1860
    num_pt_load_segments: 9
      pt_load_segment[0]:
             file_offset: 744
              phys_start: 0
                phys_end: a0000
               zero_fill: 0
      pt_load_segment[1]:
             file_offset: a0744
              phys_start: 100000
                phys_end: 1000000
               zero_fill: 0
      pt_load_segment[2]:
             file_offset: fa0744
              phys_start: 5000000
                phys_end: 38000000
               zero_fill: 0
      pt_load_segment[3]:
             file_offset: 33fa0744
              phys_start: 38000000
                phys_end: 3e5ff000
               zero_fill: 0
      pt_load_segment[4]:
             file_offset: 3a59f744
              phys_start: 3e6c6000
                phys_end: 3f594000
               zero_fill: 0
      pt_load_segment[5]:
             file_offset: 3b46d744
              phys_start: 3f59c000
                phys_end: 3f62a000
               zero_fill: 0
      pt_load_segment[6]:
             file_offset: 3b4fb744
              phys_start: 3f62e000
                phys_end: 3f6a9000
               zero_fill: 0
      pt_load_segment[7]:
             file_offset: 3b576744
              phys_start: 3f6e9000
                phys_end: 3f6ed000
               zero_fill: 0
      pt_load_segment[8]:
             file_offset: 3b57a744
              phys_start: 3f6ff000
                phys_end: 3f700000
               zero_fill: 0
              elf_header: 85368c0
                   elf32: 85368c0
                 notes32: 85368f4
                  load32: 8536914
                   elf64: 0
                 notes64: 0
                  load64: 0
             nt_prstatus: 8536a34
             nt_prpsinfo: 0
           nt_taskstruct: 0
             task_struct: 0
               page_size: 0
            switch_stack: 0
          xen_kdump_data: (unused)
        num_prstatus_notes: 2
                vmcoreinfo: 0
           size_vmcoreinfo: 0
        nt_prstatus_percpu:
         08536a34 08536ad8
Elf32_Ehdr:
                 e_ident: \177ELF
       e_ident[EI_CLASS]: 1 (ELFCLASS32)
        e_ident[EI_DATA]: 1 (ELFDATA2LSB)
     e_ident[EI_VERSION]: 1 (EV_CURRENT)
       e_ident[EI_OSABI]: 0 (ELFOSABI_SYSV)
  e_ident[EI_ABIVERSION]: 0
                  e_type: 4 (ET_CORE)
               e_machine: 3 (EM_386)
               e_version: 1 (EV_CURRENT)
                 e_entry: 0
                 e_phoff: 34
                 e_shoff: 0
                 e_flags: 0
                e_ehsize: 34
             e_phentsize: 20
                 e_phnum: a
             e_shentsize: 0
                 e_shnum: 0
              e_shstrndx: 0
Elf32_Phdr:
                  p_type: 4 (PT_NOTE)
                p_offset: 372 (174)
                 p_vaddr: 0
                 p_paddr: 0
                p_filesz: 1488 (5d0)
                 p_memsz: 1488 (5d0)
                 p_flags: 0 ()
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 1860 (744)
                 p_vaddr: c0000000
                 p_paddr: 0
                p_filesz: 655360 (a0000)
                 p_memsz: 655360 (a0000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 657220 (a0744)
                 p_vaddr: c0100000
                 p_paddr: 100000
                p_filesz: 15728640 (f00000)
                 p_memsz: 15728640 (f00000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 16385860 (fa0744)
                 p_vaddr: c5000000
                 p_paddr: 5000000
                p_filesz: 855638016 (33000000)
                 p_memsz: 855638016 (33000000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 872023876 (33fa0744)
                 p_vaddr: ffffffff
                 p_paddr: 38000000
                p_filesz: 106950656 (65ff000)
                 p_memsz: 106950656 (65ff000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 978974532 (3a59f744)
                 p_vaddr: ffffffff
                 p_paddr: 3e6c6000
                p_filesz: 15523840 (ece000)
                 p_memsz: 15523840 (ece000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 994498372 (3b46d744)
                 p_vaddr: ffffffff
                 p_paddr: 3f59c000
                p_filesz: 581632 (8e000)
                 p_memsz: 581632 (8e000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 995080004 (3b4fb744)
                 p_vaddr: ffffffff
                 p_paddr: 3f62e000
                p_filesz: 503808 (7b000)
                 p_memsz: 503808 (7b000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 995583812 (3b576744)
                 p_vaddr: ffffffff
                 p_paddr: 3f6e9000
                p_filesz: 16384 (4000)
                 p_memsz: 16384 (4000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Phdr:
                  p_type: 1 (PT_LOAD)
                p_offset: 995600196 (3b57a744)
                 p_vaddr: ffffffff
                 p_paddr: 3f6ff000
                p_filesz: 4096 (1000)
                 p_memsz: 4096 (1000)
                 p_flags: 7 (PF_X|PF_W|PF_R)
                 p_align: 0
Elf32_Nhdr:
                n_namesz: 5 ("CORE")
                n_descsz: 144
                  n_type: 1 (NT_PRSTATUS)
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000000 00000401
                          c06fef00 00000001 0000f004 0000f008
                          00000000 c06e3ecc 00000282 00000282
                          00000024 c06e3fa4 00000068 00000000
Elf32_Nhdr:
                n_namesz: 5 ("CORE")
                n_descsz: 144
                  n_type: 1 (NT_PRSTATUS)
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000dbd 00000000
                          00000000 00000000 00000000 00000000
                          00000000 00000000 00000000 00000000
                          00000000 00000000 c0712420 00007e7e
                          00000000 00000063 00000000 f0639f0c
                          00000063 0000007b 0000007b 000000d8
                          00000033 ffffffff c03145b2 00000060
                          00010086 f0639f0c 00000068 00000000
Elf32_Nhdr:
                n_namesz: 11 ("VMCOREINFO")
                n_descsz: 1134
                  n_type: 0 (unused)
                          OSRELEASE=2.6.35.3-saber
                          PAGESIZE=4096
                          SYMBOL(init_uts_ns)=c06f9120
                          SYMBOL(node_online_map)=c0730644
                          SYMBOL(swapper_pg_dir)=c06e4000
                          SYMBOL(_stext)=c0101000
                          SYMBOL(vmlist)=c07d3540
                          SYMBOL(mem_map)=c07d3500
                          SYMBOL(contig_page_data)=c072ce80
                          SIZE(page)=32
                          SIZE(pglist_data)=4224
                          SIZE(zone)=1024
                          SIZE(free_area)=44
                          SIZE(list_head)=8
                          SIZE(nodemask_t)=4
                          OFFSET(page.flags)=0
                          OFFSET(page._count)=4
                          OFFSET(page.mapping)=16
                          OFFSET(page.lru)=24
                          OFFSET(pglist_data.node_zones)=0
                          OFFSET(pglist_data.nr_zones)=4140
                          OFFSET(pglist_data.node_mem_map)=4144
                          OFFSET(pglist_data.node_start_pfn)=4148
                          OFFSET(pglist_data.node_spanned_pages)=4156
                          OFFSET(pglist_data.node_id)=4160
                          OFFSET(zone.free_area)=40
                          OFFSET(zone.vm_stat)=728
                          OFFSET(zone.spanned_pages)=916
                          OFFSET(free_area.free_list)=0
                          OFFSET(list_head.next)=0
                          OFFSET(list_head.prev)=4
                          OFFSET(vm_struct.addr)=4
                          LENGTH(zone.free_area)=11
                          SYMBOL(log_buf)=c06fc83c
                          SYMBOL(log_end)=c07bb7ec
                          SYMBOL(log_buf_len)=c06fc838
                          SYMBOL(logged_chars)=c07c38a0
                          LENGTH(free_area.free_list)=5
                          NUMBER(NR_FREE_PAGES)=0
                          NUMBER(PG_lru)=5
                          NUMBER(PG_private)=11
                          NUMBER(PG_swapcache)=16
                          CONFIG_X86_PAE=y
                          CRASHTIME=1282584565
cannot determine relocation value: not a live system
gdb /usr/src/linux-2.6.35.3/vmlinux
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
<readmem: c148a9a0, KVADDR, "kernel_config_data", 32768, (ROE), 8bad3d8>
crash: read error: kernel virtual address: c148a9a0  type: 
"kernel_config_data"
WARNING: cannot read kernel_config_data
<readmem: c1487e28, KVADDR, "cpu_possible_mask", 4, (FOE), bfed4bbc>
crash: read error: kernel virtual address: c1487e28  type: 
"cpu_possible_mask"
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] mount cmd crashes crash
                                
                                
                                
                                    
                                        by Bob Montgomery
                                    
                                
                                
                                        Sorry, forgot to reply all:
---------------------------
On Wed, 2010-08-18 at 20:57 +0000, Dave Anderson wrote:
> ----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> 
> > I'm working on a dump of a system that did not have a PID 1.  I
don't
> > think it's relevant to the crash itself, but it does cause crash get
> > a seg fault.
> > 
> > I don't know if it was important to have the context of pid 1 for
> > reporting mounts, or just any context, but this hack makes the
problem
> > go away, although not a very efficient way to find the lowest
existing
> > PID above 0.  
> 
> Yeah, it's not important to use the context of pid 1, but it just
needs
> some context, and I had presumed that init would always exist.  I
thought
> that the panic("Attempted to kill the idle task!") in do_exit() would
> prevent pid 1 from ever going away -- but apparently your kernel
figured
> out how to do it elsewhere...  ;-)
That test is for PID 0, not PID 1 (at least on the kernel I'm
debugging.)  However, there is this also:
        if (unlikely(tsk == child_reaper))
                panic("Attempted to kill init!");
And child_reaper in the dump points to a task struct for init that isn't
in the ps listing.  Hmmm.  Maybe that part *is* interesting in this
dump...
> 
> Your patch would pick a kernel thread pid, and apparently everything
still
> works OK?  That being the case, it's fine with me.
With the patch, these commands all produce the same output:
crash-5.0.6-fix> mount >mount.out
crash-5.0.6-fix> mount -n 2 >mount2.out
crash-5.0.6-fix> mount -n 1459 >mount1459.out
I discovered the -n option as my first workaround.
Bob M.
                                
                         
                        
                                
                                15 years, 2 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] mount cmd crashes crash
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> I'm working on a dump of a system that did not have a PID 1.  I don't
> think it's relevant to the crash itself, but it does cause crash get
> a seg fault.
> 
> crash> ps | head
>    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
>       0      0   0  ffffffff805144c0  RU   0.0       0      0  [swapper]
>       0     -1   1  ffff81012bc0a100  RU   0.0       0      0  [swapper]
>       2     -1   0  ffff81012bd3c040  IN   0.0       0      0  [migration/0]
>       3     -1   0  ffff81012bd3e7c0  RU   0.0       0      0  [ksoftirqd/0]
>       4     -1   0  ffff81012bd3e080  IN   0.0       0      0  [watchdog/0]
>       5     -1   1  ffff81012bd3f800  IN   0.0       0      0  [migration/1]
>       6     -1   1  ffff81012bd3f0c0  RU   0.0       0      0  [ksoftirqd/1]
>       7     -1   1  ffff81012bc0a840  IN   0.0       0      0  [watchdog/1]
>       8     -1   0  ffff81012af02880  IN   0.0       0      0  [events/0]
> crash> mount
> Segmentation fault (core dumped)
> 
> In cmd_mount, this returns null and subsequent use causes the seg fault:
> 
> 1156 
> 1157         namespace_context = pid_to_context(1);
> 
> I don't know if it was important to have the context of pid 1 for
> reporting mounts, or just any context, but this hack makes the problem
> go away, although not a very efficient way to find the lowest existing
> PID above 0.  
Yeah, it's not important to use the context of pid 1, but it just needs
some context, and I had presumed that init would always exist.  I thought
that the panic("Attempted to kill the idle task!") in do_exit() would
prevent pid 1 from ever going away -- but apparently your kernel figured
out how to do it elsewhere...  ;-)
Your patch would pick a kernel thread pid, and apparently everything still
works OK?  That being the case, it's fine with me.
Thanks,
  Dave
  
 
> --- filesys.c.orig	2010-08-18 14:03:26.000000000 -0600
> +++ filesys.c	2010-08-18 14:10:02.000000000 -0600
> @@ -1153,8 +1153,12 @@ cmd_mount(void)
>  	ulong vfsmount = 0;
>  	int flags = 0;
>  	int save_next;
> +	ulong pid;
>  
> -	namespace_context = pid_to_context(1);
> +	/* find a context */
> +	pid = 1;
> +	while ((namespace_context = pid_to_context(pid)) == NULL)
> +		pid++;
>  
>          while ((c = getopt(argcnt, args, "ifn:")) != EOF) {
>                  switch(c)
> 
> Bob Montgomery
> At HP
> 
> 
> 
> 
> 
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
                                
                         
                        
                                
                                15 years, 2 months