Re: [Crash-utility] help debug number of CPU detect failure

Thursday, 5 March 2020

...
 > I suspect that it's a problem with either the --kaslr offset
and/or
 > the phys_base value that you have used.

 Is there method to know or print kaslr & phy_base in a running Linux system? 
They are normally passed in the VMCOREINFO data that is contained in an ELF PT_NOTE
in the dumpfile header.  For example, here's a dump of the normal VMCOREINFO data,
where the phys_base and KASLR offsets are down near the bottom:

                      OSRELEASE=4.18.0-185.el8.x86_64
                      PAGESIZE=4096
                      SYMBOL(init_uts_ns)=ffffffffbd812540
                      SYMBOL(node_online_map)=ffffffffbda0f520
                      SYMBOL(swapper_pg_dir)=ffffffffbd80a000
                      SYMBOL(_stext)=ffffffffbc600000
                      SYMBOL(vmap_area_list)=ffffffffbd8d78b0
                      SYMBOL(mem_section)=ffff956a3ffd2000
                      LENGTH(mem_section)=2048
                      SIZE(mem_section)=16
                      OFFSET(mem_section.section_mem_map)=0
                      SIZE(page)=64
                      SIZE(pglist_data)=171968
                      SIZE(zone)=1472
                      SIZE(free_area)=88
                      SIZE(list_head)=16
                      SIZE(nodemask_t)=128
                      OFFSET(page.flags)=0
                      OFFSET(page._refcount)=52
                      OFFSET(page.mapping)=24
                      OFFSET(page.lru)=8
                      OFFSET(page._mapcount)=48
                      OFFSET(page.private)=40
                      OFFSET(page.compound_dtor)=16
                      OFFSET(page.compound_order)=17
                      OFFSET(page.compound_head)=8
                      OFFSET(pglist_data.node_zones)=0
                      OFFSET(pglist_data.nr_zones)=171232
                      OFFSET(pglist_data.node_start_pfn)=171240
                      OFFSET(pglist_data.node_spanned_pages)=171256
                      OFFSET(pglist_data.node_id)=171264
                      OFFSET(zone.free_area)=192
                      OFFSET(zone.vm_stat)=1296
                      OFFSET(zone.spanned_pages)=112
                      OFFSET(free_area.free_list)=0
                      OFFSET(list_head.next)=0
                      OFFSET(list_head.prev)=8
                      OFFSET(vmap_area.va_start)=0
                      OFFSET(vmap_area.list)=48
                      LENGTH(zone.free_area)=11
                      SYMBOL(log_buf)=ffffffffbd85b140
                      SYMBOL(log_buf_len)=ffffffffbd85b13c
                      SYMBOL(log_first_idx)=ffffffffbe319778
                      SYMBOL(clear_idx)=ffffffffbe319744
                      SYMBOL(log_next_idx)=ffffffffbe319768
                      SIZE(printk_log)=16
                      OFFSET(printk_log.ts_nsec)=0
                      OFFSET(printk_log.len)=8
                      OFFSET(printk_log.text_len)=10
                      OFFSET(printk_log.dict_len)=12
                      LENGTH(free_area.free_list)=5
                      NUMBER(NR_FREE_PAGES)=0
                      NUMBER(PG_lru)=5
                      NUMBER(PG_private)=12
                      NUMBER(PG_swapcache)=9
                      NUMBER(PG_swapbacked)=18
                      NUMBER(PG_slab)=8
                      NUMBER(PG_hwpoison)=22
                      NUMBER(PG_head_mask)=32768
                      NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
                      NUMBER(HUGETLB_PAGE_DTOR)=2
                      NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
   ===============>   NUMBER(phys_base)=16437477376
                      SYMBOL(init_top_pgt)=ffffffffbd80a000
                      NUMBER(pgtable_l5_enabled)=0
                      SYMBOL(node_data)=ffffffffbda0ad20
                      LENGTH(node_data)=1024
   ===============>   KERNELOFFSET=3b600000
                      NUMBER(KERNEL_IMAGE_SIZE)=1073741824
                      NUMBER(sme_mask)=0
                      CRASHTIME=1583350919

But in your Azure-generated dumpfile, I note that each cpu's NT_PRSTATUS note
contains junk data, and while does have a VMCOREINFO note, it contains this:

Elf64_Nhdr:
               n_namesz: 11 ("VMCOREINFO")
               n_descsz: 42
                 n_type: 0 (unused)
                         FAKE1=IGNORE1
                         FAKE2=IGNORE2
                         FAKE3=IGNORE3

So that's why you need to pass in the two arguments.

Now, the crash utility should be able to be brought up successfully
on a live system without passing the arguments.  And once you've done
that, you could get the values like this:  

  crash> help -m | grep phys_base
                  phys_base: 3d3c00000
  crash> help -k | grep relocate
        relocate: ffffffffc4a00000  (KASLR offset: 3b600000 / 950MB)
  crash> 

But since they change with each reboot, you would have to capture them
while running on the live system, and save them somewhere for a subsequent
crash.  So that goes back to my question -- how did you get the numbers
that you used?

Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] help debug number of CPU detect failure