-----Message d'origine-----
De : HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
Envoyé : mercredi 6 avril 2022 09:48
À : Agrain Patrick <patrick.agrain(a)al-enterprise.com>
Cc : Discussion list for crash utility usage, maintenance and development
<crash-utility(a)redhat.com>; kexec(a)lists.infradead.org
Objet : RE: EXT: RE: crash: read error on type: "memory section root table"
-----Original Message-----
Hello,
Suggested trace above gives following information after a crash -d 8 command:
<...>
kernel NR_CPUS: 2
<readmem: ffffffffa4925820, KVADDR, "high_memory", 8, (FOE),
56017b542648>
<read_diskdump: addr: ffffffffa4925820 paddr: 12925820 cnt: 8>
read_diskdump: paddr/pfn: 12925820/12925 -> cache physical page:
12925000
GETBUF(328 -> 0)
FREEBUF(0)
GETBUF(328 -> 0)
FREEBUF(0)
PAGESIZE=4096
mem_section_size = 16384
NR_SECTION_ROOTS = 2048
NR_MEM_SECTIONS = 524288
SECTIONS_PER_ROOT = 256
SECTION_ROOT_MASK = 0xff
PAGES_PER_SECTION = 32768
<readmem: ffffffffa4926db0, KVADDR, "mem_section", 8, (FOE),
7ffd1b6bb000>
<read_diskdump: addr: ffffffffa4926db0 paddr: 12926db0 cnt: 8>
read_diskdump: paddr/pfn: 12926db0/12926 -> cache physical page:
12926000
<readmem: ffff904c7f7fc000, KVADDR, "memory section root table",
16384, (FOE), 56017da26fd0>
<read_diskdump: addr: ffff904c7f7fc000 paddr: 3f7fc000 cnt: 4096>
read_diskdump: paddr/pfn: 3f7fc000/3f7fc -> cache physical page:
3f7fc000
crash: PAG3 - errno=2 r=0 pd.size=49
read_diskdump: READ_ERROR: cannot cache page: 3f7fc000
crash: read error: kernel virtual address: ffff904c7f7fc000 type: "memory section
root table"
hmm, r=0 means end of file, can you check again whether pd.offset exceeds the dumpfile
size? If so, somehow the dumpfile is shorter than expected.
Indeed, the offset points outside the dumpfile:
Get:
crash: PAG3 - errno=2 r=0 pd.size=52 pd.offset=168956485
with a dumpfile
164820 -rw-r--r--. 1 root root 168775680 6 avril 17:23 crashdump--20220406-1713
And another one:
Get:
crash: PAG3 - errno=2 r=0 pd.size=49 pd.offset=215640649
with a dumpfile
209984 -rw-r--r--. 1 root root 215023616 1 avril 10:58
crashdump-585.000-20220401-1054
I think a RHEL-based kexec-tools does "sync" after makedumpfile, but can you
check?
Actually, we are executing the makedumpfile in a script designated as init file for the
second kernel. Therefore, we do not perform the sync as per core_collector.
Thanks,
Kazu
Best regards,
Patrick