Hello,
Back to this topic.
I upgraded our system with the kexec-tools from Centos 8 Stream, based on kexec 2.0.24 and
makedumpfile 1.7.1.
We are still facing errors when using 'makedumpfile -c'.
Removing the '-c' gives better ratio success/failure, but sometimes the crash file
cannot be read by the crash tool.
Referring to Hagio's remark below concerning the sync, I added a sync operation before
the call of makedumpfile (and just after the mount ext4 of the required partitions) and
add a second call to sync after the return of makedumpfile.
In that configuration, the crash file can be read by the crash tool (up to now in all
cases).
Thanks for your help.
Best regards,
Patrick Agrain
-----Message d'origine-----
De : Crash-utility <crash-utility-bounces(a)redhat.com> De la part de Agrain Patrick
Envoyé : mercredi 6 avril 2022 17:48
À : Discussion list for crash utility usage, maintenance and development
<crash-utility(a)redhat.com>; kexec(a)lists.infradead.org
Objet : Re: [Crash-utility] EXT: RE: crash: read error on type: "memory section root
table"
-----Message d'origine-----
De : HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com> Envoyé : mercredi 6 avril 2022 09:48
À : Agrain Patrick <patrick.agrain(a)al-enterprise.com>
Cc : Discussion list for crash utility usage, maintenance and development
<crash-utility(a)redhat.com>; kexec(a)lists.infradead.org Objet : RE: EXT: RE: crash:
read error on type: "memory section root table"
-----Original Message-----
Hello,
Suggested trace above gives following information after a crash -d 8 command:
<...>
kernel NR_CPUS: 2
<readmem: ffffffffa4925820, KVADDR, "high_memory", 8, (FOE),
56017b542648>
<read_diskdump: addr: ffffffffa4925820 paddr: 12925820 cnt: 8>
read_diskdump: paddr/pfn: 12925820/12925 -> cache physical page:
12925000
GETBUF(328 -> 0)
FREEBUF(0)
GETBUF(328 -> 0)
FREEBUF(0)
PAGESIZE=4096
mem_section_size = 16384
NR_SECTION_ROOTS = 2048
NR_MEM_SECTIONS = 524288
SECTIONS_PER_ROOT = 256
SECTION_ROOT_MASK = 0xff
PAGES_PER_SECTION = 32768
<readmem: ffffffffa4926db0, KVADDR, "mem_section", 8, (FOE),
7ffd1b6bb000>
<read_diskdump: addr: ffffffffa4926db0 paddr: 12926db0 cnt: 8>
read_diskdump: paddr/pfn: 12926db0/12926 -> cache physical page:
12926000
<readmem: ffff904c7f7fc000, KVADDR, "memory section root table",
16384, (FOE), 56017da26fd0>
<read_diskdump: addr: ffff904c7f7fc000 paddr: 3f7fc000 cnt: 4096>
read_diskdump: paddr/pfn: 3f7fc000/3f7fc -> cache physical page:
3f7fc000
crash: PAG3 - errno=2 r=0 pd.size=49
read_diskdump: READ_ERROR: cannot cache page: 3f7fc000
crash: read error: kernel virtual address: ffff904c7f7fc000 type: "memory section
root table"
hmm, r=0 means end of file, can you check again whether pd.offset exceeds the dumpfile
size? If so, somehow the dumpfile is shorter than expected.
Indeed, the offset points outside the dumpfile:
Get:
crash: PAG3 - errno=2 r=0 pd.size=52 pd.offset=168956485 with a dumpfile
164820 -rw-r--r--. 1 root root 168775680 6 avril 17:23 crashdump--20220406-1713
And another one:
Get:
crash: PAG3 - errno=2 r=0 pd.size=49 pd.offset=215640649 with a dumpfile
209984 -rw-r--r--. 1 root root 215023616 1 avril 10:58
crashdump-585.000-20220401-1054
I think a RHEL-based kexec-tools does "sync" after makedumpfile, but can you
check?
Actually, we are executing the makedumpfile in a script designated as init file for the
second kernel. Therefore, we do not perform the sync as per core_collector.
Thanks,
Kazu
Best regards,
Patrick
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines:
https://github.com/crash-utility/crash/wiki