----- Original Message -----
Hi Dave and list,
I guess you still remember that a few weeks ago Wen has sent a patch which intends
to use register values in NT_PRSTATUS notes for backtracing if no panic task was
found. Thanks for your review and the suggestions are very useful.
However, Wen was busy with other work for these days, so I'll continue with this
work and now the 2rd version patch is attached.
v2: Address review feedbacks.
1) Set up a flag in pc->flags2 and it's determined during the diskdump file init
procedure.
2) Seperate code according to the that flag.
3) Also, we've done some tests on dumpfile of xen kernel and the trouble described
in the previous mail was gone.
So we're looking forward to your feedback and if you still have any problems with
it, please tell us.
Thanks,
Wang Chao
Hello Wang,
I haven't had a chance to run a complete test of this latest patch,
but I do have several suggestions.
diskdump.c:
Put the nt_prstatus_percpu array pointer and the num_prstatus_notes
into the diskdump_data structure -- like you did with the notes_buf.
Make nt_prstatus_percpu a pointer that gets allocated and initialized
only if necessary -- because if it's not used, it's a waste of space.
In read_dump_header(), when dd->notes_buf is malloc()'d, malloc() the
dd->nt_prstatus_percpu array.
And at the bottom of read_dump_header(), free both the dd->notes_buf
and the dd->nt_prstatus_percpu array if they exist, under the "err:"
label:
err:
...
if (dd->notes_buf)
free(dd->notes_buf);
if (dd->nt_prstatus_percpu)
free(dd->nt_prstatus_percpu;
In order to be able to debug/understand why things may fail, especially
due to the exclusion of prstatus notes of offline cpus, it would be
very helpful to be able to see the dd->nt_prstatus pointers for
each cpu. Similar to what is done with the vmcoreinfo data, can
you add a section to __diskdump_dump_header():
if (dh->header_version >= 3) {
fprintf(fp, " offset_vmcoreinfo: %lx\n",
(ulong)dd->sub_header_kdump->offset_vmcoreinfo);
fprintf(fp, " size_vmcoreinfo: %lu\n",
dd->sub_header_kdump->size_vmcoreinfo);
if (dd->sub_header_kdump->offset_vmcoreinfo &&
dd->sub_header_kdump->size_vmcoreinfo) {
dump_vmcoreinfo(fp);
}
}
if (dh->header_version >= 4) {
fprintf(fp, " offset_note: %lx\n",
(ulong)dd->sub_header_kdump->offset_note);
fprintf(fp, " size_note: %lu\n",
dd->sub_header_kdump->size_note);
here ==================>
}
Call a function that dumps the file offset of each cpu's nt_prstatus data.
Then, if necessary, any cpu's register set can be read with "rd -f".
netdump.c:
In get_netdump_regs_x86(), both of these settings are incorrect:
user_regs = get_regs_from_note((char *)note, &ip, &sp);
=====> bt->flags |= BT_KDUMP_ELF_REGS;
if (is_kernel_text(ip) &&
(((sp >= GET_STACKBASE(bt->task)) &&
(sp < GET_STACKTOP(bt->task))) ||
in_alternate_stack(bt->tc->processor, sp))) {
*eip = ip;
*esp = sp;
=====> bt->flags |= BT_KERNEL_SPACE;
return;
}
Neither of these flags should be set there. BT_KDUMP_ELF_REGS
is only applicable to x86_64, and BT_KERNEL_SPACE is only
appicable to kvmdump dumpfiles.
Finally, in the interest of paranoia, give the user the capability
of *not* using this facility. In main.c, create a "--no_elf_notes"
option (similar to "--zero_excluded"), and have it set a NO_ELF_NOTES
bit in the globally-accessible "*diskdump_flags". As an example, see
how ZERO_EXCLUDED is set, used, and accessed. Then in read_dump_header()
skip the new ELF setup if that flag is set. It will not be necessary
to go as creating a new environment variable like "zero_excluded", but
it would be nice to be able to restrict its use on the crash invocation
command line.
Thanks,
Dave