----- Original Message -----
What's unexplainable here is the dump of the note information:
> size_note: 1780
> num_prstatus_notes: 1
> notes_buf: 2cc4000
> notes[0]: 2cc4000
The determination of the number of ELF nt_prstatus notes is based
upon the contents of the kdump_sub_header, where "size_note" describes
a single buffer in the dumpfile that contains an array of nt_prstatus
notes. Each note consists of a small Elf64_Nhdr header, a name string,
and a register dump. Here's an example of one taken from an ELF-format
kdump:
Elf64_Nhdr:
n_namesz: 5 ("CORE")
n_descsz: 336
n_type: 1 (NT_PRSTATUS)
0000000000000000 0000000000000000
0000000000000000 0000000000000000
000000000000544d 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000000
0000000000000001 00007fffb894e76f
00007fffb894e170 ffff88012969ebf0
ffff880128541f88 ffff880108870a00
0000000000000000 0000000000000000
ffffffff8184f8f0 0000000000000000
ffff880108870ab0 0000000000000003
0000000000000004 ffffffff81ad7fd0
ffff8801280e44b0 ffffffffffffffff
ffffffff8108d378 0000000000000010
0000000000010202 ffff88012854de68
0000000000000018 00007fa3283af700
0000000000000000 0000000000000000
0000000000000000 0000000000000000
0000000000000000 0000000000000000
So rounded up, each note is roughly ~350 bytes. So, while a "size_note"
of 1780 bytes wouldn't be large enough to contain the notes for 16 cpus,
it would seem to contain more than 1 note. (???) But the note-gathering
code was only able to come up a "num_prstatus_notes" of 1.
It would interesting to find out what happened in the x86_process_elf_notes()
function.
Digging into this a bit more -- the array of notes in the dumpfile
also includes the VMCOREINFO note (n_type 0), which is roughly ~1400
bytes in length. So given that all of the notes consume 1780 bytes in
Joe's dump, it looks like there is one NT_PRSTATUS note and one VMCOREINFO
note.
Joe, do you know if the non-crashing cpus were in some kind of
bizarre state such that they would not respond to the shutdown NMI?
I suppose in that case, there would be only the one NT_PRSTATUS
note for the crashing cpu (plus the VMCOREINFO note).
In any case, so far I've got two patches queued to help address
the two segmentation violations generated by a scenario such as
this. First Joe's patch:
--- x86_64.c 28 Sep 2011 18:09:54 -0000 1.187
+++ x86_64.c 29 Sep 2011 19:17:09 -0000 1.188
@@ -4181,7 +4181,7 @@
goto skip_stage;
}
}
- } else if (ELF_NOTES_VALID()) {
+ } else if (ELF_NOTES_VALID() && bt->machdep) {
user_regs = bt->machdep;
ur_rip = ULONG(user_regs +
OFFSET(user_regs_struct_rip));
And then this preventative measure to prevent a bogus ELF
note pointer being passed back:
--- diskdump.c 20 Sep 2011 20:41:14 -0000 1.38
+++ diskdump.c 30 Sep 2011 14:55:11 -0000
@@ -1467,6 +1467,9 @@
void *
diskdump_get_prstatus_percpu(int cpu)
{
+ if ((cpu < 0) || (cpu >= dd->num_prstatus_notes))
+ return NULL;
+
return dd->nt_prstatus_percpu[cpu];
}
Dave