> It would interesting to find out what happened in the x86_process_elf_notes()
> function. 

*** Breakpoints in x86_process_elf_notes()...

(gdb) break diskdump.c:245
Breakpoint 1 at 0x52379b: file diskdump.c, line 245.
(gdb) r

Breakpoint 1, x86_process_elf_notes (note_ptr=0xd1e000, size_note=1780)
    at diskdump.c:245
245                             note64 = note_ptr + tot;
(gdb) p *(Elf64_Nhdr *)(note_ptr + tot)
$1 = {n_namesz = 5, n_descsz = 336, n_type = 1}
(gdb) c
Continuing.

Breakpoint 1, x86_process_elf_notes (note_ptr=0xd1e000, size_note=1780)
    at diskdump.c:245
245                             note64 = note_ptr + tot;
(gdb) p *(Elf64_Nhdr *)(note_ptr + tot)
$2 = {n_namesz = 11, n_descsz = 1392, n_type = 0}
(gdb) c
Continuing.

Breakpoint 1, x86_process_elf_notes (note_ptr=0xd1e000, size_note=1780)
    at diskdump.c:245
245                             note64 = note_ptr + tot;
(gdb) p *(Elf64_Nhdr *)(note_ptr + tot)
$3 = {n_namesz = 0, n_descsz = 0, n_type = 0}
(gdb) c
Continuing.


>> crash: page excluded: kernel virtual address: ffffffff81bb3b00 type:
"cpu number (per_cpu)"
>> crash: page excluded: kernel virtual address: ffffffff81bb3b00 type:
"cpu number (per_cpu)"
> [snip]
> loop in both functions -- can you dump out which cpu's
> per-cpu data was inaccessible?

(gdb) break memory.c:1976
Breakpoint 1 at 0x4722ff: file memory.c, line 1976.
(gdb) set arg -d1 vmlinux vmcore
(gdb) r
Breakpoint 1, readmem (addr=18446744071591115520, memtype=1,
    buffer=0x7fffffff5b5c, size=4, type=0x7c7744 "cpu number (per_cpu)",
    error_handle=6) at memory.c:1976
1976                                    error(INFO, PAGE_EXCLUDED_ERRMSG, memtype_string(memtype, 0), addr, type);
(gdb) up
#1  0x00000000004e5871 in x86_64_get_smp_cpus () at x86_64.c:4674
4674                            if (!readmem(sp->value + kt->__per_cpu_offset[i],
(gdb) p cpunumber
$1 = 15
(gdb) p cpus
$2 = 16
(gdb) p i
$3 = 16
(gdb) p/x kt->__per_cpu_offset[0]@17
$4 = {0xffff880028200000, 0xffff880028240000, 0xffff880028280000,
  0xffff8800282c0000, 0xffff880287400000, 0xffff880287440000,
  0xffff880287480000, 0xffff8802874c0000, 0xffff880028300000,
  0xffff880028340000, 0xffff880028380000, 0xffff8800283c0000,
  0xffff880287500000, 0xffff880287540000, 0xffff880287580000,
  0xffff8802875c0000, 0xffffffff81ba6000}


> Joe, do you know if the non-crashing cpus were in some kind of
> bizarre state such that they would not respond to the shutdown NMI?
> I suppose in that case, there would be only the one NT_PRSTATUS
> note for the crashing cpu (plus the VMCOREINFO note).

The other CPUs are almost all sitting idle, a few are running I/O.

> In any case, so far I've got two patches queued to help address
> the two segmentation violations generated by a scenario such as
> this.

Patches applied and verified no segmentation faults.

I have uploaded this vmcore/vmlinux to our FTP site (details to come in private mail).


Thanks,

-- Joe Lawrence