(2012/06/20 3:25), Dave Anderson wrote:
----- Original Message -----
>
> OK, now I'm getting confused...
>
The more I look at this patch, the more confused I get...
During initialization, the ELF notes contained in the
dumpfile file header are scanned, and if an NT_PRSTATUS
note is seen, a pointer to its location in the dumpfile
is saved in dd->nt_prstatus_percpu[num] and the "num"
of valid notes is kept in dd->num_prstatus_notes.
If the dd->num_prstatus_notes is equal to the online cpu
count, then it is presumed that there is a one-to-one
relationship, where the cpu number can be used as the
index into the dd->nt_prstatus_percpu[num] array.
If the number of notes is not equal to the number of online
cpus, then the "mapping" function is called, where if a
cpu is found to be offline, then its (incorrectly) associated
entry in the dd->nt_prstatus_percpu[num] array is "pushed up"
to the next higher entry. But the dd->num_prstatus_notes
does not seem to get incremented to reflect that move, so
then it's seems like diskdump_get_prstatus_percpu() can
possibly return NULL when there actually is a relevant
NT_PRSTATUS note.
A dd->num_prstatus_notes is equal to number of NT_PRSTATUS notes in dump file
and unless nt_prstatus_percpu[] is continuous from 0,
diskdump_get_prstatus_percpu() can possibly return NULL.
That seems to be a bug (?), but it's not particularly important,
because for x86 and x86_64, the data in the NT_PRSTATUS notes is
only used if the starting point for backtraces if the PC/SP pair
cannot be determined otherwise, which is the case virtually all of
the time. So the registers found in the NT_PRSTATUS notes are
pretty much useless...
I seem to be minor bug, if some of cpus are offline while panic,
can't backtrace the same number of cpus from tail because of returning NULL.
I think current task's PC/SP can not obtain from thread_struct corectly,
then NT_PRSTATUS notes need to be stored for them instead.
But anyway, panic under offline is very unlikely condition
as same as the NMI failed.
Now, to complicate matters, your patch does not look at the
NT_PRSTATUS notes in the dumpfile header, but instead looks
at the base kernel's original notes, and verifies their
contents, and correlates what's found there against what was
found in the dumpfile? So I don't understand what you are
attempting to do -- what is the difference between the notes
that are copied into the dumpfile vs. what you are looking at
in the base kernel?
At first, my patch is trying to find out tainted note area in base kernel.
Such notes can not be found out from dumpfile, however, found out other ones
are set continuously into nt_prstatus_percpu[from 0 to found out notes].
I just want to repaire nt_prstatus_percpu[#index] corresponding to cpu#index.
[example]
cpu#0 cpu#1 cpu#2 cpu#3
saved tainted saved tainted [by looking at base kernel]
=> nt_prstatus_percpu[0] = cpu#0's note, nt_prstatus_percpu[1] = cpu#3's note
I want to repaire array like,
=> nt_prstatus_percpu[0] = cpu#0's note, nt_prstatus_percpu[3] = cpu#3's note
Since ELF note format dose not contain information which cpu saved me,
I think there is no way to identify mapping between cpu and
nt_prstatus_percpu[] from ELF note when both are not equal to.
That's why I try to resolve from base kernel area.
I'm also wondering what would happen in your case if there
were a combination of "lost" notes *and* offline cpus? How
would that work?
I think offline cpus never ack IPI, thus lost notes everytime(not concerned).
I only assume the case that cpu was online but could not ack IPI caused
by any reasons in interrupts off.
So at this point I really don't want to add this patch
at all because it touches common code, and I don't want to
risk breaking the other arches. Nobody has ever reported
any "lost" cpus so far, probably because the kdump facility
uses non-maskable NMI's to shutdown the non-panicking cpus.
This is such a highly-unlikely corner case, that it does
not even seem worth addressing for fear of breaking something
else.
I can agree, x86 architecture own NMI which is most reliable way to
deliver interrupts to other cpus while powerpc may not own NMI.
I am going to rework patch so that it can be effective only in ppc arch.
I didn't look at the reasoning behind why you ran into a
segmentation violation, but since the PPC code path would be:
...
back_trace()
get_diskdump_regs()
get_diskdump_regs_ppc()
get_diskdump_regs_ppc() is touching dd->nt_prstatus_percpu[bt->tc->processor]
when "bt" for panic_task.
---------
if (KDUMP_CMPRS_VALID() &&
(bt->task == tt->panic_task ||
(is_task_active(bt->task) && (dd->num_prstatus_notes >
1)))) {
note = (Elf32_Nhdr*) dd->nt_prstatus_percpu[bt->tc->processor];
if (!note)
error(FATAL,
"cannot determine NT_PRSTATUS ELF note "
"for %s task: %lx\n",
(bt->task == tt->panic_task) ?
"panic" : "active",
bt->task);
len = sizeof(Elf32_Nhdr);
len = roundup(len + note->n_namesz, 4);
bt->machdep = (void *)((char *)note + len +
MEMBER_OFFSET("elf_prstatus", "pr_reg"));
}
machdep->get_stack_frame(bt, eip, esp);
---------
In my dumpfile case, bt->tc->processor is 2 and its note is contained at
dd->nt_prstatus_percpu[0].
note = (Elf32_Nhdr*) dd->nt_prstatus_percpu[2];
* Here is no note instance which "bt" expects.
A dd->nt_prstatus_percpu[] is allocated by malloc() and not initialied with 0.
When note->n_namesz is executed, it is cause of segmentation violation.
This malloc() is moved to calloc() in my first patch.
Other problem which is also appeared in get_netdump_regs_ppc().
A dd->num_prstatus_notes is 1 in my dumpfile although there were 8 online cpus.
If next condition "is_task_active(bt->task)" is required by "bt",
get_diskdump_regs_ppc() unfortunately do machdep->get_stack_frame() for
active tasks and surely failed.
"cannot determine NT_PRSTATUS ELF note ..." should be proper.
perhaps you can rework your patch so that it is segregated
to PPC only?
Yes I'm going to rework, and thanks for your reviews.
I'm modifying difficult parts this time, very helpful.
I'll do more tests before sending reworked PPC only patch.
I also have to study about kexectools or makedumpfile more about
how NT_PRSTATUS or others are treated, haven't understood enough.
Toshi
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility