Chandru wrote:
The machine is a ppc64 box with a RHEL5.1 based SMP kernel. nr_cpus
is
equal to '2' in get_idle_threads() , but the system actually has 14 cpus
and 12 of them were offline when a vmcore was collected. The
kt->__per_cpu_offset[12 & 13 ] have per cpu offset values where as
kt->__per_cpu_offset[0 to 11] = 0. I changed kt->__per_cpu_offset[i]
in ppc64_paca_init() to kt->__per_cpu_offset[cpus] and that started
crash. But backtrace 'bt' exited with segmentation fault . Looking
further the code in get_netdump_regs_ppc64()
if (nd->num_prstatus_notes > 1)
{
note = (Elf64_Nhdr *)
nd->nt_prstatus_percpu[bt->tc->processor];
}
had bt->tc->processor as '12'. I changed it to '0' and that gave
the
backtrace.
Regards,
Chandru
Chandru,
I can reproduce the "idle-task" initialization-time failure on a
4-cpu ppc64 by offlining cpus 0 and 1. That can be fixed fairly trivially
in ppc64_paca_init() by checking the cpu_present_map instead of the
cpu_online_map. So this hack should get you to a prompt:
# diff ppc64.c.orig ppc64.c
2400c2400
< readmem(symbol_value("cpu_online_map"), KVADDR,
&cpu_online_map[0],
---
readmem(symbol_value("cpu_present_map"), KVADDR,
&cpu_online_map[0],
#
With respect to the "bt" failure, that will take a bit of tinkering.
When kdump collects NT_PRSTATUS segments, it only does it for
online cpus. So in your case, there would be 2 NT_PRSTATUS notes,
one each for cpu 12 and cpu 13. That being the case, get_netdump_regs_ppc64()
would have to be modified to pick the proper one, given that the
processor number will be 12 or 13, and that would have to mapped
to the associated "online index" of NT_PRSTATUS notes. That could
get ugly. There is an elf_prstatus.ppid field that could be matched
against the incoming "bt" task pid, although there could be multiple
pid 0's, so that probably is not the best answer. I'm not sure
what is the best way to go here.
So again, I prefer not to tinker with the ppc64-specific code base
in crash, and have always deferred it back to the author (haren).
If he is not available, can you find out who in IBM is the proper
person to run this by?
Thanks,
Dave