----- Original Message -----
We have a new hardware to do dump, and use makedumpfile to generate
vmcore. Our hardware can work when the OS is out of controll(for
example: dead loop). When we use crash to analyze the vmcore, bt can
not work, because there is no panic task.
We have provide the value of register in the vmcore(the format is
elf_prstatus, it is same with normal kdump's vmcore). So we can use
it when we do not find panic task.
Let me state first that I get very nervous when patches are
made to commonly used functions to handle special cases such
as this. When those kinds of changes are proposed, I much prefer
that the new code *only* apply to the special cases, and should
not affect anything else.
I do not have any x86 or x86_64 compressed kdumps that
contain ELF note data, so I cannot really test this patch.
However, when I do run this patch against a sample set
of ~150 x86_64 dumpfiles, I see errors on simple backtraces
such as the following examples.
Without the patch:
crash> bt ffff81007e54c100
PID: 0 TASK: ffff81007e54c100 CPU: 1 COMMAND: "swapper"
#0 [ffff81007e565ef0] cpu_idle at ffffffff80047282
crash>
With your patch:
crash> bt ffff81007e54c100
PID: 0 TASK: ffff81007e54c100 CPU: 1 COMMAND: "swapper"
Segmentation fault
And it doesn't work with Xen kernels very well.
Without the patch:
crash> bt ffff8800009c0040
PID: 0 TASK: ffff8800009c0040 CPU: 1 COMMAND: "swapper"
#0 [ffff88001d485ef8] safe_halt at ffffffff8011004c
#1 [ffff88001d485f28] xen_idle at ffffffff801092f4
#2 [ffff88001d485f38] cpu_idle at ffffffff801093b5
crash>
With your patch:
crash> bt ffff8800009c0040
PID: 0 TASK: ffff8800009c0040 CPU: 1 COMMAND: "swapper"
bt: cannot determine starting stack pointer
crash>
Without the patch:
crash> bt -a
... [ cut ] ...
PID: 0 TASK: ffff880000017040 CPU: 1 COMMAND: "swapper"
#0 [ffff880033983f38] xen_idle at ffffffff8026c502
#1 [ffff880033983f48] cpu_idle at ffffffff8024a982
PID: 0 TASK: ffff8800000037e0 CPU: 2 COMMAND: "swapper"
#0 [ffff880033987f38] xen_idle at ffffffff8026c502
#1 [ffff880033987f48] cpu_idle at ffffffff8024a982
PID: 0 TASK: ffff880000003080 CPU: 3 COMMAND: "swapper"
#0 [ffff880033989f38] xen_idle at ffffffff8026c502
#1 [ffff880033989f48] cpu_idle at ffffffff8024a982
With your patch:
crash> bt -a
... [ cut ] ...
PID: 0 TASK: ffff880000017040 CPU: 1 COMMAND: "swapper"
bt: cannot determine starting stack pointer
PID: 0 TASK: ffff8800000037e0 CPU: 2 COMMAND: "swapper"
bt: cannot determine starting stack pointer
PID: 0 TASK: ffff880000003080 CPU: 3 COMMAND: "swapper"
bt: cannot determine starting stack pointer
crash>
So my point is that given that *none* of these kernels even
contain makedumpfile-generated ELF data -- so your code should
*not* affect them at all.
That being the case, I cannot accept the patch as it is currently
written. Things work fine as things are now, and I'm not interested
in debugging things that break because your patch changes current
behavior.
Here are my two major suggestions:
(1) If it is determined during the initial dumpfile scan that
it contains ELF note data, then set a global flag, perhaps
in the new pc->flags2, something like MAKEDUMPFILE_ELF_NOTES.
(2) Then, segregate your changes *completely* based upon that
flag. I would rather have essentially-redundant code put in
place instead of breaking currently-existing code.
That way, for all dumpfiles where the MAKEDUMPFILE_ELF_NOTES flag
is *not* set, then your code should *not* run.
And here are a few specific comments:
(1) The machdep->process_elf_notes handler was put in place for
this kind of thing, so please continue to use it for x86 and
x86_64 in the same way that the s390x architecture does.
You don't need to pass the machine type as an extra argument,
given that "machine_type()" can be used anywhere. The two
arches can share the same handler.
(2) get_netdump_regs_x86_64: segregate the code based upon the
MAKEDUMPFILE_ELF_NOTES flag.
(3) get_netdump_regs_x86(): segregate the code based upon the
MAKEDUMPFILE_ELF_NOTES flag.
(4) Make a new map_cpus_to_prstatus()-type function for this kind
of dumpfile. In task_init(), you can check the MAKEDUMPFILE_ELF_NOTES
flag and call the new function, and so you won't need your new
KDUMP_CMPRS_DUMPFILE() function.
(5) x86_64_get_dumpfile_stack_frame(): use the MAKEDUMPFILE_ELF_NOTES
flags instead of bt->machdep.
And two final requests:
(1) Before posting the patch, please build your changes with "make warn".
Your changes generate a few warnings.
(2) Can make your patches attachments to your email instead of inline?
Thanks,
Dave