----- Original Message -----
Hello Dave,
The following series fix minor bugs, clean up in sadump module, and
address the issue on kdump's first 640kB backup.
The last patch is a preparation for makedumpfile's support on
sadump-related formats, still work in progress, producing dumpfile in
kdump-compressed format from sadump-related formats.
This patch set is based on crash 5.1.9.
Hello Daisuke,
As I have stated in our previous sadump-related discussions, you have
free rein to make whatever changes you like in sadump-specific
files, or in functions that deal with sadump-specific issues. However,
if your changes modify behavior when used with non-sadump dumpfiles
then I may have a problem with them. So when you post a patch-set
such as this last set, I would prefer that you post two separate
patch-sets.
This 1/11 patchset is a good example of what I mean. I have no
problem with the sadump-specific patches. But I do have a big
problem with the last one, which is not necessarily sadump-specific:
use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
BTW, these are the names of the patches as they were attached, where
the second one doesn't have "0002-" prepended to it, and there is
no "0008-" patch?:
0001-sadump-bug-close-receives-unintened-value.patch.patch
cleanup_is_sadump.patch.patch
0002-sadump-bug-specify-wrong-type.patch.patch
0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch
0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch
0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch
0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch
0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch
0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch
0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch
use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch
Anyway, I tested this by running "bt -a" on a large set of sample dumpfiles,
first without, and then with, your patchset. When your patches are applied, I see
numerous examples where the backtraces are missing huge pieces of information.
Here are typical examples:
Here with un-patched crash-5.1.9, is a RHEL6 crashing process:
PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh"
#0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb
#1 [ffff88012b273a40] crash_kexec at ffffffff810b6632
#2 [ffff88012b273b10] oops_end at ffffffff814df320
#3 [ffff88012b273b40] no_context at ffffffff81040cbb
#4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45
#5 [ffff88012b273be0] bad_area at ffffffff8104106e
#6 [ffff88012b273c10] __do_page_fault at ffffffff81041793
#7 [ffff88012b273d30] do_page_fault at ffffffff814e132e
#8 [ffff88012b273d60] page_fault at ffffffff814de6b5
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000
R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
#10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
#11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
#12 [ffff88012b273ef0] vfs_write at ffffffff811730c8
#13 [ffff88012b273f30] sys_write at ffffffff81173ad1
#14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
With crash-5.1.9 plus your patch -- nothing is shown below the page fault
exception frame:
PID: 14187 TASK: ffff88012b98e040 CPU: 0 COMMAND: "runtest.sh"
[exception RIP: sysrq_handle_crash+22]
RIP: ffffffff8131b566 RSP: ffff88012b273e18 RFLAGS: 00010096
RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000000f95
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063
RBP: ffff88012b273e18 R8: ffffffff81b9e5c0 R9: 0000000000000000
R10: 00007fff7b178160 R11: 0000000000000000 R12: 0000000000000000
R13: ffffffff81a9a1a0 R14: 0000000000000286 R15: 0000000000000007
CS: 0010 SS: 0018
#0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
#1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
#2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
#3 [ffff88012b273ef0] vfs_write at ffffffff811730c8
#4 [ffff88012b273f30] sys_write at ffffffff81173ad1
#5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
RIP: 00007fad3a2f45e0 RSP: 00007fff7b1783d8 RFLAGS: 00010206
RAX: 0000000000000001 RBX: ffffffff8100b0b2 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 00007fad3abe6000 RDI: 0000000000000001
RBP: 00007fad3abe6000 R8: 000000000000000a R9: 00007fad3abe2700
R10: 00007fff7b178160 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fad3a5a6780 R14: 0000000000000002 R15: 0000000000000001
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus
that received shutdown NMI interrupts from the crashing task:
PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper"
#0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96
#1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5
#2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a
#3 [ffff880028227ef0] notify_die at ffffffff810942fe
#4 [ffff880028227f20] do_nmi at ffffffff814df033
#5 [ffff880028227f50] nmi at ffffffff814de940
[exception RIP: intel_idle+177]
RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046
RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040
RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020
R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#6 [ffff88012cd31e68] intel_idle at ffffffff812bc291
#7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
#8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2"
#0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96
#1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5
#2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a
#3 [ffff880028247ef0] notify_die at ffffffff810942fe
#4 [ffff880028247f20] do_nmi at ffffffff814df033
#5 [ffff880028247f50] nmi at ffffffff814de940
[exception RIP: io_serial_in+22]
RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006
RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000
RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0
RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000
R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c
R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6
#7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
#8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
#9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
#10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
#11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
#12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
#13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
#14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
#15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
#16 [ffff88012ce5fee8] kthread at ffffffff8108dff6
#17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception
stack are not even shown at all:
PID: 0 TASK: ffff88012cd2f580 CPU: 1 COMMAND: "swapper"
[exception RIP: intel_idle+177]
RIP: ffffffff812bc291 RSP: ffff88012cd31e68 RFLAGS: 00000046
RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff88012cd31fd8 RDI: ffffffff81a34040
RBP: ffff88012cd31ed8 R8: 0000000000000000 R9: 00000000000000c8
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000020
R13: 12257c81ed7a34e6 R14: 0000000000000003 R15: 0000000000000001
CS: 0010 SS: 0018
#0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d
#1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
#2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6
PID: 37 TASK: ffff88012ce360c0 CPU: 2 COMMAND: "events/2"
[exception RIP: io_serial_in+22]
RIP: ffffffff813324f6 RSP: ffff88012ce5fc70 RFLAGS: 00000006
RAX: ffffffffab364400 RBX: ffffffff81f2cca0 RCX: 0000000000000000
RDX: 000000000000d055 RSI: 0000000000000005 RDI: ffffffff81f2cca0
RBP: ffff88012ce5fc70 R8: ffffffff81b9e5c0 R9: 0000000000000000
R10: ffff880127498a60 R11: 0000000000000001 R12: 000000000000270c
R13: 0000000000000020 R14: 0000000000000000 R15: ffffffff81332ba0
CS: 0010 SS: 0018
#0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
#1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
#2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
#3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
#4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
#5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
#6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
#7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
#8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
#9 [ffff88012ce5fee8] kthread at ffffffff8108dff6
#10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a
If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch"
patch
the backtraces are correct. Now, it may be true that the changes you made make
sense with respect to sadump dumpfiles, where the register set stored in the header
is a reflection of the last location that each cpu ran (?).
But those changes are totally unacceptable for compressed kdump dumpfiles.
Dave