Re: [Crash-utility] [PATCH 00/11] sadump: Incremental update patches

Monday, 13 November 2023

From: Dave Anderson <anderson(a)redhat.com&gt;
Subject: Re: [Crash-utility] [PATCH 00/11] sadump: Incremental update patches
Date: Thu, 20 Oct 2011 17:06:54 -0400 (EDT)

...

 ----- Original Message -----
> Hello Dave,
> 
> The following series fix minor bugs, clean up in sadump module, and
> address the issue on kdump's first 640kB backup.
> 
> The last patch is a preparation for makedumpfile's support on
> sadump-related formats, still work in progress, producing dumpfile in
> kdump-compressed format from sadump-related formats.
> 
> This patch set is based on crash 5.1.9.

 Hello Daisuke,

 As I have stated in our previous sadump-related discussions, you have
 free rein to make whatever changes you like in sadump-specific
 files, or in functions that deal with sadump-specific issues.  However, 
 if your changes modify behavior when used with non-sadump dumpfiles
 then I may have a problem with them.  So when you post a patch-set 
 such as this last set, I would prefer that you post two separate 
 patch-sets.

 This 1/11 patchset is a good example of what I mean.  I have no
 problem with the sadump-specific patches.  But I do have a big
 problem with the last one, which is not necessarily sadump-specific:

   use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch

I see. I'll send them separately for the future.

...
 BTW, these are the names of the patches as they were attached, where
 the second one doesn't have "0002-" prepended to it, and there is
 no "0008-" patch?:

   0001-sadump-bug-close-receives-unintened-value.patch.patch
   cleanup_is_sadump.patch.patch
   0002-sadump-bug-specify-wrong-type.patch.patch
   0003-sadump-bugfix-time-stamp-values-displayed-are-same.patch.patch
   0004-sadump-don-t-exit-if-time-stamps-mismatch.patch.patch
   0005-sadump-debug-messages-at-the-beginning-of-open_disk-.patch.patch
   0006-sadump-Allow-arbitrary-number-of-disk-set-configurat.patch.patch
   0007-sadump-refer-to-eip-and-esp-on-x86-kernels.patch.patch
   0010-Make-data-relevant-to-physical-memory-have-64-bits-l.patch.patch
   0011-Read-kexec-backup-region-if-read-to-the-first-640kB-.patch.patch
   use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch

Sorry, it's unkind to you. I used stgit to organize the patch set and
send them. I didn't notice that stgit preserves original file names
during attachment.

...
 Anyway, I tested this by running "bt -a" on a large set of
sample dumpfiles, 
 first without, and then with, your patchset.  When your patches are applied, I see 
 numerous examples where the backtraces are missing huge pieces of information.

 Here are typical examples:

 Here with un-patched crash-5.1.9, is a RHEL6 crashing process:

  PID: 14187  TASK: ffff88012b98e040  CPU: 0   COMMAND: "runtest.sh"
   #0 [ffff88012b2739e0] machine_kexec at ffffffff810310fb
   #1 [ffff88012b273a40] crash_kexec at ffffffff810b6632
   #2 [ffff88012b273b10] oops_end at ffffffff814df320
   #3 [ffff88012b273b40] no_context at ffffffff81040cbb
   #4 [ffff88012b273b90] __bad_area_nosemaphore at ffffffff81040f45
   #5 [ffff88012b273be0] bad_area at ffffffff8104106e
   #6 [ffff88012b273c10] __do_page_fault at ffffffff81041793
   #7 [ffff88012b273d30] do_page_fault at ffffffff814e132e
   #8 [ffff88012b273d60] page_fault at ffffffff814de6b5
      [exception RIP: sysrq_handle_crash+22]
      RIP: ffffffff8131b566  RSP: ffff88012b273e18  RFLAGS: 00010096
      RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000f95
      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
      RBP: ffff88012b273e18   R8: ffffffff81b9e5c0   R9: 0000000000000000
      R10: 00007fff7b178160  R11: 0000000000000000  R12: 0000000000000000
      R13: ffffffff81a9a1a0  R14: 0000000000000286  R15: 0000000000000007
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #9 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
  #10 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
  #11 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
  #12 [ffff88012b273ef0] vfs_write at ffffffff811730c8
  #13 [ffff88012b273f30] sys_write at ffffffff81173ad1
  #14 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2

 With crash-5.1.9 plus your patch -- nothing is shown below the page fault
 exception frame:

  PID: 14187  TASK: ffff88012b98e040  CPU: 0   COMMAND: "runtest.sh"
      [exception RIP: sysrq_handle_crash+22]
      RIP: ffffffff8131b566  RSP: ffff88012b273e18  RFLAGS: 00010096
      RAX: 0000000000000010  RBX: 0000000000000063  RCX: 0000000000000f95
      RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000063
      RBP: ffff88012b273e18   R8: ffffffff81b9e5c0   R9: 0000000000000000
      R10: 00007fff7b178160  R11: 0000000000000000  R12: 0000000000000000
      R13: ffffffff81a9a1a0  R14: 0000000000000286  R15: 0000000000000007
      CS: 0010  SS: 0018
   #0 [ffff88012b273e20] __handle_sysrq at ffffffff8131b822
   #1 [ffff88012b273e70] write_sysrq_trigger at ffffffff8131b8de
   #2 [ffff88012b273ea0] proc_reg_write at ffffffff811d5bce
   #3 [ffff88012b273ef0] vfs_write at ffffffff811730c8
   #4 [ffff88012b273f30] sys_write at ffffffff81173ad1
   #5 [ffff88012b273f80] system_call_fastpath at ffffffff8100b0b2
      RIP: 00007fad3a2f45e0  RSP: 00007fff7b1783d8  RFLAGS: 00010206
      RAX: 0000000000000001  RBX: ffffffff8100b0b2  RCX: 0000000000000000
      RDX: 0000000000000002  RSI: 00007fad3abe6000  RDI: 0000000000000001
      RBP: 00007fad3abe6000   R8: 000000000000000a   R9: 00007fad3abe2700
      R10: 00007fff7b178160  R11: 0000000000000246  R12: 0000000000000002
      R13: 00007fad3a5a6780  R14: 0000000000000002  R15: 0000000000000001
      ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

 Again with un-patched crash-5.1.9, here are examples of two non-crashing cpus
 that received shutdown NMI interrupts from the crashing task:

  PID: 0      TASK: ffff88012cd2f580  CPU: 1   COMMAND: "swapper"
   #0 [ffff880028227e90] crash_nmi_callback at ffffffff81028a96
   #1 [ffff880028227ea0] notifier_call_chain at ffffffff814e13e5
   #2 [ffff880028227ee0] atomic_notifier_call_chain at ffffffff814e144a
   #3 [ffff880028227ef0] notify_die at ffffffff810942fe
   #4 [ffff880028227f20] do_nmi at ffffffff814df033
   #5 [ffff880028227f50] nmi at ffffffff814de940
      [exception RIP: intel_idle+177]
      RIP: ffffffff812bc291  RSP: ffff88012cd31e68  RFLAGS: 00000046
      RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
      RDX: 0000000000000000  RSI: ffff88012cd31fd8  RDI: ffffffff81a34040
      RBP: ffff88012cd31ed8   R8: 0000000000000000   R9: 00000000000000c8
      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000020
      R13: 12257c81ed7a34e6  R14: 0000000000000003  R15: 0000000000000001
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
   #6 [ffff88012cd31e68] intel_idle at ffffffff812bc291
   #7 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
   #8 [ffff88012cd31f00] cpu_idle at ffffffff81009de6

  PID: 37     TASK: ffff88012ce360c0  CPU: 2   COMMAND: "events/2"
   #0 [ffff880028247e90] crash_nmi_callback at ffffffff81028a96
   #1 [ffff880028247ea0] notifier_call_chain at ffffffff814e13e5
   #2 [ffff880028247ee0] atomic_notifier_call_chain at ffffffff814e144a
   #3 [ffff880028247ef0] notify_die at ffffffff810942fe
   #4 [ffff880028247f20] do_nmi at ffffffff814df033
   #5 [ffff880028247f50] nmi at ffffffff814de940
      [exception RIP: io_serial_in+22]
      RIP: ffffffff813324f6  RSP: ffff88012ce5fc70  RFLAGS: 00000006
      RAX: ffffffffab364400  RBX: ffffffff81f2cca0  RCX: 0000000000000000
      RDX: 000000000000d055  RSI: 0000000000000005  RDI: ffffffff81f2cca0
      RBP: ffff88012ce5fc70   R8: ffffffff81b9e5c0   R9: 0000000000000000
      R10: ffff880127498a60  R11: 0000000000000001  R12: 000000000000270c
      R13: 0000000000000020  R14: 0000000000000000  R15: ffffffff81332ba0
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
   #6 [ffff88012ce5fc70] io_serial_in at ffffffff813324f6
   #7 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
   #8 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
   #9 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
  #10 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
  #11 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
  #12 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
  #13 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
  #14 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
  #15 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
  #16 [ffff88012ce5fee8] kthread at ffffffff8108dff6
  #17 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a

 But when running crash-5.1.9 plus your patch -- the transitions to the NMI exception
 stack are not even shown at all:

  PID: 0      TASK: ffff88012cd2f580  CPU: 1   COMMAND: "swapper"
      [exception RIP: intel_idle+177]
      RIP: ffffffff812bc291  RSP: ffff88012cd31e68  RFLAGS: 00000046
      RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
      RDX: 0000000000000000  RSI: ffff88012cd31fd8  RDI: ffffffff81a34040
      RBP: ffff88012cd31ed8   R8: 0000000000000000   R9: 00000000000000c8
      R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000020
      R13: 12257c81ed7a34e6  R14: 0000000000000003  R15: 0000000000000001
      CS: 0010  SS: 0018
   #0 [ffff88012cd31e70] sched_clock_cpu at ffffffff8109539d
   #1 [ffff88012cd31ee0] cpuidle_idle_call at ffffffff813ed4b7
   #2 [ffff88012cd31f00] cpu_idle at ffffffff81009de6

  PID: 37     TASK: ffff88012ce360c0  CPU: 2   COMMAND: "events/2"
      [exception RIP: io_serial_in+22]
      RIP: ffffffff813324f6  RSP: ffff88012ce5fc70  RFLAGS: 00000006
      RAX: ffffffffab364400  RBX: ffffffff81f2cca0  RCX: 0000000000000000
      RDX: 000000000000d055  RSI: 0000000000000005  RDI: ffffffff81f2cca0
      RBP: ffff88012ce5fc70   R8: ffffffff81b9e5c0   R9: 0000000000000000
      R10: ffff880127498a60  R11: 0000000000000001  R12: 000000000000270c
      R13: 0000000000000020  R14: 0000000000000000  R15: ffffffff81332ba0
      CS: 0010  SS: 0018
   #0 [ffff88012ce5fc78] wait_for_xmitr at ffffffff81332b03
   #1 [ffff88012ce5fca8] serial8250_console_putchar at ffffffff81332bc6
   #2 [ffff88012ce5fcc8] uart_console_write at ffffffff8132e55e
   #3 [ffff88012ce5fd08] serial8250_console_write at ffffffff81332f2d
   #4 [ffff88012ce5fd58] __call_console_drivers at ffffffff81067495
   #5 [ffff88012ce5fd88] _call_console_drivers at ffffffff810674fa
   #6 [ffff88012ce5fda8] release_console_sem at ffffffff81067ac8
   #7 [ffff88012ce5fde8] fb_flashcursor at ffffffff812abb4a
   #8 [ffff88012ce5fe38] worker_thread at ffffffff81088a40
   #9 [ffff88012ce5fee8] kthread at ffffffff8108dff6
  #10 [ffff88012ce5ff48] kernel_thread at ffffffff8100c10a

 If I remove the "use_regs_in_elf_notes_on_kdump_fmt_from_sadump.patch.patch"
patch
 the backtraces are correct.  Now, it may be true that the changes you made make
 sense with respect to sadump dumpfiles, where the register set stored in the header
 is a reflection of the last location that each cpu ran (?).  

 But those changes are totally unacceptable for compressed kdump dumpfiles. 
I undestand the situtation.

I attach V2 patch. I confirmed this doesn't break the logic explained
above. Could you review this?

Thanks.
HATAYAMA, Daisuke

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] [PATCH 00/11] sadump: Incremental update patches