Thank you for the guidance Dave.
I have two questions regarding runq.
1. Could you please let me know how the active task has spent more time
than uptime on some CPUs ?
crash> runq -m
  CPU 0: [0 00:23:29.808]  PID: 529    TASK: ffff88079d0d1e40  COMMAND:
"kworker/u141:1"
  CPU 1: [1 12:10:42.840]  PID: 0      TASK: ffff88079df48000  COMMAND:
"swapper/1"
  CPU 2: [1 12:10:42.841]  PID: 0      TASK: ffff88079df4bc80  COMMAND:
"swapper/2"
  CPU 3: [1 12:10:42.841]  PID: 0      TASK: ffff88079df4dac0  COMMAND:
"swapper/3"
  CPU 4: [1 12:10:42.841]  PID: 0      TASK: ffff88079df49e40  COMMAND:
"swapper/4"
  CPU 5: [1 12:10:42.841]  PID: 0      TASK: ffff88079df58000  COMMAND:
"swapper/5"
crash> sys
      KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
    DUMPFILE: gt-user2-gmt-612746ca.vmss
        CPUS: 70
        DATE: Wed Feb 21 14:53:20 2018
      UPTIME: 1 days, 11:52:25
LOAD AVERAGE: 70.70, 30.98, 12.88
       TASKS: 2312
    NODENAME: 
gt-user2-gmt.com
     RELEASE: 4.14.19-coreos
     VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
     MACHINE: x86_64  (2094 Mhz)
      MEMORY: 60 GB
       PANIC: ""
crash>
2. Is there a way to find out why some CPUs have time lag in run queue ?
  CPU 32: 0.00 secs
  CPU 65: 0.00 secs
  CPU 54: 0.00 secs
   CPU 0: 0.01 secs
  CPU 16: 84.22 secs
  CPU 66: 268.75 secs
  CPU 58: 268.75 secs
  CPU 57: 268.75 secs
  CPU 43: 268.75 secs
  CPU 20: 268.75 secs
   CPU 7: 268.75 secs
crash>
I'm struggling to find out why my VM hung(unresponsive to ping/ssh and
couple of CPUs at 100% utilization).
-Eshak
On Thu, Feb 22, 2018 at 6:27 AM, Dave Anderson <anderson(a)redhat.com> wrote:
 ----- Original Message -----
 > Hello Dave,
 >
 > I got a kernel freeze yesterday and am able to successfully open the
 memory
 > image using crash utility.
 >
 > crash> sys
 >       KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
 >     DUMPFILE: gt-Server02-gmt-612746ca.vmss
 >         CPUS: 70
 >         DATE: Wed Feb 21 14:53:20 2018
 >       UPTIME: 1 days, 11:52:25
 > LOAD AVERAGE: 70.70, 30.98, 12.88
 >        TASKS: 2312
 >     NODENAME: 
gt-Server02-gmt.com
 >      RELEASE: 4.14.19-coreos
 >      VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
 >      MACHINE: x86_64  (2094 Mhz)
 >       MEMORY: 60 GB
 >        PANIC: ""
 > crash>
 >
 > Could you please guide me about couple of things I should check in case
 of
 > a kernel freeze before diving in deep to find the root cause ?
 I'm not sure what you mean by a "kernel freeze", but typically something
 would complain about a hard or soft lockup in the system log.  So I would
 first run "log" to see if there's anything of interest.  Run "bt
-a" on
 the active tasks to see if the active tasks are contesting for something,
 or work your way through "foreach bt" to see what the tasks of interest are
 doing/waiting on.  It would seem that some task has taken control of
 something,
 a lock, or counter, or whatever, and many other tasks have blocked waiting
 for its release.  So there's probably a common theme among the blocked
 tasks
 that might give you a clue.
 Dave
 --
 Crash-utility mailing list
 Crash-utility(a)redhat.com
 
https://www.redhat.com/mailman/listinfo/crash-utility