Thank you for the guidance Dave.

I have two questions regarding runq.

1. Could you please let me know how the active task has spent more time than uptime on some CPUs ?

crash> runq -m

  CPU 0: [0 00:23:29.808]  PID: 529    TASK: ffff88079d0d1e40  COMMAND: "kworker/u141:1"

  CPU 1: [1 12:10:42.840]  PID: 0      TASK: ffff88079df48000  COMMAND: "swapper/1"

  CPU 2: [1 12:10:42.841]  PID: 0      TASK: ffff88079df4bc80  COMMAND: "swapper/2"

  CPU 3: [1 12:10:42.841]  PID: 0      TASK: ffff88079df4dac0  COMMAND: "swapper/3"

  CPU 4: [1 12:10:42.841]  PID: 0      TASK: ffff88079df49e40  COMMAND: "swapper/4"

  CPU 5: [1 12:10:42.841]  PID: 0      TASK: ffff88079df58000  COMMAND: "swapper/5"


crash> sys

      KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux

    DUMPFILE: gt-user2-gmt-612746ca.vmss

        CPUS: 70

        DATE: Wed Feb 21 14:53:20 2018

      UPTIME: 1 days, 11:52:25

LOAD AVERAGE: 70.70, 30.98, 12.88

       TASKS: 2312

    NODENAME: gt-user2-gmt.com

     RELEASE: 4.14.19-coreos

     VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018

     MACHINE: x86_64  (2094 Mhz)

      MEMORY: 60 GB

       PANIC: ""

crash> 


2. Is there a way to find out why some CPUs have time lag in run queue ?

  CPU 32: 0.00 secs

  CPU 65: 0.00 secs

  CPU 54: 0.00 secs

   CPU 0: 0.01 secs

  CPU 16: 84.22 secs

  CPU 66: 268.75 secs

  CPU 58: 268.75 secs

  CPU 57: 268.75 secs

  CPU 43: 268.75 secs

  CPU 20: 268.75 secs

   CPU 7: 268.75 secs

crash>


I'm struggling to find out why my VM hung(unresponsive to ping/ssh and couple of CPUs at 100% utilization).

-Eshak

On Thu, Feb 22, 2018 at 6:27 AM, Dave Anderson <anderson@redhat.com> wrote:
----- Original Message -----
> Hello Dave,
>
> I got a kernel freeze yesterday and am able to successfully open the memory
> image using crash utility.
>
> crash> sys
>       KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
>     DUMPFILE: gt-Server02-gmt-612746ca.vmss
>         CPUS: 70
>         DATE: Wed Feb 21 14:53:20 2018
>       UPTIME: 1 days, 11:52:25
> LOAD AVERAGE: 70.70, 30.98, 12.88
>        TASKS: 2312
>     NODENAME: gt-Server02-gmt.com
>      RELEASE: 4.14.19-coreos
>      VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
>      MACHINE: x86_64  (2094 Mhz)
>       MEMORY: 60 GB
>        PANIC: ""
> crash>
>
> Could you please guide me about couple of things I should check in case of
> a kernel freeze before diving in deep to find the root cause ?

I'm not sure what you mean by a "kernel freeze", but typically something
would complain about a hard or soft lockup in the system log.  So I would
first run "log" to see if there's anything of interest.  Run "bt -a" on
the active tasks to see if the active tasks are contesting for something,
or work your way through "foreach bt" to see what the tasks of interest are
doing/waiting on.  It would seem that some task has taken control of something,
a lock, or counter, or whatever, and many other tasks have blocked waiting
for its release.  So there's probably a common theme among the blocked tasks
that might give you a clue.

Dave