----- Original Message -----
Thank you for the guidance Dave.
I have two questions regarding runq.
1. Could you please let me know how the active task has spent more time than
uptime on some CPUs?
I'm not sure, other than the uptime is calculated based upon jiffies, and
the runq -m option uses each run queue's per-cpu timestamp.
crash> runq -m
CPU 0: [0 00:23:29.808] PID: 529 TASK: ffff88079d0d1e40 COMMAND:
"kworker/u141:1"
CPU 1: [1 12:10:42.840] PID: 0 TASK: ffff88079df48000 COMMAND: "swapper/1"
CPU 2: [1 12:10:42.841] PID: 0 TASK: ffff88079df4bc80 COMMAND: "swapper/2"
CPU 3: [1 12:10:42.841] PID: 0 TASK: ffff88079df4dac0 COMMAND: "swapper/3"
CPU 4: [1 12:10:42.841] PID: 0 TASK: ffff88079df49e40 COMMAND: "swapper/4"
CPU 5: [1 12:10:42.841] PID: 0 TASK: ffff88079df58000 COMMAND: "swapper/5"
crash> sys
KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
DUMPFILE: gt-user2-gmt-612746ca.vmss
CPUS: 70
DATE: Wed Feb 21 14:53:20 2018
UPTIME: 1 days, 11:52:25
LOAD AVERAGE: 70.70, 30.98, 12.88
TASKS: 2312
NODENAME:
gt-user2-gmt.com
RELEASE: 4.14.19-coreos
VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
MACHINE: x86_64 (2094 Mhz)
MEMORY: 60 GB
PANIC: ""
crash>
2. Is there a way to:: find out why some CPUs have time lag in run queue ?
I don't know, but I would certainly look at the task backtraces of the cpus
that have the large lag values.
CPU 32: 0.00 secs
CPU 65: 0.00 secs
CPU 54: 0.00 secs
CPU 0: 0.01 secs
CPU 16: 84.22 secs
CPU 66: 268.75 secs
CPU 58: 268.75 secs
CPU 57: 268.75 secs
CPU 43: 268.75 secs
CPU 20: 268.75 secs
CPU 7: 268.75 secs
crash>
I'm struggling to find out why my VM hung(unresponsive to
ping/ssh and couple
of CPUs at 100% utilization).
-Eshak
On Thu, Feb 22, 2018 at 6:27 AM, Dave Anderson < anderson(a)redhat.com > wrote:
----- Original Message -----
> Hello Dave,
>
> I got a kernel freeze yesterday and am able to successfully open the memory
> image using crash utility.
>
> crash> sys
> KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
> DUMPFILE: gt-Server02-gmt-612746ca.vmss
> CPUS: 70
> DATE: Wed Feb 21 14:53:20 2018
> UPTIME: 1 days, 11:52:25
> LOAD AVERAGE: 70.70, 30.98, 12.88
> TASKS: 2312
> NODENAME:
gt-Server02-gmt.com
> RELEASE: 4.14.19-coreos
> VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
> MACHINE: x86_64 (2094 Mhz)
> MEMORY: 60 GB
> PANIC: ""
> crash>
>
> Could you please guide me about couple of things I should check in case of
> a kernel freeze before diving in deep to find the root cause ?
I'm not sure what you mean by a "kernel freeze", but typically something
would complain about a hard or soft lockup in the system log. So I would
first run "log" to see if there's anything of interest. Run "bt
-a" on
the active tasks to see if the active tasks are contesting for something,
or work your way through "foreach bt" to see what the tasks of interest are
doing/waiting on. It would seem that some task has taken control of
something,
a lock, or counter, or whatever, and many other tasks have blocked waiting
for its release. So there's probably a common theme among the blocked tasks
that might give you a clue.
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility