Hello Dave,
I got a kernel freeze yesterday and am able to successfully open the memory
image using crash utility.
crash> sys
KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
DUMPFILE: gt-Server02-gmt-612746ca.vmss
CPUS: 70
DATE: Wed Feb 21 14:53:20 2018
UPTIME: 1 days, 11:52:25
LOAD AVERAGE: 70.70, 30.98, 12.88
TASKS: 2312
NODENAME:
RELEASE: 4.14.19-coreos
VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
MACHINE: x86_64 (2094 Mhz)
MEMORY: 60 GB
PANIC: ""
crash>
Could you please guide me about couple of things I should check in case of
a kernel freeze before diving in deep to find the root cause ?
Thank you,
Eshak
On Wed, Feb 7, 2018 at 7:12 PM, Eshak <tmdeshak(a)gmail.com> wrote:
Thank you for the quick info Dave.
I'll deploy the main node with 'nokaslr' boot option and wait for a VM
freeze.
-Eshak
On Wed, Feb 7, 2018 at 6:45 PM, anderson <anderson(a)prospeed.net> wrote:
>
>
>
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
> -------- Original message --------
> From: Eshak <tmdeshak(a)gmail.com>
> Date: 2/7/18 9:34 PM (GMT-05:00)
> To: "Discussion list for crash utility usage, maintenance and
> development" <crash-utility(a)redhat.com>
> Subject: Re: [Crash-utility] linux_banner has garbage
>
> Hi Dave,
>
> In a test system I have booted the kernel with 'nokaslr' option. While
> trying to check phys_base and KASLR:
>
> crash> help -m |grep phys_base
>
> phys_base: 0
>
> text hit rate: 66% (5171 of 7801)
>
> crash> help -k | grep relocate
>
> relocate: 0 (KASLR offset: 0 / 0MB)
>
> text hit rate: 66% (5171 of 7801)
>
> crash>
>
> I'm not sure if phys_base can be 0.
>
> Question: Are these values fine in order to read memory images by
> specifying --phys_base=0 after booting main machine with 'nokaslr' option ?
>
> Yes, but since phys_base defaults to 0,
> the --machdep argument wouldn't be necessary.
>
> Dave
>
>
>
> Thank you,
> Eshak
>
> On Wed, Feb 7, 2018 at 10:49 AM, Dave Anderson <anderson(a)redhat.com>
> wrote:
>
>>
>>
>> ----- Original Message -----
>> > Hi Dave,
>> >
>> > Thanks for the info.
>> > I've installed 7.2.0-1.fc28 and was able to run crash on live system.
>> >
>> > Unfortunately, KASLR is enabled.
>>
>> Yes, I'm afraid that is unfortunate. I don't know how you can determine
>> what the KASLR offset is, and without that, the dumpfile is pretty
>> much useless.
>>
>> The best thing you can do is to prepare for the *next* crash by stashing
>> the phys_offset and KASLR offset values. You also can boot the kernel
>> with
>> "nokaslr" on the boot command line.
>>
>> Dave
>>
>>
>>
>>
>> >
>> >
>> > text hit rate: 66% (5171 of 7801)
>> >
>> > help -m |grep phys_base
>> >
>> > phys_base: 10d000000
>> >
>> > text hit rate: 66% (5171 of 7801)
>> >
>> > help -k | grep relocate
>> >
>> > relocate: ffffffffe1000000 (KASLR offset: 1f000000 / 496MB)
>> >
>> > text hit rate: 66% (5171 of 7801)
>> > Is there any other info I can get from the vmem/vmss file like
>> processes
>> > running at the time or task blocked on I/O or anything ?
>> >
>> > Thank you,
>> > Eshak
>> >
>> > On Wed, Feb 7, 2018 at 6:28 AM, Dave Anderson < anderson(a)redhat.com >
>> wrote:
>> >
>> >
>> >
>> >
>> > ----- Original Message -----
>> > > That's fixed upstream. You'll have to download the crash
sources from
>> > > github
>> > > and build the latest and greatest.
>> >
>> > It's possible that you might be able to run the Fedora 28 rawhide
>> version
>> > here:
>> >
>> > Information for build crash-7.2.0-1.fc28
>> >
https://koji.fedoraproject.org/koji/buildinfo?buildID=978501
>> >
>> > That version has the fix for the init_level4_pgt issue. I'm not sure
>> > whether you may run into anything else.
>> >
>> > Dave
>> >
>> >
>> > >
>> > >
>> > >
>> > >
>> > > Sent from my Verizon, Samsung Galaxy smartphone
>> > >
>> > > -------- Original message --------
>> > > From: Eshak < tmdeshak(a)gmail.com >
>> > > Date: 2/6/18 9:27 PM (GMT-05:00)
>> > > To: "Discussion list for crash utility usage, maintenance and
>> development"
>> > > < crash-utility(a)redhat.com >
>> > > Subject: Re: [Crash-utility] linux_banner has garbage
>> > >
>> > > Hi Dave,
>> > >
>> > > I have /proc/kcore. But I'm getting 'cannot resolve
'init_level4_pgt'
>> > > error.
>> > >
>> > >
>> > >
>> > > [root@gt-Server2-gmt proc]# crash
>> > > /home/mfusion/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/
>> 4.14.11-coreos/vmlinux
>> > > /proc/kcore
>> > >
>> > >
>> > >
>> > >
>> > > crash 7.1.9-3.fc27
>> > >
>> > > Copyright (C) 2002-2016 Red Hat, Inc.
>> > >
>> > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>> > >
>> > > Copyright (C) 1999-2006 Hewlett-Packard Co
>> > >
>> > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>> > >
>> > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>> > >
>> > > Copyright (C) 2005, 2011 NEC Corporation
>> > >
>> > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>> > >
>> > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>> > >
>> > > This program is free software, covered by the GNU General Public
>> License,
>> > >
>> > > and you are welcome to change it and/or distribute copies of it under
>> > >
>> > > certain conditions. Enter "help copying" to see the
conditions.
>> > >
>> > > This program has absolutely no warranty. Enter "help
warranty" for
>> details.
>> > >
>> > >
>> > >
>> > > crash: /dev/tty: No such device or address
>> > >
>> > > NOTE: stdin: not a tty
>> > >
>> > >
>> > >
>> > >
>> > > GNU gdb (GDB) 7.6
>> > >
>> > > Copyright (C) 2013 Free Software Foundation, Inc.
>> > >
>> > > License GPLv3+: GNU GPL version 3 or later <
>> > >
http://gnu.org/licenses/gpl.html
>> > > >
>> > >
>> > > This is free software: you are free to change and redistribute it.
>> > >
>> > > There is NO WARRANTY, to the extent permitted by law. Type "show
>> copying"
>> > >
>> > > and "show warranty" for details.
>> > >
>> > > This GDB was configured as "x86_64-unknown-linux-gnu"...
>> > >
>> > >
>> > >
>> > >
>> > > WARNING: kernel relocated [496MB]: patching 69420 gdb minimal_symbol
>> values
>> > >
>> > >
>> > >
>> > >
>> > > crash: cannot resolve "init_level4_pgt"
>> > >
>> > >
>> > >
>> > >
>> > > [root@gt-Server2-gmt proc]#
>> > > But I believe this is fixed in crash 7.2. I have raised one issue
>> against
>> > > CoreOS to make crash 7.2 to be available in toolbox packages(
>> > >
https://github.com/coreos/bugs/issues/2347 ).
>> > >
>> > > Meanwhile, Is there any workaround for this ?
>> > >
>> > > -Eshak
>> > >
>> > > On Tue, Feb 6, 2018 at 6:02 PM, anderson < anderson(a)prospeed.net
>
>> wrote:
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > To run live, you need either /dev/mem, /proc/kcore, or the /dev/crash
>> > > driver.
>> > > You could try "crash vmlinux /proc/kcore" to see if it's
available.
>> If not,
>> > > you could try building the /dev/crash driver module. But I don't
>> know if
>> > > CoreOS offers a kernel-devel package that you could build the driver
>> > > against? The driver source comes with the crash source package in the
>> > > memory_driver subdirectory.
>> > >
>> > > Dave
>> > >
>> > >
>> > > Sent from my Verizon, Samsung Galaxy smartphone
>> > >
>> > > -------- Original message --------
>> > > From: Eshak < tmdeshak(a)gmail.com >
>> > > Date: 2/6/18 8:35 PM (GMT-05:00)
>> > > To: "Discussion list for crash utility usage, maintenance and
>> development"
>> > > <
>> > > crash-utility(a)redhat.com >
>> > > Cc: hfu < hfu(a)vmware.com >
>> > > Subject: Re: [Crash-utility] linux_banner has garbage
>> > >
>> > > Hi Dave,
>> > >
>> > > When trying to run crash live, I'm getting an error saying that
>> /dev/mem is
>> > > not available.
>> > > I'm running crash from toolbox in a CoreOS VM. Is crash designed
to
>> run
>> > > from
>> > > a container ?
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > [root@gt-Server2-gmt ~]# crash -d8
>> > > /home/user/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/4.1
>> 4.11-coreos/vmlinux
>> > >
>> > >
>> > >
>> > >
>> > > crash 7.1.9-3.fc27
>> > >
>> > > Copyright (C) 2002-2016 Red Hat, Inc.
>> > >
>> > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>> > >
>> > > Copyright (C) 1999-2006 Hewlett-Packard Co
>> > >
>> > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>> > >
>> > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>> > >
>> > > Copyright (C) 2005, 2011 NEC Corporation
>> > >
>> > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>> > >
>> > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>> > >
>> > > This program is free software, covered by the GNU General Public
>> License,
>> > >
>> > > and you are welcome to change it and/or distribute copies of it under
>> > >
>> > > certain conditions. Enter "help copying" to see the
conditions.
>> > >
>> > > This program has absolutely no warranty. Enter "help
warranty" for
>> details.
>> > >
>> > >
>> > >
>> > > get_live_memory_source: /dev/mem
>> > >
>> > >
>> > >
>> > >
>> > > crash: /dev/mem: No such file or directory
>> > >
>> > >
>> > >
>> > >
>> > > [root@gt-Server2-gmt ~]#
>> > >
>> > > Thank you,
>> > > Eshak
>> > >
>> > > On Tue, Feb 6, 2018 at 3:05 PM, Eshak < tmdeshak(a)gmail.com >
wrote:
>> > >
>> > >
>> > >
>> > > Thanks for the info Dave.
>> > > Unfortunately, I cannot run crash live on the machine because the VM
>> is in
>> > > hung state right now. After resetting the VM(by tomorrow), will
>> check for
>> > > KASLR and phys_base and try the suggested option.
>> > >
>> > > The complete output of crash is below:
>> > >
>> > >
>> > > [root@gt-Server2-gmt user]# crash -d8
>> > > /home/mfusion/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/
>> 4.14.11-coreos/vmlinux
>> > > /home/mfusion/vmem_vmss_jan26/usr/lib/modules/4.14.11-coreos
>> /build/System.map
>> > > /home/mfusion/vmem_vmss_jan26/gt-Server2-gmt-612746ca.vmss
>> > >
>> > > crash 7.1.9-3.fc27
>> > > Copyright (C) 2002-2016 Red Hat, Inc.
>> > > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
>> > > Copyright (C) 1999-2006 Hewlett-Packard Co
>> > > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
>> > > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
>> > > Copyright (C) 2005, 2011 NEC Corporation
>> > > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
>> > > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
>> > > This program is free software, covered by the GNU General Public
>> License,
>> > > and you are welcome to change it and/or distribute copies of it under
>> > > certain conditions. Enter "help copying" to see the
conditions.
>> > > This program has absolutely no warranty. Enter "help
warranty" for
>> details.
>> > >
>> > > crash: diskdump / compressed kdump: dump does not have panic dump
>> header
>> > > crash: sadump: read dump device as media format
>> > > crash: sadump: does not have partition header
>> > > vmw: Header: id=bed2bed2 version=8 numgroups=95
>> > > vmw: Checkpoint is 64-bit
>> > > vmw: Group: Checkpoint offset=0x1dbc size=0x0x3ab.
>> > > vmw: Group: GuestVars offset=0x2167 size=0x0xa3.
>> > > vmw: Group: cpuid offset=0x220a size=0x0x5e0e.
>> > > vmw: Group: cpu offset=0x8018 size=0x0x615bb.
>> > > vmw: Group: BusMemSample offset=0x695d3 size=0x0x1c.
>> > > vmw: Group: UUIDVMX offset=0x695ef size=0x0x2e.
>> > > vmw: Group: StateLogger offset=0x6961d size=0x0x2.
>> > > vmw: Group: memory offset=0x6961f size=0x0xa8.
>> > > vmw: Item align_mask[0][0] => position=0x69633 size=0x4: 0000FFFF
>> > > vmw: Item regionsCount => position=0x69645 size=0x4: 00000002
>> > > vmw: Item regionPageNum[0] => position=0x6965c size=0x4: 00000000
>> > > vmw: Item regionPPN[0] => position=0x6966f size=0x4: 00000000
>> > > vmw: Item regionSize[0] => position=0x69683 size=0x4: 000C0000
>> > > vmw: Item regionPageNum[1] => position=0x6969a size=0x4: 000C0000
>> > > vmw: Item regionPPN[1] => position=0x696ad size=0x4: 00100000
>> > > vmw: Item regionSize[1] => position=0x696c1 size=0x4: 00E40000
>> > > vmw: Group: MStats offset=0x696c7 size=0x0x1936.
>> > > vmw: Group: Snapshot offset=0x6affd size=0x0x4b9c.
>> > > vmw: Group: pic offset=0x6fb99 size=0x0x511.
>> > > vmw: Group: FTCpt offset=0x700aa size=0x0x2.
>> > > vmw: Group: ide1:0 offset=0x700ac size=0x0x16e.
>> > > vmw: Group: scsi0:0 offset=0x7021a size=0x0x46.
>> > > vmw: Group: Migrate offset=0x70260 size=0x0x2.
>> > > vmw: Group: TimeTracker offset=0x70262 size=0x0x99.
>> > > vmw: Group: Backdoor offset=0x702fb size=0x0x2e.
>> > > vmw: Group: PCI offset=0x70329 size=0x0x13.
>> > > vmw: Group: Cs440bx offset=0x7033c size=0x0x40539.
>> > > vmw: Group: ExtCfgDevice offset=0xb0875 size=0x0x30.
>> > > vmw: Group: Floppy offset=0xb08a5 size=0x0x918c.
>> > > vmw: Group: AcpiNotify offset=0xb9a31 size=0x0x1b.
>> > > vmw: Group: vcpuHotPlug offset=0xb9a4c size=0x0xf5.
>> > > vmw: Group: devHP offset=0xb9b41 size=0x0x86.
>> > > vmw: Group: ACPIWake offset=0xb9bc7 size=0x0x1b.
>> > > vmw: Group: DevicesPowerOn offset=0xb9be2 size=0x0x2.
>> > > vmw: Group: PCIBridge0 offset=0xb9be4 size=0x0x272.
>> > > vmw: Group: PCIBridge4 offset=0xb9e56 size=0x0x48e.
>> > > vmw: Group: pciBridge4:1 offset=0xba2e4 size=0x0x48e.
>> > > vmw: Group: pciBridge4:2 offset=0xba772 size=0x0x48e.
>> > > vmw: Group: pciBridge4:3 offset=0xbac00 size=0x0x48e.
>> > > vmw: Group: pciBridge4:4 offset=0xbb08e size=0x0x48e.
>> > > vmw: Group: pciBridge4:5 offset=0xbb51c size=0x0x48e.
>> > > vmw: Group: pciBridge4:6 offset=0xbb9aa size=0x0x48e.
>> > > vmw: Group: pciBridge4:7 offset=0xbbe38 size=0x0x48e.
>> > > vmw: Group: PCIBridge5 offset=0xbc2c6 size=0x0x48e.
>> > > vmw: Group: pciBridge5:1 offset=0xbc754 size=0x0x48e.
>> > > vmw: Group: pciBridge5:2 offset=0xbcbe2 size=0x0x48e.
>> > > vmw: Group: pciBridge5:3 offset=0xbd070 size=0x0x48e.
>> > > vmw: Group: pciBridge5:4 offset=0xbd4fe size=0x0x48e.
>> > > vmw: Group: pciBridge5:5 offset=0xbd98c size=0x0x48e.
>> > > vmw: Group: pciBridge5:6 offset=0xbde1a size=0x0x48e.
>> > > vmw: Group: pciBridge5:7 offset=0xbe2a8 size=0x0x48e.
>> > > vmw: Group: PCIBridge6 offset=0xbe736 size=0x0x48e.
>> > > vmw: Group: pciBridge6:1 offset=0xbebc4 size=0x0x48e.
>> > > vmw: Group: pciBridge6:2 offset=0xbf052 size=0x0x48e.
>> > > vmw: Group: pciBridge6:3 offset=0xbf4e0 size=0x0x48e.
>> > > vmw: Group: pciBridge6:4 offset=0xbf96e size=0x0x48e.
>> > > vmw: Group: pciBridge6:5 offset=0xbfdfc size=0x0x48e.
>> > > vmw: Group: pciBridge6:6 offset=0xc028a size=0x0x48e.
>> > > vmw: Group: pciBridge6:7 offset=0xc0718 size=0x0x48e.
>> > > vmw: Group: PCIBridge7 offset=0xc0ba6 size=0x0x48e.
>> &
>>
>
> --
> Crash-utility mailing list
> Crash-utility(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/crash-utility
>