Re: [Crash-utility] Can't read stack contents from qemu dump

Wednesday, 4 April 2018

----- Original Message -----
...
 Hello,

 I tried running crash-head (HEAD: 5d172b230cf4) against today's linus'
 master on a dump obtained via dump-guest-memory in qemu. And I got the
 following when the image is loaded:

 please wait... (determining panic task)
 bt: read error: kernel virtual address: fffffe0000007000  type: "stack
 contents"

   KERNEL: vmlinux
     DUMPFILE: memory-verbatim.img
         CPUS: 1
         DATE: Wed Apr  4 16:36:47 2018
       UPTIME: 00:27:48
 LOAD AVERAGE: 31.11, 17.80, 10.43
        TASKS: 145
     NODENAME: ubuntu-virtual
      RELEASE: 4.16.0-rc7-nbor
      VERSION: #570 SMP Wed Apr 4 16:03:44 EEST 2018
      MACHINE: x86_64  (3392 Mhz)
       MEMORY: 4 GB
        PANIC: ""
          PID: 0
      COMMAND: "swapper/0"
         TASK: ffffffff82016500  [THREAD_INFO: ffffffff82016500]
          CPU: 0
        STATE: TASK_RUNNING
      WARNING: panic task not found

 crash> bt
 PID: 0      TASK: ffffffff82016500  CPU: 0   COMMAND: "swapper/0"
  #0 [ffffffff82003dc8] __schedule at ffffffff817ea059
 bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop:
ffffffff82000000/ffffffff82002000 cpu: 0

 So the kernel has been compiled with : gcc (Ubuntu
 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 which has retpoline enabled.

 I have KASLR disabled: # CONFIG_RANDOMIZE_BASE is not set and the kernel
 is compiled with CONFIG_FRAME_POINTER=y .

 This scenario used to work around the 4.10 timeline. Am I doing
 something wrong or crash still needs time to work on the latest upstream
 kernel code? 
Presumably the latter. 

If you do a "task -R stack ffffffff82016500", I'm presuming that it
shows the stack base address is ffffffff82000000.  And the looking at
the stackbase/stacktop values, the crash utility is presuming an 8K stack:

 bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop:
ffffffff82000000/ffffffff82002000 cpu: 0

But the RSP is ffffffff82003dc8, which puts its beyond the 8K stack size, 
so I'm presuming that the kernel is actually using 16K stacks.  The most
recent kernel I have is 4.16.0-0.rc6.git3.1.fc29.x86_64, which uses 16K stacks.

Here is how the crash utility determines the stack size.  The x86_64 stacksize
starts out with a default size of 2 pages, as set here in x86_64_init(PRE_SYMTAB):

       case PRE_SYMTAB:
		... [ cut ] ...
                machdep->stacksize = machdep->pagesize * 2;
                ...

Then later on in task_init(), it gets resized as shown here, where 
the STACKSIZE() macro is machdep->stacksize:

        if (VALID_SIZE(task_union) && (SIZE(task_union) != STACKSIZE())) {
                error(WARNING, "\nnon-standard stack size: %ld\n",
                        len = SIZE(task_union));
                machdep->stacksize = len;
        } else if (VALID_SIZE(thread_union) &&
                ((len = SIZE(thread_union)) != STACKSIZE()))
                machdep->stacksize = len;

The "task_union" no longer exists, and so it checks whether the
"thread_union" is larger than the default stacksize, and resets the
size appropriately.  

On my 4.16.0-0.rc6.git3.1.fc29.x86_64 kernel, here is the thread_union:

  crash> thread_union
  union thread_union {
      struct task_struct task;
      unsigned long stack[2048];
  }
  SIZE: 16384

And so it gets reset:

  crash> help -m | grep stacksize
            stacksize: 16384
  crash>

You can debug it from there.  Let me know what you find.

Thanks,
  Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Can't read stack contents from qemu dump