Re: [Crash-utility] Can't read stack contents from qemu dump

Wednesday, 4 April 2018

On  4.04.2018 17:48, Dave Anderson wrote:
...

 ----- Original Message -----
> Hello,
>
> I tried running crash-head (HEAD: 5d172b230cf4) against today's linus'
> master on a dump obtained via dump-guest-memory in qemu. And I got the
> following when the image is loaded:
>
> please wait... (determining panic task)
> bt: read error: kernel virtual address: fffffe0000007000  type: "stack
> contents"
>
>   KERNEL: vmlinux
>     DUMPFILE: memory-verbatim.img
>         CPUS: 1
>         DATE: Wed Apr  4 16:36:47 2018
>       UPTIME: 00:27:48
> LOAD AVERAGE: 31.11, 17.80, 10.43
>        TASKS: 145
>     NODENAME: ubuntu-virtual
>      RELEASE: 4.16.0-rc7-nbor
>      VERSION: #570 SMP Wed Apr 4 16:03:44 EEST 2018
>      MACHINE: x86_64  (3392 Mhz)
>       MEMORY: 4 GB
>        PANIC: ""
>          PID: 0
>      COMMAND: "swapper/0"
>         TASK: ffffffff82016500  [THREAD_INFO: ffffffff82016500]
>          CPU: 0
>        STATE: TASK_RUNNING
>      WARNING: panic task not found
>
> crash> bt
> PID: 0      TASK: ffffffff82016500  CPU: 0   COMMAND: "swapper/0"
>  #0 [ffffffff82003dc8] __schedule at ffffffff817ea059
> bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop:
ffffffff82000000/ffffffff82002000 cpu: 0
>
>
> So the kernel has been compiled with : gcc (Ubuntu
> 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 which has retpoline enabled.
>
> I have KASLR disabled: # CONFIG_RANDOMIZE_BASE is not set and the kernel
> is compiled with CONFIG_FRAME_POINTER=y .
>
> This scenario used to work around the 4.10 timeline. Am I doing
> something wrong or crash still needs time to work on the latest upstream
> kernel code?

 Presumably the latter. 

 If you do a "task -R stack ffffffff82016500", I'm presuming that it
 shows the stack base address is ffffffff82000000.  And the looking at
 the stackbase/stacktop values, the crash utility is presuming an 8K stack:

  bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop:
ffffffff82000000/ffffffff82002000 cpu: 0

 But the RSP is ffffffff82003dc8, which puts its beyond the 8K stack size, 
 so I'm presuming that the kernel is actually using 16K stacks.  The most
 recent kernel I have is 4.16.0-0.rc6.git3.1.fc29.x86_64, which uses 16K stacks. 
This is correct, indeed the kernel size should be 16k. However...

...

 Here is how the crash utility determines the stack size.  The x86_64 stacksize
 starts out with a default size of 2 pages, as set here in x86_64_init(PRE_SYMTAB):

        case PRE_SYMTAB:
 		... [ cut ] ...
                 machdep->stacksize = machdep->pagesize * 2;
                 ...

 Then later on in task_init(), it gets resized as shown here, where 
 the STACKSIZE() macro is machdep->stacksize:

         if (VALID_SIZE(task_union) && (SIZE(task_union) != STACKSIZE())) {
                 error(WARNING, "\nnon-standard stack size: %ld\n",
                         len = SIZE(task_union));
                 machdep->stacksize = len;
         } else if (VALID_SIZE(thread_union) &&
                 ((len = SIZE(thread_union)) != STACKSIZE()))
                 machdep->stacksize = len; 
This is not resized at all, instead VALID_SIZE(thread_union) actually
fails, I've added the following else to the if statement there :

+       } else {
+               if (VALID_SIZE(thread_union)) {
+               error(WARNING, "WE ARE IN THE ELSE BRANCH: len: %llu
thread_union size: %llu STACKSIZE(): %llu\n",
+                     len, SIZE(thread_union), STACKSIZE());
+               } else {
+               error(WARNING, "thread_union is invalid\n");
+               }
+       }

Also doing:

crash> struct thread_union
struct: invalid data structure reference: thread_union

So for some reason the thread_union cannot be found by gdb:

help -o | grep thread_union
                  thread_union: -1

...

 The "task_union" no longer exists, and so it checks whether the
 "thread_union" is larger than the default stacksize, and resets the
 size appropriately.  

 On my 4.16.0-0.rc6.git3.1.fc29.x86_64 kernel, here is the thread_union:

   crash> thread_union
   union thread_union {
       struct task_struct task;
       unsigned long stack[2048];
   }
   SIZE: 16384

 And so it gets reset:

   crash> help -m | grep stacksize
             stacksize: 16384
   crash>

 You can debug it from there.  Let me know what you find.

 Thanks,
   Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] Can't read stack contents from qemu dump