[PATCH v6 0/5] Improve stack unwind on ppc64
by Aditya Gupta
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
This way other features of 'gdb', such as seeing
backtraces/registers/variables/arguments/local variables, moving up and
down stack frames, can be used with any ppc64 vmcore, irrespective of
being ELF format or kdump-compressed format.
Note: This doesn't support live debugging on ppc64, since registers are not
available to be read
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame' command
As mentioned in patch #2, since frame will not be prohibited, so it will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC, it will be able to give all this information.
Testing:
========
Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v6
To test various gdb passthroughs:
(crash) set
(crash) set gdb on
gdb> thread
gdb> bt
gdb> info threads
gdb> info threads
gdb> info locals
gdb> info variables irq_rover_lock
gdb> info args
gdb> thread 2
gdb> set gdb off
(crash) set
(crash) set -c 6
(crash) gdb thread
(crash) bt
(crash) gdb bt
(crash) frame
(crash) up
(crash) down
(crash) info locals
Known Issues:
=============
1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been merged upstream:
This can also cause some 'invalid kernel virtual address' errors during gdb
unwinding the stack registers
Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
Fixing GDB passthroughs on other architectures
==============================================
Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
thread', 'gdb info locals' etc. has been done by the patches introducing
'machdep->get_cpu_reg' and this series fixing some issues in that.
Other architectures should be able to fix these gdb functionalities by
simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
The reasoning behind that has been explained with a diagram in commit
description of patch #1
I will assist with my findings/observations fixing it on ppc64 whenever needed.
Changelog:
==========
V6:
+ changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
to be thread 1
V5:
+ changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
code
+ changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
remove unneeded if condition
+ changes in patch #5: implement refresh regcache on per thread, instead of all
threads at once
V4:
+ fix segmentation fault in live debugging (change in patch #1)
+ mention live debugging not supported in cover letter and patch #1
+ fixed some checkpatch warnings (change in patch #5)
V3:
+ default gdb thread will be the crashing thread, instead of being
thread '0'
+ synchronise crash cpu and gdb thread context
+ fix bug in gdb_interface, that replaced gdb's output stream, losing
output in some cases, such as info threads and extra output in info
variables
+ fix 'info threads'
RFC V2:
- removed patch implementing 'frame', 'up', 'down' in crash
- updated the cover letter by removing the mention of those commands other
than the respective gdb passthrough
Aditya Gupta (5):
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
synchronise cpu context changes between crash/gdb
fix gdb_interface: restore gdb's output streams at end of
gdb_interface
fix 'info threads' command
crash_target.c | 44 ++++++++++++++++
defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
gdb_interface.c | 2 +-
kernel.c | 47 +++++++++++++++--
ppc64.c | 95 +++++++++++++++++++++++++++++++++--
task.c | 14 ++++++
tools.c | 2 +-
8 files changed, 434 insertions(+), 10 deletions(-)
--
2.41.0
10 months, 1 week
Re: Google Container OS and crash 8.0.4
by HAGIO KAZUHITO(萩尾 一仁)
On 2024/01/15 22:37, Matt Suiche wrote:
> Is there an update available for this?
No.
I saw a kernel patch that removes module_load_offset [1] and will affect
the crash-utility, so I was thinking that it would be better to address
the issue together when it comes..
[1]
https://lore.kernel.org/linux-mm/20230918072955.2507221-5-rppt@kernel.org/
Thanks,
Kazu
>
> Thanks,
>
> From: Matt Suiche <matt.suiche(a)magnetforensics.com>
> Date: Wednesday, November 29, 2023 at 3:26 PM
> To: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>, devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
> Subject: Re: [Crash-utility] Google Container OS and crash 8.0.4
> Apparently, CONFIG_KALLSYMS_ALL is not set in COS kernel
>
> Sent from my mobile device.
> ________________________________
> From: Matt Suiche <matt.suiche(a)magnetforensics.com>
> Sent: Wednesday, November 29, 2023 4:40:55 PM
> To: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>; devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
> Subject: Re: [Crash-utility] Google Container OS and crash 8.0.4
>
> Yes, it would probably make more sense. You can also probably use _stext instead of module_load_offset too to compare the values as an assertion check.
>
> Sent from my mobile device.
> ________________________________
> From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
> Sent: Wednesday, November 29, 2023 4:29 AM
> To: Matt Suiche <matt.suiche(a)magnetforensics.com>; devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
> Subject: Re: [Crash-utility] Google Container OS and crash 8.0.4
>
> On 2023/11/22 18:04, Matt Suiche wrote:
>> Sounds like this is the issue. Module_load_offset is not present, same
>> with init_task though.
>>
>> root@instance-2:~# grep -e _stext -e module_load_offset -e init_task
>> /proc/kallsyms
>> ffffffff89000000 T _stext
>> ffffffff8909e280 t ptrace_init_task
>> ffffffff891c6af0 T ftrace_graph_init_task
>> ffffffff89245ea0 T perf_event_init_task
>> ffffffff8aba3b46 T rcu_init_tasks_generic
>> root@instance-2:~#
>
> Yes, but I don't see the reason why it's not present in /proc/kallsyms,
> although it's present in the vmlinux..
>
> Recent kernels have vmcoreinfo in /proc/kcore, maybe we can use the
> KERNELOFFSET value instead of the module_load_offset symbol to determine
> whether KASLR is enabled. I might try it when I have time.
>
> Thanks,
> Kazu
>
>>
>> *From: *HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
>> *Date: *Wednesday, November 22, 2023 at 12:01 PM
>> *To: *Matt Suiche <matt.suiche(a)magnetforensics.com>,
>> devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
>> *Subject: *EXTERNAL SENDER Re: [Crash-utility] Google Container OS and
>> crash 8.0.4
>>
>> On 2023/11/22 15:41, Matt Suiche wrote:
>>> Good point, enough the –kaslr=auto option worked well. Same when I passed --kaslr=0x8000000
>>
>> Good news.
>>
>> apparently module_load_offset symbol is needed in /proc/kallsyms to
>> enable the KASLR detection. I see it in the vmlinux.
>>
>> $ nm vmlinux-cos-5.15.133+ | grep module_load_offset
>> ffffffff82d83350 b module_load_offset
>>
>> Is it (and _stext) found in /proc/kallsyms? like
>>
>> # grep -e _stext -e module_load_offset /proc/kallsyms
>> ffffffffa0e00000 T _stext
>> ffffffffa3aafab8 b module_load_offset
>>
>>
>> PS. I will be out for the rest of this week, back next week.
>>
>> Thanks,
>> Kazu
>>
>> This email including any attachments may contain confidential material
>> for the sole use of the intended recipient. If you are not the intended
>> recipient please immediately notify the sender by reply email,
>> permanently delete this message and do not forward it or any part of it
>> to anyone else.
>>
>
> This email including any attachments may contain confidential material for the sole use of the intended recipient. If you are not the intended recipient please immediately notify the sender by reply email, permanently delete this message and do not forward it or any part of it to anyone else.
>
>
> Is there an update available for this?
>
> Thanks,
>
> *From: *Matt Suiche <matt.suiche(a)magnetforensics.com>
> *Date: *Wednesday, November 29, 2023 at 3:26 PM
> *To: *HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>,
> devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
> *Subject: *Re: [Crash-utility] Google Container OS and crash 8.0.4
>
> Apparently, CONFIG_KALLSYMS_ALL is not set in COS kernel
>
> Sent from my mobile device.
>
> ------------------------------------------------------------------------
>
> *From:*Matt Suiche <matt.suiche(a)magnetforensics.com>
> *Sent:* Wednesday, November 29, 2023 4:40:55 PM
> *To:* HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>;
> devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
> *Subject:* Re: [Crash-utility] Google Container OS and crash 8.0.4
>
> Yes, it would probably make more sense. You can also probably use _stext
> instead of module_load_offset too to compare the values as an assertion
> check.
>
> Sent from my mobile device.
>
> ------------------------------------------------------------------------
>
> *From:*HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
> *Sent:* Wednesday, November 29, 2023 4:29 AM
> *To:* Matt Suiche <matt.suiche(a)magnetforensics.com>;
> devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
> *Subject:* Re: [Crash-utility] Google Container OS and crash 8.0.4
>
> On 2023/11/22 18:04, Matt Suiche wrote:
>> Sounds like this is the issue. Module_load_offset is not present, same
>> with init_task though.
>>
>> root@instance-2:~# grep -e _stext -e module_load_offset -e init_task
>> /proc/kallsyms
>> ffffffff89000000 T _stext
>> ffffffff8909e280 t ptrace_init_task
>> ffffffff891c6af0 T ftrace_graph_init_task
>> ffffffff89245ea0 T perf_event_init_task
>> ffffffff8aba3b46 T rcu_init_tasks_generic
>> root@instance-2:~#
>
> Yes, but I don't see the reason why it's not present in /proc/kallsyms,
> although it's present in the vmlinux..
>
> Recent kernels have vmcoreinfo in /proc/kcore, maybe we can use the
> KERNELOFFSET value instead of the module_load_offset symbol to determine
> whether KASLR is enabled. I might try it when I have time.
>
> Thanks,
> Kazu
>
>>
>> *From: *HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
>> *Date: *Wednesday, November 22, 2023 at 12:01 PM
>> *To: *Matt Suiche <matt.suiche(a)magnetforensics.com>,
>> devel(a)lists.crash-utility.osci.io <devel(a)lists.crash-utility.osci.io>
>> *Subject: *EXTERNAL SENDER Re: [Crash-utility] Google Container OS and
>> crash 8.0.4
>>
>> On 2023/11/22 15:41, Matt Suiche wrote:
>>> Good point, enough the –kaslr=auto option worked well. Same when I passed --kaslr=0x8000000
>>
>> Good news.
>>
>> apparently module_load_offset symbol is needed in /proc/kallsyms to
>> enable the KASLR detection. I see it in the vmlinux.
>>
>> $ nm vmlinux-cos-5.15.133+ | grep module_load_offset
>> ffffffff82d83350 b module_load_offset
>>
>> Is it (and _stext) found in /proc/kallsyms? like
>>
>> # grep -e _stext -e module_load_offset /proc/kallsyms
>> ffffffffa0e00000 T _stext
>> ffffffffa3aafab8 b module_load_offset
>>
>>
>> PS. I will be out for the rest of this week, back next week.
>>
>> Thanks,
>> Kazu
>>
>> This email including any attachments may contain confidential material
>> for the sole use of the intended recipient. If you are not the intended
>> recipient please immediately notify the sender by reply email,
>> permanently delete this message and do not forward it or any part of it
>> to anyone else.
>>
>
> This email including any attachments may contain confidential material
> for the sole use of the intended recipient. If you are not the intended
> recipient please immediately notify the sender by reply email,
> permanently delete this message and do not forward it or any part of it
> to anyone else.
>
>
10 months, 1 week
[PATCH v8 0/5] Improve stack unwind on ppc64
by Aditya Gupta
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
This way other features of 'gdb', such as seeing
backtraces/registers/variables/arguments/local variables, moving up and
down stack frames, can be used with any ppc64 vmcore, irrespective of
being ELF format or kdump-compressed format.
Note: This doesn't support live debugging on ppc64, since registers are not
available to be read
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame' command
As mentioned in patch #2, since frame will not be prohibited, so it will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC, it will be able to give all this information.
Testing:
========
Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v8
To test various gdb passthroughs:
(crash) set
(crash) set gdb on
gdb> thread
gdb> bt
gdb> info threads
gdb> info threads
gdb> info locals
gdb> info variables irq_rover_lock
gdb> info args
gdb> thread 2
gdb> set gdb off
(crash) set
(crash) set -c 6
(crash) gdb thread
(crash) bt
(crash) gdb bt
(crash) frame
(crash) gdb up
(crash) gdb down
(crash) info locals
Known Issues:
=============
1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been merged upstream:
This can also cause some 'invalid kernel virtual address' errors during gdb
unwinding the stack registers
Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
Fixing GDB passthroughs on other architectures
==============================================
Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
thread', 'gdb info locals' etc. has been done by the patches introducing
'machdep->get_cpu_reg' and this series fixing some issues in that.
Other architectures should be able to fix these gdb functionalities by
simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
The reasoning behind that has been explained with a diagram in commit
description of patch #1
I will assist with my findings/observations fixing it on ppc64 whenever needed.
Changelog:
==========
V8:
+ use get_active_task instead of depending on CURRENT_CONTEXT in ppc64_get_cpu_reg
+ rebase to upstream/master (5977936c0a91)
V7:
+ move changes in gdb-10.2.patch to the end (minor change in patch #3,4,5)
+ fix a memory leak in ppc64_get_cpu_reg (minor change in patch #1)
+ use ascii diagram in patch #1 description
V6:
+ changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
to be thread 1
V5:
+ changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
code
+ changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
remove unneeded if condition
+ changes in patch #5: implement refresh regcache on per thread, instead of all
threads at once
V4:
+ fix segmentation fault in live debugging (change in patch #1)
+ mention live debugging not supported in cover letter and patch #1
+ fixed some checkpatch warnings (change in patch #5)
V3:
+ default gdb thread will be the crashing thread, instead of being
thread '0'
+ synchronise crash cpu and gdb thread context
+ fix bug in gdb_interface, that replaced gdb's output stream, losing
output in some cases, such as info threads and extra output in info
variables
+ fix 'info threads'
RFC V2:
- removed patch implementing 'frame', 'up', 'down' in crash
- updated the cover letter by removing the mention of those commands other
than the respective gdb passthrough
Aditya Gupta (5):
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
synchronise cpu context changes between crash/gdb
fix gdb_interface: restore gdb's output streams at end of
gdb_interface
fix 'info threads' command
crash_target.c | 44 ++++++++++++++++
defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
gdb_interface.c | 2 +-
kernel.c | 47 +++++++++++++++--
ppc64.c | 95 +++++++++++++++++++++++++++++++++--
task.c | 14 ++++++
tools.c | 2 +-
8 files changed, 434 insertions(+), 10 deletions(-)
--
2.41.0
10 months, 3 weeks
[PATCH v8 0/5] Improve stack unwind on ppc64
by Aditya Gupta
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
This way other features of 'gdb', such as seeing
backtraces/registers/variables/arguments/local variables, moving up and
down stack frames, can be used with any ppc64 vmcore, irrespective of
being ELF format or kdump-compressed format.
Note: This doesn't support live debugging on ppc64, since registers are not
available to be read
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame' command
As mentioned in patch #2, since frame will not be prohibited, so it will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC, it will be able to give all this information.
Testing:
========
Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v8
To test various gdb passthroughs:
(crash) set
(crash) set gdb on
gdb> thread
gdb> bt
gdb> info threads
gdb> info threads
gdb> info locals
gdb> info variables irq_rover_lock
gdb> info args
gdb> thread 2
gdb> set gdb off
(crash) set
(crash) set -c 6
(crash) gdb thread
(crash) bt
(crash) gdb bt
(crash) frame
(crash) gdb up
(crash) gdb down
(crash) info locals
Known Issues:
=============
1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been merged upstream:
This can also cause some 'invalid kernel virtual address' errors during gdb
unwinding the stack registers
Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
Fixing GDB passthroughs on other architectures
==============================================
Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
thread', 'gdb info locals' etc. has been done by the patches introducing
'machdep->get_cpu_reg' and this series fixing some issues in that.
Other architectures should be able to fix these gdb functionalities by
simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
The reasoning behind that has been explained with a diagram in commit
description of patch #1
I will assist with my findings/observations fixing it on ppc64 whenever needed.
Changelog:
==========
V8:
+ use get_active_task instead of depending on CURRENT_CONTEXT in ppc64_get_cpu_reg
V7:
+ move changes in gdb-10.2.patch to the end (minor change in patch #3,4,5)
+ fix a memory leak in ppc64_get_cpu_reg (minor change in patch #1)
+ use ascii diagram in patch #1 description
V6:
+ changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
to be thread 1
V5:
+ changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
code
+ changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
remove unneeded if condition
+ changes in patch #5: implement refresh regcache on per thread, instead of all
threads at once
V4:
+ fix segmentation fault in live debugging (change in patch #1)
+ mention live debugging not supported in cover letter and patch #1
+ fixed some checkpatch warnings (change in patch #5)
V3:
+ default gdb thread will be the crashing thread, instead of being
thread '0'
+ synchronise crash cpu and gdb thread context
+ fix bug in gdb_interface, that replaced gdb's output stream, losing
output in some cases, such as info threads and extra output in info
variables
+ fix 'info threads'
RFC V2:
- removed patch implementing 'frame', 'up', 'down' in crash
- updated the cover letter by removing the mention of those commands other
than the respective gdb passthrough
Aditya Gupta (5):
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
synchronise cpu context changes between crash/gdb
fix gdb_interface: restore gdb's output streams at end of
gdb_interface
fix 'info threads' command
crash_target.c | 44 ++++++++++++++++
defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
gdb_interface.c | 2 +-
kernel.c | 47 +++++++++++++++--
ppc64.c | 95 +++++++++++++++++++++++++++++++++--
task.c | 14 ++++++
tools.c | 2 +-
8 files changed, 434 insertions(+), 10 deletions(-)
--
2.41.0
10 months, 3 weeks
[PATCH v7 0/5] Improve stack unwind on ppc64
by Aditya Gupta
The Problem:
============
Currently crash is unable to show function arguments and local variables, as
gdb can do. And functionality for moving between frames ('up'/'down') is not
working in crash.
Crash has 'gdb passthroughs' for things gdb can do, but the gdb passthroughs
'bt', 'frame', 'info locals', 'up', 'down' are not working either, due to
gdb not getting the register values from `crash_target::fetch_registers`,
which then uses `machdep->get_cpu_reg`, which is not implemented for PPC64
Proposed Solution:
==================
Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for PPC64.
This way, "gdb mode in crash" will support this feature for both ELF and
kdump-compressed vmcore formats, while "gdb" would only have supported ELF
format
This way other features of 'gdb', such as seeing
backtraces/registers/variables/arguments/local variables, moving up and
down stack frames, can be used with any ppc64 vmcore, irrespective of
being ELF format or kdump-compressed format.
Note: This doesn't support live debugging on ppc64, since registers are not
available to be read
Implications on Architectures:
====================================
No architecture other than PPC64 has been affected, other than in case of
'frame' command
As mentioned in patch #2, since frame will not be prohibited, so it will print:
crash> frame
#0 <unavailable> in ?? ()
Instead of before prohibited message:
crash> frame
crash: prohibited gdb command: frame
Major change will be in 'gdb mode' on PPC64, that it will print the frames, and
local variables, instead of failing with errors showing no frame, or showing
that couldn't get PC, it will be able to give all this information.
Testing:
========
Git tree with this patch series applied:
https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v7
To test various gdb passthroughs:
(crash) set
(crash) set gdb on
gdb> thread
gdb> bt
gdb> info threads
gdb> info threads
gdb> info locals
gdb> info variables irq_rover_lock
gdb> info args
gdb> thread 2
gdb> set gdb off
(crash) set
(crash) set -c 6
(crash) gdb thread
(crash) bt
(crash) gdb bt
(crash) frame
(crash) up
(crash) down
(crash) info locals
Known Issues:
=============
1. In gdb mode, 'bt' might fail to show backtrace in few vmcores collected
from older kernels. This is a known issue due to register mismatch, and
its fix has been merged upstream:
This can also cause some 'invalid kernel virtual address' errors during gdb
unwinding the stack registers
Commit: https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef78581...
Fixing GDB passthroughs on other architectures
==============================================
Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
thread', 'gdb info locals' etc. has been done by the patches introducing
'machdep->get_cpu_reg' and this series fixing some issues in that.
Other architectures should be able to fix these gdb functionalities by
simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
The reasoning behind that has been explained with a diagram in commit
description of patch #1
I will assist with my findings/observations fixing it on ppc64 whenever needed.
Changelog:
==========
V7:
+ move changes in gdb-10.2.patch to the end (minor change in patch #3,4,5)
+ fix a memory leak in ppc64_get_cpu_reg (minor change in patch #1)
+ use ascii diagram in patch #1 description
V6:
+ changes in patch #5: fix bug introduced in v5 that caused initial gdb thread
to be thread 1
V5:
+ changes in patch #1: made ppc64_get_cpu_reg static, and remove unreachable
code
+ changes in patch #3: fixed typo 'ppc64_renum' instead of 'ppc64_regnum',
remove unneeded if condition
+ changes in patch #5: implement refresh regcache on per thread, instead of all
threads at once
V4:
+ fix segmentation fault in live debugging (change in patch #1)
+ mention live debugging not supported in cover letter and patch #1
+ fixed some checkpatch warnings (change in patch #5)
V3:
+ default gdb thread will be the crashing thread, instead of being
thread '0'
+ synchronise crash cpu and gdb thread context
+ fix bug in gdb_interface, that replaced gdb's output stream, losing
output in some cases, such as info threads and extra output in info
variables
+ fix 'info threads'
RFC V2:
- removed patch implementing 'frame', 'up', 'down' in crash
- updated the cover letter by removing the mention of those commands other
than the respective gdb passthrough
Aditya Gupta (5):
ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
remove 'frame' from prohibited commands list
synchronise cpu context changes between crash/gdb
fix gdb_interface: restore gdb's output streams at end of
gdb_interface
fix 'info threads' command
crash_target.c | 44 ++++++++++++++++
defs.h | 130 +++++++++++++++++++++++++++++++++++++++++++++++-
gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
gdb_interface.c | 2 +-
kernel.c | 47 +++++++++++++++--
ppc64.c | 95 +++++++++++++++++++++++++++++++++--
task.c | 14 ++++++
tools.c | 2 +-
8 files changed, 434 insertions(+), 10 deletions(-)
--
2.41.0
10 months, 3 weeks
Read orc_header section to detect ORC version
by nilayvaish@google.com
Folks
The Linux kernel commit b9f174c811e3ae4ae8959dc57e6adb9990e913f4 (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit...) added an ELF section for the ORC version identifier. I think the crash utility should use this section to identify the ORC version in use.
I want to implement this feature in the crash utility. As I understand it, we would need to read the kernel binary first, find out whether it has .orc_header section and then read the section to figure out the version of the ORC format in use.
I do not have familiarity with the code for the crash utility. Is it possible for someone to advise on whether the above sounds reasonable? If yes, then how to divide the functionality among the files? If no, then what would a reasonable to have this functionality?
Thanks!
10 months, 3 weeks
Re: [PATCH v6 1/5] ppc64: correct gdb passthroughs by implementing machdep->get_cpu_reg
by Lianbo Jiang
Hi, Aditya
Thank you for the great job.
The v6 patch set looks good to me, and I did not see any new issues in
my tests.
So for the v6: Ack.
Thanks
Lianbo
On 1/5/24 15:31, devel-request(a)lists.crash-utility.osci.io wrote:
> Date: Fri, 5 Jan 2024 13:00:23 +0530
> From: Aditya Gupta<adityag(a)linux.ibm.com>
> Subject: [Crash-utility] [PATCH v6 1/5] ppc64: correct gdb
> passthroughs by implementing machdep->get_cpu_reg
> To:<devel(a)lists.crash-utility.osci.io>,<lijiang(a)redhat.com>,
> Tao Liu<ltao(a)redhat.com>
> Cc: Mahesh J Salgaonkar<mahesh(a)linux.ibm.com>, "Naveen N. Rao"
> <naveen.n.rao(a)linux.vnet.ibm.com>
> Message-ID:<20240105073027.378928-2-adityag(a)linux.ibm.com>
> Content-Type: text/plain; charset=UTF-8
>
> Currently, gdb passthroughs of 'bt', 'frame', 'up', 'down', 'info
> locals' don't work. This is due to gdb not knowing the register values to
> unwind the stack frames
>
> Every gdb passthrough goes through `gdb_interface`. And then, gdb expects
> `crash_target::fetch_registers` to give it the register values, which is
> dependent on `machdep->get_cpu_reg` to read the register values for
> specific architecture.
>
> ┌────────────────────────┐
> gdb passthrough (eg. "bt") │ │
> crash ─────────────────────────▶│ │
> │ gdb_interface │
> │ │
> │ │
> │ ┌────────────────────┐ │
> fetch_registers │ │ │ │
> crash_target◀────────────────────────┼─┤ gdb │ │
> ─────────────────────────┼▶│ │ │
> Registers (SP,NIP, etc.)│ │ │ │
> │ │ │ │
> │ └────────────────────┘ │
> └────────────────────────┘
>
> Implement `machdep->get_cpu_reg` on PPC64, so that crash provides the
> register values to gdb to unwind stack frames properly
>
> With these changes, on powerpc, 'bt' command output in gdb mode, will look
> like this:
>
> gdb> bt
> #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc00000000486f8d8) at ./arch/powerpc/include/asm/kexec.h:69
> #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974
> #2 0xc000000000168918 in panic (fmt=<optimized out>) at kernel/panic.c:358
> #3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155
> #4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602
> #5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163
> #6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>, pde=0xc000000009ed3a80) at fs/proc/inode.c:340
> #7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352
> #8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc00000009dda7d00, buf=buf@entry=0xebcfc7c6040 <error: Cannot access memory at address 0xebcfc7c6040>, count=count@entry=2, pos=pos@entry=0xc00000000486fda0) at fs/read_write.c:582
>
> instead of earlier output without this patch:
>
> gdb> bt
> #0 <unavailable> in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
> Also, 'get_dumpfile_regs' has been introduced to get registers from
> multiple supported vmcore formats. Correspondingly a flag 'BT_NO_PRINT_REGS'
> has been introduced to tell helper functions to get registers, to not
> print registers with every call to backtrace in gdb.
>
> Note: This feature to support GDB unwinding doesn't support live debugging
>
> Signed-off-by: Aditya Gupta<adityag(a)linux.ibm.com>
> ---
> defs.h | 122 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> kernel.c | 33 +++++++++++++++
> ppc64.c | 105 +++++++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 257 insertions(+), 3 deletions(-)
>
> diff --git a/defs.h b/defs.h
> index 5218a94fe4a4..615f3a37935a 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -6023,6 +6023,7 @@ int load_module_symbols_helper(char *);
> void unlink_module(struct load_module *);
> int check_specified_module_tree(char *, char *);
> int is_system_call(char *, ulong);
> +void get_dumpfile_regs(struct bt_info*, ulong*, ulong*);
> void generic_dump_irq(int);
> void generic_get_irq_affinity(int);
> void generic_show_interrupts(int, ulong *);
> @@ -6121,6 +6122,7 @@ ulong cpu_map_addr(const char *type);
> #define BT_REGS_NOT_FOUND (0x4000000000000ULL)
> #define BT_OVERFLOW_STACK (0x8000000000000ULL)
> #define BT_SKIP_IDLE (0x10000000000000ULL)
> +#define BT_NO_PRINT_REGS (0x20000000000000ULL)
> #define BT_SYMBOL_OFFSET (BT_SYMBOLIC_ARGS)
>
> #define BT_REF_HEXVAL (0x1)
> @@ -7854,4 +7856,124 @@ enum x86_64_regnum {
> LAST_REGNUM
> };
>
> +/*
> + * Register numbers to make crash_target->fetch_registers()
> + * ---> machdep->get_cpu_reg() work properly.
> + *
> + * These register numbers and names are given according to output of
> + * `rs6000_register_name`, because that is what was being used by
> + * crash_target::fetch_registers in case of PPC64
> + */
> +enum ppc64_regnum {
> + PPC64_R0_REGNUM = 0,
> + PPC64_R1_REGNUM,
> + PPC64_R2_REGNUM,
> + PPC64_R3_REGNUM,
> + PPC64_R4_REGNUM,
> + PPC64_R5_REGNUM,
> + PPC64_R6_REGNUM,
> + PPC64_R7_REGNUM,
> + PPC64_R8_REGNUM,
> + PPC64_R9_REGNUM,
> + PPC64_R10_REGNUM,
> + PPC64_R11_REGNUM,
> + PPC64_R12_REGNUM,
> + PPC64_R13_REGNUM,
> + PPC64_R14_REGNUM,
> + PPC64_R15_REGNUM,
> + PPC64_R16_REGNUM,
> + PPC64_R17_REGNUM,
> + PPC64_R18_REGNUM,
> + PPC64_R19_REGNUM,
> + PPC64_R20_REGNUM,
> + PPC64_R21_REGNUM,
> + PPC64_R22_REGNUM,
> + PPC64_R23_REGNUM,
> + PPC64_R24_REGNUM,
> + PPC64_R25_REGNUM,
> + PPC64_R26_REGNUM,
> + PPC64_R27_REGNUM,
> + PPC64_R28_REGNUM,
> + PPC64_R29_REGNUM,
> + PPC64_R30_REGNUM,
> + PPC64_R31_REGNUM,
> +
> + PPC64_F0_REGNUM = 32,
> + PPC64_F1_REGNUM,
> + PPC64_F2_REGNUM,
> + PPC64_F3_REGNUM,
> + PPC64_F4_REGNUM,
> + PPC64_F5_REGNUM,
> + PPC64_F6_REGNUM,
> + PPC64_F7_REGNUM,
> + PPC64_F8_REGNUM,
> + PPC64_F9_REGNUM,
> + PPC64_F10_REGNUM,
> + PPC64_F11_REGNUM,
> + PPC64_F12_REGNUM,
> + PPC64_F13_REGNUM,
> + PPC64_F14_REGNUM,
> + PPC64_F15_REGNUM,
> + PPC64_F16_REGNUM,
> + PPC64_F17_REGNUM,
> + PPC64_F18_REGNUM,
> + PPC64_F19_REGNUM,
> + PPC64_F20_REGNUM,
> + PPC64_F21_REGNUM,
> + PPC64_F22_REGNUM,
> + PPC64_F23_REGNUM,
> + PPC64_F24_REGNUM,
> + PPC64_F25_REGNUM,
> + PPC64_F26_REGNUM,
> + PPC64_F27_REGNUM,
> + PPC64_F28_REGNUM,
> + PPC64_F29_REGNUM,
> + PPC64_F30_REGNUM,
> + PPC64_F31_REGNUM,
> +
> + PPC64_PC_REGNUM = 64,
> + PPC64_MSR_REGNUM = 65,
> + PPC64_CR_REGNUM = 66,
> + PPC64_LR_REGNUM = 67,
> + PPC64_CTR_REGNUM = 68,
> + PPC64_XER_REGNUM = 69,
> + PPC64_FPSCR_REGNUM = 70,
> +
> + PPC64_VR0_REGNUM = 106,
> + PPC64_VR1_REGNUM,
> + PPC64_VR2_REGNUM,
> + PPC64_VR3_REGNUM,
> + PPC64_VR4_REGNUM,
> + PPC64_VR5_REGNUM,
> + PPC64_VR6_REGNUM,
> + PPC64_VR7_REGNUM,
> + PPC64_VR8_REGNUM,
> + PPC64_VR9_REGNUM,
> + PPC64_VR10_REGNUM,
> + PPC64_VR11_REGNUM,
> + PPC64_VR12_REGNUM,
> + PPC64_VR13_REGNUM,
> + PPC64_VR14_REGNUM,
> + PPC64_VR15_REGNUM,
> + PPC64_VR16_REGNUM,
> + PPC64_VR17_REGNUM,
> + PPC64_VR18_REGNUM,
> + PPC64_VR19_REGNUM,
> + PPC64_VR20_REGNUM,
> + PPC64_VR21_REGNUM,
> + PPC64_VR22_REGNUM,
> + PPC64_VR23_REGNUM,
> + PPC64_VR24_REGNUM,
> + PPC64_VR25_REGNUM,
> + PPC64_VR26_REGNUM,
> + PPC64_VR27_REGNUM,
> + PPC64_VR28_REGNUM,
> + PPC64_VR29_REGNUM,
> + PPC64_VR30_REGNUM,
> + PPC64_VR31_REGNUM,
> +
> + PPC64_VSCR_REGNUM = 138,
> + PPC64_VRSAVE_REGNU = 139
> +};
> +
> #endif /* !GDB_COMMON */
> diff --git a/kernel.c b/kernel.c
> index 6dcf414693e6..52b7ba09f390 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -3533,6 +3533,39 @@ get_lkcd_regs(struct bt_info *bt, ulong *eip, ulong *esp)
> machdep->get_stack_frame(bt, eip, esp);
> }
>
> +void
> +get_dumpfile_regs(struct bt_info *bt, ulong *eip, ulong *esp)
> +{
> + bt->flags |= BT_NO_PRINT_REGS;
> +
> + if (NETDUMP_DUMPFILE())
> + get_netdump_regs(bt, eip, esp);
> + else if (KDUMP_DUMPFILE())
> + get_kdump_regs(bt, eip, esp);
> + else if (DISKDUMP_DUMPFILE())
> + get_diskdump_regs(bt, eip, esp);
> + else if (KVMDUMP_DUMPFILE())
> + get_kvmdump_regs(bt, eip, esp);
> + else if (LKCD_DUMPFILE())
> + get_lkcd_regs(bt, eip, esp);
> + else if (XENDUMP_DUMPFILE())
> + get_xendump_regs(bt, eip, esp);
> + else if (SADUMP_DUMPFILE())
> + get_sadump_regs(bt, eip, esp);
> + else if (VMSS_DUMPFILE())
> + get_vmware_vmss_regs(bt, eip, esp);
> + else if (REMOTE_PAUSED()) {
> + if (!is_task_active(bt->task) || !get_remote_regs(bt, eip, esp))
> + machdep->get_stack_frame(bt, eip, esp);
> + } else
> + machdep->get_stack_frame(bt, eip, esp);
> +
> + bt->flags &= ~BT_NO_PRINT_REGS;
> +
> + bt->instptr = *eip;
> + bt->stkptr = *esp;
> +}
> +
>
> /*
> * Store the head of the kernel module list for future use.
> diff --git a/ppc64.c b/ppc64.c
> index e8930a139e0d..ea4821d86b9e 100644
> --- a/ppc64.c
> +++ b/ppc64.c
> @@ -55,6 +55,8 @@ static void ppc64_set_bt_emergency_stack(enum emergency_stack_type type,
> static char * ppc64_check_eframe(struct ppc64_pt_regs *);
> static void ppc64_print_eframe(char *, struct ppc64_pt_regs *,
> struct bt_info *);
> +static int ppc64_get_cpu_reg(int cpu, int regno, const char *name, int size,
> + void *value);
> static void parse_cmdline_args(void);
> static int ppc64_paca_percpu_offset_init(int);
> static void ppc64_init_cpu_info(void);
> @@ -704,6 +706,8 @@ ppc64_init(int when)
> error(FATAL, "cannot malloc hwirqstack buffer space.");
> }
>
> + machdep->get_cpu_reg = ppc64_get_cpu_reg;
> +
> ppc64_init_paca_info();
>
> if (!machdep->hz) {
> @@ -2501,6 +2505,99 @@ ppc64_print_eframe(char *efrm_str, struct ppc64_pt_regs *regs,
> ppc64_print_nip_lr(regs, 1);
> }
>
> +static int
> +ppc64_get_cpu_reg(int cpu, int regno, const char *name, int size,
> + void *value)
> +{
> + struct bt_info bt_info, bt_setup;
> + struct task_context *tc;
> + struct ppc64_pt_regs *pt_regs;
> + ulong ip, sp;
> +
> + if (LIVE()) {
> + /* doesn't support reading registers in live dump */
> + return FALSE;
> + }
> +
> + /* Currently only handling registers available in ppc64_pt_regs:
> + *
> + * 0-31: r0-r31
> + * 64: pc/nip
> + * 65: msr
> + *
> + * 67: lr
> + * 68: ctr
> + */
> + switch (regno) {
> + case PPC64_R0_REGNUM ... PPC64_R31_REGNUM:
> +
> + case PPC64_PC_REGNUM:
> + case PPC64_MSR_REGNUM:
> + case PPC64_LR_REGNUM:
> + case PPC64_CTR_REGNUM:
> + break;
> +
> + default:
> + // return false if we can't get that register
> + if (CRASHDEBUG(1))
> + error(WARNING, "unsupported register, regno=%d\n", regno);
> + return FALSE;
> + }
> +
> + tc = CURRENT_CONTEXT();
> + BZERO(&bt_setup, sizeof(struct bt_info));
> + clone_bt_info(&bt_setup, &bt_info, tc);
> + fill_stackbuf(&bt_info);
> +
> + // reusing the get_dumpfile_regs function to get pt regs structure
> + get_dumpfile_regs(&bt_info, &sp, &ip);
> + pt_regs = (struct ppc64_pt_regs *)bt_info.machdep;
> +
> + if (!pt_regs) {
> + error(WARNING, "pt_regs not available for cpu %d\n", cpu);
> + return FALSE;
> + }
> +
> + switch (regno) {
> + case PPC64_R0_REGNUM ... PPC64_R31_REGNUM:
> + if (size != sizeof(pt_regs->gpr[regno]))
> + return FALSE; // size mismatch
> +
> + memcpy(value, &pt_regs->gpr[regno], size);
> + break;
> +
> + case PPC64_PC_REGNUM:
> + if (size != sizeof(pt_regs->nip))
> + return FALSE; // size mismatch
> +
> + memcpy(value, &pt_regs->nip, size);
> + break;
> +
> + case PPC64_MSR_REGNUM:
> + if (size != sizeof(pt_regs->msr))
> + return FALSE; // size mismatch
> +
> + memcpy(value, &pt_regs->msr, size);
> + break;
> +
> + case PPC64_LR_REGNUM:
> + if (size != sizeof(pt_regs->link))
> + return FALSE; // size mismatch
> +
> + memcpy(value, &pt_regs->link, size);
> + break;
> +
> + case PPC64_CTR_REGNUM:
> + if (size != sizeof(pt_regs->ctr))
> + return FALSE; // size mismatch
> +
> + memcpy(value, &pt_regs->ctr, size);
> + break;
> + }
> +
> + return TRUE;
> +}
> +
> /*
> * For vmcore typically saved with KDump or FADump, get SP and IP values
> * from the saved ptregs.
> @@ -2613,9 +2710,11 @@ ppc64_get_dumpfile_stack_frame(struct bt_info *bt_in, ulong *nip, ulong *ksp)
> pt_regs = (struct ppc64_pt_regs *)bt->machdep;
> ur_nip = pt_regs->nip;
> ur_ksp = pt_regs->gpr[1];
> - /* Print the collected regs for panic task. */
> - ppc64_print_regs(pt_regs);
> - ppc64_print_nip_lr(pt_regs, 1);
> + if (!(bt->flags & BT_NO_PRINT_REGS)) {
> + /* Print the collected regs for panic task. */
> + ppc64_print_regs(pt_regs);
> + ppc64_print_nip_lr(pt_regs, 1);
> + }
> } else if ((pc->flags & KDUMP) ||
> ((pc->flags & DISKDUMP) &&
> (*diskdump_flags & KDUMP_CMPRS_LOCAL))) {
> -- 2.43.0
10 months, 3 weeks