Hi Jan,
Good catch. As far as a fix goes, it would be more efficient if
tgid_quick_search() just returns a NULL in that case. Try the
attached patch.
Thanks,
Dave
----- Original Message -----
Hi Dave
I have a vmcore file for ARM64 that crashes Crash during startup. The core
file is created at a hardware watchdog (I believe) so there is no panic
message or something similar in the log.
This is the printout from Crash running under gdb, after the copyrights and
config information:
please wait... (determining panic task)
Program received signal SIGSEGV, Segmentation fault.
0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114
4114 if (tgid == last->tgid) {
(gdb) bt
#0 0x000000000047ed40 in tgid_quick_search (tgid=5040) at memory.c:4114
#1 0x000000000047f046 in get_task_mem_usage (task=18446743799318107136,
tm=0x7fffffff6f40)
at memory.c:4186
#2 0x000000000047c679 in vm_area_dump (task=18446743799318107136, flag=10,
vaddr=0, ref=0x0)
at memory.c:3671
#3 0x000000000047ec08 in in_user_stack (task=18446743799318107136, vaddr=0)
at memory.c:4063
#4 0x00000000004fd9fe in arm64_get_dumpfile_stackframe (frame=<synthetic
pointer>,
bt=<optimized out>) at arm64.c:1077
#5 arm64_get_stack_frame (bt=0x7fffffffc690, pcp=0x7fffffff9560,
spp=0x7fffffff9568)
at arm64.c:1103
#6 0x00000000004de409 in back_trace (bt=0x7fffffffc690) at kernel.c:2533
#7 0x00000000004d1563 in foreach (fd=0x7fffffffc7c0) at task.c:6161
#8 0x00000000004d2bbd in panic_search () at task.c:6425
#9 0x00000000004d4454 in get_panic_context () at task.c:5364
#10 task_init () at task.c:491
#11 0x000000000046146e in main_loop () at main.c:801
#12 0x00000000006467a3 in captured_command_loop (data=<optimized out>) at
main.c:258
#13 0x000000000064535b in catch_errors (func=0x646790
<captured_command_loop>, func_args=0x0,
errstring=0x873235 "", mask=6) at exceptions.c:557
#14 0x0000000000647726 in captured_main (data=<optimized out>) at main.c:1064
#15 0x000000000064535b in catch_errors (func=0x646aa0 <captured_main>,
func_args=0x7fffffffe030,
errstring=0x873235 "", mask=6) at exceptions.c:557
#16 0x0000000000647a84 in gdb_main (args=<optimized out>) at main.c:1079
#17 0x0000000000647abe in gdb_main_entry (argc=<optimized out>,
argv=<optimized out>)
at main.c:1099
#18 0x000000000045f61f in main (argc=3, argv=0x7fffffffe188) at main.c:758
(gdb) p tt->last_tgid
$1 = (struct tgid_context *) 0x0
Source code for tgid_quick_search:
static struct tgid_context *
tgid_quick_search(ulong tgid)
{
struct tgid_context *last, *next;
tt->tgid_searches++;
last = tt->last_tgid;
if (tgid == last->tgid) {
tt->tgid_cache_hits++;
return last;
}
....
}
So 'last' becomes 0 which causes the crash.
After some more investigation I have seen that "tt->last_tgid" is
initialized
in function sort_tgid_array in task.c, but that function seems to be called
at a later stage.
By adding a line in tgid_quick_search:
static struct tgid_context *
tgid_quick_search(ulong tgid)
{
struct tgid_context *last, *next;
tt->tgid_searches++;
if (tt->last_tgid == 0) sort_tgid_array(); // added line
last = tt->last_tgid;
if (tgid == last->tgid) {
tt->tgid_cache_hits++;
return last;
}
...
I can run Crash on this core file. However I do not know if this is the best
way to fix the problem.
Jan
Jan Karlsson
Senior Software Engineer
System Assurance
Sony Mobile Communications
Tel: +46 703 062 174
jan.karlsson(a)sonymobile.com
sonymobile.com
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility