From: Dave Anderson <anderson@redhat.com>
To: paawan oza <paawan1982@yahoo.com>
Cc: "Discussion list for crash utility usage, maintenance and development" <crash-utility@redhat.com>
Sent: Friday, 1 March 2013 10:49 PM
Subject: Re: [Crash-utility] timer: invalid list entry: 1
----- Original Message -----
> I would give some more info.
>
> It is dual core system. (ARM)
> both core are stuck at wfi (wait for interrupt)
> and we observe that the timer counter has one much ahead than the comparators.
> so we never get a local timer interrupt, and nobody is there to wake the cpu up.
>
> so we observe the freeze.
>
> Regards,
> Oza.
I don't know much about the ARM architecture, and the only sample
SMP ARM dumpfile I have on hand shows the non-panicking cpu blocked
in default_idle(). So I
don't understand how "wfi" would come
into play.
What does "bt -a" show?
>
> some more info:
> I am debugging crash utility with gdb, and getting following stack trace.
>
> crash> timer
> TVEC_BASES[0]: c0a419c0
> JIFFIES
> 4297762
> EXPIRES TIMER_LIST FUNCTION
> 128 c1621ea8 c007260c <idle_worker_timeout>
> 30208 c0b81f04 c04e4244 <inet_frag_secret_rebuild>
> 30720 c0b7f264 c0461440 <flow_cache_new_hashrnd>
> 30840 dba2be04 c0068ebc <process_timeout>
> 38228 dbae5e04 c0068ebc <process_timeout>
> 11796480 c097cb64 c0010aa4 <sched_clock_poll>
> 4294937694 c0a6f118 c026f820 <rx_timeout_handler>
> 4294945658 c16238fc c007412c <delayed_work_timer_fn>
> 4294945667 d811be14 c0068ebc <process_timeout>
> 4294945700 c16237cc c007412c <delayed_work_timer_fn>
>
4294945700 c16236e0 c007412c <delayed_work_timer_fn>
> 4294946020 c0a1dcbc c007412c <delayed_work_timer_fn>
> 4294946029 dca8f884 c007412c <delayed_work_timer_fn>
> 4294946504 c0b871c4 c007412c <delayed_work_timer_fn>
> 4294950720 c0b81d6c c007412c <delayed_work_timer_fn>
>
> Breakpoint 2, do_list (ld=0xff961c78) at tools.c:3507
> 3507 error(INFO, "\ninvalid list entry: %lx\n", next);
> (gdb) bt
> #0 do_list (ld=0xff961c78) at tools.c:3507
> #1 0x0811de03 in do_timer_list (vec_kvaddr=3699761524, size=256,
> vec=0x85c9f40, option=0x0, highest=0x0, tv=0xff962ec4) at
> kernel.c:6983
> #2 0x0811c9d3 in dump_timer_data_tvec_bases_v2 () at kernel.c:6678
> #3 0x0811afac in dump_timer_data () at kernel.c:6370
> #4 0x0811af8a in cmd_timer () at kernel.c:6329
> #5 0x080910a1 in exec_command () at main.c:818
> #6 0x08090ec7 in
main_loop () at main.c:766
> #7 0x081bf35a in current_interp_command_loop ()
> #8 0x081bfbcf in captured_command_loop ()
> #9 0x081beddc in catch_errors ()
> #10 0x081c0a9a in captured_main ()
> #11 0x081beddc in catch_errors ()
> #12 0x081c0adc in gdb_main ()
> #13 0x081c0b29 in gdb_main_entry ()
> #14 0x08121590 in gdb_main_loop (argc=2, argv=0xff964014) at gdb_interface.c:76
> #15 0x08090c01 in main (argc=3, argv=0xff964014) at main.c:671
>
> here exactly I hit invalid entry.
Right, I understand where the error message came from.
The crash utility's do_list() function is simply reporting what
it sees in the list_head-type linked list that it was following.
I have only seen these types of timer command errors in
vmcores that were generated with the "snap.so" extension
module, or when running the command on a live system.
And both of those
scenarios make perfect sense because the
underlying kernel was running/modifying the timer-related
data structures while the memory was being copied.
Presuming that the crash was taken with kdump, you would
typically expect that the timer data structures would
be stable.
Dave