----- Original Message -----
I would give some more info.
It is dual core system. (ARM)
both core are stuck at wfi (wait for interrupt)
and we observe that the timer counter has one much ahead than the comparators.
so we never get a local timer interrupt, and nobody is there to wake the cpu up.
so we observe the freeze.
Regards,
Oza.
I don't know much about the ARM architecture, and the only sample
SMP ARM dumpfile I have on hand shows the non-panicking cpu blocked
in default_idle(). So I don't understand how "wfi" would come
into play.
What does "bt -a" show?
some more info:
I am debugging crash utility with gdb, and getting following stack trace.
crash> timer
TVEC_BASES[0]: c0a419c0
JIFFIES
4297762
EXPIRES TIMER_LIST FUNCTION
128 c1621ea8 c007260c <idle_worker_timeout>
30208 c0b81f04 c04e4244 <inet_frag_secret_rebuild>
30720 c0b7f264 c0461440 <flow_cache_new_hashrnd>
30840 dba2be04 c0068ebc <process_timeout>
38228 dbae5e04 c0068ebc <process_timeout>
11796480 c097cb64 c0010aa4 <sched_clock_poll>
4294937694 c0a6f118 c026f820 <rx_timeout_handler>
4294945658 c16238fc c007412c <delayed_work_timer_fn>
4294945667 d811be14 c0068ebc <process_timeout>
4294945700 c16237cc c007412c <delayed_work_timer_fn>
4294945700 c16236e0 c007412c <delayed_work_timer_fn>
4294946020 c0a1dcbc c007412c <delayed_work_timer_fn>
4294946029 dca8f884 c007412c <delayed_work_timer_fn>
4294946504 c0b871c4 c007412c <delayed_work_timer_fn>
4294950720 c0b81d6c c007412c <delayed_work_timer_fn>
Breakpoint 2, do_list (ld=0xff961c78) at tools.c:3507
3507 error(INFO, "\ninvalid list entry: %lx\n", next);
(gdb) bt
#0 do_list (ld=0xff961c78) at tools.c:3507
#1 0x0811de03 in do_timer_list (vec_kvaddr=3699761524, size=256,
vec=0x85c9f40, option=0x0, highest=0x0, tv=0xff962ec4) at
kernel.c:6983
#2 0x0811c9d3 in dump_timer_data_tvec_bases_v2 () at kernel.c:6678
#3 0x0811afac in dump_timer_data () at kernel.c:6370
#4 0x0811af8a in cmd_timer () at kernel.c:6329
#5 0x080910a1 in exec_command () at main.c:818
#6 0x08090ec7 in main_loop () at main.c:766
#7 0x081bf35a in current_interp_command_loop ()
#8 0x081bfbcf in captured_command_loop ()
#9 0x081beddc in catch_errors ()
#10 0x081c0a9a in captured_main ()
#11 0x081beddc in catch_errors ()
#12 0x081c0adc in gdb_main ()
#13 0x081c0b29 in gdb_main_entry ()
#14 0x08121590 in gdb_main_loop (argc=2, argv=0xff964014) at gdb_interface.c:76
#15 0x08090c01 in main (argc=3, argv=0xff964014) at main.c:671
here exactly I hit invalid entry.
Right, I understand where the error message came from.
The crash utility's do_list() function is simply reporting what
it sees in the list_head-type linked list that it was following.
I have only seen these types of timer command errors in
vmcores that were generated with the "snap.so" extension
module, or when running the command on a live system.
And both of those scenarios make perfect sense because the
underlying kernel was running/modifying the timer-related
data structures while the memory was being copied.
Presuming that the crash was taken with kdump, you would
typically expect that the timer data structures would
be stable.
Dave