wfi (is wait for interrupt), in the sense we let the cpu go ino idle/dormant when he has
nothing to do.
and the thread who has been scheduled earliest, the timer would have set accordingly and
then wake the cpu up.
here we are missing both timer interrupt on both cpu.
that means that timer counter has much gone ahead, and it will never match programmed
compare values.
so its system freeze, as interrupts are not happening.
in that freeze, we have special keryboard interrupt to take task dump and other dumps.
on that ramdump which I have crash utility would show
crash> bt -a
PID: 0 TASK: c097b8b0 CPU: 0 COMMAND: "swapper/0"
bt: WARNING: cannot get stackframe for task
PID: 0 TASK: dc84ca40 CPU: 1 COMMAND: "swapper/1"
bt: WARNING: cannot get stackframe for task
and timer
crash> timer
TVEC_BASES[0]: c0a419c0
JIFFIES
4297762
EXPIRES TIMER_LIST FUNCTION
128 c1621ea8 c007260c <idle_worker_timeout>
30208 c0b81f04 c04e4244 <inet_frag_secret_rebuild>
30720 c0b7f264 c0461440 <flow_cache_new_hashrnd>
30840 dba2be04 c0068ebc <process_timeout>
38228 dbae5e04 c0068ebc <process_timeout>
11796480 c097cb64 c0010aa4 <sched_clock_poll>
4294937694 c0a6f118 c026f820 <rx_timeout_handler>
4294945658 c16238fc c007412c <delayed_work_timer_fn>
4294945667 d811be14 c0068ebc <process_timeout>
4294945700 c16237cc c007412c <delayed_work_timer_fn>
4294945700 c16236e0 c007412c <delayed_work_timer_fn>
4294946020 c0a1dcbc c007412c <delayed_work_timer_fn>
4294946029 dca8f884 c007412c <delayed_work_timer_fn>
4294946504 c0b871c4 c007412c <delayed_work_timer_fn>
4294950720 c0b81d6c c007412c <delayed_work_timer_fn>
timer: invalid list entry: 1
timer: ignoring faulty timer list at index 44 of timer array
timer: invalid list entry: 1
timer: ignoring faulty timer list at index 44 of timer array
TVEC_BASES[1]: dc85e000
JIFFIES
4297762
EXPIRES TIMER_LIST FUNCTION
384 c0a42ba8 c007260c <idle_worker_timeout>
4297862 dbec0dfc c007412c <delayed_work_timer_fn>
4297897 c162c6e0 c007412c <delayed_work_timer_fn>
4297962 dbec0ea0 c04a7cec <estimation_timer>
4297997 c162c7cc c007412c <delayed_work_timer_fn>
4300768 dcb36654 c007412c <delayed_work_timer_fn>
4309824 c0a20024 c0516718 <addrconf_verify>
4327762 dcaabf54 c0068ebc <process_timeout>
4327808 c162aea8 c007260c <idle_worker_timeout>
4357762 dbaa3e04 c0068ebc <process_timeout>
4357762 dbaa3e04 c0068ebc <process_timeout>
4357888 c0b83fa4 c04e4244 <inet_frag_secret_rebuild>
4357888 c0b84694 c04e4244 <inet_frag_secret_rebuild>
4357888 c0b83fa4 c04e4244 <inet_frag_secret_rebuild>
4357888 c0b84694 c04e4244 <inet_frag_secret_rebuild>
Regards,
Oza.
________________________________
From: Dave Anderson <anderson(a)redhat.com>
To: paawan oza <paawan1982(a)yahoo.com>
Cc: "Discussion list for crash utility usage, maintenance and development"
<crash-utility(a)redhat.com>
Sent: Friday, 1 March 2013 10:49 PM
Subject: Re: [Crash-utility] timer: invalid list entry: 1
----- Original Message -----
I would give some more info.
It is dual core system. (ARM)
both core are stuck at wfi (wait for interrupt)
and we observe that the timer counter has one much ahead than the comparators.
so we never get a local timer interrupt, and nobody is there to wake the cpu up.
so we observe the freeze.
Regards,
Oza.
I don't know much about the ARM architecture, and the only sample
SMP ARM dumpfile I have on hand shows the non-panicking cpu blocked
in default_idle(). So I don't understand how "wfi" would come
into play.
What does "bt -a" show?
some more info:
I am debugging crash utility with gdb, and getting following stack trace.
crash> timer
TVEC_BASES[0]: c0a419c0
JIFFIES
4297762
EXPIRES TIMER_LIST FUNCTION
128 c1621ea8 c007260c <idle_worker_timeout>
30208 c0b81f04 c04e4244 <inet_frag_secret_rebuild>
30720 c0b7f264 c0461440 <flow_cache_new_hashrnd>
30840 dba2be04 c0068ebc <process_timeout>
38228 dbae5e04 c0068ebc <process_timeout>
11796480 c097cb64 c0010aa4 <sched_clock_poll>
4294937694 c0a6f118 c026f820 <rx_timeout_handler>
4294945658 c16238fc c007412c <delayed_work_timer_fn>
4294945667 d811be14 c0068ebc <process_timeout>
4294945700 c16237cc c007412c <delayed_work_timer_fn>
4294945700 c16236e0 c007412c <delayed_work_timer_fn>
4294946020 c0a1dcbc c007412c <delayed_work_timer_fn>
4294946029 dca8f884 c007412c <delayed_work_timer_fn>
4294946504 c0b871c4 c007412c <delayed_work_timer_fn>
4294950720 c0b81d6c c007412c <delayed_work_timer_fn>
Breakpoint 2, do_list (ld=0xff961c78) at tools.c:3507
3507 error(INFO, "\ninvalid list entry: %lx\n", next);
(gdb) bt
#0 do_list (ld=0xff961c78) at tools.c:3507
#1 0x0811de03 in do_timer_list (vec_kvaddr=3699761524, size=256,
vec=0x85c9f40, option=0x0, highest=0x0, tv=0xff962ec4) at
kernel.c:6983
#2 0x0811c9d3 in dump_timer_data_tvec_bases_v2 () at kernel.c:6678
#3 0x0811afac in dump_timer_data () at kernel.c:6370
#4 0x0811af8a in cmd_timer () at kernel.c:6329
#5 0x080910a1 in exec_command () at main.c:818
#6 0x08090ec7 in main_loop () at main.c:766
#7 0x081bf35a in current_interp_command_loop ()
#8 0x081bfbcf in captured_command_loop ()
#9 0x081beddc in catch_errors ()
#10 0x081c0a9a in captured_main ()
#11 0x081beddc in catch_errors ()
#12 0x081c0adc in gdb_main ()
#13 0x081c0b29 in gdb_main_entry ()
#14 0x08121590 in gdb_main_loop (argc=2, argv=0xff964014) at gdb_interface.c:76
#15 0x08090c01 in main (argc=3, argv=0xff964014) at main.c:671
here exactly I hit invalid entry.
Right, I understand where the error message came from.
The crash utility's do_list() function is simply reporting what
it sees in the list_head-type linked list that it was following.
I have only seen these types of timer command errors in
vmcores that were generated with the "snap.so" extension
module, or when running the command on a live system.
And both of those scenarios make perfect sense because the
underlying kernel was running/modifying the timer-related
data structures while the memory was being copied.
Presuming that the crash was taken with kdump, you would
typically expect that the timer data structures would
be stable.
Dave