On Wed, 2010-02-10 at 14:01 -0500, Dave Anderson wrote:
----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com>
wrote:
> > > It shows all swapper tasks (online and offline), but I get errors for
> > > the backtrace for the offline CPUs.
> >
> > What kind of errors?
>
> The problem is that for the offline swapper tasks
> s390x_get_stack_frame() is called. In that function I check with
> s390x_has_cpu() if the task is currently running on a CPU. Because of
> the missing CPU online check, s390x_has_cpu() returns TRUE. Therefore I
> try to read the CPU registers from the lowcore of that CPU. The lowcore
> pointer is zero, because the CPU is offline. Therefore the read stack
> pointer (register 15) is wrong and the backtrace fails.
>
> > >
> > > The attached patch would solve the problem (and eliminate most of the
> > > probably redundant s390(x)_has_cpu() function.
> >
> > I don't see what's being solved by the patch (not the
s390x_get_smp_cpus
> > parts) -- does the "old" s390x_has_cpu() fail?
>
> The old s390x_has_cpu() returns TRUE for the offline swapper tasks. And
> I think that this is wrong.
Hmmm... To me, it is TRUE, i.e., the existing-but-idle swapper task for
an offline cpu actually *does* own that cpu.
And that's why I was wondering about what error message gets shown.
>
> The new implementation of s390x_has_cpu() should return TRUE if the task
> is running on a online CPU and FALSE otherwise:
>
> + if (is_task_active(bt->task) && (kt->cpu_flags[cpu] &
ONLINE))
> + return TRUE;
> + else
> + return FALSE;
This is probably OK, although I am slightly hesitant about throwing out all
of the old backwards-compatibility code in the s390[x]_has_cpu() functions.
Why? The "is_task_active()" function must also work on all supported
kernel levels. Otherwise crash would probably fail in other s390
independent functions, wouldn't it? Of course, we could also keep my old
code and add the online check to the old code.
I thought maybe it would be safer to leave well enough alone, and
not
worry about any error messages from backtraces of offline cpus.
It might be even more useful that there are error messages to alert
the user that the cpu is not online?
The following shows the output of "bt -a" without the patch:
PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
bt: invalid kernel virtual address: ffffffffffffc000 type:
"async_stack"
PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
bt: invalid kernel virtual address: ffffffffffffc000 type:
"async_stack"
We can't leave it like that. With my patch at least we get a correct
stack backtrace:
PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
#0 [18d3feb8] ret_from_fork at 117e12
How is the output of a backtrace of offline CPUs on other architectures?
Michael