----- "Lai Jiangshan" <laijs(a)cn.fujitsu.com> wrote:
Dave Anderson wrote:
> Hello Lai,
>
> If ever there was a perfect candidate for a crash utility extension module,
> this is it. This functionality is far too subsystem-specific to included as
> a generic command. There has not been a "new" base crash command in many
years.
>
> Reviewing the patch, the "trace" command can easily be created as an
extension
> module. The only things that need to be done are:
Your suggest is very helpful. We accept it. We're doing it now.
Thank you very much.
>
> (2) Put the "int nr_cpu_ids" variable into the ftrace.c extension
> module, where you still will have access to the global "kt"
> kernel_table pointer.
>
There is a bug in my box: crash can not recognize the real cpus number,
kt->cpus is wrong. So I fix it and put nr_cpu_ids in the kernel_table.
I'll sent a separate patch for it soon.
In current linux kernel, nr_cpu_ids is recommended to be used instead
of old NR_CPUS. Because CONFIG_NR_CPUS=4096, it's too big for a lot of
systems.
kmalloc(sizeof(struct foo) * NR_CPUS) ==> kmalloc(sizeof(struct foo) * nr_cpu_ids)
for (i=0; i < NR_CPUS; i++) ==> for (i=0; i < nr_cpu_ids; i++)
NR_CPUS is also 4096 in crash now, so I also suggest using nr_cpu_ids
instead of NR_CPUS in crash's code when the symbol "nr_cpu_ids"
exists.
I understand the problem with NR_CPUS usage in the kernel, but your
original patch did this:
+ if (symbol_exists("nr_cpu_ids"))
+ get_symbol_data("nr_cpu_ids", sizeof(int),
&kt->nr_cpu_ids);
+ else
+ kt->nr_cpu_ids = 1;
+
+ if (kt->cpus < kt->nr_cpu_ids)
+ kt->cpus = kt->nr_cpu_ids;
+
As I understand it, the kernel's "nr_cpu_ids" is initialized to NR_CPUS,
and then later reduced to the number of "possible" cpus, neither of which
represent the number of online cpus.
The crash utility's "kt->cpus" is meant to reflect the number of actual
cpus that are online. It almost always is less than NR_CPUS and/or the
number of "possible" cpus -- only if the number of online cpus is actually
equal to the number of possible cpus would they ever be the same. So the
setting of "kt->cpus = kt->nr_cpu_ids" above cannot be the correct thing
to do.
Now, there may be another bug w/respect to your box such that the crash
utility cannot determine the number of cpus. That determination is done
differently by the supported processors -- I'd be interested in exactly
what the bug in your machine is.
Thanks,
Dave