----- "Lai Jiangshan" <laijs(a)cn.fujitsu.com> wrote:
 Dave Anderson wrote:
 > Hello Lai,
 >
 > If ever there was a perfect candidate for a crash utility extension module,
 > this is it.  This functionality is far too subsystem-specific to included as
 > a generic command.  There has not been a "new" base crash command in many
years.
 >
 > Reviewing the patch, the "trace" command can easily be created as an
extension
 > module.  The only things that need to be done are:
 Your suggest is very helpful. We accept it. We're doing it now.
 Thank you very much.
 >  
 >   (2) Put the "int nr_cpu_ids" variable into the ftrace.c extension
 >       module, where you still will have access to the global "kt"
 >       kernel_table pointer.
 >
 There is a bug in my box: crash can not recognize the real cpus number,
 kt->cpus is wrong. So I fix it and put nr_cpu_ids in the kernel_table.
 I'll sent a separate patch for it soon.
 In current linux kernel, nr_cpu_ids is recommended to be used instead
 of old NR_CPUS. Because CONFIG_NR_CPUS=4096, it's too big for a lot of
 systems.
 kmalloc(sizeof(struct foo) * NR_CPUS) ==> kmalloc(sizeof(struct foo) * nr_cpu_ids)
 for (i=0; i < NR_CPUS; i++) ==> for (i=0; i < nr_cpu_ids; i++)
 NR_CPUS is also 4096 in crash now, so I also suggest using nr_cpu_ids
 instead of NR_CPUS in crash's code when the symbol "nr_cpu_ids"
 exists. 
I understand the problem with NR_CPUS usage in the kernel, but your
original patch did this:
+       if (symbol_exists("nr_cpu_ids"))
+               get_symbol_data("nr_cpu_ids", sizeof(int),
&kt->nr_cpu_ids);
+       else
+               kt->nr_cpu_ids = 1;
+
+       if (kt->cpus < kt->nr_cpu_ids)
+               kt->cpus = kt->nr_cpu_ids;
+
As I understand it, the kernel's "nr_cpu_ids" is initialized to NR_CPUS,
and then later reduced to the number of "possible" cpus, neither of which
represent the number of online cpus.
The crash utility's "kt->cpus" is meant to reflect the number of actual
cpus that are online.  It almost always is less than NR_CPUS and/or the
number of "possible" cpus -- only if the number of online cpus is actually
equal to the number of possible cpus would they ever be the same.  So the
setting of "kt->cpus = kt->nr_cpu_ids" above cannot be the correct thing
to do.
Now, there may be another bug w/respect to your box such that the crash
utility cannot determine the number of cpus.  That determination is done
differently by the supported processors -- I'd be interested in exactly
what the bug in your machine is.
Thanks,
  Dave