Dave Anderson wrote:
Daniel Li wrote:
> Dave Anderson wrote:
>
>> Daniel Li wrote:
>>
>>> It seems the problem is not one with guest dump, but the version of
>>> SLES.
>>>
>>> After upgrading my NATIVE SLES 9 system to SP 3, exactly the same
>>> problem happened while trying to use 'crash' on the live system,
>>> with a debug linux kernel ('vmlinux.dbg' below) built on the same
>>> system from matching 'kernel-source' package. (During this upgrade,
>>> the linux kernel changed from 2.6.5-7.97-smp to 2.6.5-7.244-smp, the
>>> same as that on the guest.)
>>>
>>> Has anyone else seen this?
>>
>>
>>
>> Did anything change in the task_struct between 2.6.5-7.97-smp and
>> 2.6.5-7.244-smp?
>>
>> Or, more likely, anything associated with the pidhash/pid_hash-related
>> code in the kernel?
>>
>> Is the output of the crash command "help -t | grep refresh_task_table"
>> different when running against 2.6.5-7.97-smp vs. 2.6.5-7.244-smp?
>>
>> Dave
>>
> The definition of task_struct between 2.6.5-7.97-smp and
> 2.6.5-7.244-smp did change. There is one new 8-bytes field called
> 'last_ran' before the list_head for tasks. This is what I don't get:
> why should it matter as long as the dump and debug kernel are using
> the same definition?
>
It shouldn't.
Does the output of "help -o task_struct" on the .97 vs the .244 kernels
reflect the member offset differences as you would expect? I.e.,
everything
(that's not -1) coming after the new last_ran member is bumped up by 8?
And are you sure there's nothing different w/respect to the pid_hash
declarations/usage?
Dave
> The output of "help -t | grep refresh_task_table" didn't change.
The reason I ask about any pid_hash-related changes is because
over the years the manner of task table handling by the crash
utility has had to change to deal with the kernel changes.
The crash-internal tt->refresh_task_table function pointer
that you see in the "help -t" output gets set during task_init()
to one of these functions:
static void refresh_fixed_task_table(void);
static void refresh_unlimited_task_table(void);
static void refresh_pidhash_task_table(void);
static void refresh_pid_hash_task_table(void);
static void refresh_hlist_task_table(void);
static void refresh_hlist_task_table_v2(void);
with later kernels requiring the later function in the list above.
For a 2.6.5 vintage kernel, I'm guessing that when you did
the "help -t" it showed "refresh_pid_hash_task_table()"?
Anyway, in the two kernels that you are comparing, how is the
"pid_hash" variable declared in the kernel sources? With
respect to the crash-internal setting of tt->refresh_task_table,
it should line up like so:
kernel: static struct list_head pid_hash[PIDTYPE_MAX][PIDHASH_SIZE];
crash: refresh_pid_hash_task_table()
kernel: static struct hlist_head *pid_hash[PIDTYPE_MAX];
crash: refresh_hlist_task_table()
kernel: static struct hlist_head *pid_hash;
crash: refresh_hlist_task_table_v2()
For whatever reason it almost looks like the task-gathering is
using the wrong function, or maybe given back-ports and such,
the SUSE kernel task-handling is now a "hybrid" that would need
its own task-gathering function in the crash utility.
With respect to the "last_ran" addition, you could always rebuild
a kernel with that field moved to the end of the task_struct,
run that kernel, and see what happens. If the "ps" task output
is still screwed up, then it should rule that out as the problem
at hand.
Dave