Daniel Li wrote:
Hey Dave,
When you said this was something you never saw before, did you mean you
never tried to use crash on a dump of SLES9 guest with the nonstandard
ELF format, or that this scenario was working for you, thus you didn't
see this type of error message?
Well, both actually, because I've never seen a SLES9 guest dumpfile.
But what I meant with respect to your specific problem was that
I've never seen a situation where a task_struct's contents seemingly
contains both bogus and good data.
If the answer happen to be the first one, do you have any plan to
support SLES guest dumps? (with the new ELF format you incorporated in
the first half of this year to get crash working with Redhat guest dumps)
The support I put in for the new xen-ELF format guest dumps was
based solely upon upstream xen-ELF dumpfiles I received from the
developers at the time. So it had nothing to do with Red Hat
guest dumps, since the initial RHEL5 release still uses the old
"xendump" format for guest dumps.
In the upcoming RHEL5.1 update release, the upstream xen-ELF dumpfile
code was backported, and dumpfiles created from that environment
"just worked" without any Red Hat specific changes.
So, no, I don't have any plans (personally) to do anything for
SLES guest dumps, but as always, am ready and willing to accept
any fixes that that can make it happen.
Since your guest dumpfile is of 2.6.5 vintage, and seems to
"sort of work" -- except for the task_struct issue -- I'm
wondering if this may be a fairly trivial issue to fix.
I don't know for sure, but I'm presuming that LKCD dumps of
2.6.5 SUSE kernels have been readable by the crash utility?
Somebody else will have to confirm that.
What I don't understand is that the "current" swapper task
at ffffffff803d2800 shows a correct "RU" state and points
to a backtrace-able kernel stack -- but its VSZ, RSS and COMM
fields are bogus:
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff803d2800 RU 0.0 4399648058624 4389578663200
[<80>^L]
0 0 0 ffffffff803d2808 ?? 0.0 0 0
[swapper]
But then it shows another "swapper" task at ffffffff803d2808, i.e.,
8 bytes beyond that, which by address (and since it's a single cpu)
obviously makes no sense. However, the task_struct at ffffffff803d2808
matches up with the "comm[16]" field to show "swapper". And then all
of the subsequent process's task_structs also end with an "8", and
show the same symptom.
It almost seems that task_init() is picking the wrong function
to assign to tt->refresh_task_table(). But even if it is, I
still can't explain the task_struct-printing issue, unless the
vmlinux.dbg file doesn't match the dumped kernel or something
like that?
Dave