Bob Montgomery wrote:
On Thu, 2007-03-29 at 08:13 -0500, Dave Anderson wrote:
> Ken'ichi Ohmichi wrote:
> > I checked whether this change is correct by the following:
> > (The following patches are attached with this mail)
> > - makedumpfile-1.1.2 with "point_same_zero_page2.patch" creates a
dumpfile.
> > - crash-4.0-3.21 with "not-access-excluded-page.patch" analyzes the
dumpfile.
> > - The analysis result of the dumpfile is compared with /proc/vmcore's.
> >
> > And on i386 linux-2.6.19, I found the difference between the result
> > of the dumpfile (excluding free pages) and /proc/vmcore's by subcommand
> > "foreach bt".
> > But by using crash-4.0-3.21 without
"not-access-excluded-page.patch",
> > there is not any difference. In a word, this difference happens due to
> > considering the excluded pages as unaccess pages.
Just to clarify for those who probably aren't as confused as I was at
first:
This isn't a test of the zero page trick, because with the changes to
makedumpfile, zero pages are no longer actually excluded. (I read
"excluding free pages" but immediately thought "excluding zero
pages"
and spent more than a few minutes checking how that could possibly have
happened.)
So this is apparently a case where a page excluded because it was
supposedly free is then maybe accessed by the back tracer while it might
be trying to read kernel text, right? But kernel text should never look
free, so I'm still puzzled. Did makedumpfile mis-identify a real page
as free, or is crash asking for pages it shouldn't be looking at during
backtrace?
No -- it's kernel text that was marked as __init, so the page containing
it got freed and reallocated as a page that was purposely excluded.
The page originally contained the "start_kernel" __init function, which
only gets executed once by the first swapper thread.
The problem is that crash shouldn't be looking at that text location
when doing a backtrace on that PID 0, because it should have
stopped the trace as soon as it saw the "cpu_idle" stack reference.
I don't know why it's doing that -- I tried simulating Ken'ichi's vmcore
by forcibly returning an error if readmem() got a request for the
page originally containing "start_kernel", but the backtrace worked
OK -- even though I could see the "start_kernel" reference on
the stack when using "bt -t".
Anyway, that's why I've asked Ken'ichi if he can make his
vmlinux/vmcore pair available for me to debug.
Thanks,
Dave
Anyway, with my test dump on my x86_64 box, I don't get a case where
dumpfiles with excluded free pages produce different "foreach bt" output
than I get from the vmcore file. I tried -d16 and -d31 options. I do
get the expected excluded page message when I x/xg an excluded address,
just no problems with bt. So I can't help Dave with an example.
Bob Montgomery