Dave Anderson wrote:
Actually, upon looking at a sample ELF-format dumpfile from Ken'ichi, and hacking in a forced-readmem()-failure upon accessing the page containing the start_kernel __init function, I can reproduce the problem with the 2.6.19 kernel. But it is not a problem with makedumpfile's excluded pages, but rather with the backtrace code's framesize calculation of schedule() -- which is causing it skip over the cpu_idle() ending point. It seems to be specific to 2.6.19 for some reason. I'll look further into that issue.
I've tracked down why the framesize calculation of schedule()
is
too large in this case, and therefore skipping over the calling
function's (cpu_idle) frame, and going too far back up the stack
into the "start_kernel" frame reference left on the stack.
It's because there is a BUG() instruction in schedule() code,
which has the line number and virtual address of the filename
string encoded just after the BUG()'s "ud2a" instruction.
#define BUG()
\
__asm__ __volatile__( "ud2\n"
\
"\t.word %c0\n" \
"\t.long %c1\n" \
: : "i" (__LINE__), "i" (__FILE__))
It just so happens that the virtual address of the filename,
when erroneously decoded as i386 instructions, by dumb luck
translates into a "pusha" instruction ("push all general-purpose
registers"), and therefore the framesize calculator erroneously
adds 32 bytes.
It's weird (and fortunate), that I've spent the last couple of
days coming up with a "dis" fix to deal with those extra bytes
encoded after x86 and x86_64 "ud2a" instructions that causes the
text disassembly to go off into the weeds. Anyway, I'll leverage
off of that code to make the framesize calculator also skip over
the extra encoded bytes instead of translating them.
This could very well be the cause of other x86 framesize
miscalculations that has led to incorrect backtraces. Only
because Bob had put in the code to refuse excluded-page
read attempts was this actually caught in the act.
Thanks, Bob!
Dave