Bob Montgomery wrote:
On Tue, 2007-03-20 at 11:18 +0900, Ken'ichi Ohmichi wrote:
> Hi Bob,
>
> Thank you for the great report.
You are welcome.

Before continuing the discussion of issues with proposed changes to ELF
dumpfile generation, I'd like to recap where we are and suggest a couple
of actions.

1)  Makedumpfile patch:  Ken'ichi Ohmichi's email of Wed, 7 Mar 2007
10:43:38 +0900 contained the patch "point_same_zero_page.patch".  That
patch contains the nice solution to remove redundant zero page images
from the diskdump dump file by pointing the page descriptors of zero
pages to a common zero image.  I suggest that this patch should be
applied to makedumpfile as soon as possible, without waiting on a
possible solution to the ELF situation.  As described in my report, ELF
and diskdump dump files have not shown identical behavior in the past.
This patch makes diskdump dump files more accurate, and leaves ELF dump
files at the same level of accuracy that they have always had.

2) Crash patch:  Ken'ichi Ohmichi's crash patch in the email of Tue, 20
Mar 2007 11:18:02 +0900 should be submitted to crash as soon as
possible.  The error message is a bit wordy, and could be reduced to
prevent wrap on 80 column windows, as suggested below, but that is a
minor issue, since the output is already a bit unwieldy:

Current patch prints on 80-column screen (three levels of crash are now
reporting the problem):

When Ken'ichi agrees to and commits the makedumpfile change,
I'll do the same for crash.

BTW, you won't get all the gdb-related clutter if you use the
crash "rd" command to read memory instead of the gdb "x" command.

Anyway, as the patch is currently written, there will be two
error messages, and a "rd" command would look like this:

crash> rd 0xffff810000005ff0
rd:  diskdump: paddr(5ff0) excluded from dump
rd: seek error: kernel virtual address: ffff810000005ff0  type: "64-bit KVADDR"
crash>

I guess that's OK.   Although I actually kind of liked your
original suggestion for an additional return value to readmem().
That way we could at least consolidate the 2 error messages into
just one.

 
crash> x/xg 0xffff810000005ff0
0xffff810000005ff0:     gdb: diskdump: cannot access paddr(5ff0) due to the excl
uded page
gdb: seek error: kernel virtual address: ffff810000005ff0  type: "gdb_readmem_ca
llback"
Cannot access memory at address 0xffff810000005ff0
crash>

With a shortened message in diskdump.c:

crash> x/xg 0xffff810000005ff0
0xffff810000005ff0:     gdb: diskdump: paddr(5ff0) excluded from dump
gdb: seek error: kernel virtual address: ffff810000005ff0  type: "gdb_readmem_ca
llback"
Cannot access memory at address 0xffff810000005ff0
crash>

With these two patches, crash reports the contents of diskdump dump
files produced by makedumpfile correctly.  Zero content pages that have
not been excluded for other reasons remain accessible, and pages that
have really been excluded become inaccessible, instead of showing 0x0
contents.  Crash should continue to read old dump files as before,
because of the change in version number in the dumpfiles.
 

I am a little bit worried that this may cause an
unnecessary abort -- based upon your experience
with my suggestion of returning an error instead
of a zero-filled buffer from the current compressed
diskdump format -- so I may be paranoid.  Anyway,
I'll probably put in an "out" so that the user
has the choice of getting a zero-filled buffer
like it does now.

Dave

 
Bob Montgomery