On Wed, 2007-03-07 at 10:43 +0900, Ken'ichi Ohmichi wrote:
I want to use the feature of excluding zero-pages, because our
systems
(x86_64) have many zero-pages immediately after system booting.
Bob is researching for the behavior of crash on ELF format dumpfiles.
I would like to wait for his report.
Sorry you had to wait so long.
Bob Montgomery
Here's what I know about how crash deals with ELF (netdump) dump files
compared to how it deals with kdump (diskdump) dump files.
========================================
Intro to ELF dumpfiles and zero filling:
========================================
ELF format dumpfiles do not contain a page-by-page bitmap of included
and excluded pages. Instead, a program header table describes
groups of contiguous pages that are present in the dumpfile. In its
simplest form, this allows a debugger to locate groups of pages that
are present in the file, and conversely to identify pages that are
missing by failing to find a program header entry that encloses the
address of a missing page.
At some point between the 3.4 and 3.19 versions of crash, code was
added to netdump.c:netdump_read to handle a zero_fill feature in
the ELF files. In a program header entry where p_memsz (MemSiz) is
bigger than p_filsiz (FileSiz), the zone between them is considered
to be filled with zero upon access.
For example, using info from readelf on a -E -d31 dumpfile:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
...
LOAD 0x00000000004722c8 0xffff810005000000 0x0000000005000000
0x000000000063b000 0x0000000003000000 RWE 0
...
For the group of pages described by this program header entry,
crash-3.19 sets up this internal representation:
{file_offset = 0x4722c8, phys_start = 0x5000000, phys_end = 0x563b000,
zero_fill = 0x8000000}
If the FileSiz and MemSiz are the same, no zero fill zone is needed,
phys_end is the real end of the segment, and zero_fill is set to 0x0.
If the requested address falls between phys_start and phys_end,
it is read from the computed file offset. Otherwise, if it falls
between phys_end and zero_fill, the requested buffer is memset to zero.
Here is an address that falls within the zero fill zone shown above.
(0xffff81000563c000 maps to physical address 0x563c000, which is above
phys_end, but below zero_fill).
crash-3.19> x/xg 0xffff81000563c000
0xffff81000563c000: 0x0000000000000000
gdb (6.4.90-debian) can also read this address from the ELF dumpfile:
(gdb) x/xg 0xffff81000563c000
0xffff81000563c000: 0x0000000000000000
But crash-3.4 did not have the zero fill code and fails:
crash-3.4> x/xg 0xffff81000563c000
0xffff81000563c000: gdb: read error: kernel virtual address:
ffff81000563c000 type: "gdb_readmem_callback"
Cannot access memory at address 0xffff81000563c000
==========================
Philosophical Meandering
==========================
The zero-fill feature gives an ELF dump file *three* ways to represent
pages:
1) Not In The Address Space: There is no program header LOAD entry
that contains the requested address.
2) Not In The File, Zero Fill: A program header LOAD entry contains
the address, but the offset in the segment is between FileSiz
and MemSiz.
3) In The File: A program header LOAD entry contains the address,
and the offset in the segment is smaller than FileSiz.
Unfortunately, makedumpfile recognizes *four* types of pages:
A) Not In The Address Space: physical holes in memory, undumpable zones
B) Excluded Type: Excluded based on page type (user memory, cached page)
C) Zero Content: Excluded based on page content being all 0x0
D) In The File
The problem at hand is that makedumpfile's current mapping of the four
types of pages onto ELF's three types of representation puts both
"B) Excluded Type" and "C) Zero Content" into ELF's "2) Not
In The
File, Zero Fill" representation. This results in crash reporting
the contents of all addresses in excluded pages as 0x0, regardless
of their original value.
We have proposed to fix this problem on diskdump-format dump files by
leaving zero pages in the bitmaps and page descriptors, but pointing
all their data pointers to a single common page of zeroes for access.
Coupled with a modification to crash's diskdump.c:read_diskdump routine
to return SEEK_ERROR on excluded pages instead of zero-filling,
we achieve the goal of reading zeroes only when zeroes were in the
original address. We get an indication of read error when attempting
to access a page that has really been excluded. But we still reduce
the size of the dumpfile by storing one copy of a zero page to serve
as the data image for all zero pages.
======================================================
The Compatibility Concern, and Why It Shouldn't Matter
======================================================
One concern is that if we fix diskdump-format dump files in this way,
they will behave differently than ELF-format dump files. Actually,
they will behave correctly, while ELF-format dump files will still
have the "any excluded page must have contained zeroes" behavior.
So I'd like to assert that even with the old zero-filling behavior,
diskdump files and ELF files did not give the same results. In other
words, you could not count on getting the same result for every
address request, even with crash's zero-filling of excluded pages
working for both cases.
Here is why:
The diskdump format includes a page level bitmap of every page in the
known address space. If makedumpfile wants to exclude a page, it's as
simple as changing a 1 to a 0 in the bitmap and leaving out the page.
A huge expanse of memory with alternating pages excluded wouldn't cause
anything more alarming than a bunch of 0xaaaaaaaa words in the bitmap.
The ELF format requires a program header entry for each distinct
group of pages, with a provision for representing one contiguous
group of excluded pages at the end of the group for later zeroing.
The overhead of creating a program header for every isolated excluded
page would be prohibitive. Because of this, when makedumpfile
builds its map of excluded pages and then translates that into the
ELF format dumpfile, it only acts when it finds groups of 256 or more
contiguous excluded pages. Then it sets up separate program header
entries around the exclusion zone, and continues through the bitmap
until it finds another large contiguous section of excluded pages.
This means that pages that were meant to be excluded, but that were not
in big contiguous groups, get to stay in the dumpfile. And here's an
example, using crash-3.19 on two dumpfiles made from the same vmcore
with an unmodified makedumpfile using -d31, one ELF and one normal
(diskdump format).
=============================================================
Elf (netdump) dumpfile (-E -d31):
=============================================================
crash-3.19> sys
KERNEL: vmlinux-2.6.18-3-telco-amd64
DUMPFILE: dumpfile.E_d31
CPUS: 2
DATE: Tue Feb 6 14:56:05 2007
UPTIME: 00:44:09
LOAD AVERAGE: 0.05, 0.03, 0.05
TASKS: 81
NODENAME: hpde
RELEASE: 2.6.18-3-telco-amd64
VERSION: #1 SMP Mon Feb 5 13:33:27 MST 2007
MACHINE: x86_64 (1800 Mhz)
MEMORY: 3.9 GB
PANIC: "Oops: 0000 [1] SMP " (check log for details)
crash-3.19> x/xg 0xffff810000005ff0
0xffff810000005ff0: 0x0000000000205007
crash-3.19> x/xg 0xffff8100cfe44800
0xffff8100cfe44800: 0xbb67bdc97f9fdefe
=============================================================
kdump (diskdump) dumpfile (-d31):
=============================================================
crash-3.19> sys
KERNEL: ../vmlinux-2.6.18-3-telco-amd64
DUMPFILE: dumpfile.makedumpfile-d31.run1
CPUS: 2
DATE: Tue Feb 6 14:56:05 2007
UPTIME: 00:44:09
LOAD AVERAGE: 0.05, 0.03, 0.05
TASKS: 81
NODENAME: hpde
RELEASE: 2.6.18-3-telco-amd64
VERSION: #1 SMP Mon Feb 5 13:33:27 MST 2007
MACHINE: x86_64 (1800 Mhz)
MEMORY: 3.9 GB
PANIC: "Oops: 0000 [1] SMP " (check log for details)
crash-3.19> x/xg 0xffff810000005ff0
0xffff810000005ff0: 0x0000000000000000
crash-3.19> x/xg 0xffff8100cfe44800
0xffff8100cfe44800: 0x0000000000000000
The two memory accesses that return 0x0000000000000000 in the second
case were not really zero, and the ELF dumpfile doesn't say they were.
So we've never really had address for address compatibility between
an ELF dump and a diskdump dump of the same situation anyway.
With the proposed fixes for the diskdump format, these two accesses
would have correctly given read errors, since the information is not
in the dump file. And that's OK, because the system manager chose
to exclude certain types of pages from the dump. The ELF user could
feel lucky that they accidently got the right answer, only because the
page in question was in a group of 10 contiguous excluded pages instead
of 256 or more, and wasn't really excluded like it should have been.
=====================================
Can ELF Dumpfiles Solve This Problem?
=====================================
To achieve correctness with ELF dumpfiles, one could perhaps remap the
four types of pages to the three types of ELF representations so that
"A) Not In The Address Space" and "B) Excluded Type" were both mapped
to "1) Not In The Address Space". Then "C) Zero Content" would map
to "2) Not In The File, Zero Fill". You would lose the ability to
know if a page were missing because it was never in the address space
in the first place, or because it was excluded because of its type.
But if you read a zero, you'd know it really was a zero.
This method probably means that makedumpfile would have to process a
bitmap of excluded (but not zero) pages against the original program
header table to create a new set of program headers that reflected the
excluded pages, and then go back and process that against a bitmap of
zero pages, to see if any zero-fill tails could be created. That is
left as an exercise for the loyal ELF user.
Otherwise, makedumpfile could quit removing pages because they're
zero, and try to achieve dumpfile size goals by removing pages based
only on type. Then zero pages that were supposed to stay would still
be there, and crash wouldn't need to fill missing pages with zeroes
without knowing whether they really contained zeroes in the first place.
==============
The Last Point
==============
The ELF format is inferior for most serious dumpfile debugging.
It does not support page compression, and its format does not allow
the exclusion of all eligible pages. Whether or not the inferior
ELF format dumpfiles can be made completely correct should not
serve as a barrier to getting it right on diskdump format dumpfiles.
As demonstrated, the two types aren't truly bug for bug compatible now,
so fixing one without fixing the other is still a net win.