Itsuro ODA wrote:
Hi Dave,

> This all sounds good, and I agree that the p2m_mfn should
> be added to the ia64 XEN_ELFNOTE_CRASH_INFO.
>
> However, there's something incorrect in your calculation of
> "xkd->p2m_frames" in your ia64_xen_kdump_p2m_create() implementation.
> It looks like it should be 32, but it's set to 524288.  As a result
> that wastes a lot of memory, and "help -n" is pretty much unusable
> since wants to dump all ~512k entries:

This is because IA64's pseudo-physical memory map (domain on xen
specific).

phys-to-machine mapping is managed as 3-level page table.
pgd looks like:
-------------------------------------------------------------
crash> doms
   DID       DOMAIN      ST T  MAXPAGE  TOTPAGE VCPU     SHARED_I          P2M_MFN
  32753 f000000007dac080 ?? O     0        0      0          0              ----
  32754 f000000007ff0080 ?? X     0        0      0          0              ----
  32767 f000000007ff4080 ?? I     0        0      1          0              ----
>*    0 f000000007da4080 ?? 0   10000    f986     1  f000000007d90000       1f62c

crash> domain f000000007da4080
struct domain {
  domain_id = 0,
  shared_info = 0xf000000007d90000,
...
  arch = {
    mm = {
      pgd = 0xf00000007d8b0000
    },
...
crash> rd 0xf00000007d8b0000 256
f00000007d8b0000:  000000007c8d8000 0000000000000000   ...|............
f00000007d8b0010:  0000000000000000 0000000000000000   ................
f00000007d8b0020:  0000000000000000 0000000000000000   ................
f00000007d8b0030:  0000000000000000 0000000000000000   ................
f00000007d8b0040:  0000000000000000 0000000000000000   ................
f00000007d8b0050:  0000000000000000 0000000000000000   ................
f00000007d8b0060:  0000000000000000 0000000000000000   ................
f00000007d8b0070:  0000000000000000 0000000000000000   ................
f00000007d8b0080:  000000007f428000 0000000000000000   ..B.............
f00000007d8b0090:  0000000000000000 0000000000000000   ................
...
f00000007d8b07c0:  0000000000000000 0000000000000000   ................
f00000007d8b07d0:  0000000000000000 0000000000000000   ................
f00000007d8b07e0:  0000000000000000 0000000000000000   ................
f00000007d8b07f0:  0000000000000000 000000007bed4000   .........@.{....
-------------------------------------------------------------------------
(256 * 2048 = 524288)

It is certain that (pseudo-)physical memory "256GB-" and "-4TB" exits.
These area are shared by domain-0 and xen hypervisor.
These area should be accessed in dom0's analysis session.

(I said:)
> > But this patch is a bit tricky. And the memory usage is
> > large if the machine memory layout is sparse.

It is wrong. This should be "the memory usage is large if
pseudo-physical memory layout is sparse."
And it is always sparse actually...

Thanks.


Hi Itsuro,

I now understand the difference in the 3rd-level p2m
frame contents being page table entries instead of mfn
values.

However, I still do not understand what you mean regarding
the concept of the pseudo-physical memory being "sparse".
Looking at the dumpfile again, it appears to have the same
type of flat pseudo-physical memory layout just like the
other architectures.

Dom0 has ~1GB of pseudo-physical memory:

crash> sys
      KERNEL: ../20070510-sample-dump-2/vmlinux-xen-ia64
    DUMPFILE: ../20070510-sample-dump-2/vmcore.tiger.iomem_machine
        CPUS: 1
        DATE: Mon May  7 04:07:43 2007
      UPTIME: 00:01:47
LOAD AVERAGE: 0.11, 0.04, 0.01
       TASKS: 21
    NODENAME: (none)
     RELEASE: 2.6.18-xen
     VERSION: #3 SMP Mon May 7 13:14:41 JST 2007
     MACHINE: ia64  (1296 Mhz)
      MEMORY: 1 GB
       PANIC: "SysRq : Trigger a crashdump"
crash>

And as far as dom0's VM is concerned, its memory map only knows
about the 64512 pages in DMA zone 0:

crash> kmem -n
NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES
  0    64512    a000000100482f80  a000000100608950  a000000100482f80
                                                    a000000100483500
                                                    a000000100483a80
                                                    a000000100484000
    MEM_MAP       START_PADDR  START_MAPNR
e0000000010b0000       0            0

ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
  0   DMA         64512  e0000000010b0000            0            0
  1   DMA32           0                 0            0            0
  2   Normal          0                 0            0            0
  3   HighMem         0                 0            0            0
crash>

So the "end of memory" would be just below 1GB:

crash> eval 64512 * 16k
hexadecimal: 3f000000  (1008MB)
    decimal: 1056964608
      octal: 7700000000
     binary: 0000000000000000000000000000000000111111000000000000000000000000
crash>

So, with respect to dom0, how would it ever go beyond 32
p2m_frames?  Putting a debug printf in xen_kdump_p2m, it
shows this:

crash> rd -p 3f000000
xen_kdump_p2m: mfn_idx for 3f000000: 31
        3f000000:  0000000000000000                    ........
crash>

So that shows that there only needs to be 32 p2m_frames
for accessing all of dom0 pseudo-physical memory.

But it also shows that you are allowing access to memory
that is *beyond* the end of dom0 pseudo-physical memory,
since 3f000000 should not be readable.  There is not a
page structure associated with 3f000000:

crash> kmem -p | tail
e000000001421dd0 3efd8000      -------       -----   1 0
e000000001421e08 3efdc000      -------       -----   1 0
e000000001421e40 3efe0000      -------       -----   1 60
e000000001421e78 3efe4000      -------       -----   1 60
e000000001421eb0 3efe8000      -------       -----   1 60
e000000001421ee8 3efec000      -------       -----   1 60
e000000001421f20 3eff0000      -------       -----   2 0
e000000001421f58 3eff4000      -------       -----   1 80
e000000001421f90 3eff8000      -------       -----   1 80
e000000001421fc8 3effc000      -------       -----   1 80
crash>

By doing few other "rd -p" commands, I see that you seem
to be allowing memory accesses based upon what's in the ELF
header PT_LOAD segments, which are "machine" physical memory
descriptors:

crash> help -n | grep phys_end
               phys_end: 1000
               phys_end: 7000
               phys_end: 9000
               phys_end: 82000
               phys_end: 85000
               phys_end: a0000
               phys_end: 4000000
               phys_end: 81b3000
               phys_end: ffc0000
               phys_end: 10000000
               phys_end: 7ab06000
               phys_end: 7c8d2000
               phys_end: 7c92e000
               phys_end: 7c938000
               phys_end: 7c97e000
               phys_end: 7cdf6000
               phys_end: 7cdfc000
               phys_end: 7ce2a000
               phys_end: 7d001000
               phys_end: 7d002000
               phys_end: 7d044000
               phys_end: 7d045000
               phys_end: 7d37e000
               phys_end: 7d700000
               phys_end: 7d77e000
               phys_end: 7d8b4000
               phys_end: 7f980000
               phys_end: 7fa00000
               phys_end: 7feda000
crash>

So it appears that the physical machine running the
dom0 and hypervisor has almost 2GB of "real" physical
memory.  And if I try to read the limit address of
7feda000, it fails:

crash> rd -p 7feda000
xen_kdump_p2m: mfn_idx for 7feda000: 63
rd: read error: physical address: 7feda000  type: "64-bit PHYSADDR"
crash>

But the last page of physical memory can be read:

crash> rd -p 7fed9000
xen_kdump_p2m: mfn_idx for 7fed9000: 63
        7fed9000:  000000007f9da0a0                    ........
crash>

"rd -p" is supposed to read pseudo-physical memory in xen
kernels, but it seems to be allowing reads based upon the
PT_LOAD segment contents?  In other words, it seems to
be mixing dom0 pseudo-physical memory and the system's
machine memory, because 7fed9000 is not a legitimate dom0
pseudo-physical address.

(And even with that happening, the maximum p2m_frame index
is still only 63 -- how can it ever be 512k with respect
to dom0's pseudo-physical memory?)

So I'm sorry, but this does not make sense to me...

Dave