----- Original Message -----
Hi folks,
I've just discovered that the crash utility fails to initialize the vm
subsystem properly on our latest SLES 32-bit kernels. It turns out that our
kernels are compiled with CONFIG_DISCONTIGMEM=y, which causes pgdat structs to
be allocated by the remap allocator (cf. arch/x86/mm/numa_32.c and also the
code in setup_node_data).
If you don't know what the remap allocator is (like I didn't before I hit the
bug), it's a very special early-boot allocator which remaps physical pages
from low memory to high memory, giving them virtual addresses from the
identity mapping. Looks a bit like this:
physical addr
+------------+
| |
+------------+
+--> | KVA RAM |
| +------------+
| | |
| \/\/\/\/\/\/\/
| /\/\/\/\/\/\/\
| | |
virtual addr | | highmem |
+------------+ | |------------|
| | -----> | |
+------------+ | +------------+
| remap va | --+ | KVA PG | (unused)
+------------+ +------------+
| | | |
| | -----> | RAM bottom |
+------------+ +------------+
This breaks a very basic assumption that crash makes about low-memory virtual
addresses.
Hmmm, yeah, I am also unaware of this, and I'm not entirely clear based upon
your explanation. What do "KVA PG" and "KVA RAM" mean exactly? And
do just
the pgdat structures (which I know can be huge) get moved from low to high
physical memory (per-node perhaps), and then remapped with mapped virtual
addresses?
Anyway, I trust you know what you're doing...
The attached patch fixes the issue for me, but may not be the cleanest method
to handle these mappings.
Anyway, what I can't wrap my head around is that the initialization sequence
is being done by the first call to x86_ktop_PAE(), which calls x86_kvtop_remap(),
which calls initialize_remap(), which calls readmem(), which calls x86_kvtop_PAE(),
starting the whole thing over again. How does that recursion work? Would it be
possible to call initialize_remap() earlier on instead of doing it upon the first
kvtop() call?
Dave
Ken'ichi Ohmichi, please note that makedumpfile is also affected by this
deficiency. On my test system, it will fail to produce any output if I set
dump level to anything greater than zero:
makedumpfile -c -d 31 -x vmlinux-3.0.13-0.5-pae.debug vmcore kdump.31
readmem: Can't convert a physical address(34a012b4) to offset.
readmem: type_addr: 0, addr:f4a012b4, size:4
get_mm_discontigmem: Can't get node_start_pfn.
makedumpfile Failed.
However, fixing this for makedumpfile is harder, and it will most likely
require a few more lines in VMCOREINFO, because debug symbols may not be
available at dump time, and I can't see any alternative method to locate the
remapped regions.
Regards,
Petr Tesarik
SUSE Linux