----- Original Message -----
Hello:
i am using crash utility 6.0.8 to parse the dump file of kernel 3.4.
my platform will generate ebi.bin after crash, this binary file
dumps ddr from address 0x0 to 0x20000000, total 512MB ram.
after i get this binary file, i prefix a elf header to it, the
function to generate elf header is as below:
static size_t mkelfheader(void *buf)
{
struct elf_phdr *nhdr, *phdr;
struct elfhdr *elf;
size_t offset = 0;
void *bufp = buf;
elf = (Elf32_Ehdr *) bufp;
bufp += sizeof(Elf32_Ehdr);
offset += sizeof(struct elfhdr);
memcpy(elf->e_ident, ELFMAG, SELFMAG);
elf->e_ident[EI_CLASS] = ELFCLASS32;
elf->e_ident[EI_DATA] = ELFDATA2LSB;
elf->e_ident[EI_VERSION]= EV_CURRENT;
elf->e_ident[EI_OSABI] = ELFOSABI_NONE;
memset(elf->e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
elf->e_type = ET_CORE;
elf->e_machine = EM_ARM;
elf->e_version = EV_CURRENT;
elf->e_entry = 0;
elf->e_phoff = sizeof(struct elfhdr);
elf->e_shoff = 0;
elf->e_flags = 0;
elf->e_ehsize = sizeof(struct elfhdr);
elf->e_phentsize= sizeof(struct elf_phdr);
elf->e_phnum = 2;
elf->e_shentsize= 0;
elf->e_shnum = 0;
elf->e_shstrndx = 0;
nhdr = (struct elf_phdr *) bufp;
bufp += sizeof(struct elf_phdr);
offset += sizeof(struct elf_phdr);
nhdr->p_type = PT_NOTE;
nhdr->p_offset = 0;
nhdr->p_vaddr = 0;
nhdr->p_paddr = 0;
nhdr->p_filesz = 0;
nhdr->p_memsz = 0;
nhdr->p_flags = 0;
nhdr->p_align = 0;
phdr = (struct elf_phdr *) bufp;
bufp += sizeof(struct elf_phdr);
offset += sizeof(struct elf_phdr);
phdr->p_type = PT_LOAD;
phdr->p_flags = PF_R|PF_W|PF_X;
phdr->p_offset = offset;
phdr->p_vaddr = 0xc0000000;
phdr->p_paddr = 0x00200000;
phdr->p_filesz = phdr->p_memsz = MEMSIZE;
phdr->p_align = 0;
return offset;
}
after all, there will be a cdump.elf which contains the generated elf
header, tailed by ebi.bin. then i use crash utility to load this
cdump.elf together with the vmlinux. it has below error:
WARNING: could not find MAGIC_START!
WARNING: cpu_present_mask indicates more than 4 (NR_CPUS) cpus
crash: cannot determine base kernel version
crash: vmlinux and cdump.elf do not match!
our platform set CONFIG_PHYS_OFFSET=0x00200000 in kernel .config
file, which means that the virtual address 0xc0000000 will map to
physical address 0x00200000. for this reason, i set phdr->p_paddr =
0x00200000 when generate the elf header.
please help me to find out what is wrong, thanks very much.
Best Regards
The fact that crash gets as far as it does at least means that the
ELF header you've created was deemed acceptable as an ARM vmcore.
However, the error messages re: "cpu_present_mask indicates..." and
"cannot determine base kernel version" indicate that the data
that was read from the vmcore was clearly not the correct data.
The "cpu_present_mask" value that it read contained too
many bits -- presuming that the 32-bit ARM processor is
still limited to only 4 cpus. (looks like upstream that
CONFIG_NR_CPUS is still 2 in the arch/arm/configs files.)
But more indicative of the wrong data being read is the second
"cannot determine base kernel version" message, which was generated
after it read the kernel's "init_uts_ns" uts_namespace structure.
After reading it, it sees that the "release" string contains
non-ASCII data, whereas it should contain the kernel version:
crash> p init_uts_ns
init_uts_ns = $3 = {
kref = {
refcount = {
counter = 2
}
},
name = {
sysname =
"Linux\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
nodename =
"phenom-01.lab.bos.redhat.com\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
release =
"2.6.32-313.el6.x86_64\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
version = "#1 SMP Thu Sep 27 16:25:19 EDT
2012\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
machine =
"x86_64\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
domainname =
"(none)\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
}
crash>
So it appears that you're reading data from the wrong
locations in the dumpfile. You should be able to verify
that by bringing up the crash session with the --minimal
flag like this:
$ crash --minimal vmlinux vmcore
That will bypass most of the initialization, including all
readmem() calls of the vmcore. Then do this:
crash> rd linux_banner 20
ffffffff818000a0: 65762078756e694c 2e33206e6f697372 Linux version 3.
ffffffff818000b0: 63662e312d312e35 365f3638782e3731 5.1-1.fc17.x86_6
ffffffff818000c0: 626b636f6d282034 69756240646c6975 4 (mockbuild@bui
ffffffff818000d0: 2e33322d6d76646c 6465662e32786870 ldvm-23.phx2.fed
ffffffff818000e0: 656a6f727061726f 202967726f2e7463
oraproject.org)
ffffffff818000f0: 7265762063636728 372e34206e6f6973 (gcc version 4.7
ffffffff81800100: 303231303220302e 6465522820373035 .0 20120507 (Red
ffffffff81800110: 372e342074614820 47282029352d302e Hat 4.7.0-5) (G
ffffffff81800120: 3123202920294343 75685420504d5320 CC) ) #1 SMP Thu
ffffffff81800130: 3120392067754120 2033343a30353a37 Aug 9 17:50:43
crash> rd -a linux_banner
ffffffff818000a0: Linux version 3.5.1-1.fc17.x86_64 (mockbuild(a)buildvm-23.phx2
ffffffff818000dc: .fedoraproject.org) (gcc version 4.7.0 20120507 (Red Hat 4.7
ffffffff81800118: .0-5) (GCC) ) #1 SMP Thu Aug 9 17:50:43 UTC 2012
crash>
I'm guessing that you will not see a string starting with "Linux version"
with your dumpfile as shown above.
If that's the case, then it's clear that the readmem() function is ultimately
reading from the wrong vmcore file offset.
Here's what you can try doing. Taking the linux_banner example above,
you can check where in the dumpfile it's reading from by setting the debug
flag, before doing a simple read -- like this example on an ARM dumpfile:
crash> set debug 8
debug: 8
crash> rd linux_banner
<addr: c033ea10 count: 1 flag: 488 (KVADDR)>
<readmem: c033ea10, KVADDR, "32-bit KVADDR", 4, (FOE), ff94f048>
<read_kdump: addr: c033ea10 paddr: 33ea10 cnt: 4>
read_netdump: addr: c033ea10 paddr: 33ea10 cnt: 4 offset: 33f088
c033ea10: 756e694c Linu
crash>
The linux_banner is at virtual address c033ea10 (addr). First it gets translated
into physical address 33ea10 (paddr). Then that paddr is translated into the
vmcore file offset of 33f088. It lseeks to vmcore file offset 33f088 and
reads 4 bytes, which contain "756e694c", or the first 4 bytes of the
"Linux version ..." string.
Note that if I subtract the physical address from vmcore file offset
I get this:
crash> eval 33f088 - 33ea10
hexadecimal: 678
decimal: 1656
octal: 3170
binary: 00000000000000000000011001111000
crash>
which would put physical address 0 at a vmcore file offset of 0x678, and
therefore implying that that the ELF header comprises the first 0x678 bytes.
And looking at the vmcore, that can be verified:
$ readelf -a vmcore
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: CORE (Core file)
Machine: ARM
Version: 0x1
Entry point address: 0x0
Start of program headers: 52 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 0 (bytes)
Number of section headers: 0
Section header string table index: 0
There are no sections in this file.
There are no sections to group in this file.
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NOTE 0x000094 0x00000000 0x004e345c 0x005e4 0x005e4 0
LOAD 0x000678 0xc0000000 0x00000000 0x5600000 0x5600000 RWE 0
LOAD 0x5600678 0xc5700000 0x05700000 0x100000 0x100000 RWE 0
...
Note that the "Offset" value of the first PT_LOAD segment has a file offset
value of 0x678.
Another thing to do is to verify that your phys_base of 0x20000000
is being properly seen. In the --minimal session, you can verify that
by doing this:
crash> help -m | grep phys_base
Trying the above should yield some clues into the problem you're encountering.
Dave