On Thu, 2008-02-07 at 16:46 -0500, Dave Anderson wrote:
Andrew Hecox wrote:
> On Thu, 2008-02-07 at 16:04 -0500, Dave Anderson wrote:
>> Andrew Hecox wrote:
>>> On Thu, 2008-02-07 at 15:38 -0500, Dave Anderson wrote:
>>>> Andrew Hecox wrote:
>>>>> I get the same:
>>>>>
>>>>> (/boot/System.map-2.6.9-67.0.1.ELhugemem)
>>>>>
>>>>> 02323bd8 d log_buf_len
>>>>>
>>>>> (/usr/lib/debug/lib/modules/2.6.9-67.0.1.ELhugemem/vmlinux)
>>>>>
>>>>> $1 = (int *) 0x2323bd8
>>>>>
>>>>> -Andrew
>>>> So, as Takao suggested, can you dump the incoming vaddr and
>>>> resultant pfn values in diskdumpmsg.c:read_buffer()?
>>>>
>>> The vaddr value is: 36846552.
>>>
>>> -Andrew
>>>
>>>> Dave
>>>>
>>>>
>> OK, so the incoming vaddr is 36846552 is which is 0x2323bd8.
>> To get a pfn, that hugemem kernel virtual address is passed
>> through vtop() and then divided by 4096:
>>
>> static int read_buffer(DumpFile *dump, addr_t vaddr, size_t len, void *buf)
>> {
>> addr_t paddr;
>> int block_size = get_page_size();
>> unsigned long pfn;
>> int ret;
>> size_t copy_len, offs;
>> void *page_data;
>>
>> paddr = vtop(dump, vaddr);
>> pfn = paddr / block_size;
>> offs = paddr % block_size;
>>
>> When 0x2323bd8 is run through vtop(), it simply strips off the
>> hugemem unity-map identifier:
>>
>> addr_t vtop(DumpFile *dump, addr_t vaddr)
>> {
>> if (strstr("hugemem", dump->utsname->release))
>> return vaddr - 0x02000000L;
>> else
>> return vaddr - 0xc0000000L;
>> }
>>
>> leaving 0x323bd8 -- which gets divided by the page size of 4096, leaving
>> a pfn of 0x323.
>>
>> But you see that the pfn was 271139 (0x42323). If that is expanded
>> to a physical address it would be 0x42323000. It looks like it's
>> using the non-hugemem value in vtop(), i,e, subtracting c0000000 from
>> the incoming vaddr. In other words, 0x2323bd8 - 0xc000000 is
>> equal to 0x42323bd8. If that is divided by 4096, you get
>> the funky pfn of 271139 (0x42323).
>>
>> Print out the dump->utsname->release string in vtop(). It must
>> not contain "hugemem".
>>
>
> Dave,
>
> I get:
>
> (gdb) print dump->utsname->release
> $19 = "2.6.9-67.0.1.ELhugemem", '\0' <repeats 42 times>
>
> but then
>
> (gdb) s
> 16 return vaddr - 0xc0000000L;
>
> ! oh uh.
>
> man strstr
>
> ...
> char *strstr(const char *haystack, const char *needle);
> ...
>
> It looks like
>
> if (strstr("hugemem", dump->utsname->release))
>
> should be:
>
> if (strstr(dump->utsname->release,"hugemem"))
Bingo -- like the man page says:
char *strstr(const char *haystack, const char *needle);
>
> I patched, recompiled, tested and it works:
>
> [root@ibm-x3455-1 ~]# diskdumpmsg -f -p /var/crash/vmcore
> Jan 31 05:43:08 elabhost012 kernel: --- salvaged messages from crash
> dump start
> Jan 31 05:43:08 elabhost012 kernel: 0218b9c0 0232d363 0232d3e0
> 0215aff6 df954fac f6db4000 eaa756c0 fffffff7
> Jan 31 05:43:08 elabhost012 kernel: f6db4000 df954000 0215b0c0
> df954fac 00000000 00000000 00000000 df954fc4
> Jan 31 05:43:08 elabhost012 kernel: Call Trace:
> Jan 31 05:43:08 elabhost012 kernel: [<0220c46a>] __handle_sysrq
> +0x58/0xc6
> Jan 31 05:43:08 elabhost012 kernel: [<0218b9c0>] write_sysrq_trigger
> +0x37/0x3e
> Jan 31 05:43:08 elabhost012 kernel: [<0215aff6>] vfs_write+0xb6/0xe2
> Jan 31 05:43:08 elabhost012 kernel: [<0215b0c0>] sys_write+0x3c/0x62
> Jan 31 05:43:08 elabhost012 kernel: Code: 11 02 c7 05 10 fd 44 02 00 00
> 00 00 c7 05 38 fd 44 02 00 00 00 00 c7 05 2c fd 44 02 6e ad 87 4b 89 15
> 28 fd 44 02 e9 8b 41 f2 ff <c6> 05 00 00 00 00 00 c3 e9 0a ff f4 ff e9
> a2 48 f5 ff 85 d2 89
> Jan 31 05:43:08 elabhost012 kernel: --- salvaged messages from crash
> dump end
>
> Thanks much for all the help! Should I open a bz against the issue? It
> looks like all i386 hugemem kernels would be similarly affected.
Yep -- definitely open a BZ against component "diskdumputils".
I've opened up bz431937 for the strstr change and bz431943 for the more
lack of input validation that caused the FPE. I separated them since one
actually fixes an issue for production users and the other just provides
a better error without making anything work.
-Andrew
Dave