On 8/20/23 19:37, HAGIO KAZUHITO(萩尾 一仁) wrote:
On 2023/08/18 3:08, David Mair wrote:
> Hi All,
>
> Before I consider starting work on a patch for this I'd appreciate more
> input.
>
> I am seeing random cases of crash failing to load reporting an x86_64
> coredump reporting a "bad" linux_banner. However, the value displayed as
> the banner is:
>
> 0x65762078756e694c
>
> which is plainly ASCII text as a 64-bit number and is the little-endian
> reversal of the text "Linux ver".
>
> It's randomly found with specific coredumps and reproduces all times
> with that coredump and a given version of crash, though sometimes it
> will appear when using a given coredump with one version of crash but
> not with another version of crash. What I'm trying to get working is
> crash current and the rest of this is experience using crash 8.0.3 only.
>
> I used gdb crash to debug it through verify_version(). If I breakpoint
> there with gdb crash and step through the function I find that in the
> section:
>
>
> if (!(sp = symbol_search("linux_banner")))
> error(FATAL, "linux_banner symbol does not exist?\n");
> else if ((sp->type == 'R') || (sp->type == 'r') ||
> (THIS_KERNEL_VERSION >= LINUX(2,6,11) && (sp->type ==
'D' ||
> sp->type == 'd')) ||
> (machine_type("ARM") && sp->type == 'T') ||
> (machine_type("ARM64")))
> linux_banner = symbol_value("linux_banner");
> else
> get_symbol_data("linux_banner", sizeof(ulong),
&linux_banner);
>
>
> * The if block is not executed, i.e. symbol_search("linux_banner")
> succeeded and we have a usable struct syment for "linux_banner" in sp
> * The else if block is not executed, all conditions are met or not
> relevant except for the the value of sp->type in the case of
> THIS_KERNEL_VERSION >= LINUX(2,6,11). But sp->type is 'B', bss segment
> * The final else block is executed, we copy sizeof(ulong) bytes of
> symbol data from what "linux_banner" refers to into the crash internal
> linux_banner variable
>
> Here's how sp looks at the else if statement in the above code:
>
> gdb) print *sp
> $2 = {value = 18446744071587233984, name = 0x5555566a735b "linux_banner",
> val_hash_next = 0x7fffe51e4338, name_hash_next = 0x7fffe51f8d38,
> type = 66 'B', cnt = 1 '\001', flags = 0 '\000', pad2 = 0
'\000'}
>
> ...and sp->value in hex is:
>
> (gdb) p/x sp->value
> $5 = 0xffffffff818000c0
>
> Starting crash in --minimal mode with the same core, kernel and
> debuginfo so that I can try to read 0xffffffff818000c0 I find:
>
> crash> rd 0xffffffff818000c0 18
> ffffffff818000c0: 65762078756e694c 2e34206e6f697372 Linux version 4.
> ffffffff818000d0: 34392d3038312e34 6665642d3533312e 4.180-94.135-def
> ffffffff818000e0: 65672820746c7561 6c697562406f6b65 ault (geeko@buil
> ffffffff818000f0: 28202974736f6864 7372657620636367 dhost) (gcc vers
> ffffffff81800100: 2e382e34206e6f69 2045535553282035 ion 4.8.5 (SUSE
> ffffffff81800110: 29202978756e694c 20504d5320312320 Linux) ) #1 SMP
> ffffffff81800120: 20766f4e206e6f4d 35353a3930203631 Mon Nov 16 09:55
> ffffffff81800130: 204354552037353a 3033282030323032 :57 UTC 2020 (30
> ffffffff81800140: 000a293039393633 0000000000000000 36990)..........
>
> IOW, it is the linux_banner, it's at 0xffffffff818000c0 and if the first
> sizeof(ulong) bytes are read into crash's linux_banner variable via
> get_symbol_data() it makes a 64-bit number with little-endian revesral
> from the string bytes "Linux ver" from the actual linux_banner text, the
> same value seen in the error report when crash fails.
>
> If I repeat the above debug of verify_version() in gdb and at the else
> if block in the above C in verify_version() I set var the sp->type to be
> 'D' or 'd' then crash's linux_banner is set to
0xffffffff818000c0
> (sp->value) and the else if block is executed then the remainder of the
> function successfully finds a linux_banner and gets the version
> information from it and crash loads.
>
> In symbol_search() is the value of type in the returned struct syment a
> property generated by crash (I thought not and it was based on the
> kernel compilation, possible kdummp/makedumpfile but not controlled by
> crash)? If I'm correct then is it safe to expect a struct syment with
> type 'B' (also 'b' I believe) for linux_banner? Or is there anything
> special about a bss segment type that makes it not possible to assume we
> can take sp->value as-is for the address of the linux_banner string
> details, i.e. we can't safely use symbol_value("linux_banner") to set
> the crash linux_banner variable if the struct syment type for
> linux_banner is 'B' or 'b'?
>
> Any comments? I'll write a patch if that's the right direction but I
> need a better understanding of why symbols with the type bss segment
> aren't already assumed valid sources of a linux_banner address value,
> only 'D' and 'd' types are and if there is anything special I
don't
> understand about 'B' and 'b' types.
>
It seems that sp->type is set to a value from bfd_get_symbol_info() in
store_symbols(). So it's from the vmlinux through the embedded gdb.
Anyway, I don't think that the current check gets the point, i.e. we can
check the type of the linux_banner symbol directly like this?
switch (get_symbol_type("linux_banner", NULL, NULL))
{
case TYPE_CODE_ARRAY:
linux_banner = sp->value;
break;
case TYPE_CODE_PTR:
get_symbol_data("linux_banner", sizeof(ulong), &linux_banner);
break;
default:
error(WARNING, "linux_banner is unknown type\n");
linux_banner = 0;
break;
}
I don't know why it checks the sp->type, but what it should do
originally might be the above, I think.
Lianbo, do you have an old vmcore that has "const char *linux_banner"?
According to crash commit fce91bec5bef, 2.6.10 and older kernels have
it. I'd like to test the above with it.
Kazu,
I've seen the above code solve cases of starting crash giving "bad"
linux_banner for three coredumps in a week. I don't want to be a reason
to hold things back and the variety of coredumps is (well, it is only
three) but random enough for me to expect the change covers a broad set
of dumps.
It's your change, do you want to make the commit? I can prepare one if
you prefer.
--
David