Hi All,
Before I consider starting work on a patch for this I'd appreciate more
input.
I am seeing random cases of crash failing to load reporting an x86_64
coredump reporting a "bad" linux_banner. However, the value displayed as
the banner is:
0x65762078756e694c
which is plainly ASCII text as a 64-bit number and is the little-endian
reversal of the text "Linux ver".
It's randomly found with specific coredumps and reproduces all times
with that coredump and a given version of crash, though sometimes it
will appear when using a given coredump with one version of crash but
not with another version of crash. What I'm trying to get working is
crash current and the rest of this is experience using crash 8.0.3 only.
I used gdb crash to debug it through verify_version(). If I breakpoint
there with gdb crash and step through the function I find that in the
section:
if (!(sp = symbol_search("linux_banner")))
error(FATAL, "linux_banner symbol does not exist?\n");
else if ((sp->type == 'R') || (sp->type == 'r') ||
(THIS_KERNEL_VERSION >= LINUX(2,6,11) && (sp->type == 'D' ||
sp->type
== 'd')) ||
(machine_type("ARM") && sp->type == 'T') ||
(machine_type("ARM64")))
linux_banner = symbol_value("linux_banner");
else
get_symbol_data("linux_banner", sizeof(ulong), &linux_banner);
* The if block is not executed, i.e. symbol_search("linux_banner")
succeeded and we have a usable struct syment for "linux_banner" in sp
* The else if block is not executed, all conditions are met or not
relevant except for the the value of sp->type in the case of
THIS_KERNEL_VERSION >= LINUX(2,6,11). But sp->type is 'B', bss segment
* The final else block is executed, we copy sizeof(ulong) bytes of
symbol data from what "linux_banner" refers to into the crash internal
linux_banner variable
Here's how sp looks at the else if statement in the above code:
gdb) print *sp
$2 = {value = 18446744071587233984, name = 0x5555566a735b "linux_banner",
val_hash_next = 0x7fffe51e4338, name_hash_next = 0x7fffe51f8d38,
type = 66 'B', cnt = 1 '\001', flags = 0 '\000', pad2 = 0
'\000'}
...and sp->value in hex is:
(gdb) p/x sp->value
$5 = 0xffffffff818000c0
Starting crash in --minimal mode with the same core, kernel and
debuginfo so that I can try to read 0xffffffff818000c0 I find:
crash> rd 0xffffffff818000c0 18
ffffffff818000c0: 65762078756e694c 2e34206e6f697372 Linux version 4.
ffffffff818000d0: 34392d3038312e34 6665642d3533312e 4.180-94.135-def
ffffffff818000e0: 65672820746c7561 6c697562406f6b65 ault (geeko@buil
ffffffff818000f0: 28202974736f6864 7372657620636367 dhost) (gcc vers
ffffffff81800100: 2e382e34206e6f69 2045535553282035 ion 4.8.5 (SUSE
ffffffff81800110: 29202978756e694c 20504d5320312320 Linux) ) #1 SMP
ffffffff81800120: 20766f4e206e6f4d 35353a3930203631 Mon Nov 16 09:55
ffffffff81800130: 204354552037353a 3033282030323032 :57 UTC 2020 (30
ffffffff81800140: 000a293039393633 0000000000000000 36990)..........
IOW, it is the linux_banner, it's at 0xffffffff818000c0 and if the first
sizeof(ulong) bytes are read into crash's linux_banner variable via
get_symbol_data() it makes a 64-bit number with little-endian revesral
from the string bytes "Linux ver" from the actual linux_banner text, the
same value seen in the error report when crash fails.
If I repeat the above debug of verify_version() in gdb and at the else
if block in the above C in verify_version() I set var the sp->type to be
'D' or 'd' then crash's linux_banner is set to 0xffffffff818000c0
(sp->value) and the else if block is executed then the remainder of the
function successfully finds a linux_banner and gets the version
information from it and crash loads.
In symbol_search() is the value of type in the returned struct syment a
property generated by crash (I thought not and it was based on the
kernel compilation, possible kdummp/makedumpfile but not controlled by
crash)? If I'm correct then is it safe to expect a struct syment with
type 'B' (also 'b' I believe) for linux_banner? Or is there anything
special about a bss segment type that makes it not possible to assume we
can take sp->value as-is for the address of the linux_banner string
details, i.e. we can't safely use symbol_value("linux_banner") to set
the crash linux_banner variable if the struct syment type for
linux_banner is 'B' or 'b'?
Any comments? I'll write a patch if that's the right direction but I
need a better understanding of why symbols with the type bss segment
aren't already assumed valid sources of a linux_banner address value,
only 'D' and 'd' types are and if there is anything special I don't
understand about 'B' and 'b' types.
--
Thanks,
David.