Thank you, I have some more information...
On 8/21/23 02:20, lijiang wrote:
On Mon, Aug 21, 2023 at 9:37 AM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab(a)nec.com>
wrote:
> On 2023/08/18 3:08, David Mair wrote:
>> Hi All,
>>
>> Before I consider starting work on a patch for this I'd appreciate more
>> input.
>>
>> I am seeing random cases of crash failing to load reporting an x86_64
>> coredump reporting a "bad" linux_banner. However, the value displayed
as
>> the banner is:
>>
>> 0x65762078756e694c
>>
>> which is plainly ASCII text as a 64-bit number and is the little-endian
>> reversal of the text "Linux ver".
>>
>> It's randomly found with specific coredumps and reproduces all times
>> with that coredump and a given version of crash, though sometimes it
>> will appear when using a given coredump with one version of crash but
>> not with another version of crash. What I'm trying to get working is
>> crash current and the rest of this is experience using crash 8.0.3 only.
>>
>> I used gdb crash to debug it through verify_version(). If I breakpoint
>> there with gdb crash and step through the function I find that in the
>> section:
>>
>>
>> if (!(sp = symbol_search("linux_banner")))
>> error(FATAL, "linux_banner symbol does not exist?\n");
>> else if ((sp->type == 'R') || (sp->type == 'r') ||
>> (THIS_KERNEL_VERSION >= LINUX(2,6,11) && (sp->type ==
'D' ||
>> sp->type == 'd')) ||
>> (machine_type("ARM") && sp->type == 'T')
||
>> (machine_type("ARM64")))
>> linux_banner = symbol_value("linux_banner");
>> else
>> get_symbol_data("linux_banner", sizeof(ulong),
&linux_banner);
>>
>>
>> * The if block is not executed, i.e. symbol_search("linux_banner")
>> succeeded and we have a usable struct syment for "linux_banner" in sp
>> * The else if block is not executed, all conditions are met or not
>> relevant except for the the value of sp->type in the case of
>> THIS_KERNEL_VERSION >= LINUX(2,6,11). But sp->type is 'B', bss
segment
>> * The final else block is executed, we copy sizeof(ulong) bytes of
>> symbol data from what "linux_banner" refers to into the crash internal
>> linux_banner variable
>>
>> Here's how sp looks at the else if statement in the above code:
>>
>> gdb) print *sp
>> $2 = {value = 18446744071587233984, name = 0x5555566a735b
"linux_banner",
>> val_hash_next = 0x7fffe51e4338, name_hash_next = 0x7fffe51f8d38,
>> type = 66 'B', cnt = 1 '\001', flags = 0 '\000', pad2
= 0 '\000'}
>>...
I modified crash to dump every symbol added to the symbol table. The rpm
package build in question has a patch that causes the symbol table to be
built from the debuginfo file and before the patch it was built from the
kernel executable.
If I toggle between the two models I find that in a case of crash
failing due to the linux_banner pointer being ASCII characters as a
pointer value then all symbol table entries generated from the debuginfo
file have type "b" or 'B".
BUT, for the same coredump, in the case of crash building the symbol
table from the kernel executable all symbols have varying types and
linux_banner is 'D'.
There are about 78,000 symbol table entries created in both cases.
This error is not observed with every kernel debuginfo file. At this
point if there is any reason for this crash package to use the patch to
build the symbol table from the debuginfo file then I have to suspect
either:
* Sometimes the kerneldebuginfo is created with all symbols having type
bss segment when the kernel executable has varying types; or
* Something in the attempt to read the debuginfo fails such that bss
segment gets used for all
The first is not a bug in crash and the current code would work if such
debuginfo files were never generated and work when that debuginfo file
is evaded such that the symbol table is built from the kernel executable
the debuginfo is matched to.
I'll try to discover which case occurs using gdb crash from today and
take on board Kazu's suggestion while I explore.
--
Thank you,
David.