Hi Dave!
I reproduced the problem on an FC6 system here. It looks like
the symbol which is causing the problem is only
nominally in the BSS segment, i.e. the symbol
is at bss_start+bss_size, even though it claims
to be in the bss segment.
I'm attaching a patch where I end up skipping
over symbols which don't fit in. I had somehow
left out one line when writing the original code
which would have fixed this.
Also there's a small fix to my changes to the MODULES_IN_CWD
code, because I was breaking finding some modules on 2.4 kernels.
Thanks for your testing and help!
-castor
-----Original Message-----
From: crash-utility-bounces(a)redhat.com on behalf of Dave Anderson
Sent: Fri 12/15/2006 8:46 AM
To: Castor Fu
Cc: crash-utility(a)redhat.com
Subject: [Crash-utility] Re: crash-4.0-3.14-sym.2.patch (was: modules and data / bss
initialization)
Hi Castor,
I was hoping to fold in your lastest crash-4.0-3.14-sym.2.patch,
but upon testing it, there is a bug that needs attention, and I'm
not sure why it occurs.
Again, it has something to do with those "__key.####" bss symbols
in the modules.
Here's a "mod -S" on an x86 machine, where it bumps into one of those
symbols in the very first module it tries to load:
crash> set debug 1
debug: 1
crash> mod -S
load_module_symbols: scsi_transport_spi
/lib/modules/2.6.18-1.2747.el5xen/kernel/drivers/scsi/scsi_transport_spi.ko
ee031000 122600
186 symbols found in obj file
/lib/modules/2.6.18-1.2747.el5xen/kernel/drivers/scsi/scsi_transport_spi.ko
scsi_transport_spi: update sec offset sym spi_device_configure @ ee031000 val 0 section
.text
scsi_transport_spi: update sec offset sym spi_transport_exit @ ee033510 val 0 section
.exit.text
scsi_transport_spi: update sec offset sym class_device_attr_period @ ee033570 val 30
section .rodata
scsi_transport_spi: update sec offset sym __ksymtab_spi_release_transport @ ee033ed8 val
0 section __ksymtab
scsi_transport_spi: update sec offset sym __kcrctab_spi_release_transport @ ee033f08 val
0 section __kcrctab
scsi_transport_spi: update sec offset sym __ksymtab_spi_populate_ppr_msg @ ee033f20 val 0
section __ksymtab_gpl
scsi_transport_spi: update sec offset sym __kcrctab_spi_populate_ppr_msg @ ee033f38 val 0
section __kcrctab_gpl
scsi_transport_spi: update sec offset sym __kstrtab_spi_release_transport @ ee033f44 val
0 section __ksymtab_strings
scsi_transport_spi: update sec offset sym ____versions @ ee034000 val 0 section
__versions
scsi_transport_spi: update sec offset sym spi_transport_class @ ee036ae0 val 0 section
.data
scsi_transport_spi: update sec offset sym __this_module @ ee036d80 val 0 section
.gnu.linkonce.this_module
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
scsi_transport_spi: update sec offset sym __key.19739 @ ee037f80 val 0 section .bss
... [ spits out same message forever ] ...
i.e, continuing to loop in your new calculate_load_order_v2() function.
On an x86_64, I see the same thing, although it goes through several
other modules successfully before the infinite loop starts in the i2c_core
module:
crash> set debug 1
debug: 1
crash> mod -S
[ ... debug info removed ...]
load_module_symbols: i2c_core
/lib/modules/2.6.18-1.2747.el5/kernel/drivers/i2c/i2c-core.ko ffffffff8817d000 b02600
244 symbols found in obj file
/lib/modules/2.6.18-1.2747.el5/kernel/drivers/i2c/i2c-core.ko
update sec offset sym i2c_device_match @ ffffffff8817d000 val 0 section .text
update sec offset sym i2c_exit @ ffffffff8817e614 val 0 section .exit.text
update sec offset sym __ksymtab_i2c_smbus_write_i2c_block_data @ ffffffff8817e990 val 0
section __ksymtab
update sec offset sym __kcrctab_i2c_smbus_write_i2c_block_data @ ffffffff8817eb50 val 0
section __kcrctab
update sec offset sym __ksymtab_i2c_bus_type @ ffffffff8817ec30 val 0 section
__ksymtab_gpl
update sec offset sym __kcrctab_i2c_bus_type @ ffffffff8817ec70 val 0 section
__kcrctab_gpl
update sec offset sym __kstrtab_i2c_smbus_write_i2c_block_data @ ffffffff8817ec90 val 0
section __ksymtab_strings
update sec offset sym ____versions @ ffffffff8817efa0 val 0 section __versions
update sec offset sym i2c_bus_type @ ffffffff88182980 val 0 section .data
update sec offset sym __this_module @ ffffffff88182f80 val 0 section
.gnu.linkonce.this_module
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
update sec offset sym __key.10825 @ ffffffff8818b180 val 0 section .bss
... [ repeat forever ] ...
First I had thought that it would start spinning upon encountering the first
module with one of those __key.##### symbols -- which was true in
the case of the x86 machine.
However, in the above x86_64 machine, several other modules with those
__key.#### bss symbols types did get loaded OK before getting hung up
loading i2c_core module.
For example, if I do the loads individually on the x86_64, the jbd module
loads OK, but the i2c_core hangs:
crash> sym -m jbd | grep __key
ffffffff8804b0b0 (b) __key.16794
ffffffff8804b0b0 (b) __key.16795
crash> mod -s jbd
MODULE NAME SIZE OBJECT FILE
ffffffff88042e80 jbd 98609
/lib/modules/2.6.18-1.2747.el5/kernel/fs/jbd/jbd.ko
crash> sym -m i2c_core | grep __key
ffffffff8818b180 (b) __key.10825
ffffffff8818b180 (b) __key.10826
crash> mod -s i2c_core
< hang >
Can you see if you can reproduce, and hopefully fix this?
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility