----- Original Message -----
On Thu, Mar 21, 2013 at 03:02:54PM -0400, Dave Anderson wrote:
> If for some reason you can't get them, I can make them available to
> you.
> And Lei Wen can also give you a sample dumpfile from his
> environment.
Got them from Luc.
> > Are you able to access module symbols on ARM dump (the one that Luc provided)?
> > Or is it failing completely?
>
> I *think* so...
>
> This module text disassembly looks right:
>
> crash> dis usbnet_suspend
> 0xbf000ae8 <usbnet_suspend>: push {r3, r4, r5, lr}
> 0xbf000aec <usbnet_suspend+4>: add r0, r0, #32
> 0xbf000af0 <usbnet_suspend+8>: mov r5, r1
> 0xbf000af4 <usbnet_suspend+12>: bl 0xc01b8264
> <dev_get_drvdata>
> 0xbf000af8 <usbnet_suspend+16>: ldrb r3, [r0, #36] ; 0x24
> 0xbf000afc <usbnet_suspend+20>: mov r4, r0
> 0xbf000b00 <usbnet_suspend+24>: add r2, r3, #1
> 0xbf000b04 <usbnet_suspend+28>: cmp r3, #0
> 0xbf000b08 <usbnet_suspend+32>: strb r2, [r0, #36] ; 0x24
> 0xbf000b0c <usbnet_suspend+36>: bne 0xbf000bdc
> <usbnet_suspend+244>
> 0xbf000b10 <usbnet_suspend+40>: mrs r3, CPSR
> 0xbf000b14 <usbnet_suspend+44>: orr r3, r3, #128 ; 0x80
> 0xbf000b18 <usbnet_suspend+48>: msr CPSR_c, r3
> 0xbf000b1c <usbnet_suspend+52>: mov r0, #1
> 0xbf000b20 <usbnet_suspend+56>: bl 0xc0015f40
> <add_preempt_count>
> 0xbf000b24 <usbnet_suspend+60>: ldr r3, [r4, #200] ; 0xc8
> 0xbf000b28 <usbnet_suspend+64>: cmp r3, #0
> 0xbf000b2c <usbnet_suspend+68>: beq 0xbf000b70
> <usbnet_suspend+136>
> 0xbf000b30 <usbnet_suspend+72>: tst r5, #1024 ; 0x400
> 0xbf000b34 <usbnet_suspend+76>: beq 0xbf000b70
> <usbnet_suspend+136>
> 0xbf000b38 <usbnet_suspend+80>: mrs r3, CPSR
> ...
>
> This (r) data looks OK:
>
> crash> p smsc95xx_netdev_ops
> smsc95xx_netdev_ops = $8 = {
> ndo_init = 0,
> ndo_uninit = 0,
> ndo_open = 0xbf000514 <usbnet_open>,
> ndo_stop = 0xbf000bec <usbnet_stop>,
> ndo_start_xmit = 0xbf001a60 <usbnet_start_xmit>,
> ndo_select_queue = 0,
> ndo_change_rx_flags = 0,
> ndo_set_rx_mode = 0,
> ndo_set_multicast_list = 0xbf008abc <smsc95xx_set_multicast>,
> ndo_set_mac_address = 0xc025d854 <eth_mac_addr>,
> ndo_validate_addr = 0xc025d6f8 <eth_validate_addr>,
> ndo_do_ioctl = 0xbf00926c <smsc95xx_ioctl>,
> ndo_set_config = 0,
> ndo_change_mtu = 0xbf000de0 <usbnet_change_mtu>,
> ndo_neigh_setup = 0,
> ndo_tx_timeout = 0xbf000d4c <usbnet_tx_timeout>,
> ndo_get_stats64 = 0,
> ndo_get_stats = 0,
> ndo_vlan_rx_add_vid = 0,
> ndo_vlan_rx_kill_vid = 0,
> ndo_set_vf_mac = 0,
> ndo_set_vf_vlan = 0,
> ndo_set_vf_tx_rate = 0,
> ndo_get_vf_config = 0,
> ndo_set_vf_port = 0,
> ndo_get_vf_port = 0,
> ndo_setup_tc = 0,
> ndo_add_slave = 0,
> ndo_del_slave = 0,
> ndo_fix_features = 0,
> crash>
I'm able to see the same.
Setting suitable debug level reveals:
bf00f040 (bf00f000): scsi_wait_scan syms: 0 gplsyms: 0 ksyms: 1
bf00a1f8 (bf008000): smsc95xx syms: 0 gplsyms: 0 ksyms: 60
bf002a40 (bf000000): usbnet syms: 0 gplsyms: 24 ksyms: 65
The ksyms comes from KALLSYMS and by default it only includes text and
inittext symbols. This explains why Lei is not able to see data etc. symbols
when he runs 'sym -m <module>'.
So I believe crash on ARM works as it should in this case.
I note that the symbols exported by ARM modules prior to mod -[sS]
contains a bunch of "$d" and "$a" symbols. The ARM
arm_verify_symbol()
function rejects symbols of that type, but that is only called if the
"mod -[sS]" function is run.
In other words, this is the flow during session initialization:
module_init()
store_module_symbols_v2() -> symbols from KALLSYMS + in-kernel module
struct
And if "mod -[sS]" is done, it goes like this:
cmd_mod()
do_module_cmd()
load_module_symbols()
store_load_module_symbols() -> symbols from module.ko file
machdep->verify_symbol()
So the "$d" and "$a" are there from the initialization-time onward.
But since store_module_symbols_v2() has never called machdep->verify_symbol()
I'm a bit hesitant to make it do so for all architectures without knowing the
consequences. But it certainly seems legitimate in the
"machine_type("ARM")" case.
> But the user-space vtop is clearly wrong:
>
> crash> vm
> PID: 1495 TASK: c1ef1380 CPU: 0 COMMAND: "bash"
> MM PGD RSS TOTAL_VM
> c30cd1e0 c1de4000 1484k 2940k
> VMA START END FLAGS FILE
> c1e9ae90 8000 c2000 8001875 /bin/bash
> c1e9aee8 c9000 ce000 8101877 /bin/bash
> c1e9af40 ce000 d3000 100077
> c2fc27b0 1247000 1268000 100077
> c2fc2650 4001c000 4001d000 100077
> c1e9af98 40038000 40055000 8000875 /lib/ld-linux.so.3
> c2fc20d0 4005c000 4005d000 8100875 /lib/ld-linux.so.3
> c2fc2758 4005d000 4005e000 8100877 /lib/ld-linux.so.3
> ...
>
>
> crash> vtop 8000
> VIRTUAL PHYSICAL
> 8000 8000
>
> PAGE DIRECTORY: c1de4000
> PGD: c1de4000 => 412
> PMD: c1de4000 => 412
> PAGE: 0 (1MB)
>
>
> VMA START END FLAGS FILE
> c1e9ae90 8000 c2000 8001875 /bin/bash
>
> crash> vtop 4005d000
> VIRTUAL PHYSICAL
> 4005d000 4005d000
>
> PAGE DIRECTORY: c1de4000
> PGD: c1de5000 => 40000412
> PMD: c1de5000 => 40000412
> PAGE: 40000000 (1MB)
>
>
> VMA START END FLAGS FILE
> c2fc2758 4005d000 4005e000 8100877 /lib/ld-linux.so.3
This is actually a known issue on ARM (just remembered that). When the crash
happens it identity maps the whole address space of the running process. This
has been fixed by upstream commit:
commit 2c8951ab0c337cb198236df07ad55f9dd4892c26
Author: Will Deacon <will.deacon(a)arm.com>
Date: Wed Jun 8 15:53:34 2011 +0100
ARM: idmap: use idmap_pgd when setting up mm for reboot
For soft-rebooting a system, it is necessary to map the MMU-off code
with an identity mapping so that execution can continue safely once the
MMU has been switched off.
Currently, switch_mm_for_reboot takes out a 1:1 mapping from 0x0 to
TASK_SIZE during reboot in the hope that the reset code lives at a
physical address corresponding to a userspace virtual address.
This patch modifies the code so that we switch to the idmap_pgd tables,
which contain a 1:1 mapping of the cpu_reset code. This has the
advantage of only remapping the code that we need and also means we
don't need to worry about allocating a pgd from an atomic context in the
case that the physical address of the cpu_reset code aliases with the
virtual space used by the kernel.
It went in for 3.2 and Luc's kernel is v3.1.1 which explains this.
If you select any other task vtop should work fine. For example cron daemon:
crash> vm
PID: 316 TASK: c2a7c160 CPU: 0 COMMAND: "crond"
MM PGD RSS TOTAL_VM
c30cd060 c0a70000 836k 2916k
VMA START END FLAGS FILE
c1cdd860 8000 15000 8001875 /usr/sbin/crond
c1cddcd8 1c000 1d000 8101875 /usr/sbin/crond
c1d7d758 1d000 1e000 8101877 /usr/sbin/crond
c1cddd88 1e000 9e000 100077
c1d7d5a0 9a4000 9c5000 100077
...
crash> vtop 8000
VIRTUAL PHYSICAL
8000 c1030000
PAGE DIRECTORY: c0a70000
PGD: c0a70000 => c2b3d831
PMD: c0a70000 => c2b3d831
PTE: c2b3d020 => c103018f
PAGE: c1030000
PTE PHYSICAL FLAGS
c103018f c1030000 (PRESENT|YOUNG|EXEC)
VMA START END FLAGS FILE
c1cdd860 8000 15000 8001875 /usr/sbin/crond
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c047d600 c1030000 c09b1590 0 2 228
OK good, that explains that...
Is it something that can be worked-around, or is the original pgd
lost forever? If it is not recoverable, then maybe the user-space
vtop should recognize that the bait-and-switch has occurred and fail?
Your call...
Thanks,
Dave