----- Original Message -----
Modified patch attached. It is rebased to latest crash version.
The arguments are in the form of ordered pair as you had mentioned. I
have tested it with arm and armv8 ramdumps.
Do we really need dump_ramdump_def ? As the dump is converted to
kdump and we use the kdump flag in pc->flags, help -D and help -n
works fine using kdump dump functions. Did I miss something ?
I will send you the link to arm64 ramdump in another email.
Thanks,
Vinayak
I tested your latest patch on the sample ARM and ARM64 RAM dumps
you sent me.
As far as the patch itself is concerned, I ran into a problem
where if crash is invoked in a directory where it does not have
write permission, the session hangs trying to write to a bad file
descriptor -- because of this:
fd2 = open(out_elf, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
if (!fd2) {
error(INFO, "%s open error\n", out_elf);
goto end1;
}
It should be "if (fd2 < 0)".
But more to the point, in my earlier response, I had suggested this:
With respect to the [-o output_file], and given the potential
simplicity of the argument string, I think it should be
optional. You could do something like this in the getopt()
handler, and have the ELF output_file name pre-stored in the
ramdump_def structure:
+ case 'o':
+ ramdump_elf_output_file(optarg);
+ break;
If "-o output_file" is NOT used, then ramdump_to_elf() can
pass back the name of a temporary file.
I should have been more clear w/respect to "a temporary file".
what I was suggesting was that you do something like using
mkstemp(3) to create a temporay file in /var/tmp, and then
unlink() it immediately so it would only exist until the crash
session ends.
I tested this patch on your sample ARM and ARM64 RAM dumps.
The 32-bit ARM dumpfile can be analyzed OK, but as you noted,
the ARM64 dump requires "--cpus 4" to come OK, which really
should not be required.
Investigating the reason for the "--cpus 4" requirement, it's
helpful to compare the two dumps. With your sample 32-bit ARM
dumpfile, although it comes up OK with 4 cpus, note that only
cpu 1 is marked online in the kernel:
crash> help -k
...
cpu_possible_map: 0 1 2 3
cpu_present_map: 0 1 2 3
cpu_online_map: 1
cpu_active_map: 0 1 2 3
...
The 32-bit arm.c arm_get_smp_cpus() function calculates the
number of cpus like this:
return MAX(get_cpus_active(), get_cpus_online());
so it returns 4 since the "active" map shows all 4 cpus.
The "ps" command shows tasks associated with all 4 cpus, and the
runqueues look like this, where cpus 0 and 2 have their idle
task running, and cpus 1 and 3 have user-mode tasks running:
crash> runq
CPU 0 RUNQUEUE: c0f286c0
CURRENT: PID: 0 TASK: c0a5d8b0 COMMAND: "swapper/0"
RT PRIO_ARRAY: c0f287a0
[no tasks queued]
CFS RB_ROOT: c0f28730
[no tasks queued]
CPU 1 RUNQUEUE: c0f316c0
CURRENT: PID: 13429 TASK: db944580 COMMAND: "AudioIn_5F8"
RT PRIO_ARRAY: c0f317a0
[no tasks queued]
CFS RB_ROOT: c0f31730
[120] PID: 474 TASK: d9a36ac0 COMMAND: "kworker/1:1"
[120] PID: 2890 TASK: c89b2580 COMMAND: "sh"
CPU 2 RUNQUEUE: c0f3a6c0
CURRENT: PID: 0 TASK: db63a040 COMMAND: "swapper/2"
RT PRIO_ARRAY: c0f3a7a0
[no tasks queued]
CFS RB_ROOT: c0f3a730
[112] PID: 1599 TASK: db87d040 COMMAND: "mm_device_threa"
CPU 3 RUNQUEUE: c0f436c0
CURRENT: PID: 1949 TASK: db951040 COMMAND: "WindowManager"
RT PRIO_ARRAY: c0f437a0
[no tasks queued]
CFS RB_ROOT: c0f43730
[no tasks queued]
crash>
So it does seem that whatever mechanism you use to take the
raw RAM dump on the 32-bit ARM offlines cpus first?
Now, on the ARM64 dumpfile, if I force it to come up with "--cpus 4"
it shows that only cpu 0 is online, present and active:
crash> help -k
...
cpu_possible_map: 0 1 2 3
cpu_present_map: 0
cpu_online_map: 0
cpu_active_map: 0
...
I can understand that perhaps cpus are offlined prior to taking
the RAM dump, but it's strange that the "present" and "active"
maps are also the same as the "online" map?
Currently the arm64.c arm64_get_smp_cpus() returns the number of
cpus like this:
return MAX(get_cpus_online(), get_highest_cpu_online()+1);
so it returns 1. Even if it did it the same as the 32-bit ARM,
it would still return 1 because of the active map.
So we have to force it to return 4 with "--cpus 4". But having done
that, oddly enough, the "runq" command shows this, where the
"CURRENT"
task on cpu 0 is "0":
crash> runq
CPU 0 RUNQUEUE: ffffffc03ffb6e40
CURRENT: 0
RT PRIO_ARRAY: ffffffc03ffb6fb0
[no tasks queued]
CFS RB_ROOT: ffffffc03ffb6f10
[no tasks queued]
CPU 1 RUNQUEUE: ffffffc03ffc1e40
CURRENT: PID: 0 TASK: ffffffc03ecb4b00 COMMAND: "swapper/1"
RT PRIO_ARRAY: ffffffc03ffc1fb0
[no tasks queued]
CFS RB_ROOT: ffffffc03ffc1f10
[no tasks queued]
CPU 2 RUNQUEUE: ffffffc03ffcce40
CURRENT: PID: 0 TASK: ffffffc03ecb5dc0 COMMAND: "swapper/2"
RT PRIO_ARRAY: ffffffc03ffccfb0
[no tasks queued]
CFS RB_ROOT: ffffffc03ffccf10
[no tasks queued]
CPU 3 RUNQUEUE: ffffffc03ffd7e40
CURRENT: PID: 0 TASK: ffffffc03ecf0000 COMMAND: "swapper/3"
RT PRIO_ARRAY: ffffffc03ffd7fb0
[no tasks queued]
CFS RB_ROOT: ffffffc03ffd7f10
[no tasks queued]
crash>
I have never seen this before -- As I understand it, if no other
task is queued and run on a cpu, then it defaults to the idle/swapper
task for that cpu, whose address is hard-wired in the per-cpu runqueue
structure. But if I look at the rq structure for cpu 0, not only is
the "curr" task pointer NULL, the "idle" task pointer is also:
crash> rq.curr,idle,cpu ffffffc03ffb6e40
curr = 0x0
idle = 0x0
cpu = 0
crash>
whereas the other 3 cpus show that they are running their idle tasks:
crash> rq.curr,idle,cpu ffffffc03ffc1e40
curr = 0xffffffc03ecb4b00
idle = 0xffffffc03ecb4b00
cpu = 1
crash> rq.curr,idle,cpu ffffffc03ffcce40
curr = 0xffffffc03ecb5dc0
idle = 0xffffffc03ecb5dc0
cpu = 2
crash> rq.curr,idle,cpu ffffffc03ffd7e40
curr = 0xffffffc03ecf0000
idle = 0xffffffc03ecf0000
cpu = 3
crash>
Perhaps it has something to do with *when* you took the dump.
The "sys" command shows an UPTIME of 00:00:00:
crash> sys
KERNEL: /home/anderson/Downloads/tmp_ARM64/vmlinux
DUMPFILE: ramdump_elf
CPUS: 4
DATE: Wed Dec 31 19:00:00 1969
UPTIME: 00:00:00
LOAD AVERAGE: 0.00, 0.00, 0.00
TASKS: 34
NODENAME: (none)
RELEASE: 3.10.33+
VERSION: #22 SMP PREEMPT Tue May 6 16:23:34 IST 2014
MACHINE: aarch64 (unknown Mhz)
MEMORY: 1 GB
PANIC: ""
crash>
And the "ps" command doesn't show any user-space tasks running,
not even "init" PID 1, and the funky idle/swapper task on cpu 0
shows a PID of 1:
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 -1 1 ffffffc03ecb4b00 RU 0.0 0 0
[swapper/1]
0 -1 2 ffffffc03ecb5dc0 RU 0.0 0 0 [swapper/2]
0 -1 3 ffffffc03ecf0000 RU 0.0 0 0 [swapper/3]
1
-1 0 ffffffc03ec78000 UN 0.0 0 0 [swapper/0]
2 -1 0 ffffffc03ec792c0 IN 0.0 0 0 [kthreadd]
3 2 0 ffffffc03ec7a580 IN 0.0 0 0 [ksoftirqd/0]
4 2 0 ffffffc03ec7b840 IN 0.0 0 0 [kworker/0:0]
5 2 0 ffffffc03ec7cb00 IN 0.0 0 0 [kworker/0:0H]
6 2 0 ffffffc03ec7ddc0 IN 0.0 0 0 [kworker/u8:0]
7 2 0 ffffffc03ecb0000 IN 0.0 0 0 [migration/0]
8 2 0 ffffffc03ecb12c0 IN 0.0 0 0 [rcu_preempt]
9 2 0 ffffffc03ecb2580 IN 0.0 0 0 [rcu_bh]
10 2 0 ffffffc03ecb3840 IN 0.0 0 0 [rcu_sched]
11 2 1 ffffffc03ecf12c0 ?? 0.0 0 0 [migration/1]
12 2 1 ffffffc03ecf2580 ?? 0.0 0 0 [ksoftirqd/1]
13 2 1 ffffffc03ecf3840 IN 0.0 0 0 [kworker/1:0]
14 2 1 ffffffc03ecf4b00 IN 0.0 0 0 [kworker/1:0H]
15 2 2 ffffffc03ecf5dc0 ?? 0.0 0 0 [migration/2]
16 2 2 ffffffc03ed20000 IN 0.0 0 0 [ksoftirqd/2]
17 2 0 ffffffc03ed212c0 UN 0.0 0 0 [kworker/2:0]
18 2 2 ffffffc03ed22580 IN 0.0 0 0 [kworker/2:0H]
19 2 3 ffffffc03ed23840 IN 0.0 0 0 [migration/3]
20 2 3 ffffffc03ed24b00 IN 0.0 0 0 [ksoftirqd/3]
21 2 3 ffffffc03ed25dc0 IN 0.0 0 0 [kworker/3:0]
22 2 3 ffffffc03ed40000 IN 0.0 0 0 [kworker/3:0H]
23 2 0 ffffffc03ed412c0 IN 0.0 0 0 [khelper]
24 2 0 ffffffc03ed42580 IN 0.0 0 0 [kdevtmpfs]
25 2 0 ffffffc03ed43840 IN 0.0 0 0 [kworker/u8:1]
56 2 0 ffffffc03ededdc0 IN 0.0 0 0 [bcm_ipc_ch0]
57 2 0 ffffffc03edecb00 IN 0.0 0 0 [bcm_ipc_ch11]
180 2 0 ffffffc03ee5ddc0 IN 0.0 0 0 [writeback]
182 2 0 ffffffc03ee912c0 IN 0.0 0 0 [bioset]
184 2 0 ffffffc03ede92c0 IN 0.0 0 0 [kworker/u9:0]
185 2 0 ffffffc03ede8000 IN 0.0 0 0 [kblockd]
crash>
So I'm guessing that this dumpfile was taken before the "init" task was
even
created, and the kernel data structures were not fully initialized?
Maybe you can try taking a RAM dump on an ARM64 machine after
it is up and running?
Thanks,
Dave