adding back a few CC's because this discussion is useful
On 11/12/14 19:43, Petr Tesarik wrote:
V Wed, 12 Nov 2014 15:50:32 +0100
Laszlo Ersek <lersek(a)redhat.com> napsáno:
> On 11/12/14 09:04, Petr Tesarik wrote:
>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST)
>> HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote:
>
>>> Anyway, phys_base is kernel information. To make it available for qemu
>>> side, there's need to prepare a mechanism for qemu to have any access
>>> to it.
>>
>> Yes. I wonder if you can have access without some sort of co-operation
>> from the guest kernel itself. I guess not.
>
> Propagating any kind of additional information from the guest kernel
> (which is unprivileged and potentially malicious) to the host-side qemu
> process (which is by definition more privileged, although still confined
> by various measures) is something we'd explicitly like to avoid.
>
> Think of it like this. I throw a physical box at you, running Linux,
> that has frozen in time. Can "crash" work with nothing else but the
> contents of the memory, and information about the CPUs?
If only you could save the _complete_ state of the CPU... For example
the content of CR3 would be quite useful.
(1) CR3 is already saved, in both the ELF and the kdump compressed formats.
- ELF case:
qmp_dump_guest_memory() [dump.c]
create_vmcore()
dump_begin()
write_elf64_notes()
loop from 1 to #vcpu:
cpu_write_elf64_note() [qom/cpu.c]
x86_64_write_elf64_note() [target-i386/arch_dump.c]
writes "CORE"
loop from 1 to #vcpu:
cpu_write_elf64_qemunote() [qom/cpu.c]
x86_cpu_write_elf64_qemunote() [target-i386/arch_dump.c]
cpu_write_qemu_note()
qemu_get_cpustate()
s->cr[3] = env->cr[3]; <---------- here
writes "QEMU"
Hence, the information is part of the QEMU note.
- kdump case:
qmp_dump_guest_memory() [dump.c]
create_kdump_vmcore()
write_dump_header()
create_header64()
write_elf64_notes()
[... same as above ...]
The trick here is that the note-writer functions use a callback function
for actually outputting the data. So while in the ELF case the stuff
goes directly to a file, in the kdump case the notes are first saved in
a memory buffer, and then later saved in the file at offset
KdumpSubHeader64.offset_note. (... Which is then represented in the
flattened file format of course.)
So, the information is there in both cases.
(2) Dave -- this just made me realize that the QEMU note is *already*
there in the kdump file as well; pointed-to by
KdumpSubHeader64.offset_note, for a length of KdumpSubHeader64.note_size.
From your other email
<
http://thread.gmane.org/gmane.linux.kernel.kexec/12787/focus=12797>:
sub_header_kdump: 1c9cff0
phys_base: 0
dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO)
split: 0
start_pfn: (unused)
end_pfn: (unused)
offset_vmcoreinfo: 0 (0x0)
size_vmcoreinfo: 0 (0x0)
offset_note: 4200 (0x1068) <----------- here
size_note: 3232 (0xca0) <-----------
num_prstatus_notes: 4
notes_buf: 1c9e000
notes[0]: 1c9e000
notes[1]: 1c9e164
notes[2]: 1c9e2c8
notes[3]: 1c9e42c
NT_PRSTATUS_offset: 1068
11cc
1330
1494
offset_eraseinfo: 0 (0x0)
size_eraseinfo: 0 (0x0)
start_pfn_64: (unused)
end_pfn_64: (unused)
max_mapnr_64: 1245184 (0x130000)
Can you fetch that in "crash"? If you can, then there's nothing to do on
the qemu side (and I'll have to apologize for spamming a bunch of lists :/).
I think "crash" already iterates over all of the notes in the note
buffer, but skips everything different from NT_PRSTATUS.
(3) Regarding the structure of the notes, we have to consider the
placement of the notes and their internal structure. The placement is
different between the ELF and the KDUMP file format. The internal
structure of the notes is identical between the two file formats.
For example, for a 4 VCPU guest, you end up with note names like
CORE
CORE
CORE
CORE
QEMU
QEMU
QEMU
QEMU
All of these are Elf64_Nhdr structures. The CORE ones have type
NT_PRSTATUS, and the QEMU ones have type 0.
(3a) The placement in the ELF file is already handled by "crash". Each
note "simply" gets its own ELF note segment/section.
(3b) In the kdump file, the Elf64_Nhdr structures (8 pieces in total, in
the above example -- 4x CORE, 4x QEMU) are concatenated in that order,
and finally stored at "offset_note".
(3c) Regarding the internal structure of the notes. The CORE ones are
already known and handled. The QEMU notes have the following structure:
Elf64_Nhdr:
n_namesz: 5 ("QEMU")
n_descsz: 432
n_type: 0 (?)
000001b000000001 0000000000000000
|------||------|
|--------------|
size version rax
0000000000000000 0000000000000000
|--------------| |--------------|
rbx rcx
0000000000000000 0000000000000001
|--------------| |--------------|
rdx rsi
ffffffff81dd5228 ffffffff81a01ec8
|--------------| |--------------|
rdi rsp
ffffffff81a01ec8 0000000000000000
|--------------| |--------------|
rbp r8
0000000000000000 00000013911d5f29
|--------------| |--------------|
r9 r10
0000000000000000 ffffffff81c00480
|--------------| |--------------|
r11 r12
0000000000000000 ffffffffffffffff
|--------------| |--------------|
r13 r14
000000000309f000 ffffffff810375ab
|--------------| |--------------|
r15 rip
0000000000000246 ffffffff00000010
|--------------| |------||------|
rflags cs/lim cs/sel
0000000000a09b00 0000000000000000
|------||------| |--------------|
cs/pad cs/flags cs/base
ffffffff00000018 0000000000c09300
|------||------| |------||------|
ds/lim ds/sel ds/pad ds/flags
0000000000000000 ffffffff00000018
|--------------| |------||------|
ds/base es/lim es/sel
0000000000c09300 0000000000000000
|------||------| |--------------|
es/pad es/flags es/base
ffffffff00000000 0000000000000000
|------||------| |------||------|
fs/lim fs/sel fs/pad fs/flags
0000000000000000 ffffffff00000000
|--------------| |------||------|
fs/base gs/lim gs/sel
0000000000000000 ffff880003200000
|------||------| |--------------|
gs/pad gs/flags gs/base
ffffffff00000018 0000000000c09300
|------||------| |------||------|
ss/lim ss/sel ss/pad ss/flags
0000000000000000 ffffffff00000000
|--------------| |------||------|
ss/base ldt...
0000000000000000 0000000000000000
|------||------| |--------------|
...ldt
0000208700000040 0000000000008b00
|------||------| |------||------|
tr...
ffff880003213b40 0000007f00000000
|--------------| |------||------|
...tr gdt...
0000000000000000 ffff880003204000
|------||------| |--------------|
...gdt
00000fff00000000 0000000000000000
|------||------| |------||------|
idt...
ffffffff81dd2000 000000008005003b
|--------------| |--------------|
...idt cr0
0000000000000000 0000000001b2e000
|--------------| |--------------|
cr1 cr2
0000000007b18000 00000000000006f0
|--------------| |--------------|
cr3 cr4
From "target-i386/arch_dump.c":
struct QEMUCPUSegment {
uint32_t selector;
uint32_t limit;
uint32_t flags;
uint32_t pad;
uint64_t base;
};
typedef struct QEMUCPUSegment QEMUCPUSegment;
struct QEMUCPUState {
uint32_t version;
uint32_t size;
uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
uint64_t rip, rflags;
QEMUCPUSegment cs, ds, es, fs, gs, ss;
QEMUCPUSegment ldt, tr, gdt, idt;
uint64_t cr[5];
};
typedef struct QEMUCPUState QEMUCPUState;
Summary: I think the info is all there.
Thanks
Laszlo