----- Original Message -----
adding back a few CC's because this discussion is useful
On 11/12/14 19:43, Petr Tesarik wrote:
> V Wed, 12 Nov 2014 15:50:32 +0100
> Laszlo Ersek <lersek(a)redhat.com> napsáno:
>
>> On 11/12/14 09:04, Petr Tesarik wrote:
>>> On Wed, 12 Nov 2014 12:08:38 +0900 (JST)
>>> HATAYAMA Daisuke <d.hatayama(a)jp.fujitsu.com> wrote:
>>
>>>> Anyway, phys_base is kernel information. To make it available for qemu
>>>> side, there's need to prepare a mechanism for qemu to have any
access
>>>> to it.
>>>
>>> Yes. I wonder if you can have access without some sort of co-operation
>>> from the guest kernel itself. I guess not.
>>
>> Propagating any kind of additional information from the guest kernel
>> (which is unprivileged and potentially malicious) to the host-side qemu
>> process (which is by definition more privileged, although still confined
>> by various measures) is something we'd explicitly like to avoid.
>>
>> Think of it like this. I throw a physical box at you, running Linux,
>> that has frozen in time. Can "crash" work with nothing else but the
>> contents of the memory, and information about the CPUs?
>
> If only you could save the _complete_ state of the CPU... For example
> the content of CR3 would be quite useful.
(1) CR3 is already saved, in both the ELF and the kdump compressed formats.
- ELF case:
qmp_dump_guest_memory() [dump.c]
create_vmcore()
dump_begin()
write_elf64_notes()
loop from 1 to #vcpu:
cpu_write_elf64_note() [qom/cpu.c]
x86_64_write_elf64_note() [target-i386/arch_dump.c]
writes "CORE"
loop from 1 to #vcpu:
cpu_write_elf64_qemunote() [qom/cpu.c]
x86_cpu_write_elf64_qemunote() [target-i386/arch_dump.c]
cpu_write_qemu_note()
qemu_get_cpustate()
s->cr[3] = env->cr[3]; <---------- here
writes "QEMU"
Hence, the information is part of the QEMU note.
- kdump case:
qmp_dump_guest_memory() [dump.c]
create_kdump_vmcore()
write_dump_header()
create_header64()
write_elf64_notes()
[... same as above ...]
The trick here is that the note-writer functions use a callback function
for actually outputting the data. So while in the ELF case the stuff
goes directly to a file, in the kdump case the notes are first saved in
a memory buffer, and then later saved in the file at offset
KdumpSubHeader64.offset_note. (... Which is then represented in the
flattened file format of course.)
So, the information is there in both cases.
(2) Dave -- this just made me realize that the QEMU note is *already*
there in the kdump file as well; pointed-to by
KdumpSubHeader64.offset_note, for a length of KdumpSubHeader64.note_size.
From your other email
<
http://thread.gmane.org/gmane.linux.kernel.kexec/12787/focus=12797>:
> sub_header_kdump: 1c9cff0
> phys_base: 0
> dump_level: 1 (0x1) (DUMP_EXCLUDE_ZERO)
> split: 0
> start_pfn: (unused)
> end_pfn: (unused)
> offset_vmcoreinfo: 0 (0x0)
> size_vmcoreinfo: 0 (0x0)
> offset_note: 4200 (0x1068) <----------- here
> size_note: 3232 (0xca0) <-----------
> num_prstatus_notes: 4
> notes_buf: 1c9e000
> notes[0]: 1c9e000
> notes[1]: 1c9e164
> notes[2]: 1c9e2c8
> notes[3]: 1c9e42c
> NT_PRSTATUS_offset: 1068
> 11cc
> 1330
> 1494
> offset_eraseinfo: 0 (0x0)
> size_eraseinfo: 0 (0x0)
> start_pfn_64: (unused)
> end_pfn_64: (unused)
> max_mapnr_64: 1245184 (0x130000)
Can you fetch that in "crash"? If you can, then there's nothing to do on
the qemu side (and I'll have to apologize for spamming a bunch of lists :/).
Sure enough...
I was just playing with process_el64_notes() to check/read the note name strings,
and noticed that I can certainly see them. But as you noted, only the NT_PRSTATUS
notes are stored in the "notes[]" array. so I was under the impression that the
QEMU notes were completely missing.
That being the case -- we're pretty much done!
I'll put a patch in the next upstream release of crash.
Thanks,
Dave
I think "crash" already iterates over all of the notes in the note
buffer, but skips everything different from NT_PRSTATUS.
(3) Regarding the structure of the notes, we have to consider the
placement of the notes and their internal structure. The placement is
different between the ELF and the KDUMP file format. The internal
structure of the notes is identical between the two file formats.
For example, for a 4 VCPU guest, you end up with note names like
CORE
CORE
CORE
CORE
QEMU
QEMU
QEMU
QEMU
All of these are Elf64_Nhdr structures. The CORE ones have type
NT_PRSTATUS, and the QEMU ones have type 0.
(3a) The placement in the ELF file is already handled by "crash". Each
note "simply" gets its own ELF note segment/section.
(3b) In the kdump file, the Elf64_Nhdr structures (8 pieces in total, in
the above example -- 4x CORE, 4x QEMU) are concatenated in that order,
and finally stored at "offset_note".
(3c) Regarding the internal structure of the notes. The CORE ones are
already known and handled. The QEMU notes have the following structure:
> Elf64_Nhdr:
> n_namesz: 5 ("QEMU")
> n_descsz: 432
> n_type: 0 (?)
> 000001b000000001 0000000000000000
|------||------| |--------------|
size version rax
> 0000000000000000 0000000000000000
|--------------| |--------------|
rbx rcx
> 0000000000000000 0000000000000001
|--------------| |--------------|
rdx rsi
> ffffffff81dd5228 ffffffff81a01ec8
|--------------| |--------------|
rdi rsp
> ffffffff81a01ec8 0000000000000000
|--------------| |--------------|
rbp r8
> 0000000000000000 00000013911d5f29
|--------------| |--------------|
r9 r10
> 0000000000000000 ffffffff81c00480
|--------------| |--------------|
r11 r12
> 0000000000000000 ffffffffffffffff
|--------------| |--------------|
r13 r14
> 000000000309f000 ffffffff810375ab
|--------------| |--------------|
r15 rip
> 0000000000000246 ffffffff00000010
|--------------| |------||------|
rflags cs/lim cs/sel
> 0000000000a09b00 0000000000000000
|------||------| |--------------|
cs/pad cs/flags cs/base
> ffffffff00000018 0000000000c09300
|------||------| |------||------|
ds/lim ds/sel ds/pad ds/flags
> 0000000000000000 ffffffff00000018
|--------------| |------||------|
ds/base es/lim es/sel
> 0000000000c09300 0000000000000000
|------||------| |--------------|
es/pad es/flags es/base
> ffffffff00000000 0000000000000000
|------||------| |------||------|
fs/lim fs/sel fs/pad fs/flags
> 0000000000000000 ffffffff00000000
|--------------| |------||------|
fs/base gs/lim gs/sel
> 0000000000000000 ffff880003200000
|------||------| |--------------|
gs/pad gs/flags gs/base
> ffffffff00000018 0000000000c09300
|------||------| |------||------|
ss/lim ss/sel ss/pad ss/flags
> 0000000000000000 ffffffff00000000
|--------------| |------||------|
ss/base ldt...
> 0000000000000000 0000000000000000
|------||------| |--------------|
...ldt
> 0000208700000040 0000000000008b00
|------||------| |------||------|
tr...
> ffff880003213b40 0000007f00000000
|--------------| |------||------|
...tr gdt...
> 0000000000000000 ffff880003204000
|------||------| |--------------|
...gdt
> 00000fff00000000 0000000000000000
|------||------| |------||------|
idt...
> ffffffff81dd2000 000000008005003b
|--------------| |--------------|
...idt cr0
> 0000000000000000 0000000001b2e000
|--------------| |--------------|
cr1 cr2
> 0000000007b18000 00000000000006f0
|--------------| |--------------|
cr3 cr4
From "target-i386/arch_dump.c":
> struct QEMUCPUSegment {
> uint32_t selector;
> uint32_t limit;
> uint32_t flags;
> uint32_t pad;
> uint64_t base;
> };
>
> typedef struct QEMUCPUSegment QEMUCPUSegment;
>
> struct QEMUCPUState {
> uint32_t version;
> uint32_t size;
> uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
> uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
> uint64_t rip, rflags;
> QEMUCPUSegment cs, ds, es, fs, gs, ss;
> QEMUCPUSegment ldt, tr, gdt, idt;
> uint64_t cr[5];
> };
>
> typedef struct QEMUCPUState QEMUCPUState;
Summary: I think the info is all there.
Thanks
Laszlo