Re: [Crash-utility] PATCH 00/10] teach crash to work with "live" ramdump

Tuesday, 26 April 2016

----- Original Message -----
...
 On 04/26, Dave Anderson wrote:
 >
 > > OK. Suppose we add ACTIVE_QEMU() helper. IMO this is a bad idea in any case,
the core
 > > code should not even know that this kernel runs under qemu. Nevermind, suppose
we have
 > > say
 > >
 > > 	#define ACTIVE_QEMU() ((pc->flags & LIVE_SYSTEM) &&
(pc->flags2 && QEMU))
 > >
 > > Now what? We need the same 1-7 patches, just LOCAL_ACTIVE() should be replaced
 > > with "ACTIVE() && !QEMU_ACTIVE()".
 >
 > Correct.  ACTIVE() is used ~100 times, and in the vast majority of cases, its use
 > applies to a live QEMU/KVM session. In the few circumstances that it doesn't,
then
 > ACTIVE_QEMU() should be applied so that it's obvious to the maintainer (me),
what
 > the issue is.

 to a live QEMU/KVM session, and/or to any other live-and-remote session, please see
 below.

 > Who know what "live" mechanism may come about in the future that  may also
have its own
 > quirks?  I don't want to hide it, but rather make it strikingly obvious.

 Ah, but this is another story.

 I mean... OK, as 00/10 says, my vague/distant goal is teach /usr/bin/crash to use
 gdb-remote protocol to debug the live guests. And in this case ACTIVE_QEMU() makes
 a lot of sense. Say, cmd_bt() can use it to get the registers/trace even if the
 process is running, pause/resume the guest, etc.

 But all the LOCAL_ACTIVE changes in 1-7 do not care about the details of
"live"
 mechanism at all. So I still think we need a generic helper which should be true
 if local-and-active. Or, vice versa, remote-and-active, this doesn't matter.

 > > 	--- a/kernel.c
 > > 	+++ b/kernel.c
 > > 	@@ -2900,7 +2900,7 @@ back_trace(struct bt_info *bt)
 > > 			return;
 > > 		}
 > >
 > > 	-	if (ACTIVE() && !INSTACK(esp, bt)) {
 > > 	+	if (LOCAL_ACTIVE() && !INSTACK(esp, bt)) {
 > > 			sprintf(buf, "/proc/%ld", bt->tc->pid);
 > > 			if (!file_exists(buf, NULL))
 > > 				error(INFO, "task no longer exists\n");
 > >
 > > The usage of ACTIVE() is obviously wrong if this is the live (so that ACTIVE()
 > > is true) but remote kernel. We should not even try to look at /proc files on
 > > the local system in this case.
 >
 > Correct.  So restrict it meaningfully (to me anyway).

 So you suggest to change this patch to do

 		if (ACTIVE() && !ACTIVE_QEMU() && !INSTACK(...))

 To me this simply looks worse, but I won't insist. But note that if we ever have
 another ACTIVE_SOMETHING() source, we will need to modify this code again.  While
 this code do not care about qemu/something at all. So I still think we need a new
 helper which doesn't depend on qemu or whatever else. 
Right, but this is definitely the outlier with respect to "live" systems.

...
 > > Or perhaps you mean that ACTIVE_QEMU() should be defined
as
 > >
 > > 	#define ACTIVE_QEMU()	(pc->flags2 & QEMU_LIVE)
 > >
 > > ? iow, it should not imply ACTIVE() ? This would be even worse, in this case
we
 > > would neet to replace almost every ACTIVE() with "ACTIVE() ||
ACTIVE_QEMU()". 
QEMU_LIVE should be in pc->flags, and appear as part of MEMORY_SOURCES.  And
LIVE_SYSTEM
should also be set so that the facility falls under both ACTIVE() and ACTIVE_QEMU().
And then in the subset of cases where ACTIVE() is too broad, ACTIVE_QEMU() can be added
as a restriction.

But the above is not relevant with respect to some new extension of the ramdump.

...
 >
 > I agree that there are a handful of circumstances that you have run into where
 > ACTIVE() may not apply, such as the case where /proc was accessed.  But I don't
 > understand why you say "almost every" instance?

 Ah, sorry for confusion. I meant, If we add ACTIVE_QEMU() it should imply
 ACTIVE(), otherwise we have even more problems. 
Correct.

...

 > Why?  If the target is live, then all of the above should be called as-is.  Each
 > of them returns if the target is a dumpfile.

 Yes, sure, see above. If ACTIVE_QEMU() plugin sets LIVE_SYSTEM flag too, most users
 of ACTIVE() are fine.

 > > OK, lets suppose we add this feature... How do you think the command line
should
 > > look?
 > >
 > > I mean, after this series we can do, say,
 > >
 > > 	./crash vmlinux raw:DUMP_1@OFFSET_1,DUMP_2@OFFSET_2
 > >
 > > if we have 2 ramdump's which should be used together. How do you think the
new
 > > syntax should look? I am fine either way.
 >
 > I guess I've got some basic misunderstandings here...
 >
 > If it's a live system, why is necessary to specify RAM offsets?

 I suspect we will need offsets in more complex situations, qemu can have multiple
 memory-backend-file/numa options.  And perhaps even a single file may need it,
 not sure. 
But with any live system, crash reads the relevant kernel data structures and sets
up its picture of the system's physical memory accordingly.  There's no need to
specify
where the memory lies -- it's all available in the live kernel itself.

On the other hand, typical dumpfile headers give the crash utility instructions
on how to randomly access physical memory in the dump, i.e., like the PT_LOAD
segments in an ELF vmcore.  Ramdumps don't have any header information, so the
physical memory blocks have to be specified on the crash command line -- and then
crash creates a temporary in-memory ELF header for subsequent memory reads.
(or the user can specify "-o dumpfile" to transform the ramdump into a kdump
clone.   

...

 > And if you're just emulating the ramdump facility by first dumping the
guest's memory
 > into a dumpfile, why isn't it just a ramdump clone?

 Sorry, can't understand... could you spell? 
I'm not sure because that's what I don't understand.  You seem to be
describing two completely
different facilities:

  (1) a live access facility like /dev/mem, et al, but to a live KVM guest
  (2) some kind of ramdump facility?

And if it's a ramdump facility, couldn't you just copy it from the guest to the
host
and analyze it there?

Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] PATCH 00/10] teach crash to work with "live" ramdump