October 2010 - Crash-utility - Crash Utility List Archives

[ANNOUNCE] crash version 5.0.9 is available

by Dave Anderson

changelog: - Make the symbol_search_next() function externally available to extension modules, as requested for the "pykdump" extension module. (anderson(a)redhat.com) - Fix for the "log" command to recognize that the "log_end" symbol was changed from an unsigned long to an unsigned int in 2.6.25 and later kernels. (anderson(a)redhat.com) - Fix to determine the size and location of the x86_64 interrupt stack on kernels that are not configured CONFIG_SMP. Without the patch, runtime commands that use the embedded gdb module may fail with the error message "<segmentation violation in gdb>". (anderson(a)redhat.com) - Suppress the "crash -d1" initialization-time message that indicates "WARNING: Because this kernel was compiled with gcc version <x.x.x>, certain commands or command options may fail unless crash is invoked with the --readnow command line option" to only be displayed with kernels compiled with gcc versions between 3.4.0 and 4.0.0. (anderson(a)redhat.com) - Fix for the "bt" command on 2.6.33 and later x86_64 kernels, which contain debuginfo data for "struct user_regs_struct", and where the the dumpfile is a kdump ELF vmcore. Without the patch, the backtrace for the panic task uses the registers found in the ELF header's NT_PRSTATUS note as starting hooks, which causes the backtrace to be essentially truncated, leaving out the exception frame, the exception handler's frame, and so on, down to the kdump operation. The patch will only use the ELF header's registers if better starting hooks cannot be determined. (anderson(a)redhat.com) - Fix for handling KVM dumpfiles that contain "devices" that are not explicitly supported. The patch skips over the unsupported/unused device segment in the dumpfile, and searches for the next "known" device contained in the supported device table. Without the patch, the crash session fails during initialization with the error message "crash: <dumpfile>: initialization failed". (anderson(a)redhat.com) - When handcrafting the backtrace starting point for the "bt" command by using the -S option, and the starting stack address is not in the task's process stack or in a legitimate non-process stack address, such as a hard or soft IRQ stack address, or an x86_64 exception stack address, a message gets displayed that indicates "non-process stack address for this task". Without the patch, the backtrace is still attempted, which may result in a segmentation violation, so this behavior has been changed such that the "bt" command will fail immediately. (anderson(a)redhat.com) - Modified the help page for the "help" command to also show the various crash-internal debug options available. (anderson(a)redhat.com) - Fix for the x86_64 "bt" command to more correctly find the starting backtrace RIP and RSP hooks in KVM dumpfiles. Without the patch, backtraces that should start in the interrupt or exception stacks were not being detected correctly. (anderson(a)redhat.com) - Save the per-cpu register contents stored in the "cpu" devices of x86_64 KVM dumpfiles, and use their contents for x86_64 backtrace RSP and RIP hooks in the case of KVM "live dumps" where the guest system was not in a crashed state when the "virsh dump" operation was done on the KVM host. If an active task was running in user space when a live dump was taken, that will be indicated by the "bt" output, along with the user-space register contents. The x86_64 register set saved for each cpu may be displayed with the "help -[D|n]" command. (hutao(a)cn.fujitsu.com, anderson(a)redhat.com) - Fix for the cpu count determination in crashed x86 KVM dumpfiles, where the non-crashing cpus are marked offline in the kernel's cpu_online_mask by smp_stop_cpu(). Depending upon the cpu number of the crashing task, the cpu count may be set to a value that is less than the number of present cpus. (anderson(a)redhat.com) - Fix for a premature failure of the "kmem -i" command with kernels that are not configured with CONFIG_SWAP. (per.xx.fransson(a)stericsson.com) - Fix for the x86 "bt" command on 2.6.31 and later kernels when the crash was generated by an "echo c > /proc/sysrq-trigger". Without the patch, the backtrace does not display the exception frame from the forced oops. This is not applicable to older kernels where crash_kexec() is called directly from sysrq_handle_crash(), or if an actual alt-sysrq-c keystroke sequence is entered. (anderson(a)redhat.com) - Fix for the x86 "bt" command to correctly find the starting backtrace EIP and ESP hooks for the active tasks in KVM dumpfiles where the kernel had crashed. (anderson(a)redhat.com) - Fix to utilize the correct "cpu" device format in x86 KVM dumpfiles Without the patch, the x86 registers were read in a 32-bit format, which is only true if the host machine was running a 32-bit kernel. With the patch, the format defaults to the 64-bit format, and is switched to the 32-bit format if it can be determined that the host machine was running a 32-bit kernel. (hutao(a)cn.fujitsu.com, anderson(a)redhat.com) - Save the per-cpu register contents stored in the "cpu" devices of x86 KVM dumpfiles, and use their contents for x86 backtrace ESP and EIP hooks in the case of KVM "live dumps", i.e., where the guest system was not in a crashed state when the "virsh dump" operation was done on the KVM host. If an active task was running in user space when a live dump was taken, that will be indicated by the "bt" output, along with the user-space register contents. The saved x86 register set for each cpu may also be displayed with the "help -[D|n]" command. (hutao(a)cn.fujitsu.com, anderson(a)redhat.com) - Update for the KVM-only "map" command to also store the register sets read from the the KVM dumpfile's "cpu" devices in addition to the mapfile data when it is written to an external mapfile, or appended to the dumpfile, so that subsequent sessions will not require the initial scan of the KVM dumpfile. (anderson(a)redhat.com) - Fix the KVM-only "map" command to prevent its use when the session is not being run against a KVM dumpfile, and to reject filename arguments to the -a option or without the -f option. (anderson(a)redhat.com) Download from: http://people.redhat.com/anderson

14 years, 9 months

1
0
0 / 0

(no subject)

by Miller, Mike (OS Dev)

Hello again, I sent this yesterday but since I wasn't a member maybe it didn't go anywhere. Hello, I'm trying to debug an issue using crash. I've moved the vmcore file over to a system other than the system from which the dump was collected. I've installed a variety of debug rpms including the kernel-debuginfo, kernel-debug-debuginfo, glib-debuginfo, and kernel-debuginfo-common. I've also copied the lib/modules from the test system to mine. When I run the command "p hba" it returns: Crash > p hba Hba = $2 = 778764288 (please ignore the uppercase, stupid mail client) I expect to see something like: Hba = $1 = {0x102170b0000, 0x0, 0x0,..., 0x0} Why is that? Do I need to be running crash on the test system itself? I thought all relevant info would be contained in the vmcore file and I could attempt the analysis on any system?. The crash version command returns version = $3 = v0.25941. My running kernel is 2.6.18-prep. /etc/issue reports rhel5.5. The kernel on the test system is 2.6.18-164.el5. That is also the version of the debug packages I installed as well as the vmlinux file I'm using. Any problems mixing and matching like that? I have not rebooted my system since installing the debug packages but that doesn't seem necessary. Thanks in advance. -- mikem

14 years, 9 months

2
1
0 / 0

Re: [Crash-utility] can't get hba info

by Dave Anderson

----- "Mike Miller (OS Dev)" <Mike.Miller(a)hp.com> wrote: > Hello, > I'm trying to debug an issue using crash. I've moved the vmcore file > over to a system other than the system from which the dump was > collected. I've installed a variety of debug rpms including the > kernel-debuginfo, kernel-debug-debuginfo, glib-debuginfo, and > kernel-debuginfo-common. I've also copied the lib/modules from the > test system to mine. > > When I run the command "p hba" it returns: > > Crash > p hba > Hba = $2 = 778764288 (please ignore the uppercase, stupid mail client) > > I expect to see something like: > > Hba = $1 = {0x102170b0000, 0x0, 0x0,..., 0x0} > > Why is that? I'm not sure without more info... > Do I need to be running crash on the test system itself? No. > I thought all relevant info would be contained in the vmcore file and > I could attempt the analysis on any system?. True. (although if the debuginfo data you need is not contained in the vmlinux file, you may need to load debuginfo data from the relevant module's debuginfo data.) Anyway, I wonder if your kernel has another "hba" symbol that's being selected by the embedded gdb module? The only sample vmlinux/vmcore I have on hand that knows about an "hba" symbol is this one: crash> whatis hba ctlr_info_t *hba[32]; crash> But I see that that kernel has two of them, so they must be statically defined: crash> sym hba ffffffff82700620 (b) hba ffffffff82700aa0 (b) hba crash> When I run the command, the embedded gdb module picks the second one at ffffffff82700aa0: crash> p hba hba = $4 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} crash> p &hba $11 = (ctlr_info_t *(*)[32]) 0xffffffff82700aa0 crash> So if I want to look at the first one at ffffffff82700620, I'd do this: crash> p *(ctlr_info_t *(*)[32]) 0xffffffff82700620 $14 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} crash> But since your example is displaying hba as "778764288", perhaps it's using a different "hba" instance that's something different? So enter this: crash> whatis hba And confirm whether it's the data structure instance you're referring to. And if it's some other hba, then do this to get the possible addresses: crash> sym hba And using casting on the relevant address to create the output you want (like I did above to access the ctlr_info_t array at the other address). Also, if the the data structure definition is in a module code, (i.e., not declared in a kernel header), then that module's debuginfo data would needs to be loaded with the "mod" command. > The crash version command returns version = $3 = v0.25941. Huh? (Sorry -- I don't know what you're talking about...) > My running kernel is 2.6.18-prep. /etc/issue reports rhel5.5. > The kernel on the test system is 2.6.18-164.el5. That is also the > version of the debug packages I installed as well as the vmlinux file > I'm using. Any problems mixing and matching like that? I have not > rebooted my system since installing the debug packages but that > doesn't seem necessary. Thanks in advance. If you install the complete kernel-debuginfo/kernel-debuginfo-common package pair that goes along with the vmlinux/vmcore pair that you're working with, then there should be no problem loading all module debuginfo data by just entering "mod -S". The kernel-debug-debuginfo package is for the "debug" kernel, which is a completely separate kernel from the base 2.6.18-164.el5 kernel. The glib-debuginfo package is also unnecessary... Hope this helps, Dave

14 years, 9 months

1
0
0 / 0

can't get hba info

by Miller, Mike (OS Dev)

Hello, I'm trying to debug an issue using crash. I've moved the vmcore file over to a system other than the system from which the dump was collected. I've installed a variety of debug rpms including the kernel-debuginfo, kernel-debug-debuginfo, glib-debuginfo, and kernel-debuginfo-common. I've also copied the lib/modules from the test system to mine. When I run the command "p hba" it returns: Crash > p hba Hba = $2 = 778764288 (please ignore the uppercase, stupid mail client) I expect to see something like: Hba = $1 = {0x102170b0000, 0x0, 0x0,..., 0x0} Why is that? Do I need to be running crash on the test system itself? I thought all relevant info would be contained in the vmcore file and I could attempt the analysis on any system?. The crash version command returns version = $3 = v0.25941. My running kernel is 2.6.18-prep. /etc/issue reports rhel5.5. The kernel on the test system is 2.6.18-164.el5. That is also the version of the debug packages I installed as well as the vmlinux file I'm using. Any problems mixing and matching like that? I have not rebooted my system since installing the debug packages but that doesn't seem necessary. Thanks in advance. -- mikem "The spirit of resistance to government is so valuable on certain occasions, that I wish it to be always kept alive. It will often be exercised when wrong, but better so than not to be exercised at all. I like a little rebellion now and then. It is like a storm in the atmosphere." --Thomas Jefferson, letter to Abigail Adams, 1787

14 years, 9 months

1
0
0 / 0

Re: [Crash-utility] [PATCH] bug on get_be_long() and improvement of bt

by Dave Anderson

----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote: > > > On Tue, Oct 19, 2010 at 09:06:33AM -0400, Dave Anderson wrote: > > > > > > > > ----- "Hu Tao" <hutao cn fujitsu com> wrote: > > > > > > > > > Hi Dave, > > > > > > > > > > These are updated patches tested with SMP system and panic task. > > > > > > > > > > When testing a x86 guest, I found another bug about reading cpu > > > > > registers from dumpfile. Qemu simulated system is x86_64 > > > > > (qemu-system-x86_64), guest OS is x86. When crash reads cpu registers > > > > > from dumpfile, it uses cpu_load_32(), this will read gp registers by > > > > > get_be_long(fp, 32), that is, treate them as 32bits. But in fact, > > > > > qemu-system-x86_64 saves 64bits for each of them(although guest OS > > > > > uses only lower 32 bits). As a result, crash gets wrong cpu gp > > > > > register values. > > > > > > > > As I understand it, you're running a 32-bit guest on a 64-bit host. > > > > > > Yes. > > > > > > > If you were to read 64-bit register values instead of 32-bit register > > > > values, wouldn't that cause the file offsets of the subsequent get_xxx() > > > > calls in cpu_load() to read from the wrong file offsets? And then > > > > that would leave the ending file offset incorrect, such that the > > > > qemu_load() loop would fail to find the next device? > > > > > > > > In other words, the cpu_load() function, which is used for both > > > > 32-bit and 64-bit guests, must be reading the correct amount of > > > > data from the "cpu" device, or else qemu_load() would fail to > > > > find the next device in the next location in the dumpfile. > > > > > > True. In fact, in my case if read 32-bit registers, following devices > > > are found: > > > block, ram, kvm-tpr-opt, kvmclock, timer, cpu_common, cpu. > > > If read 64-bit registers, following devices are found: > > > block, ram, kvm-tpr-opt, kvmclock, timer, cpu_common, cpu, apic, fw_cfg > > > > Right -- so it got "lost" after incorrectly gathering the data for the > > first "cpu" device instance. > > > > > > > Is there any way we can know from dumpfile that these gp > > > > > registers(and those similar registers) are 32bits or 64bits? > > > > > > > > I don't know. If what you say is true, when would those registers > > > > ever be 32-bit values? > > > > > > I did tests on a 64-bit machine. Result is: > > > > > > machine OS guest machine guest OS saved gp regs > > > ------------------------------------------------------------------------ > > > 64-bit x86 qemu-kvm(kvm enabled) x86 64 bits > > > 64-bit x86 qemu(kvm disabled) x86 32 bits > > > > I don't understand what you mean when you say that the guest machine > > is "kvm enabled" or "kvm disabled"? > > Sorry for being vague. "kvm enabled" means using qemu-kvm to bring up > guest machine and this enables KVM hardware virtualization on host. > "kvm disabled" means using qemu to bring up guest machine and this > disables KVM hardware virtualization on host. > > > > > And if your host machine is running a 32-bit x86 OS (on 64-bit hardware), > > that's something I've never seen given that Red Hat only allows 64-bit > > kernels as KVM hosts. > > I did the test on Fedora 13 i686. Just tried rhel6 i386, as you said, > there is no kvm support. Hello Hu, Your supposition that the "cpu" device layout is dependent upon the host kernel type is correct, but unfortunately there's no readily-evident way to determine what type of kernel the host was running. This is Paolo's response to the question: > So the question is: > > Can it be determined from something in the dumpfile header that > the *host* machine was running a 32-bit kernel? It's not an exact science, but you can do some trial-and-error. I suggest measuring the distance from between the cpu and apic blocks (which you can do using code from your "workaround" explained below, I guess) and deciding based on the size of the CPU block. A 64-bit image I have lying around takes 987 bytes, I'd guess that anything above 850 is 64-bit. Maybe you can start searching after the first 250 bytes, since the registers are at the beginning and if you're going to get a false match you're going to get it there. The "workaround" he's referring to is this, which will be in the next release: Re: [Crash-utility] [patch] crash on a KVM-generated dump https://www.redhat.com/archives/crash-utility/2010-October/msg00034.html But it's not a particularly graceful solution in this case, because it would require walking through all of the "block" and "ram" devices to find the first "cpu" device -- but at that point the 32-vs-64 bit device has already been selected. I suppose another alternative would be to always start reading the "cpu" data in cpu_load() as if it were created by a 64-bit host, and making a determination somewhere along the way that the data being read is bogus and that it should be using the 32-bit device mechanism, seeking back, and calling the other function? I don't know -- either option would be be really ugly... Anyway, given that the use of 32-bit KVM hosts should be fairly rare, what would you think of handling it this way: (1) use the 64-bit functions by default (2) adding a crash command line option like "--kvmhost 32" to force the use of the 32-bit functions And of course, even if the new option were *not* used on a 32-bit dumpfile, it would still behave as it does now -- crash still comes up OK -- but it just wouldn't be able to use the registers from the header. What do you think? Dave

14 years, 9 months

2
1
0 / 0

no swapper_space

by Per Fransson

Hi Dave, "kmem -i" aborts before it completes when swapping isn't enabled. This is a suggestion for fixing it. Regards, Per diff --git a/memory.c b/memory.c index 287285f..cb02c57 100755 --- a/memory.c +++ b/memory.c @@ -6717,7 +6717,7 @@ dump_kmeminfo(void) } else if (dump_vm_stat("NR_FILE_PAGES", &nr_file_pages, 0)) { char *swapper_space = GETBUF(SIZE(address_space)); - if (!readmem(symbol_value("swapper_space"), KVADDR, swapper_space, + if (!symbol_exists("swapper_space") || !readmem(symbol_value("swapper_space"), KVADDR, swapper_space, SIZE(address_space), "swapper_space", RETURN_ON_ERROR)) swapper_space_nrpages = 0; else @@ -6796,7 +6796,8 @@ dump_kmeminfo(void) * get swap data from dump_swap_info(). */ fprintf(fp, "\n"); - if (dump_swap_info(RETURN_ON_ERROR, &totalswap_pages, + if (symbol_exists("swapper_space")) { + if (dump_swap_info(RETURN_ON_ERROR, &totalswap_pages, &totalused_pages)) { fprintf(fp, "%10s %7ld %11s ----\n", "TOTAL SWAP", totalswap_pages, @@ -6816,7 +6817,7 @@ dump_kmeminfo(void) } else error(INFO, "swap_info[%ld].swap_map at %lx is inaccessible\n", totalused_pages, totalswap_pages); - + } dump_zone_page_usage(); }

14 years, 9 months

2
1
0 / 0

Re: [Crash-utility] spanned vs. present

by Dave Anderson

----- "Per Fransson" <per.fransson.ml(a)gmail.com> wrote: > Hi, > > I'm wondering why kmem -f uses the spanned pages of every zone when > reporting its size. Wouldn't it be more informative to get a value > which excludes any holes? > > This is the code from memory.c I'm thinking of: > > /* > * Same as dump_free_pages_zones_v1(), but updated for numerous 2.6 zone > * and free_area related data structure changes. > */ > static void > dump_free_pages_zones_v2(struct meminfo *fi) > { > [...] > if (VALID_MEMBER(zone_spanned_pages)) > zone_size_offset = OFFSET(zone_spanned_pages); > else > error(FATAL, "zone struct has no spanned_pages field\n"); > [...] > } You can get the present pages with "kmem -z". Dave

14 years, 9 months

2
1
0 / 0

spanned vs. present

by Per Fransson

Hi, I'm wondering why kmem -f uses the spanned pages of every zone when reporting its size. Wouldn't it be more informative to get a value which excludes any holes? This is the code from memory.c I'm thinking of: /* * Same as dump_free_pages_zones_v1(), but updated for numerous 2.6 zone * and free_area related data structure changes. */ static void dump_free_pages_zones_v2(struct meminfo *fi) { [...] if (VALID_MEMBER(zone_spanned_pages)) zone_size_offset = OFFSET(zone_spanned_pages); else error(FATAL, "zone struct has no spanned_pages field\n"); [...] } Regards, Per

14 years, 9 months

1
0
0 / 0

Re: [Crash-utility] [PATCH] bug on get_be_long() and improvement of bt

by Dave Anderson

----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote: > On Tue, Oct 19, 2010 at 09:06:33AM -0400, Dave Anderson wrote: > > > > ----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote: > > > > > Hi Dave, > > > > > > These are updated patches tested with SMP system and panic > task. > > > > > > When testing a x86 guest, I found another bug about reading cpu > > > registers from dumpfile. Qemu simulated system is x86_64 > > > (qemu-system-x86_64), guest OS is x86. When crash reads cpu registers > > > from dumpfile, it uses cpu_load_32(), this will read gp registers by > > > get_be_long(fp, 32), that is, treate them as 32bits. But in fact, > > > qemu-system-x86_64 saves 64bits for each of them(although guest OS > > > uses only lower 32 bits). As a result, crash gets wrong cpu gp > > > register values. > > > > As I understand it, you're running a 32-bit guest on a 64-bit host. > > Yes. > > > If you were to read 64-bit register values instead of 32-bit register > > values, wouldn't that cause the file offsets of the subsequent get_xxx() > > calls in cpu_load() to read from the wrong file offsets? And then > > that would leave the ending file offset incorrect, such that the > > qemu_load() loop would fail to find the next device? > > > > In other words, the cpu_load() function, which is used for both > > 32-bit and 64-bit guests, must be reading the correct amount of > > data from the "cpu" device, or else qemu_load() would fail to > > find the next device in the next location in the dumpfile. > > True. In fact, in my case if read 32-bit registers, following devices > are found: > block, ram, kvm-tpr-opt, kvmclock, timer, cpu_common, cpu. > If read 64-bit registers, following devices are found: > block, ram, kvm-tpr-opt, kvmclock, timer, cpu_common, cpu, apic, fw_cfg Right -- so it got "lost" after incorrectly gathering the data for the first "cpu" device instance. > > > Is there any way we can know from dumpfile that these gp > > > registers(and those similar registers) are 32bits or 64bits? > > > > I don't know. If what you say is true, when would those registers > > ever be 32-bit values? > > I did tests on a 64-bit machine. Result is: > > machine OS guest machine guest OS saved gp regs > ------------------------------------------------------------------------ > 64-bit x86 qemu-kvm(kvm enabled) x86 64 bits > 64-bit x86 qemu(kvm disabled) x86 32 bits I don't understand what you mean when you say that the guest machine is "kvm enabled" or "kvm disabled"? And if your host machine is running a 32-bit x86 OS (on 64-bit hardware), that's something I've never seen given that Red Hat only allows 64-bit kernels as KVM hosts. Dave

14 years, 9 months

2
1
0 / 0

Re: [Crash-utility] [PATCH] bug on get_be_long() and improvement of bt

by Dave Anderson

----- "Dave Anderson" <anderson(a)redhat.com> wrote: > ----- "Hu Tao" <hutao(a)cn.fujitsu.com> wrote: > > > Hi Dave, > > > > There is a bug on get_be_long() that causes high 32 bits truncated. > > As a result, we get wrong registers values from dump file. Patch 1 > > fixes this. > > Good catch! > > > Once we can get right cpu registers values, it's better to use the > > sp/ip for backtracing the active task. This can show a more accurate > > backtrace, not including those invalid frames beyond sp. pathes 2 and > > 3 do this on kvmdump case(virsh dump). > > > > To verify: run that km_probe.c test module on a x86_64 system, then > > `echo q > /proc/sysrq-trigger' to trigger the kprobe which does > > looping in post_handler. Then vrish dump then crash. > > However, I'm wondering whether you actually tested this with a > *crashed* system's dumpfile, and not just with a *live* system dump with > a contrived set of circumstances. And if so, what differences, if any, > did you see with the backtraces of the task that either oops'd or called > panic(), as well as those of the other active tasks that were brought down > by IP interrupt? > > Anyway, I'll give this a thorough testing with a set of sample > dumpfiles that I have on hand. Actually, the patch fails miserably on SMP dumpfiles with segmentation during initialization. And now looking at your patch, I'm wondering whether you even tested this with an SMP system? The change to qemu_load() here: @@ -904,6 +906,9 @@ qemu_load (const struct qemu_device_loader *devices, uint32_t required_features, if (feof (fp) || ferror (fp)) break; + if (STREQ(d->vtbl->name, "cpu")) + result->dx86 = d; + if (sec == QEMU_VM_SECTION_END || sec == QEMU_VM_SECTION_FULL) result->features |= features; } That function cycles through the "cpu" devices for *each* cpu in the system, so this patch will store the device of the last cpu device it encounters. So in an SMP machine, it will store the device for the highest cpu only, right? And then there's this change to get_kvmdump_regs(): @@ -310,7 +311,11 @@ kvmdump_memory_dump(FILE *ofp) void get_kvmdump_regs(struct bt_info *bt, ulong *pc, ulong *sp) { - machdep->get_stack_frame(bt, pc, sp); + if (is_task_active(bt->task)) { + *sp = device_list->dx86->regs[R_ESP]; + *pc = device_list->dx86->eip; + } else + machdep->get_stack_frame(bt, pc, sp); } is_task_active() returns TRUE for all active tasks in an SMP system, not just the panic task. So it would seem you're going to pass back the same registers for all active tasks? And what's the point of this change to kernel.c? diff --git a/kernel.c b/kernel.c index e399099..2627020 100755 --- a/kernel.c +++ b/kernel.c @@ -16,6 +16,7 @@ */ #include "defs.h" +#include "qemu-load.h" #include "xen_hyper_defs.h" #include <elf.h> Also, the change to main.c is unnecessary -- there are dozens of malloc'd memory areas in the program -- so why go to the bother of free'ing just this one prior to exiting? Anyway, instead of saving the device list, I suggest you do something like storing the per-cpu IP/SP values in a separate data structure that can possibly be used as an alternative source for register values for "live dumps" -- and possibly for crashing systems if usable starting hooks cannot be determined in the traditional manner. I had thought of doing something like that in the past, but when I looked at the register values, I must have run into the get_be_long() issue? Dave

14 years, 9 months

2
4
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility October 2010