November 2013 - Crash-utility - Crash Utility List Archives

[PATCH 0/7] Upgrade the remote access to a xen domain.

by Don Slutz

From: Don Slutz <dslutz(a)verizon.com> Currently crash 4.0 and later using the code: http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html There is some very limited documention on crashes remote ptotocol in: http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html and some fixes in the attachment 0001-xen-crashd-Connect-crash-with-domain.patch from the 1st link above. How ever there are issues with the current code: 1) The remote protocol has a minor issue (which I have not been able to happen) based on the fact TCP/IP is stream based protocol. This means a RECV or a SEND may not do the fully requested size of data. In fact the current code assumes that the amount of data that a SEND is called with will all be read with a single RECV. 2) The most common mismatch between crash and older kernels is phys_base. In the remote case, see if the remote server supports vitrual memory access, and if so, see if phys_base can be retreived. 3) crash assumes that the remote system is active and can not return currect IP and SP to do a better back trace. 4) enable a non-active mode of remote access. This patch set attempts to fix these: The fix for #1 I have called NIL mode (patch 1). The fix for #2 uses get_remote_phys_base (patch 3). The fix for #3 uses get_remote_regs (patch 5). The fix for #4 uses special file /dev/xenmem (patch 7). It also REMOTE_NIL() to indicate "remote paused system". Don Slutz (7): Add NIL mode to remote. remote_proc_version: NULL terminate passed buffer on error. Add get_remote_phys_base. Add remote_vtop. bt: get remote live registers if possible. Add get_remote_cr3 Add support for non-live remote. defs.h | 8 +- kernel.c | 25 +++- memory.c | 25 +++- remote.c | 462 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------- x86_64.c | 24 ++++ 5 files changed, 465 insertions(+), 79 deletions(-) -- 1.8.4

12 years, 3 months

3
15
0 / 0

crash crashes reading qemu dump files

by Andi Kleen

So I wanted to use crash to look at a dump from a qemu KVM guest I write a dump file from the qemu console with dump-guest-core foo Then I do on the host crash vmlinux foo ... <segmentation violation in gdb> In gdb I see it seems to jump to 0 (gdb) p $pc $1 = (void (*)()) 0x0 (gdb) bt #0 0x0000000000000000 in ?? () #1 0x000000000073707a in ui_file_write (length_buf=1, buf=0x7fffffffbaaf "N", file=<optimized out>) at ui-file.c:224 #2 tee_file_write (file=<optimized out>, buf=0x7fffffffbaaf "N", length_buf=1) at ui-file.c:758 #3 0x00000000007337d6 in fputc_unfiltered (stream=0x126de20, c=<optimized out>) at utils.c:2209 #4 fputs_maybe_filtered (linebuffer=linebuffer@entry=0x1534a00 "No symbol \"task_struct\" in current context.\n", stream=stream@entry=0x126de20, filter=1) at utils.c:2126 #5 0x00000000007339b0 in vfprintf_maybe_filtered (stream=0x126de20, format=format@entry=0x9db4b5 "%s\n", args=args@entry=0x7fffffffbaf8, filter=1, filter=1) at utils.c:2332 #6 0x0000000000734a4c in vfprintf_filtered (args=0x7fffffffbaf8, format=0x9db4b5 "%s\n", stream=<optimized out>) at utils.c:2340 #7 fprintf_filtered (stream=<optimized out>, format=format@entry=0x9db4b5 "%s\n") at utils.c:2392 #8 0x0000000000676f67 in throw_exception (exception=...) at exceptions.c:234 #9 0x0000000000677219 in throw_it (reason=reason@entry=RETURN_ERROR, error=error@entry=GENERIC_ERROR, fmt=<optimized out>, ap=ap@entry=0x7fffffffbc48) at exceptions.c:434 #10 0x0000000000677436 in throw_verror (error=error@entry=GENERIC_ERROR, fmt=<optimized out>, ap=ap@entry=0x7fffffffbc48) at exceptions.c:440 #11 0x00000000007323d4 in error (string=<optimized out>) at utils.c:717 #12 0x00000000005ea1a3 in c_parse_internal () at c-exp.y:862 #13 0x00000000005ea4f9 in c_parse () at c-exp.y:3064 #14 0x00000000006a2fe1 in parse_exp_in_context (stringptr=stringptr@entry=0x7fffffffda18, pc=pc@entry=0, block=block@entry=0x0, comma=comma@entry=0, out_subexp=out_subexp@entry=0x0, void_context_p=0) at parse.c:1234 #15 0x00000000006a31e5 in parse_exp_1 (stringptr=stringptr@entry=0x7fffffffda68, pc=pc@entry=0, block=block@entry=0x0, comma=comma@entry=0) at parse.c:1136 #16 0x00000000006a3239 in parse_expression (string=0x8739b0 "task_struct") at parse.c:1279 #17 0x000000000064b270 in gdb_get_datatype (req=0xe61620 <shared_bufs>) at symtab.c:5330 #18 gdb_command_funnel (req=req@entry=0xe61620 <shared_bufs>) at symtab.c:5208 #19 0x00000000004ddc45 in gdb_interface (req=req@entry=0xe61620 <shared_bufs>) at gdb_interface.c:397 #20 0x00000000004de102 in gdb_session_init () at gdb_interface.c:244 #21 0x0000000000466c14 in main_loop () at main.c:637 #22 0x0000000000678e83 in captured_command_loop (data=data@entry=0x0) at main.c:258 #23 0x000000000067772a in catch_errors (func=func@entry=0x678e70 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x8b201f "", mask=mask@entry=6) at exceptions.c:557 #24 0x0000000000679e16 in captured_main (data=data@entry=0x7fffffffde00) at main.c:1064 #25 0x000000000067772a in catch_errors (func=func@entry=0x679150 <captured_main>, func_args=func_args@entry=0x7fffffffde00, errstring=errstring@entry=0x8b201f "", mask=mask@entry=6) at exceptions.c:557 #26 0x000000000067a177 in gdb_main (args=0x7fffffffde00) at main.c:1079 #27 gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7fffffffdf58) at main.c:1099 #28 0x00000000004dce84 in gdb_main_loop (argc=<optimized out>, argc@entry=3, argv=argv@entry=0x7fffffffdf58) at gdb_interface.c:76 #29 0x000000000046549f in main (argc=3, argv=0x7fffffffdf58) at main.c:613 Environment: Linux 3.11, current mainline linux (same result) FC19, qemu-system-x86-1.4.2-12.fc19.x86_64 Known problem? -Andi -- ak(a)linux.intel.com -- Speaking for myself only

12 years, 3 months

3
2
0 / 0

[PATCH v2 0/7] Upgrade the remote access to a xen domain.

by Don Slutz

From: Don Slutz <dslutz(a)verizon.com> Currently crash 4.0 and later using the code: http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html There is some very limited documention on crashes remote ptotocol in: http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html and some fixes in the attachment 0001-xen-crashd-Connect-crash-with-domain.patch from the 1st link above. How ever there are issues with the current code: 1) The remote protocol has a minor issue (which I have not been able to happen) based on the fact TCP/IP is stream based protocol. This means a RECV or a SEND may not do the fully requested size of data. In fact the current code assumes that the amount of data that a SEND is called with will all be read with a single RECV. 2) The most common mismatch between crash and older kernels is phys_base. In the remote case, see if the remote server supports vitrual memory access, and if so, see if phys_base can be retreived. 3) crash assumes that the remote system is active and can not return currect IP and SP to do a better back trace. 4) enable a non-active mode of remote access. This patch set attempts to fix these: The fix for #1 I have called NIL mode (patch 1). The fix for #2 uses get_remote_phys_base (patch 3). The fix for #3 uses get_remote_regs (patch 5). The fix for #4 uses special fake file /dev/xenmem (patch 7). It also REMOTE_PAUSED() to indicate "remote paused system". changes v1 to v2: Daniel Kiper: remove all camelCase. remove Emacs directives. Dave Anderson: rework flags, and time of changes. change get_remote_regs() to return non-zero when it successfully gets the registers. fix "make warn" warnings remove Emacs directives. Don Slutz (7): Add NIL mode to remote. remote_proc_version: NULL terminate passed buffer on error. Add get_remote_phys_base. Add remote_vtop. bt: get remote live registers if possible. Add get_remote_cr3 Add support for non-live remote. defs.h | 8 +- kernel.c | 13 +- memory.c | 13 +- remote.c | 464 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------- x86_64.c | 12 ++ 5 files changed, 430 insertions(+), 80 deletions(-) -- 1.8.4

12 years, 3 months

1
7
0 / 0

Using crash to debug a domU on xen.

by Don Slutz

I have some code that allows this. See the following mail thread: http://permalink.gmane.org/gmane.comp.emulators.xen.devel/174807 The questions are: 1. Does remote access have a specification? 2. Is it supported? 3. Should the code be part of xen or crash? -Don Slutz

12 years, 3 months

4
7
0 / 0

Re: [Crash-utility] patch for slight modification to runq -g command

by Dave Anderson

----- Original Message ----- > Hi Dave, > > The rb_root pointer is pretty useless in our debugging. I thought about > removing it but decided not to do it thinking other people may need it. > I can get you another patch that removes the rb_root display from the > "group" line. Please let me know what you think. OK good, that's what I figured. And since the cfs_rq address is being displayed now, if it's really necessary to see the the rb_root, it can easily be accessed from there. No need for another patch -- I've been tinkering with the results of your changes. Thanks again, Dave > > Thanks, > Anthony > > > -----Original Message----- > > From: crash-utility-bounces(a)redhat.com [mailto:crash-utility- > > bounces(a)redhat.com] On Behalf Of Dave Anderson > > Sent: Wednesday, November 13, 2013 1:51 PM > > To: Discussion list for crash utility usage, maintenance and > > development > > Subject: Re: [Crash-utility] patch for slight modification to runq -g > > command > > > > > > > > ----- Original Message ----- > > > > > > > > > ----- Original Message ----- > > > > My mistake, I didn't read your email carefully enough. Here's the > > > > new > > > > patch. > > > > > > > > Thanks, > > > > Anthony > > > > > > Thanks Anthony -- your changes are queued for crash-7.0.4. > > > > > > Dave > > > > Anthony, > > > > Given your interest in this command, I have a question re: the display. > > > > Do you ever have any interest in the rb_root addresses in each group? > > I'm thinking that with your patch adding the task_group and cfs_rq > > addresses, that the rb_root address is fairly useless. It was > > pretty much carried forward from the base "runq" command display, > > where it makes some sense. > > > > Dave > > > > > > -- > > Crash-utility mailing list > > Crash-utility(a)redhat.com > > https://www.redhat.com/mailman/listinfo/crash-utility >

12 years, 3 months

1
0
0 / 0

FW: patch for slight modification to runq -g command

by Chen, Anthony

Hi, We're debugging an in-house application that makes use of hard limit extensively. We ran into a lot of timing windows (all our own making) and we use runq -g command a lot. The current runq -g display only the RBROOT pointer. It really is a bit inconvenient to traverse the task_group hierarchy. It would be nice to have the command also display the corresponding task_group, cfs_rq pointer (at a minimum). Since the way we crash the system by messing up the nr_running and h_nr_running, so we also display those two fields at the same time. Here's an example of before and after. CPU 4 CURRENT: PID: 0 TASK: ffff8840668c0380 COMMAND: "swapper" TASK_GROUP RT_RQ: ffff8800027d3820 RT PRIO_ARRAY: ffff8800027d3820 [no tasks queued] TASK_GROUP CFS_RQ: ffff8800027d36e0 CFS RB_ROOT: ffff8800027d3710 GROUP CFS RB_ROOT: ffff883ff69bcc30 <TDAT> GROUP CFS RB_ROOT: ffff884006290e30 <User> GROUP CFS RB_ROOT: ffff88400641c430 <TDWMVP1> GROUP CFS RB_ROOT: ffff88400646b030 <ServDown1:0> GROUP CFS RB_ROOT: ffff884006492630 <ServOrder1:1> GROUP CFS RB_ROOT: ffff883ff058fe30 <TDWMWD57> (THROTTLED) GROUP CFS RB_ROOT: ffff88047889ee30 <S:WD:3d:35fa> [120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain" <<< more throttled groups removed >>> CPU 4 CURRENT: PID: 0 CFS: ffff8800027d36e0 TASK: ffff8840668c0380 COMMAND: "swapper" TASK_GROUP RT_RQ: ffff8800027d3820 RT PRIO_ARRAY: ffff8800027d3820 [no tasks queued] TASK_GROUP CFS_RQ: ffff8800027d36e0 CFS RB_ROOT: ffff8800027d3710 GROUP: ffff88405394d000 CFS: ffff883ff69bcc00 RB_ROOT: ffff883ff69bcc30 <TDAT> (0 0) GROUP: ffff88405906c400 CFS: ffff884006290e00 RB_ROOT: ffff884006290e30 <User> (0 0) GROUP: ffff884055081000 CFS: ffff88400641c400 RB_ROOT: ffff88400641c430 <TDWMVP1> (0 0) GROUP: ffff884055081c00 CFS: ffff88400646b000 RB_ROOT: ffff88400646b030 <ServDown1:0> (0 0) GROUP: ffff8840580fd400 CFS: ffff884006492600 RB_ROOT: ffff884006492630 <ServOrder1:1> (0 0) GROUP: ffff884058f58c00 CFS: ffff883ff058fe00 RB_ROOT: ffff883ff058fe30 <TDWMWD57> (7 9) (THROTTLED) GROUP: ffff8808fb976000 CFS: ffff88047889ee00 RB_ROOT: ffff88047889ee30 <S:WD:3d:35fa> (1 1) [120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain" <<< more throttled groups removed >>> I have attached the patch we use to display additional information. Could you please take a look at my proposal to see if it is possible that you include this kind of display format. Thanks, Anthony

12 years, 3 months

2
13
0 / 0

Kernel pages reported as PAGE_EXCLUDED

by Stefan Hajnoczi

Hi, I'm analyzing a dump file ("KDUMP" header) with crash 7.0.1 that seems to have some kernel pages missing: crash> struct foo ffff81101e19e000 struct foo struct: page excluded: kernel virtual address: ffff81101e19e000 type: "gdb_readmem_callback" Cannot access memory at address 0xffff81101e19e000 When I set debug to 1 it turns out that even crash's 'mount' command hits excluded pages: mount: page excluded: kernel virtual address: ffff81101d688000 type: "read_string characters" In order to understand this error better I took a look at diskdump.c:read_diskdump(): if (!page_is_dumpable(pfn)) { if ((dd->flags & (ZERO_EXCLUDED|ERROR_EXCLUDED)) == ERROR_EXCLUDED) { if (CRASHDEBUG(8)) fprintf(fp, "read_diskdump: PAGE_EXCLUDED: " "paddr/pfn: %llx/%lx\n", (ulonglong)paddr, pfn); return PAGE_EXCLUDED; } if (CRASHDEBUG(8)) fprintf(fp, "read_diskdump: zero-fill: " "paddr/pfn: %llx/%lx\n", (ulonglong)paddr, pfn); memset(bufptr, 0, cnt); return cnt; } Does this mean these kernel pages are *not* zero? Why would kernel pages containing structs (i.e. not page cache or userspace pages) be excluded from the dump file? (This may seem like a weird question but I did not generate the dump file myself so I can't easily try recreating it with different options.) Thanks, Stefan

12 years, 3 months

2
4
0 / 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility November 2013