[PATCH 0/7] Upgrade the remote access to a xen domain.
by Don Slutz
From: Don Slutz <dslutz(a)verizon.com>
Currently crash 4.0 and later using the code:
http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html
There is some very limited documention on crashes remote ptotocol in:
http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html
and some fixes in the attachment 0001-xen-crashd-Connect-crash-with-domain.patch
from the 1st link above.
How ever there are issues with the current code:
1) The remote protocol has a minor issue (which I have not been able to
happen) based on the fact TCP/IP is stream based protocol. This means
a RECV or a SEND may not do the fully requested size of data. In fact
the current code assumes that the amount of data that a SEND is called
with will all be read with a single RECV.
2) The most common mismatch between crash and older kernels is
phys_base. In the remote case, see if the remote server supports
vitrual memory access, and if so, see if phys_base can be retreived.
3) crash assumes that the remote system is active and can not return
currect IP and SP to do a better back trace.
4) enable a non-active mode of remote access.
This patch set attempts to fix these:
The fix for #1 I have called NIL mode (patch 1).
The fix for #2 uses get_remote_phys_base (patch 3).
The fix for #3 uses get_remote_regs (patch 5).
The fix for #4 uses special file /dev/xenmem (patch 7).
It also REMOTE_NIL() to indicate "remote paused system".
Don Slutz (7):
Add NIL mode to remote.
remote_proc_version: NULL terminate passed buffer on error.
Add get_remote_phys_base.
Add remote_vtop.
bt: get remote live registers if possible.
Add get_remote_cr3
Add support for non-live remote.
defs.h | 8 +-
kernel.c | 25 +++-
memory.c | 25 +++-
remote.c | 462 +++++++++++++++++++++++++++++++++++++++++++++++++++++----------
x86_64.c | 24 ++++
5 files changed, 465 insertions(+), 79 deletions(-)
--
1.8.4
11 years
crash crashes reading qemu dump files
by Andi Kleen
So I wanted to use crash to look at a dump from a qemu KVM guest
I write a dump file from the qemu console with
dump-guest-core foo
Then I do on the host
crash vmlinux foo
...
<segmentation violation in gdb>
In gdb I see it seems to jump to 0
(gdb) p $pc
$1 = (void (*)()) 0x0
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x000000000073707a in ui_file_write (length_buf=1,
buf=0x7fffffffbaaf "N", file=<optimized out>) at ui-file.c:224
#2 tee_file_write (file=<optimized out>, buf=0x7fffffffbaaf "N",
length_buf=1) at ui-file.c:758
#3 0x00000000007337d6 in fputc_unfiltered (stream=0x126de20,
c=<optimized out>) at utils.c:2209
#4 fputs_maybe_filtered (linebuffer=linebuffer@entry=0x1534a00 "No
symbol \"task_struct\" in current context.\n",
stream=stream@entry=0x126de20, filter=1) at utils.c:2126
#5 0x00000000007339b0 in vfprintf_maybe_filtered (stream=0x126de20,
format=format@entry=0x9db4b5 "%s\n",
args=args@entry=0x7fffffffbaf8, filter=1, filter=1) at utils.c:2332
#6 0x0000000000734a4c in vfprintf_filtered (args=0x7fffffffbaf8,
format=0x9db4b5 "%s\n", stream=<optimized out>) at utils.c:2340
#7 fprintf_filtered (stream=<optimized out>,
format=format@entry=0x9db4b5 "%s\n") at utils.c:2392
#8 0x0000000000676f67 in throw_exception (exception=...) at
exceptions.c:234
#9 0x0000000000677219 in throw_it (reason=reason@entry=RETURN_ERROR,
error=error@entry=GENERIC_ERROR, fmt=<optimized out>,
ap=ap@entry=0x7fffffffbc48) at exceptions.c:434
#10 0x0000000000677436 in throw_verror (error=error@entry=GENERIC_ERROR,
fmt=<optimized out>, ap=ap@entry=0x7fffffffbc48)
at exceptions.c:440
#11 0x00000000007323d4 in error (string=<optimized out>) at utils.c:717
#12 0x00000000005ea1a3 in c_parse_internal () at c-exp.y:862
#13 0x00000000005ea4f9 in c_parse () at c-exp.y:3064
#14 0x00000000006a2fe1 in parse_exp_in_context
(stringptr=stringptr@entry=0x7fffffffda18, pc=pc@entry=0,
block=block@entry=0x0,
comma=comma@entry=0, out_subexp=out_subexp@entry=0x0,
void_context_p=0) at parse.c:1234
#15 0x00000000006a31e5 in parse_exp_1
(stringptr=stringptr@entry=0x7fffffffda68, pc=pc@entry=0,
block=block@entry=0x0,
comma=comma@entry=0) at parse.c:1136
#16 0x00000000006a3239 in parse_expression (string=0x8739b0
"task_struct") at parse.c:1279
#17 0x000000000064b270 in gdb_get_datatype (req=0xe61620 <shared_bufs>)
at symtab.c:5330
#18 gdb_command_funnel (req=req@entry=0xe61620 <shared_bufs>) at
symtab.c:5208
#19 0x00000000004ddc45 in gdb_interface (req=req@entry=0xe61620
<shared_bufs>) at gdb_interface.c:397
#20 0x00000000004de102 in gdb_session_init () at gdb_interface.c:244
#21 0x0000000000466c14 in main_loop () at main.c:637
#22 0x0000000000678e83 in captured_command_loop (data=data@entry=0x0) at
main.c:258
#23 0x000000000067772a in catch_errors (func=func@entry=0x678e70
<captured_command_loop>, func_args=func_args@entry=0x0,
errstring=errstring@entry=0x8b201f "", mask=mask@entry=6) at
exceptions.c:557
#24 0x0000000000679e16 in captured_main (data=data@entry=0x7fffffffde00)
at main.c:1064
#25 0x000000000067772a in catch_errors (func=func@entry=0x679150
<captured_main>, func_args=func_args@entry=0x7fffffffde00,
errstring=errstring@entry=0x8b201f "", mask=mask@entry=6) at
exceptions.c:557
#26 0x000000000067a177 in gdb_main (args=0x7fffffffde00) at main.c:1079
#27 gdb_main_entry (argc=<optimized out>,
argv=argv@entry=0x7fffffffdf58) at main.c:1099
#28 0x00000000004dce84 in gdb_main_loop (argc=<optimized out>,
argc@entry=3, argv=argv@entry=0x7fffffffdf58) at gdb_interface.c:76
#29 0x000000000046549f in main (argc=3, argv=0x7fffffffdf58) at
main.c:613
Environment:
Linux 3.11, current mainline linux (same result)
FC19, qemu-system-x86-1.4.2-12.fc19.x86_64
Known problem?
-Andi
--
ak(a)linux.intel.com -- Speaking for myself only
11 years
[PATCH v2 0/7] Upgrade the remote access to a xen domain.
by Don Slutz
From: Don Slutz <dslutz(a)verizon.com>
Currently crash 4.0 and later using the code:
http://lists.xen.org/archives/html/xen-devel/2013-11/msg02569.html
There is some very limited documention on crashes remote ptotocol in:
http://lists.xen.org/archives/html/xen-devel/2013-11/msg02352.html
and some fixes in the attachment 0001-xen-crashd-Connect-crash-with-domain.patch
from the 1st link above.
How ever there are issues with the current code:
1) The remote protocol has a minor issue (which I have not been able to
happen) based on the fact TCP/IP is stream based protocol. This means
a RECV or a SEND may not do the fully requested size of data. In fact
the current code assumes that the amount of data that a SEND is called
with will all be read with a single RECV.
2) The most common mismatch between crash and older kernels is
phys_base. In the remote case, see if the remote server supports
vitrual memory access, and if so, see if phys_base can be retreived.
3) crash assumes that the remote system is active and can not return
currect IP and SP to do a better back trace.
4) enable a non-active mode of remote access.
This patch set attempts to fix these:
The fix for #1 I have called NIL mode (patch 1).
The fix for #2 uses get_remote_phys_base (patch 3).
The fix for #3 uses get_remote_regs (patch 5).
The fix for #4 uses special fake file /dev/xenmem (patch 7).
It also REMOTE_PAUSED() to indicate "remote paused system".
changes v1 to v2:
Daniel Kiper:
remove all camelCase.
remove Emacs directives.
Dave Anderson:
rework flags, and time of changes.
change get_remote_regs() to return non-zero when it successfully gets the registers.
fix "make warn" warnings
remove Emacs directives.
Don Slutz (7):
Add NIL mode to remote.
remote_proc_version: NULL terminate passed buffer on error.
Add get_remote_phys_base.
Add remote_vtop.
bt: get remote live registers if possible.
Add get_remote_cr3
Add support for non-live remote.
defs.h | 8 +-
kernel.c | 13 +-
memory.c | 13 +-
remote.c | 464 +++++++++++++++++++++++++++++++++++++++++++++++++++++----------
x86_64.c | 12 ++
5 files changed, 430 insertions(+), 80 deletions(-)
--
1.8.4
11 years
Re: [Crash-utility] patch for slight modification to runq -g command
by Dave Anderson
----- Original Message -----
> Hi Dave,
>
> The rb_root pointer is pretty useless in our debugging. I thought about
> removing it but decided not to do it thinking other people may need it.
> I can get you another patch that removes the rb_root display from the
> "group" line. Please let me know what you think.
OK good, that's what I figured. And since the cfs_rq address is being
displayed now, if it's really necessary to see the the rb_root, it can
easily be accessed from there.
No need for another patch -- I've been tinkering with the results of
your changes.
Thanks again,
Dave
>
> Thanks,
> Anthony
>
> > -----Original Message-----
> > From: crash-utility-bounces(a)redhat.com [mailto:crash-utility-
> > bounces(a)redhat.com] On Behalf Of Dave Anderson
> > Sent: Wednesday, November 13, 2013 1:51 PM
> > To: Discussion list for crash utility usage, maintenance and
> > development
> > Subject: Re: [Crash-utility] patch for slight modification to runq -g
> > command
> >
> >
> >
> > ----- Original Message -----
> > >
> > >
> > > ----- Original Message -----
> > > > My mistake, I didn't read your email carefully enough. Here's the
> > > > new
> > > > patch.
> > > >
> > > > Thanks,
> > > > Anthony
> > >
> > > Thanks Anthony -- your changes are queued for crash-7.0.4.
> > >
> > > Dave
> >
> > Anthony,
> >
> > Given your interest in this command, I have a question re: the display.
> >
> > Do you ever have any interest in the rb_root addresses in each group?
> > I'm thinking that with your patch adding the task_group and cfs_rq
> > addresses, that the rb_root address is fairly useless. It was
> > pretty much carried forward from the base "runq" command display,
> > where it makes some sense.
> >
> > Dave
> >
> >
> > --
> > Crash-utility mailing list
> > Crash-utility(a)redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
>
11 years
FW: patch for slight modification to runq -g command
by Chen, Anthony
Hi,
We're debugging an in-house application that makes use of hard limit extensively. We ran into a lot of timing windows (all our own making) and we use runq -g command a lot. The current runq -g display only the RBROOT pointer. It really is a bit inconvenient to traverse the task_group hierarchy. It would be nice to have the command also display the corresponding task_group, cfs_rq pointer (at a minimum). Since the way we crash the system by messing up the nr_running and h_nr_running, so we also display those two fields at the same time. Here's an example of before and after.
CPU 4
CURRENT: PID: 0 TASK: ffff8840668c0380 COMMAND: "swapper"
TASK_GROUP RT_RQ: ffff8800027d3820
RT PRIO_ARRAY: ffff8800027d3820
[no tasks queued]
TASK_GROUP CFS_RQ: ffff8800027d36e0
CFS RB_ROOT: ffff8800027d3710
GROUP CFS RB_ROOT: ffff883ff69bcc30 <TDAT>
GROUP CFS RB_ROOT: ffff884006290e30 <User>
GROUP CFS RB_ROOT: ffff88400641c430 <TDWMVP1>
GROUP CFS RB_ROOT: ffff88400646b030 <ServDown1:0>
GROUP CFS RB_ROOT: ffff884006492630 <ServOrder1:1>
GROUP CFS RB_ROOT: ffff883ff058fe30 <TDWMWD57> (THROTTLED)
GROUP CFS RB_ROOT: ffff88047889ee30 <S:WD:3d:35fa>
[120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain"
<<< more throttled groups removed >>>
CPU 4
CURRENT: PID: 0 CFS: ffff8800027d36e0 TASK: ffff8840668c0380 COMMAND: "swapper"
TASK_GROUP RT_RQ: ffff8800027d3820
RT PRIO_ARRAY: ffff8800027d3820
[no tasks queued]
TASK_GROUP CFS_RQ: ffff8800027d36e0
CFS RB_ROOT: ffff8800027d3710
GROUP: ffff88405394d000 CFS: ffff883ff69bcc00 RB_ROOT: ffff883ff69bcc30 <TDAT> (0 0)
GROUP: ffff88405906c400 CFS: ffff884006290e00 RB_ROOT: ffff884006290e30 <User> (0 0)
GROUP: ffff884055081000 CFS: ffff88400641c400 RB_ROOT: ffff88400641c430 <TDWMVP1> (0 0)
GROUP: ffff884055081c00 CFS: ffff88400646b000 RB_ROOT: ffff88400646b030 <ServDown1:0> (0 0)
GROUP: ffff8840580fd400 CFS: ffff884006492600 RB_ROOT: ffff884006492630 <ServOrder1:1> (0 0)
GROUP: ffff884058f58c00 CFS: ffff883ff058fe00 RB_ROOT: ffff883ff058fe30 <TDWMWD57> (7 9) (THROTTLED)
GROUP: ffff8808fb976000 CFS: ffff88047889ee00 RB_ROOT: ffff88047889ee30 <S:WD:3d:35fa> (1 1)
[120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain"
<<< more throttled groups removed >>>
I have attached the patch we use to display additional information. Could you please take a look at my proposal to see if it is possible that you include this kind of display format.
Thanks,
Anthony
11 years
Kernel pages reported as PAGE_EXCLUDED
by Stefan Hajnoczi
Hi,
I'm analyzing a dump file ("KDUMP" header) with crash 7.0.1 that seems
to have some kernel pages missing:
crash> struct foo ffff81101e19e000
struct foo struct: page excluded: kernel virtual address: ffff81101e19e000 type: "gdb_readmem_callback"
Cannot access memory at address 0xffff81101e19e000
When I set debug to 1 it turns out that even crash's 'mount' command
hits excluded pages:
mount: page excluded: kernel virtual address: ffff81101d688000 type: "read_string characters"
In order to understand this error better I took a look at
diskdump.c:read_diskdump():
if (!page_is_dumpable(pfn)) {
if ((dd->flags & (ZERO_EXCLUDED|ERROR_EXCLUDED)) ==
ERROR_EXCLUDED) {
if (CRASHDEBUG(8))
fprintf(fp, "read_diskdump: PAGE_EXCLUDED: "
"paddr/pfn: %llx/%lx\n",
(ulonglong)paddr, pfn);
return PAGE_EXCLUDED;
}
if (CRASHDEBUG(8))
fprintf(fp, "read_diskdump: zero-fill: "
"paddr/pfn: %llx/%lx\n",
(ulonglong)paddr, pfn);
memset(bufptr, 0, cnt);
return cnt;
}
Does this mean these kernel pages are *not* zero?
Why would kernel pages containing structs (i.e. not page cache or
userspace pages) be excluded from the dump file?
(This may seem like a weird question but I did not generate the dump
file myself so I can't easily try recreating it with different options.)
Thanks,
Stefan
11 years