kmem: WARNING: cannot find mem_map page for address
by Bruce Korb
Hi Dave, et al.,
Having hacked my way around the self-sabotage on the temp file:
> #define mWARN( _m, _a...) do { \
> FILE * _sv_fp = fp; \
> fp = stdout; \
> error(WARNING, _m, ##_a); \
> fp = _sv_fp; \
> } while (0)
That also removed the mysterious problem of having a duplicate error
message show up on the console.
My next problem is that I seem to be getting inconsistencies between
the data I can print and the data I can find mapping information about:
> crash> gdb set $tp = (struct cfs_trace_page *)0xffff8807fb590740
> crash> p $tp->page
> $7 = (cfs_page_t *) 0xffffea001bb1d1e8
> crash> p *$tp->page
> $8 = {
> flags = 144115188075855872,
> _count = {
> counter = 1
> },
> [...]
> lru = {
> next = 0xdead000000100100,
> prev = 0xdead000000200200
> }
> }
> crash> kmem 0xffffea001bb1d1e8
> kmem: WARNING: cannot find mem_map page for address: ffffea001bb1d1e8
> 879b1d1e8: kernel virtual address not found in mem map
So I can print out the page_t structure (renamed as cfs_page_t in Lustre)
at address 0xffff8807fb590740, but when I try to get kmem information about
it, it cannot find the page. What am I missing?
Thanks for hints/pointers! Regards, Bruce
11 years, 11 months
Getting access to function parameters
by Ahmed Al-Mehdi
Hello,
I have a question about trying to decipher the values of parameters passed
to a function in "crash". I understand "bt -f" and "bt -F" prints the
stack data, but I am having a hard time deciphering the stack to get
access to the values of parameters passed to a function. I understand the
compiler could have optimized the parameters into registers. If so, is
there a compiler option to turn it off? If not, is my only option to
browse the object file to see what registers are used? Is there any
extensions (experimental or hack) that I can add to crash to display
function parameter values.
In the following crash, I am trying to understand the value of the function
parameters - e, buf, len. Any help or pointers would be very appreciated.
c code:
int
doread(EB *e, uchar *buf, int len)
{
return queueread(e->rq, buf, len);
}
>From crash:
crash> bt
PID: 2725 TASK: ffff880353c17500 CPU: 1 COMMAND: "bash"
#0 [ffff88036276d540] machine_kexec at ffffffff8103281b
#1 [ffff88036276d5a0] crash_kexec at ffffffff810ba662
#2 [ffff88036276d670] oops_end at ffffffff81501290
#3 [ffff88036276d6a0] no_context at ffffffff81043bab
#4 [ffff88036276d6f0] __bad_area_nosemaphore at ffffffff81043e35
#5 [ffff88036276d740] bad_area at ffffffff81043f5e
#6 [ffff88036276d770] __do_page_fault at ffffffff81044710
#7 [ffff88036276d890] do_page_fault at ffffffff8150326e
#8 [ffff88036276d8c0] page_fault at ffffffff81500625
[exception RIP: queueread+32]
RIP: ffffffffa03e4b70 RSP: ffff88036276d978 RFLAGS: 00010286
RAX: 00000000000005ae RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffff8803613c0020 RDI: 0000000000000000
RBP: ffff88036276d9a8 R8: 0000000000000d44 R9: 0000000050c91762
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803613c0020
R13: ffff880341780290 R14: 00000000000237f8 R15: ffff880341780020
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88036276d9b0] elread at ffffffffa03ecd25 [ethdrv]
#10 [ffff88036276d9c0] elechosrv at ffffffffa03eef4d [ethdrv]
#11 [ffff88036276da00] edwritectl at ffffffffa03dff0e [ethdrv]
#12 [ffff88036276de40] writectl at ffffffffa03f028b [ethdrv]
#13 [ffff88036276de60] proc_file_write at ffffffff811e6e44
#14 [ffff88036276dea0] proc_reg_write at ffffffff811e0abe
#15 [ffff88036276def0] vfs_write at ffffffff8117b068
#16 [ffff88036276df30] sys_write at ffffffff8117ba81
#17 [ffff88036276df80] system_call_fastpath at ffffffff8100b0f2
RIP: 0000003a29ada3c0 RSP: 00007fffe92f1a60 RFLAGS: 00010202
RAX: 0000000000000001 RBX: ffffffff8100b0f2 RCX: 0000000000000065
RDX: 000000000000000a RSI: 00007fab2c281000 RDI: 0000000000000001
RBP: 00007fab2c281000 R8: 000000000000000a R9: 00007fab2c272700
R10: 00000000fffffff7 R11: 0000000000000246 R12: 000000000000000a
R13: 0000003a29d8c780 R14: 000000000000000a R15: 0000000000e75130
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> bt -f
..............
#8 [ffff88036276d8c0] page_fault at ffffffff81500625
[exception RIP: queueread+32]
RIP: ffffffffa03e4b70 RSP: ffff88036276d978 RFLAGS: 00010286
RAX: 00000000000005ae RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffff8803613c0020 RDI: 0000000000000000
RBP: ffff88036276d9a8 R8: 0000000000000d44 R9: 0000000050c91762
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803613c0020
R13: ffff880341780290 R14: 00000000000237f8 R15: ffff880341780020
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
ffff88036276d8c8: ffff880341780020 00000000000237f8
ffff88036276d8d8: ffff880341780290 ffff8803613c0020
ffff88036276d8e8: ffff88036276d9a8 0000000000000000
ffff88036276d8f8: 0000000000000000 0000000000000000
ffff88036276d908: 0000000050c91762 0000000000000d44
ffff88036276d918: 00000000000005ae 0000000000000000
ffff88036276d928: 0000000000001000 ffff8803613c0020
ffff88036276d938: 0000000000000000 ffffffffffffffff
ffff88036276d948: ffffffffa03e4b70 0000000000000010
ffff88036276d958: 0000000000010286 ffff88036276d978
ffff88036276d968: 0000000000000018 ffffffffa03ed062
ffff88036276d978: 000005ae613c01ab ffff880341780290
ffff88036276d988: 00000000000005ae ffff8803613c0020
ffff88036276d998: ffff880341780290 00000000000237f8
ffff88036276d9a8: ffff88036276d9b8 ffffffffa03ecd25
#9 [ffff88036276d9b0] elread at ffffffffa03ecd25 [ethdrv]
ffff88036276d9b8: ffff88036276d9f8 ffffffffa03eef4d
...................
crash> bt -F
#8 [ffff88036276d8c0] page_fault at ffffffff81500625
[exception RIP: queueread+32]
RIP: ffffffffa03e4b70 RSP: ffff88036276d978 RFLAGS: 00010286
RAX: 00000000000005ae RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffff8803613c0020 RDI: 0000000000000000
RBP: ffff88036276d9a8 R8: 0000000000000d44 R9: 0000000050c91762
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803613c0020
R13: ffff880341780290 R14: 00000000000237f8 R15: ffff880341780020
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
ffff88036276d8c8: [size-131072] 00000000000237f8
ffff88036276d8d8: [size-131072] [size-8192]
ffff88036276d8e8: ffff88036276d9a8 0000000000000000
ffff88036276d8f8: 0000000000000000 0000000000000000
ffff88036276d908: 0000000050c91762 0000000000000d44
ffff88036276d918: 00000000000005ae 0000000000000000
ffff88036276d928: 0000000000001000 [size-8192]
ffff88036276d938: 0000000000000000 ffffffffffffffff
ffff88036276d948: queueread+32 0000000000000010
ffff88036276d958: 0000000000010286 ffff88036276d978
ffff88036276d968: 0000000000000018 elwrite+98
ffff88036276d978: 000005ae613c01ab [size-131072]
ffff88036276d988: 00000000000005ae [size-8192]
ffff88036276d998: [size-131072] 00000000000237f8
ffff88036276d9a8: ffff88036276d9b8 elread+21
#9 [ffff88036276d9b0] elread at ffffffffa03ecd25 [ethdrv]
ffff88036276d9b8: ffff88036276d9f8 elechosrv+173
........................
Regards,
Ahmed.
11 years, 11 months
__error() mucks with pc->tmpfile
by Bruce Korb
I spent today trying to figure out why some parsing was going awry.
The problem stems from trying to emit a warning message while
reprocessing the pc->tmpfile data. viz.:
open_tmpfile();
hq_open();
count = do_list(&ld);
hq_close();
rewind(pc->tmpfile);
while (fgets(buf, sizeof(buf), pc->tmpfile) != 0) {
if (something_wrong(buf)) {
error(WARNING, "something wrong");
continue;
}
... etc.
After the error() invocation, the data have been scribbled on because
of this code:
if ((fp != stdout) && (fp != pc->stdpipe)) {
fprintf(fp, "%s%s%s %s", new_line ? "\n" : "",
type == WARNING ? "WARNING" :
type == NOTE ? "NOTE" :
type == CONT ? spacebuf : pc->curcmd,
type == CONT ? " " : ":",
buf);
fflush(fp);
}
"fp" being a global variable that is set to pc->tmpfile.
I suppose you can say, "works as expected", but it surely isn't as
I would expect. How about a nice "standard_error" wrapper that
hides and restores that "fp" global variable thingy while invoking
__error()? I can do it myself, but I really do not think it
advisable for crash client code to fiddle with what seems to me
to be internal state.
Thanks! - Bruce
11 years, 11 months
kernel 3.6.10
by Pablo Sole
Hi,
I tried the latest crash tool 6.1.1 (built from sources) on a kernel
3.6.10 dump and it failed while reading the module information.
Error says:
crash: invalid structure member offset: module_core_size
FILE: kernel.c LINE: 2974 FUNCTION: module_init()
[../../crash-6.1.1/crash] error trace: 809afc6 => 810ecab => 8152543 =>
80929f9
80929f9: OFFSET_verify.part.27+71
8152543: OFFSET_verify+51
810ecab: module_init+939
809afc6: main_loop+214
Even if I disable the the modules information gathering (--no_modules)
it fails later on while trying to read the idle tasks:
crash: cannot determine idle task addresses from init_tasks[] or runqueues[]
crash: cannot resolve "init_task_union"
Is this a known issue? Any plans to support kernel 3.6.10?
As a FYI, I can use --minimal to enter the system and retrieve the dmesg
buffer at the crash point.
Thanks,
Pablo Sole.
11 years, 11 months
Error in unload_extension
by Karlsson, Jan
I was running valgrind to check some of my own code and then stumbled upon the following:
In function unload_extension (extensions.c) last loop:
for (ext = extension_table, found = FALSE; ext; ext = ext->next) {
in the loop free(ext) is performed and then ext is accessed again in the loop control statement. Fix:
- either test for "!found && ext" in loop control or
- break the loop if the free statement has executed.
Note that there is also a risk that the loop continues (with current code), even if found becomes true, as ext->next is not changed.
Jan
Jan Karlsson
Senior Software Engineer
MIB
Sony Mobile Communications
Tel: +46703062174
sonymobile.com<http://sonymobile.com/>
[cid:image001.jpg@01CDDC60.1C5AB450]
11 years, 11 months
[PATCH] add a new command: cgget
by Zhang Xiaohe
Hello Dave,
I made a new command 'cgget' to help investigating parameters
of cgroup.
Followings are 2 examples.
1. Print parameters of some cgroup
crash> cgget -g cpu /
/:
cpu.rt_period_us: 1000000
cpu.rt_runtime_us: 950000
cpu.stat:
nr_periods: 0
nr_throttled: 0
throttled_time: 0
cpu.cfs_period_us: 0
cpu.cfs_quota_us: 0
cpu.shares: 1024
2. Print parameters of all cgroups
crash> cgget -a /
/:
cpuset.cpu_exclusive: 1
cpuset.mem_exclusive: 1
cpuset.mem_hardwall: 0
cpuset.memory_migrate: 0
cpuset.sched_load_balance: 1
cpuset.memory_spread_page: 0
cpuset.memory_spread_slab: 0
cpuset.memory_pressure_enabled: 0
cpuset.memory_pressure: 0
cpuset.sched_relax_domain_level: -1
cpuset.mems: 0
cpuset.cpus: 0-3
...
blkio.io_merged:
8:0 Read 10925
8:0 Write 31704
8:0 Sync 25413
8:0 Async 17216
8:0 Total 42629
Total 42629
blkio.io_queued:
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0
blkio.reset_stats:
To build the module from the top-level crash-<version> directory, enter:
$ cp <path-to>/cgget.c extensions
$ make extensions
Please refer to the attachment for more information. And I'm
expecting you to give me some advices soon.
--
Zhang Xiaohe
Regards
--------------------------------------------------
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8552
FAX: +86+25-83317685
MAIL: zhangxh(a)cn.fujitsu.com
--------------------------------------------------
11 years, 11 months
[PATCH] Rename eppic_typeislocal in the library
by Petr Tesarik
Hi Luc,
please consider the attached patch for inclusion in eppic.
The actual name should match the name from eppic_api.h.
The crash extension worked previously only because the it relied on
implicit declaration of the wrong name, which is ugly.
Regards,
Petr Tesarik
11 years, 11 months