December 2012 - Crash-utility - Crash Utility List Archives

kmem: WARNING: cannot find mem_map page for address

by Bruce Korb

Hi Dave, et al., Having hacked my way around the self-sabotage on the temp file: > #define mWARN( _m, _a...) do { \ > FILE * _sv_fp = fp; \ > fp = stdout; \ > error(WARNING, _m, ##_a); \ > fp = _sv_fp; \ > } while (0) That also removed the mysterious problem of having a duplicate error message show up on the console. My next problem is that I seem to be getting inconsistencies between the data I can print and the data I can find mapping information about: > crash> gdb set $tp = (struct cfs_trace_page *)0xffff8807fb590740 > crash> p $tp->page > $7 = (cfs_page_t *) 0xffffea001bb1d1e8 > crash> p *$tp->page > $8 = { > flags = 144115188075855872, > _count = { > counter = 1 > }, > [...] > lru = { > next = 0xdead000000100100, > prev = 0xdead000000200200 > } > } > crash> kmem 0xffffea001bb1d1e8 > kmem: WARNING: cannot find mem_map page for address: ffffea001bb1d1e8 > 879b1d1e8: kernel virtual address not found in mem map So I can print out the page_t structure (renamed as cfs_page_t in Lustre) at address 0xffff8807fb590740, but when I try to get kmem information about it, it cannot find the page. What am I missing? Thanks for hints/pointers! Regards, Bruce

12 years, 7 months

2
11
0 / 0

Getting access to function parameters

by Ahmed Al-Mehdi

Hello, I have a question about trying to decipher the values of parameters passed to a function in "crash". I understand "bt -f" and "bt -F" prints the stack data, but I am having a hard time deciphering the stack to get access to the values of parameters passed to a function. I understand the compiler could have optimized the parameters into registers. If so, is there a compiler option to turn it off? If not, is my only option to browse the object file to see what registers are used? Is there any extensions (experimental or hack) that I can add to crash to display function parameter values. In the following crash, I am trying to understand the value of the function parameters - e, buf, len. Any help or pointers would be very appreciated. c code: int doread(EB *e, uchar *buf, int len) { return queueread(e->rq, buf, len); } >From crash: crash> bt PID: 2725 TASK: ffff880353c17500 CPU: 1 COMMAND: "bash" #0 [ffff88036276d540] machine_kexec at ffffffff8103281b #1 [ffff88036276d5a0] crash_kexec at ffffffff810ba662 #2 [ffff88036276d670] oops_end at ffffffff81501290 #3 [ffff88036276d6a0] no_context at ffffffff81043bab #4 [ffff88036276d6f0] __bad_area_nosemaphore at ffffffff81043e35 #5 [ffff88036276d740] bad_area at ffffffff81043f5e #6 [ffff88036276d770] __do_page_fault at ffffffff81044710 #7 [ffff88036276d890] do_page_fault at ffffffff8150326e #8 [ffff88036276d8c0] page_fault at ffffffff81500625 [exception RIP: queueread+32] RIP: ffffffffa03e4b70 RSP: ffff88036276d978 RFLAGS: 00010286 RAX: 00000000000005ae RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff8803613c0020 RDI: 0000000000000000 RBP: ffff88036276d9a8 R8: 0000000000000d44 R9: 0000000050c91762 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803613c0020 R13: ffff880341780290 R14: 00000000000237f8 R15: ffff880341780020 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88036276d9b0] elread at ffffffffa03ecd25 [ethdrv] #10 [ffff88036276d9c0] elechosrv at ffffffffa03eef4d [ethdrv] #11 [ffff88036276da00] edwritectl at ffffffffa03dff0e [ethdrv] #12 [ffff88036276de40] writectl at ffffffffa03f028b [ethdrv] #13 [ffff88036276de60] proc_file_write at ffffffff811e6e44 #14 [ffff88036276dea0] proc_reg_write at ffffffff811e0abe #15 [ffff88036276def0] vfs_write at ffffffff8117b068 #16 [ffff88036276df30] sys_write at ffffffff8117ba81 #17 [ffff88036276df80] system_call_fastpath at ffffffff8100b0f2 RIP: 0000003a29ada3c0 RSP: 00007fffe92f1a60 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff8100b0f2 RCX: 0000000000000065 RDX: 000000000000000a RSI: 00007fab2c281000 RDI: 0000000000000001 RBP: 00007fab2c281000 R8: 000000000000000a R9: 00007fab2c272700 R10: 00000000fffffff7 R11: 0000000000000246 R12: 000000000000000a R13: 0000003a29d8c780 R14: 000000000000000a R15: 0000000000e75130 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b crash> bt -f .............. #8 [ffff88036276d8c0] page_fault at ffffffff81500625 [exception RIP: queueread+32] RIP: ffffffffa03e4b70 RSP: ffff88036276d978 RFLAGS: 00010286 RAX: 00000000000005ae RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff8803613c0020 RDI: 0000000000000000 RBP: ffff88036276d9a8 R8: 0000000000000d44 R9: 0000000050c91762 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803613c0020 R13: ffff880341780290 R14: 00000000000237f8 R15: ffff880341780020 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ffff88036276d8c8: ffff880341780020 00000000000237f8 ffff88036276d8d8: ffff880341780290 ffff8803613c0020 ffff88036276d8e8: ffff88036276d9a8 0000000000000000 ffff88036276d8f8: 0000000000000000 0000000000000000 ffff88036276d908: 0000000050c91762 0000000000000d44 ffff88036276d918: 00000000000005ae 0000000000000000 ffff88036276d928: 0000000000001000 ffff8803613c0020 ffff88036276d938: 0000000000000000 ffffffffffffffff ffff88036276d948: ffffffffa03e4b70 0000000000000010 ffff88036276d958: 0000000000010286 ffff88036276d978 ffff88036276d968: 0000000000000018 ffffffffa03ed062 ffff88036276d978: 000005ae613c01ab ffff880341780290 ffff88036276d988: 00000000000005ae ffff8803613c0020 ffff88036276d998: ffff880341780290 00000000000237f8 ffff88036276d9a8: ffff88036276d9b8 ffffffffa03ecd25 #9 [ffff88036276d9b0] elread at ffffffffa03ecd25 [ethdrv] ffff88036276d9b8: ffff88036276d9f8 ffffffffa03eef4d ................... crash> bt -F #8 [ffff88036276d8c0] page_fault at ffffffff81500625 [exception RIP: queueread+32] RIP: ffffffffa03e4b70 RSP: ffff88036276d978 RFLAGS: 00010286 RAX: 00000000000005ae RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff8803613c0020 RDI: 0000000000000000 RBP: ffff88036276d9a8 R8: 0000000000000d44 R9: 0000000050c91762 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8803613c0020 R13: ffff880341780290 R14: 00000000000237f8 R15: ffff880341780020 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ffff88036276d8c8: [size-131072] 00000000000237f8 ffff88036276d8d8: [size-131072] [size-8192] ffff88036276d8e8: ffff88036276d9a8 0000000000000000 ffff88036276d8f8: 0000000000000000 0000000000000000 ffff88036276d908: 0000000050c91762 0000000000000d44 ffff88036276d918: 00000000000005ae 0000000000000000 ffff88036276d928: 0000000000001000 [size-8192] ffff88036276d938: 0000000000000000 ffffffffffffffff ffff88036276d948: queueread+32 0000000000000010 ffff88036276d958: 0000000000010286 ffff88036276d978 ffff88036276d968: 0000000000000018 elwrite+98 ffff88036276d978: 000005ae613c01ab [size-131072] ffff88036276d988: 00000000000005ae [size-8192] ffff88036276d998: [size-131072] 00000000000237f8 ffff88036276d9a8: ffff88036276d9b8 elread+21 #9 [ffff88036276d9b0] elread at ffffffffa03ecd25 [ethdrv] ffff88036276d9b8: ffff88036276d9f8 elechosrv+173 ........................ Regards, Ahmed.

12 years, 7 months

2
1
0 / 0

__error() mucks with pc->tmpfile

by Bruce Korb

I spent today trying to figure out why some parsing was going awry. The problem stems from trying to emit a warning message while reprocessing the pc->tmpfile data. viz.: open_tmpfile(); hq_open(); count = do_list(&ld); hq_close(); rewind(pc->tmpfile); while (fgets(buf, sizeof(buf), pc->tmpfile) != 0) { if (something_wrong(buf)) { error(WARNING, "something wrong"); continue; } ... etc. After the error() invocation, the data have been scribbled on because of this code: if ((fp != stdout) && (fp != pc->stdpipe)) { fprintf(fp, "%s%s%s %s", new_line ? "\n" : "", type == WARNING ? "WARNING" : type == NOTE ? "NOTE" : type == CONT ? spacebuf : pc->curcmd, type == CONT ? " " : ":", buf); fflush(fp); } "fp" being a global variable that is set to pc->tmpfile. I suppose you can say, "works as expected", but it surely isn't as I would expect. How about a nice "standard_error" wrapper that hides and restores that "fp" global variable thingy while invoking __error()? I can do it myself, but I really do not think it advisable for crash client code to fiddle with what seems to me to be internal state. Thanks! - Bruce

12 years, 7 months

2
1
0 / 0

kernel 3.6.10

by Pablo Sole

Hi, I tried the latest crash tool 6.1.1 (built from sources) on a kernel 3.6.10 dump and it failed while reading the module information. Error says: crash: invalid structure member offset: module_core_size FILE: kernel.c LINE: 2974 FUNCTION: module_init() [../../crash-6.1.1/crash] error trace: 809afc6 => 810ecab => 8152543 => 80929f9 80929f9: OFFSET_verify.part.27+71 8152543: OFFSET_verify+51 810ecab: module_init+939 809afc6: main_loop+214 Even if I disable the the modules information gathering (--no_modules) it fails later on while trying to read the idle tasks: crash: cannot determine idle task addresses from init_tasks[] or runqueues[] crash: cannot resolve "init_task_union" Is this a known issue? Any plans to support kernel 3.6.10? As a FYI, I can use --minimal to enter the system and retrieve the dmesg buffer at the crash point. Thanks, Pablo Sole.

12 years, 7 months

2
1
0 / 0

Error in unload_extension

by Karlsson, Jan

I was running valgrind to check some of my own code and then stumbled upon the following: In function unload_extension (extensions.c) last loop: for (ext = extension_table, found = FALSE; ext; ext = ext->next) { in the loop free(ext) is performed and then ext is accessed again in the loop control statement. Fix: - either test for "!found && ext" in loop control or - break the loop if the free statement has executed. Note that there is also a risk that the loop continues (with current code), even if found becomes true, as ext->next is not changed. Jan Jan Karlsson Senior Software Engineer MIB Sony Mobile Communications Tel: +46703062174 sonymobile.com<http://sonymobile.com/> [cid:image001.jpg@01CDDC60.1C5AB450]

12 years, 7 months

2
1
0 / 0

[PATCH] add a new command: cgget

by Zhang Xiaohe

Hello Dave, I made a new command 'cgget' to help investigating parameters of cgroup. Followings are 2 examples. 1. Print parameters of some cgroup crash> cgget -g cpu / /: cpu.rt_period_us: 1000000 cpu.rt_runtime_us: 950000 cpu.stat: nr_periods: 0 nr_throttled: 0 throttled_time: 0 cpu.cfs_period_us: 0 cpu.cfs_quota_us: 0 cpu.shares: 1024 2. Print parameters of all cgroups crash> cgget -a / /: cpuset.cpu_exclusive: 1 cpuset.mem_exclusive: 1 cpuset.mem_hardwall: 0 cpuset.memory_migrate: 0 cpuset.sched_load_balance: 1 cpuset.memory_spread_page: 0 cpuset.memory_spread_slab: 0 cpuset.memory_pressure_enabled: 0 cpuset.memory_pressure: 0 cpuset.sched_relax_domain_level: -1 cpuset.mems: 0 cpuset.cpus: 0-3 ... blkio.io_merged: 8:0 Read 10925 8:0 Write 31704 8:0 Sync 25413 8:0 Async 17216 8:0 Total 42629 Total 42629 blkio.io_queued: 8:0 Read 0 8:0 Write 0 8:0 Sync 0 8:0 Async 0 8:0 Total 0 Total 0 blkio.reset_stats: To build the module from the top-level crash-<version> directory, enter: $ cp <path-to>/cgget.c extensions $ make extensions Please refer to the attachment for more information. And I'm expecting you to give me some advices soon. -- Zhang Xiaohe Regards -------------------------------------------------- Development Dept.I Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST) No. 6 Wenzhu Road, Nanjing, 210012, China TEL: +86+25-86630566-8552 FAX: +86+25-83317685 MAIL: zhangxh(a)cn.fujitsu.com --------------------------------------------------

12 years, 7 months

2
1
0 / 0

[PATCH] Rename eppic_typeislocal in the library

by Petr Tesarik

Hi Luc, please consider the attached patch for inclusion in eppic. The actual name should match the name from eppic_api.h. The crash extension worked previously only because the it relied on implicit declaration of the wrong name, which is ugly. Regards, Petr Tesarik

12 years, 7 months

4
7
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility December 2012