March 2008 - Crash-utility - Crash Utility List Archives

Re: crash: cannot gather a stable task list via pid_hash (500 retries)

by Dave Anderson

Eugene, Another debugging aid you can try already exists in the task-gathering function refresh_hlist_task_table_v3(), which does this before making the "duplicate task" check: if (CRASHDEBUG(1)) { if (chained) console(" %lx upid: %lx nr: %d pid: %lx\n" pnext/pprev: %.*lx/%lx task: %lx\n", kpp, upid, upid_nr, pid, VADDR_PRLEN, pnext, pprev, next); else console("pid_hash[%4d]: %lx upid: %lx nr: %d pid: %lx\n" " pnext/pprev: %.*lx/%lx task: %lx\n", i, kpp, upid, upid_nr, pid, VADDR_PRLEN, pnext, pprev, next); } The console() function debug output, however, is a no-op unless you first set up the "console" environment variable with a tty name. Open another window on the system you're running on, get its tty filename, and put it in a .crashrc file located either in the current directory or home directory: set console /dev/pts/<whatever> You can set it to the same window as you're running the crash session if you want -- the main reason the console() function exists is to print often very verbose debug output without trashing crash command output, so it allows you to redirect it to another window. Anyway, having done that, invoke "crash -d1", and you should see output like this on the selected console window, showing the task(s) found by walking each in-use pid_hash[x] hlist_head: ... pid_hash[ 11]: ffff81027c8f9048 upid: ffff81027c8f9038 nr: 666 pid: ffff81027c8f9000 pnext/pprev: 0000000000000000/ffff81000105b5d8 task: ffff81027c934000 pid_hash[ 19]: ffff81027f8012c8 upid: ffff81027f8012b8 nr: 1 pid: ffff81027f801280 pnext/pprev: 0000000000000000/ffff81000105b618 task: ffff81027ec5e000 pid_hash[ 27]: ffff81027cc2c748 upid: ffff81027cc2c738 nr: 378 pid: ffff81027cc2c700 pnext/pprev: 0000000000000000/ffff81000105b658 task: ffff81027cc9a000 pid_hash[ 74]: ffff81027ec589c8 upid: ffff81027ec589b8 nr: 35 pid: ffff81027ec58980 pnext/pprev: 0000000000000000/ffff81000105b7d0 task: ffff81027ee01160 pid_hash[ 119]: ffff81027cc35948 upid: ffff81027cc35938 nr: 2297 pid: ffff81027cc35900 pnext/pprev: 0000000000000000/ffff81000105b938 task: ffff81027d825160 ... Typically there's only one task on any pid_hash chain, but if there's more than one, it will look like this two-task example: pid_hash[2920]: ffff81027d1b7c48 upid: ffff81027d1b7c38 nr: 528 pid: ffff81027d1b7c00 pnext/pprev: ffff81027f801748/ffff8100010610c0 task: ffff81027eef48b0 ffff81027f801748 upid: ffff81027f801738 nr: 7 pid: ffff81027f801700 pnext/pprev: 0000000000000000/ffff81027d1b7c48 task: ffff81027ec72000 Presumably in your case, (if you can reproduce it) there would have been a pid_hash chain that contains ffff81012f0811d0 twice. Your debug output is going to be extremely verbose, because you will see the pid_hash output repeating itself 500 times -- but it will stop at the pid_hash[index] where it found the duplicate entry. I'm curious why you are seeing this. This pid_hash/retry scheme has been in place forever, and I've never seen a legitimate/persistent duplicate task error. Thanks, Dave

17 years, 4 months

1
0
0 / 0

when tracking full slabs, check kmem_cache flag

by i-kitayama＠ap.jp.nec.com

Hi Dave, Users who want to track full slabs (on systmes using the SLUB allocator), the flag SLAB_STORE_USER also needs to be set or add_full() never invoked. I have attached a patch not to echo an empty message when only slub debug is set, could you take a look at it? The patch is for crash-4.0-6.1, tested on a Fedora 8 box (2.6.24.3-12.fc8) Thanks, Itaru

17 years, 4 months

2
3
0 / 0

Re: [Crash-utility] [PATCH] crash: add dev_base_head support for net command

by Dave Anderson

Dave Anderson wrote: > > Eugene Teo wrote: > > > > Hi Dave, > > > > I found that the net command in crash 4.0-6.1 does not work with Fedora > > kernel 2.6.23.14-115.fc8. There was a rework on dev_base[1] to replace > > the use of dev_base variable, and dev->next pointer with for_each_netdev > > loop. This patch fixes this problem. > > > Thanks Eugene -- queued for the next release. > > BTW, the patch looks to have a short shelf-life. In 2.6.24, the > new dev_base_head list_head that your patch looks for no longer > exists. So for now I've also updated the error message in > show_net_devices() to complain if both dev_base and dev_base_head > don't exist. > > I'll update the TODO list with the newer net command issue > if anybody wants to take a shot at fixing it. > > Thanks, > Dave The attached patch addresses the same issue for 2.6.24 and later kernels, and which applies on top of Eugene's patch. Dave

17 years, 4 months

1
0
0 / 0

[PATCH] crash: add dev_base_head support for net command

by Eugene Teo

Hi Dave, I found that the net command in crash 4.0-6.1 does not work with Fedora kernel 2.6.23.14-115.fc8. There was a rework on dev_base[1] to replace the use of dev_base variable, and dev->next pointer with for_each_netdev loop. This patch fixes this problem. $ uname -rm 2.6.23.14-115.fc8 i686 $ sudo ./crash [...] crash> net net: dev_base does not exist! With the patch, $ sudo ./crash [...] crash> net NET_DEVICE NAME IP ADDRESS(ES) c0714480 lo 127.0.0.1 f7031000 wmaster0 f7031800 wlan0 f775c000 eth0 a.b.c.d f6d3a000 redhat0 w.x.y.z [1] linux-2.6 commit: 7562f876cd93800f2f8c89445f2a563590b24e09 [NET]: Rework dev_base via list_head (v3) Signed-off-by: Eugene Teo <eugeneteo(a)kernel.sg> diff -uprN crash-4.0-6.1.default/defs.h crash-4.0-6.1/defs.h --- crash-4.0-6.1.default/defs.h 2008-02-29 00:09:10.000000000 +0800 +++ crash-4.0-6.1/defs.h 2008-03-16 13:51:32.000000000 +0800 @@ -1234,6 +1234,7 @@ struct offset_table { long net_device_type; long net_device_addr_len; long net_device_ip_ptr; + long net_device_dev_list; long device_next; long device_name; long device_type; diff -uprN crash-4.0-6.1.default/net.c crash-4.0-6.1/net.c --- crash-4.0-6.1.default/net.c 2008-02-29 00:09:10.000000000 +0800 +++ crash-4.0-6.1/net.c 2008-03-16 14:01:39.000000000 +0800 @@ -65,6 +65,7 @@ struct devinfo { #define BYTES_IP_TUPLE (BYTES_IP_ADDR + BYTES_PORT_NUM + 1) static void show_net_devices(void); +static void show_net_devices_v2(void); static void print_neighbour_q(ulong, int); static void get_netdev_info(ulong, struct devinfo *); static void get_device_name(ulong, char *); @@ -111,6 +112,7 @@ net_init(void) "net_device", "addr_len"); net->dev_ip_ptr = MEMBER_OFFSET_INIT(net_device_ip_ptr, "net_device", "ip_ptr"); + MEMBER_OFFSET_INIT(net_device_dev_list, "net_device", "dev_list"); ARRAY_LENGTH_INIT(net->net_device_name_index, net_device_name, "net_device.name", NULL, sizeof(char)); net->flags |= (NETDEV_INIT|STRUCT_NET_DEVICE); @@ -355,6 +357,11 @@ show_net_devices(void) long flen; char buf[BUFSIZE]; + if (symbol_exists("dev_base_head")) { + show_net_devices_v2(); + return; + } + if (!symbol_exists("dev_base")) error(FATAL, "dev_base does not exist!\n"); @@ -384,6 +391,58 @@ show_net_devices(void) } while (next); } +static void +show_net_devices_v2(void) +{ + struct list_data list_data, *ld; + char *net_device_buf; + char buf[BUFSIZE]; + ulong *ndevlist; + int ndevcnt, i; + long flen; + + if (!net->netdevice) /* initialized in net_init() */ + return; + + flen = MAX(VADDR_PRLEN, strlen(net->netdevice)); + + fprintf(fp, "%s NAME IP ADDRESS(ES)\n", + mkstring(upper_case(net->netdevice, buf), + flen, CENTER|LJUST, NULL)); + + net_device_buf = GETBUF(SIZE(net_device)); + + ld = &list_data; + BZERO(ld, sizeof(struct list_data)); + get_symbol_data("dev_base_head", sizeof(void *), &ld->start); + ld->end = symbol_value("dev_base_head"); + ld->list_head_offset = OFFSET(net_device_dev_list); + + hq_open(); + ndevcnt = do_list(ld); + ndevlist = (ulong *)GETBUF(ndevcnt * sizeof(ulong)); + ndevcnt = retrieve_list(ndevlist, ndevcnt); + hq_close(); + + for (i = 0; i < ndevcnt; ++i) { + readmem(ndevlist[i], KVADDR, net_device_buf, + SIZE(net_device), "net_device buffer", + FAULT_ON_ERROR); + + fprintf(fp, "%s ", + mkstring(buf, flen, CENTER|RJUST|LONG_HEX, + MKSTR(ndevlist[i]))); + + get_device_name(ndevlist[i], buf); + fprintf(fp, "%-6s ", buf); + + get_device_address(ndevlist[i], buf); + fprintf(fp, "%s\n", buf); + } + + FREEBUF(ndevlist); + FREEBUF(net_device_buf); +} /* * Perform the actual work of dumping the ARP table...

17 years, 4 months

2
1
0 / 0

A patch for match_file_string()

by Alex Sidorenko

Hi Dave, on some distributions (e.g. Ubuntu) crash cannot find the live kernel image because of a slight mismatch between kt->proc_version and 'strings' output from namelist file (e.g. /boot/vmlinux-debug-2.6.22-14-generic) Namely, kt->proc_version is LF-terminated, but there is no LF in 'strings vmlinux' output. If I strip LF from the end of kt->proc_version, everything works fine --- filesys.c.orig 2008-02-28 11:09:10.000000000 -0500 +++ filesys.c 2008-03-13 16:33:02.000000000 -0400 @@ -3689,7 +3689,10 @@ int found; char command[BUFSIZE]; FILE *pipe; + int slen = strlen(string); + if (slen && string[slen-1] == '\n') + string[slen-1] = '\0'; sprintf(command, "/usr/bin/strings %s", filename); if ((pipe = popen(command, "r")) == NULL) { ===================================================================== Regards, Alex -- ------------------------------------------------------------------ Alexandre Sidorenko email: alexs(a)hplinux.canada.hp.com Global Solutions Engineering: Unix Networking Hewlett-Packard (Canada) ------------------------------------------------------------------

17 years, 4 months

2
4
0 / 0

backtrace - how do I get local and formal parametes - kernel is compiled -O0.

by Pete/Piet Delaney

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Got by 1st panic crash, crash seems to be working but I don't feel comfortable as with gdb. I can't see the parameters on the stack as variables as with kgdb nor can I see the local variables on the stack. Seems I should be able to get crash working with gdb so I can see the details that gdb can see. I compiled the kernel -O0 and without the framepointer as it seem crash prefers (rather odd). But want to see the locals and formals on the stack and walk up and down the stack and have the context know to crash just as with gdb. I'll re-read your crash paper. Any suggestions? - -piet - ------------------------------------------------- crash> bt -l PID: 1478 TASK: c7ed4570 CPU: 0 COMMAND: "savproxy" ~ #0 [d76bbbb4] crash_kexec at c01ecb41 ~ include/asm/system.h: 259 ~ #1 [d76bbc24] panic at c018bc33 ~ /nethome/piet/src/India-2.6.16.57/linux/kernel/panic.c: 89 ~ #2 [d76bbc44] skb_tcp_zero_copy_iovec at c06d67a0 ~ /nethome/piet/src/India-2.6.16.57/linux/net/core/skbuff.c: 517 ~ #3 [d76bbcb4] tcp_zerocopy_recvmsg at c07e6c14 ~ /nethome/piet/src/India-2.6.16.57/linux/net/ipv4/tcp.c: 1792 ~ #4 [d76bbd8c] receive_tcp_zero_copy_buffers at c06beb89 ~ /nethome/piet/src/India-2.6.16.57/linux/net/socket.c: 3573 ~ #5 [d76bbf2c] sys_zcopy_sockop at c06bc077 ~ /nethome/piet/src/India-2.6.16.57/linux/net/socket.c: 2536 ~ #6 [d76bbfb8] sysenter_entry at c0981544 ~ include/asm/system.h: 279 ~ EAX: 000000bd EBX: 00000005 ECX: 00000006 EDX: 08228cd8 ~ DS: 007b ESI: 00000100 ES: 007b EDI: 00000001 ~ SS: 007b ESP: b59767d0 EBP: b5976828 ~ CS: 0073 EIP: ffffe410 ERR: 000000bd EFLAGS: 00000246 crash> bt PID: 1478 TASK: c7ed4570 CPU: 0 COMMAND: "savproxy" ~ #0 [d76bbbb4] crash_kexec at c01ecb41 ~ #1 [d76bbc24] panic at c018bc33 ~ #2 [d76bbc44] skb_tcp_zero_copy_iovec at c06d67a0 ~ #3 [d76bbcb4] tcp_zerocopy_recvmsg at c07e6c14 ~ #4 [d76bbd8c] receive_tcp_zero_copy_buffers at c06beb89 ~ #5 [d76bbf2c] sys_zcopy_sockop at c06bc077 ~ #6 [d76bbfb8] sysenter_entry at c0981544 ~ EAX: 000000bd EBX: 00000005 ECX: 00000006 EDX: 08228cd8 ~ DS: 007b ESI: 00000100 ES: 007b EDI: 00000001 ~ SS: 007b ESP: b59767d0 EBP: b5976828 ~ CS: 0073 EIP: ffffe410 ERR: 000000bd EFLAGS: 00000246 crash> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH0g9fJICwm/rv3hoRAhI3AJ9veuvkY6Nfrycz+xoGwGaTNyijwQCfRzdQ TKiaPBx3+B4XNCa0n/mfUCc= =/Pcx -----END PGP SIGNATURE-----

17 years, 4 months

2
7
0 / 0

Re: backtrace - how do I get local and formal parametes - kernel is compiled -O0.

by Piet Delaney

Dave Anderson wrote: > Pete/Piet Delaney wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Dave Anderson wrote: >> | Pete/Piet Delaney wrote: >> |> Got by 1st panic crash, crash seems to be working but >> |> I don't feel comfortable as with gdb. I can't see >> |> the parameters on the stack as variables as with >> |> kgdb nor can I see the local variables on the stack. >> |> Seems I should be able to get crash working with >> |> gdb so I can see the details that gdb can see. >> |> >> |> I compiled the kernel -O0 and without the framepointer >> |> as it seem crash prefers (rather odd). But want to >> |> see the locals and formals on the stack and walk up and >> |> down the stack and have the context know to crash just >> |> as with gdb. >> |> >> |> I'll re-read your crash paper. >> |> >> |> Any suggestions? >> | >> | Nope, other than using "bt -f" to dump each frame's data, >> | and figuring it out from there... >> >> When I was using gdb on SunOS 4.1.4 I was able to set $sp and >> $fp to switch processes. When I took the SP and FP and used >> a gdb set $sp and %fp it didn't seen to allow me to do this: >> - ----------------------------------------------------------- >> crash> gdb set $sp = 0xd76bbbb4 >> No registers. >> >> crash> gdb set $fp = 0xd76bbc24 >> No registers. >> - ----------------------------------------------------------- >> Why is is saying 'No registers'? > > I'm not positive, but the gdb sources seem to display that message > whenever (!target_has_registers), and that's #define'd like so: > > /* Does the target have registers? (Exec files don't.) */ > > #define target_has_registers \ > (current_target.to_has_registers) > > And since it's being invoked as "gdb vmlinux", it only knows > about the exec file, and has no clue about registers. Running ddd+gdb on the crash dump seem to be working unreasonably well. The back traces for the task running on CPU's are available as well as the registers. I thought it would be convenient to get the $sp and $fp from 'crash' for a task that I'm interested in and change the values of $sp and $fp for one of the CPU's contexts. I did a 'set write on' to allow me to change the contents of the crash dump but it continues to say the $sp isn't an l-value. gdb) set write on gdb) set $sp += 4 Left operand of assignment is not an lvalue. (gdb) (gdb) help set write Set writing into executable and core files. (gdb) kgdb use to allow this on live sessions on SunOS 4.1.4. Seems to a common practice still: http://www.mcs.vuw.ac.nz/cgi-bin/info2www?(gdb)Registers I'm fine on the current bug that I'm looking into but would like to be able to use gdb on all of the tasks; not just the ones currently running on CPUs. The kdump gdb macros are a bit of help but extremely slow. I once looked in kgdb running slow on gdb macro interpretation and at that time is was a silly lack of caching that was causing the gdb 'ps' macro to run so slowly. -piet >

17 years, 4 months

1
0
0 / 0

EFBIG with gdb - to you hack gdb to pass O_LARGEFILE in the files open() flags?

by Pete/Piet Delaney

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Dave: I was wondering why gdb isn't having a problem with the crash file being too big when started with crash yet when handing the vmcore directly to gdb it gets an error while opening the file EFBIG. Did you hack gdb to get around this. Seems to be done on 32 bit code when file is too large. Looks like the system call will ignore this if O_LARGEFILE is set: - ------------------------------------------------------------------------- /* ~ * Called when an inode is about to be open. ~ * We use this to disallow opening large files on 32bit systems if ~ * the caller didn't specify O_LARGEFILE. On 64bit systems we force ~ * on this flag in sys_open. ~ */ int generic_file_open(struct inode * inode, struct file * filp) { ~ if (!(filp->f_flags & O_LARGEFILE) && i_size_read(inode) > MAX_NON_LFS) ~ return -EFBIG; ~ return 0; } - --------------------------------------------------------------------------- The system making the core file is likely running in 32 bit mode but it has a lot of memory, likely 4GB. MemTotal: 3,557,132 kB This old 2.6.16 kernel doesn't know that the processor can run in 64 bit mode. Both gdb and crash were build in the same 'build' env and appear to be the same version. - -piet -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH0S8oJICwm/rv3hoRAgJjAJoCymCnNVei16pll4wdhZnP+NKXLgCcDYlY r2K8cYee2j4SdcZ1xkuB15w= =iott -----END PGP SIGNATURE-----

17 years, 4 months

2
4
0 / 0

RE: [Crash-utility] Unable to change the content of memory usingcrash on a live system

by Wright, David

On the other hand, there's nothing to prevent the ambitious developer from writing their own /dev/crash driver that *does* have a write operation in it, is there? -- David Wright, Egenera, Inc. > -----Original Message----- > From: crash-utility-bounces(a)redhat.com > [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson > Sent: Thursday, March 06, 2008 9:37 AM > To: Discussion list for crash utility usage,maintenance and > development > Subject: Re: [Crash-utility] Unable to change the content of > memory usingcrash on a live system > > Dheeraj Sangamkar wrote: > > I use crash 4.0-3.9 on a live 2.6.9-55 kernel on i386/i686 as root. > > > > crash> ls -l /dev/crash > > crw------- 1 root root 10, 61 Mar 5 21:57 /dev/crash > > crash> ls -l /dev/mem > > crw-r----- 1 root kmem 1, 1 Mar 5 16:49 /dev/mem > > crash> q > > [root@linux17081 ~]# ls -l /dev/crash /dev/mem > > ls: /dev/crash: No such file or directory > > crw-r----- 1 root kmem 1, 1 Mar 5 16:49 /dev/mem > > [root@linux17081 ~]# id > > uid=0(root) gid=0(root) > > groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) > > > > So, the /dev/crash file has write permission for me. The > > > > I am attempting to change the content of some memory. > > > > crash> struct request_queue 0xf7b933f8 > > struct request_queue { > > queue_head = { > > <SNIP> > > ... > > } > > > > crash> struct -o request_queue | grep in_flight > > [476] unsigned int in_flight; > > crash> eval 0xf7b933f8 + 476 > > hexadecimal: f7b935d4 > > decimal: 4156110292 (-138857004) > > octal: 36756232724 > > binary: 11110111101110010011010111010100 > > crash> rd f7b935d4 > > f7b935d4: fffffff1 .... > > crash> wr f7b935d4 0 > > wr: cannot write to /dev/crash! > > > > I get the error above even if I change the ownership of > /dev/kmem to > > root:root > > crash> ls -l /dev/mem > > crw-r----- 1 root root 1, 1 Mar 5 16:49 /dev/mem > > > > Am I doing something wrong? How do I change the content of > memory on a > > live system using crash? > > With Red Hat x86 and x86_64 kernels, you can't. > > I feel your pain... > > The crash utility traditionally has had the capability of writing > to /dev/mem, which can be a very useful, powerful (and dangerous) > tool for kernel debugging. > > But Red Hat deemed the /dev/mem interface as a security hole, > and restricted the x86 and x86_64 /dev/mem drivers to just > the first 256 pages (1MB) of physical memory, making it useless > for the crash utility. They allowed me to create the /dev/crash > driver to replace it -- but it is effectively read-only because > the driver has no write file operations handler: > > static struct file_operations crash_fops = { > owner: THIS_MODULE, > llseek: crash_llseek, > read: crash_read, > }; > > and so the kernel's vfs_write() returns EINVAL. > > Changing the permission of /dev/mem won't help because it > isn't used by the crash utility when /dev/crash exists. > > Sorry about that, > Dave > > > -- > Crash-utility mailing list > Crash-utility(a)redhat.com > https://www.redhat.com/mailman/listinfo/crash-utility >

17 years, 4 months

4
5
0 / 0

How to print structs in user space?

by Dheeraj Sangamkar

Hi, When I debug ioctls, I get parameters which are pointers to structures in user space. I am unable to use the struct command to print these structures. Currently I am using "rd -u" to read the content of user memory and decode it based on the structure information I have. Am I missing something? Is there an easier way to do this? Dheeraj

17 years, 4 months

2
4
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility March 2008