November 2010 - Crash-utility - Crash Utility List Archives

Re: [Crash-utility] [PATCH] Show missing tasks in ps

by Dave Anderson

----- "Michael Holzheu" <holzheu(a)linux.vnet.ibm.com> wrote: > Hi Dave, > > I got an s390x dump of a Linux 2.6.36 system, where a task (kmcheck, pid=44) is > missing in the ps output. I debugged the problem and I think that I found the > reason: > > It looks like that crash does not walk the linked list of the pid hash table > to the end, if it finds a NULL pointer in the pid.tasks[PIDTYPE_PID=0] > array. Unfortunately, for the struct pid that is before our lost task in the > linked list this condition is true. Therefore crash does not find our task. That sounds similar to the fix Bob Montgomery made in 5.0.7: - Fix for the potential to miss one or more tasks in 2.6.23 and earlier kernels, presumably due to catching an entry the kernel's pid_hash[] chain in transition. Without the patch, the task will simply not be seen in the gathered task list. (bob.montgomery(a)hp.com) where this was his patch posting -- which fixed refresh_hlist_task_table_v2(): [Crash-utility] Missing PID 1 is crash problem with losing tasks https://www.redhat.com/archives/crash-utility/2010-August/msg00049.html and where your patch fixes refresh_hlist_task_table_v3(). I'll give it a test run... Thanks, Dave > The attached patch seems to fix this problem. > > Here my crash debug log with the 2.6.36 dump: > --------------------------------------------- > Task "kmcheck" is in hash slot 2941 in the linked list at position 2: > > crash> print pid_hash[2941] > $4 = { > first = 0x3f5fb7f8 > } > > crash> upid > struct upid { > int nr; > struct pid_namespace *ns; > struct hlist_node pid_chain; > } > SIZE: 32 > > crash> upid.pid_chain > struct upid { > [16] struct hlist_node pid_chain; > } > > crash> eval 0x3f5fb7f8 - 16 > hexadecimal: 3f5fb7e8 > > crash> upid 3f5fb7e8 <<<<---- the first upid in the list > struct upid { > nr = 565, > ns = 0x81d8f8, > pid_chain = { > next = 0x3edea2b0, > pprev = 0x96554e8 > } > } > > crash> pid > struct pid { > atomic_t count; > unsigned int level; > struct hlist_head tasks[3]; > struct rcu_head rcu; > struct upid numbers[1]; > } > SIZE: 80 > > crash> pid.numbers > struct pid { > [48] struct upid numbers[1]; > } > > crash> eval 3f5fb7e8 - 48 > hexadecimal: 3f5fb7b8 > > crash> pid 3f5fb7b8 > struct pid { > count = { > counter = 1 > }, > level = 0, > tasks = {{ > first = 0x0 <<<----------- tasks[0] is NULL > }, { > first = 0x3d488620 > }, { > first = 0x0 > }}, > rcu = { > next = 0x5a5a5a5a5a5a5a5a, > func = 0x5a5a5a5a5a5a5a5a > }, > numbers = {{ > nr = 565, > ns = 0x81d8f8, > pid_chain = { > next = 0x3edea2b0, <<<--------- Pointer to second element in > list > pprev = 0x96554e8 > } > }} > } > > crash> eval 0x3edea2b0 - 16 > hexadecimal: 3edea2a0 <<<-- The second upid in the list > > crash> upid 0x3edea2a0 > struct upid { > nr = 44, <<<--- Our missing pid=44 (kmcheck) > ns = 0x81d8f8, > pid_chain = { > next = 0x0, > pprev = 0x3f5fb7f8 > } > } > > crash> eval 0x3edea2a0 - 48 > hexadecimal: 3edea270 > > crash> pid 3edea270 > struct pid { > count = { > counter = 5 > }, > level = 0, > tasks = {{ > first = 0x3e799908 <<<--- Pointer to our task_struct.pids > }, { > first = 0x0 > }, { > first = 0x0 > }}, > rcu = { > next = 0x5a5a5a5a5a5a5a5a, > func = 0x5a5a5a5a5a5a5a5a > }, > numbers = {{ > nr = 44, > ns = 0x81d8f8, > pid_chain = { > next = 0x0, > pprev = 0x3f5fb7f8 > } > }} > } > > crash> task_struct.pids > struct task_struct { > [712] struct pid_link pids[3]; > } > > crash> eval 0x3e799908 - 712 > hexadecimal: 3e799640 > > crash> task_struct 3e799640 | grep comm > comm = "kmcheck\000\000\000\000\000\000\000\000", <<<--- here it is > --- > task.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/task.c > +++ b/task.c > @@ -2006,7 +2006,7 @@ do_chained: > } > > if (pid_tasks_0 == 0) > - continue; > + goto chain_next; > > next = pid_tasks_0 - OFFSET(task_struct_pids); > > @@ -2042,7 +2042,7 @@ do_chained: > } > > cnt++; > - > +chain_next: > if (pnext) { > kpp = pnext; > upid = pnext - OFFSET(upid_pid_chain); > > > -- > Crash-utility mailing list > Crash-utility(a)redhat.com > https://www.redhat.com/mailman/listinfo/crash-utility

14 years, 8 months

1
0
0 / 0

[ANNOUNCE] crash version 5.1.0 is available

by Dave Anderson

Changelog: - Fix for the x86 "bt" command for the active, non-crashing, tasks in 2.6.31 and later KVM dumpfile kernels that are not configured with CONFIG_4KSTACKS. Without the patch, the exception frame generated by the reboot_interrupt() entry point is not displayed, and if the task was running in user space, the command would generate a "bt: cannot resolve stack trace" error message. (anderson(a)redhat.com) - Ksplice Inc. proposed a patch which added the module name to x86 and x86_64 "bt" frame displays so that it would be readily evident that a kernel function had been replaced by a ksplice module function. Given that the functionality is useful for the frame display of any kernel module function, it has been extended to be used in the "bt" output of the other architectures, as well as in the output of the "bt -[tT]" options. (nelhage(a)ksplice.com, anderson(a)redhat.com) - Enhance the "sym" command to display the containing module name name in brackets (if applicable) when entering a virtual address, symbol name, or symbol query argument. (anderson(a)redhat.com) - Implemented support for the recognition and display of module per-cpu symbols after they have been loaded by the "mod -[sS]" command. Without the patch, any per-cpu symbols declared by a module were not recognized or displayed at all. With the patch, they are displayed in the module's symbol list, and as is the case with base kernel per-cpu symbols, they can be the target of the "p" command in order to show their type and per-cpu virtual addresses. (nakayama.ts(a)ncos.nec.co.jp, anderson(a)redhat.com) - Fix for the x86 "bt" command to properly handle a NMI-interrupted idle task running in cpu_idle(). Without the patch, the backtrace indicated "bt: cannot resolve stack trace" even though it had resolved the trace correctly. (anderson(a)redhat.com) - Implemented support for s390x compressed kdump dumpfiles created by the makedumpfile facility. (mahesh(a)linux.vnet.ibm.com) - Fix for the "bt" command on x86 Xen hypervisor dumpfiles where a vcpu received a shutdown NMI while running in the event_check_interrupt() interrupt handler. Without the patch, the backtrace would indicate "bt: cannot resolve stack trace", and dump the text symbols on the stack. The patch recognizes all hypervisor entry points at the top of the vcpu stack. (anderson(a)redhat.com) - Fix for the "bt" command on x86 Xen hypervisor dumpfiles where a vcpu received a shutdown NMI while running in the hypercall entry point, but its return address on the stack gets perceived as an assembly label symbol within the hypercall code. Without the patch, the backtrace would indicate "bt: cannot resolve stack trace", and dump the text symbols on the stack. The patch replaces the assembly label symbol name with "hypercall". (anderson(a)redhat.com) - Fix for the "help -n" output for s390x ELF vmcore dumpfiles to recognize the EM_S390 e_machine value, the NT_FPREGSET n_type, and the new NT_S390_TIMER, NT_S390_TODCMP, NT_S390_TODPREG, NT_S390_CTRS and NT_S390_PREFIX n_types. Without the patch the e_machine field showed "(unsupported)", and the n_types showed "(?)". (anderson(a)redhat.com) - Fix for the "help -n" output for s390x ELF vmcore dumpfiles to properly dump the contents of the descriptor data of each Elf64_Nhdr note. Without the patch, the pointer to the descriptor data was incorrectly calculated and the resultant data output was "shifted". (anderson(a)redhat.com) - Fix for the "help -n" output for diskdump and compressed kdump files to show the file name as stored in the per-file diskdump_data structure. Without the patch, only "split" dumpfiles displayed their individual dumpfile names, whereas single dumpfiles showed "(null)". (anderson(a)redhat.com) - Resurrection of the "irq -b" command option for 2.6 kernels. (anderson(a)redhat.com) - Fix for the displaying of data generated from shell-escaped commands when the data contains a "%" character followed by a conversion character. Without the patch, a segmentation violation may occur when the a conversion gets attempted by fprintf(). (anderson(a)redhat.com) - Reworked the do_radix_tree() utility function to work without depending upon a hardwired copy of the kernel's radix_tree_node structure, and changed the RADIX_TREE_MAP_SHIFT, RADIX_TREE_MAP_SIZE and RADIX_TREE_MAP_MASK #define's into dynamically calculated values. (anderson(a)redhat.com) - Call FREEBUF() on a GETBUF()-generated buffer in the do_radix_tree() utility function. (wang.chao(a)cn.fujitsu.com) - Store the .debug_frame section offset and size from the vmlinux file, and use its data as an alternative to the .eh_frame section data in the x86_64 unwind code. (wang.chao(a)cn.fujitsu.com) - Fix for the "irq" command when run on 2.6.29 kernels, which declared the irq_desc_ptrs as a static array indexed by NR_IRQS. Without the patch, the command would show nonsensical IRQ data or fail with the error message "irq: invalid kernel virtual address: <address> type: hw_interrupt_type typename". (anderson(a)redhat.com) - Fix for the "irq" command to run with 2.6.34 or later kernels that replaced the array of irq_desc structures or irq_desc pointers with a radix tree. Without the patch, the command would fail with the error message "irq: x86_64_dump_irq: irq_desc[] does not exist?". (anderson(a)redhat.com) - As of 2.6.37, the output of the "irq" command will change from the current manner of displaying a few cherry-picked structure members that are of questionable usefulness and a nightmare to maintain. The new scheme displays the address of the irq_desc/irq_data structure, and a list of one or more associated irqaction structures and their name string. With that information, it is simple matter to ascertain any other desired data concerning the IRQ. (anderson(a)redhat.com) Download from: http://people.redhat.com/anderson

14 years, 8 months

1
0
0 / 0

[PATCH] Show missing tasks in ps

by Michael Holzheu

Hi Dave, I got an s390x dump of a Linux 2.6.36 system, where a task (kmcheck, pid=44) is missing in the ps output. I debugged the problem and I think that I found the reason: It looks like that crash does not walk the linked list of the pid hash table to the end, if it finds a NULL pointer in the pid.tasks[PIDTYPE_PID=0] array. Unfortunately, for the struct pid that is before our lost task in the linked list this condition is true. Therefore crash does not find our task. The attached patch seems to fix this problem. Here my crash debug log with the 2.6.36 dump: --------------------------------------------- Task "kmcheck" is in hash slot 2941 in the linked list at position 2: crash> print pid_hash[2941] $4 = { first = 0x3f5fb7f8 } crash> upid struct upid { int nr; struct pid_namespace *ns; struct hlist_node pid_chain; } SIZE: 32 crash> upid.pid_chain struct upid { [16] struct hlist_node pid_chain; } crash> eval 0x3f5fb7f8 - 16 hexadecimal: 3f5fb7e8 crash> upid 3f5fb7e8 <<<<---- the first upid in the list struct upid { nr = 565, ns = 0x81d8f8, pid_chain = { next = 0x3edea2b0, pprev = 0x96554e8 } } crash> pid struct pid { atomic_t count; unsigned int level; struct hlist_head tasks[3]; struct rcu_head rcu; struct upid numbers[1]; } SIZE: 80 crash> pid.numbers struct pid { [48] struct upid numbers[1]; } crash> eval 3f5fb7e8 - 48 hexadecimal: 3f5fb7b8 crash> pid 3f5fb7b8 struct pid { count = { counter = 1 }, level = 0, tasks = {{ first = 0x0 <<<----------- tasks[0] is NULL }, { first = 0x3d488620 }, { first = 0x0 }}, rcu = { next = 0x5a5a5a5a5a5a5a5a, func = 0x5a5a5a5a5a5a5a5a }, numbers = {{ nr = 565, ns = 0x81d8f8, pid_chain = { next = 0x3edea2b0, <<<--------- Pointer to second element in list pprev = 0x96554e8 } }} } crash> eval 0x3edea2b0 - 16 hexadecimal: 3edea2a0 <<<-- The second upid in the list crash> upid 0x3edea2a0 struct upid { nr = 44, <<<--- Our missing pid=44 (kmcheck) ns = 0x81d8f8, pid_chain = { next = 0x0, pprev = 0x3f5fb7f8 } } crash> eval 0x3edea2a0 - 48 hexadecimal: 3edea270 crash> pid 3edea270 struct pid { count = { counter = 5 }, level = 0, tasks = {{ first = 0x3e799908 <<<--- Pointer to our task_struct.pids }, { first = 0x0 }, { first = 0x0 }}, rcu = { next = 0x5a5a5a5a5a5a5a5a, func = 0x5a5a5a5a5a5a5a5a }, numbers = {{ nr = 44, ns = 0x81d8f8, pid_chain = { next = 0x0, pprev = 0x3f5fb7f8 } }} } crash> task_struct.pids struct task_struct { [712] struct pid_link pids[3]; } crash> eval 0x3e799908 - 712 hexadecimal: 3e799640 crash> task_struct 3e799640 | grep comm comm = "kmcheck\000\000\000\000\000\000\000\000", <<<--- here it is --- task.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/task.c +++ b/task.c @@ -2006,7 +2006,7 @@ do_chained: } if (pid_tasks_0 == 0) - continue; + goto chain_next; next = pid_tasks_0 - OFFSET(task_struct_pids); @@ -2042,7 +2042,7 @@ do_chained: } cnt++; - +chain_next: if (pnext) { kpp = pnext; upid = pnext - OFFSET(upid_pid_chain);

14 years, 8 months

1
0
0 / 0

Re: [Crash-utility] crash "bt" and "dmesg" show different messages

by Dave Anderson

----- "Yuming Cheng" <chengyuming_ah(a)yahoo.com.cn> wrote: > Hi all, > > When using kdump, I find crash "bt" and "dmesg" give different info. > which one is more reliable ? > > Thanks, > ---cym In this case, the dmesg output is more helpful because it contains the exception frame. It's pretty clear that the neigh_cleanup_and_release() has called a destructor function, but the address stored in neigh->parms->neigh_destructor (as stored in RAX) contains a bogus address of 0000000000000001: static void neigh_cleanup_and_release(struct neighbour *neigh) { if (neigh->parms->neigh_destructor) neigh->parms->neigh_destructor(neigh); __neigh_notify(neigh, RTM_DELNEIGH, 0); neigh_release(neigh); } crash> dis -r neigh_cleanup_and_release+0x13 0xffffffff8022108d <neigh_cleanup_and_release>: push %rbx 0xffffffff8022108e <neigh_cleanup_and_release+0x1>: mov 0x10(%rdi),%rax 0xffffffff80221092 <neigh_cleanup_and_release+0x5>: mov %rdi,%rbx 0xffffffff80221095 <neigh_cleanup_and_release+0x8>: mov 0x18(%rax),%rax 0xffffffff80221099 <neigh_cleanup_and_release+0xc>: test %rax,%rax 0xffffffff8022109c <neigh_cleanup_and_release+0xf>: je 0xffffffff802210a0 <neigh_cleanup_and_release+0x13> 0xffffffff8022109e <neigh_cleanup_and_release+0x11>: callq *%rax 0xffffffff802210a0 <neigh_cleanup_and_release+0x13>: lock decl 0x70(%rbx) If you do a "bt -e" I would guess that the exception frame would be found and displayed, but it *should* have been displayed in-line by the "bt" command. I can't tell you why it was not displayed by "bt" unless I have the dumpfile. You also didn't mention what version of crash you were running -- there have been a few fixes for "missing" exception frames. If you want to make the dumpfile available to me, I can take a look at it. Dave > > dmesg > /****************************************/ > Unable to handle kernel NULL pointer dereference at 0000000000000001 > RIP: [<0000000000000001>] > PGD 323c6f067 PUD 323f13067 PMD 0 > Oops: 0010 [1] SMP > last sysfs file: /devices/pci0000:00/0000:00:00.0/irq > CPU 6 > Modules linked in: igb(U) bonding ipv6 xfrm_nalgo crypto_api autofs4 > hidp rfcomm l2cap bluetooth lockd sunrpc dm_mirror dm_multipath > scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi > acpi_memhotplug ac parport_pc lp parport sg ixgbe pcspkr i2c_i801 > serio_raw i2c_core 8021q dca dm_raid45 dm_message dm_region_hash > dm_log dm_mod dm_mem_cache ahci libata shpchp mptsas mptscsih mptbase > scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd > ehci_hcd > Pid: 8894, comm: ifconfig Tainted: G 2.6.18-164.el5debug #1 > RIP: 0010:[<0000000000000001>] [<0000000000000001>] > RSP: 0018:ffff810323dd9cf0 EFLAGS: 00010202 > RAX: 0000000000000001 RBX: ffff81032a8e5b68 RCX: 0000000000000000 > RDX: 0000000000000006 RSI: 0000000000000001 RDI: ffff81032a8e5b68 > RBP: ffff81033aabc850 R08: 0000000000000002 R09: 0000000000000001 > R10: ffff81032a8e5c30 R11: ffffffff80049ee3 R12: ffff81032a8e5ba8 > R13: 0000000000000006 R14: ffff8103238be000 R15: ffffffff8846ad00 > FS: 00002ba7032083f0(0000) GS:ffff810113a9e4c8(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000001 CR3: 00000003242b4000 CR4: 00000000000006e0 > Process ifconfig (pid: 8894, threadinfo ffff810323dd8000, task > ffff810323f461c0) > Stack: ffffffff8023de3b ffff81032a8e5b68 ffffffff8023e0d6 > ffffffff8023e122 > ffffffff88468eb0 ffff8103238be000 ffffffff8846ad00 ffff8103238be000 > 0000000000000000 ffffffff8846ae98 ffffffff8023e12d 0000000000000002 > Call Trace: > [<ffffffff8023de3b>] neigh_cleanup_and_release+0x13/0x2c > [<ffffffff8023e0d6>] neigh_flush_dev+0x9d/0xc3 > [<ffffffff88439acb>] :ipv6:ndisc_netdev_event+0x30/0x3d > [<ffffffff8006ae76>] notifier_call_chain+0x20/0x32 > [<ffffffff80238c52>] dev_close+0x6e/0x72 > [<ffffffff80237d24>] dev_change_flags+0x5a/0x119 > [<ffffffff8026cb77>] devinet_ioctl+0x235/0x59c > [<ffffffff8022f0e3>] sock_ioctl+0x1c7/0x1eb > [<ffffffff8004465d>] do_ioctl+0x21/0x6b > [<ffffffff80031f07>] vfs_ioctl+0x45d/0x4bf > [<ffffffff800c0b9d>] audit_syscall_entry+0x180/0x1b3 > [<ffffffff8004ef9e>] sys_ioctl+0x59/0x78 > [<ffffffff800602a6>] tracesys+0xd5/0xdf > > /****************************************/ > crash btcrash> bt > PID: 8894 TASK: ffff810323f461c0 CPU: 6 COMMAND: "ifconfig" > #0 [ffff810323dd9a50] crash_kexec at ffffffff800b6eae > #1 [ffff810323dd9b10] __die at ffffffff80069087 > #2 [ffff810323dd9b50] do_page_fault at ffffffff8006ad73 > #3 [ffff810323dd9c40] error_exit at ffffffff80060e9d > #4 [ffff810323dd9c78] skb_dequeue at ffffffff80049ee3 > #5 [ffff810323dd9cf0] neigh_cleanup_and_release at ffffffff8023de3b > #6 [ffff810323dd9d00] neigh_flush_dev at ffffffff8023e0d6 > #7 [ffff810323dd9d40] neigh_ifdown at ffffffff8023e12d > #8 [ffff810323dd9d80] ndisc_netdev_event at ffffffff88439acb > #9 [ffff810323dd9d90] notifier_call_chain at ffffffff8006ae76 > #10 [ffff810323dd9db0] dev_close at ffffffff80238c52 > #11 [ffff810323dd9dc0] dev_change_flags at ffffffff80237d24 > #12 [ffff810323dd9df0] devinet_ioctl at ffffffff8026cb77 > #13 [ffff810323dd9e90] sock_ioctl at ffffffff8022f0e3 > #14 [ffff810323dd9eb0] do_ioctl at ffffffff8004465d > #15 [ffff810323dd9ed0] vfs_ioctl at ffffffff80031f07 > #16 [ffff810323dd9f40] sys_ioctl at ffffffff8004ef9e > #17 [ffff810323dd9f80] tracesys at ffffffff800602a6 (via system_call) > RIP: 0000003749ccc557 RSP: 00007fff470ea238 RFLAGS: 00000206 > RAX: ffffffffffffffda RBX: ffffffff800602a6 RCX: > ffffffffffffffff > RDX: 00007fff470ea240 RSI: 0000000000008914 RDI: > 0000000000000004 > RBP: 0000000000000000 R8: 00007fff470ea244 R9: > 0000000000000002 > R10: 0000000000000001 R11: 0000000000000206 R12: > 00007fff470ea360 > R13: 00000000fffffffe R14: 00007fff470ea530 R15: > 0000000000000004 > ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b > > > > > > > > -- > Crash-utility mailing list > Crash-utility(a)redhat.com > https://www.redhat.com/mailman/listinfo/crash-utility

14 years, 8 months

2
1
0 / 0

crash "bt" and "dmesg" show different messages

by Yuming Cheng

Hi all, When using kdump, I find crash "bt" and "dmesg" give different info. which one is more reliable ? Thanks, ---cym dmesg /****************************************/ Unable to handle kernel NULL pointer dereference at 0000000000000001 RIP: [<0000000000000001>] PGD 323c6f067 PUD 323f13067 PMD 0 Oops: 0010 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 6 Modules linked in: igb(U) bonding ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg ixgbe pcspkr i2c_i801 serio_raw i2c_core 8021q dca dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 8894, comm: ifconfig Tainted: G 2.6.18-164.el5debug #1 RIP: 0010:[<0000000000000001>] [<0000000000000001>] RSP: 0018:ffff810323dd9cf0 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff81032a8e5b68 RCX: 0000000000000000 RDX: 0000000000000006 RSI: 0000000000000001 RDI: ffff81032a8e5b68 RBP: ffff81033aabc850 R08: 0000000000000002 R09: 0000000000000001 R10: ffff81032a8e5c30 R11: ffffffff80049ee3 R12: ffff81032a8e5ba8 R13: 0000000000000006 R14: ffff8103238be000 R15: ffffffff8846ad00 FS: 00002ba7032083f0(0000) GS:ffff810113a9e4c8(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000001 CR3: 00000003242b4000 CR4: 00000000000006e0 Process ifconfig (pid: 8894, threadinfo ffff810323dd8000, task ffff810323f461c0) Stack: ffffffff8023de3b ffff81032a8e5b68 ffffffff8023e0d6 ffffffff8023e122 ffffffff88468eb0 ffff8103238be000 ffffffff8846ad00 ffff8103238be000 0000000000000000 ffffffff8846ae98 ffffffff8023e12d 0000000000000002 Call Trace: [<ffffffff8023de3b>] neigh_cleanup_and_release+0x13/0x2c [<ffffffff8023e0d6>] neigh_flush_dev+0x9d/0xc3 [<ffffffff88439acb>] :ipv6:ndisc_netdev_event+0x30/0x3d [<ffffffff8006ae76>] notifier_call_chain+0x20/0x32 [<ffffffff80238c52>] dev_close+0x6e/0x72 [<ffffffff80237d24>] dev_change_flags+0x5a/0x119 [<ffffffff8026cb77>] devinet_ioctl+0x235/0x59c [<ffffffff8022f0e3>] sock_ioctl+0x1c7/0x1eb [<ffffffff8004465d>] do_ioctl+0x21/0x6b [<ffffffff80031f07>] vfs_ioctl+0x45d/0x4bf [<ffffffff800c0b9d>] audit_syscall_entry+0x180/0x1b3 [<ffffffff8004ef9e>] sys_ioctl+0x59/0x78 [<ffffffff800602a6>] tracesys+0xd5/0xdf /****************************************/ crash btcrash> bt PID: 8894 TASK: ffff810323f461c0 CPU: 6 COMMAND: "ifconfig" #0 [ffff810323dd9a50] crash_kexec at ffffffff800b6eae #1 [ffff810323dd9b10] __die at ffffffff80069087 #2 [ffff810323dd9b50] do_page_fault at ffffffff8006ad73 #3 [ffff810323dd9c40] error_exit at ffffffff80060e9d #4 [ffff810323dd9c78] skb_dequeue at ffffffff80049ee3 #5 [ffff810323dd9cf0] neigh_cleanup_and_release at ffffffff8023de3b #6 [ffff810323dd9d00] neigh_flush_dev at ffffffff8023e0d6 #7 [ffff810323dd9d40] neigh_ifdown at ffffffff8023e12d #8 [ffff810323dd9d80] ndisc_netdev_event at ffffffff88439acb #9 [ffff810323dd9d90] notifier_call_chain at ffffffff8006ae76 #10 [ffff810323dd9db0] dev_close at ffffffff80238c52 #11 [ffff810323dd9dc0] dev_change_flags at ffffffff80237d24 #12 [ffff810323dd9df0] devinet_ioctl at ffffffff8026cb77 #13 [ffff810323dd9e90] sock_ioctl at ffffffff8022f0e3 #14 [ffff810323dd9eb0] do_ioctl at ffffffff8004465d #15 [ffff810323dd9ed0] vfs_ioctl at ffffffff80031f07 #16 [ffff810323dd9f40] sys_ioctl at ffffffff8004ef9e #17 [ffff810323dd9f80] tracesys at ffffffff800602a6 (via system_call) RIP: 0000003749ccc557 RSP: 00007fff470ea238 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: ffffffff800602a6 RCX: ffffffffffffffff RDX: 00007fff470ea240 RSI: 0000000000008914 RDI: 0000000000000004 RBP: 0000000000000000 R8: 00007fff470ea244 R9: 0000000000000002 R10: 0000000000000001 R11: 0000000000000206 R12: 00007fff470ea360 R13: 00000000fffffffe R14: 00007fff470ea530 R15: 0000000000000004 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

14 years, 8 months

1
0
0 / 0

Re: [Crash-utility] [PATCH] Read in .debug_frame section of vmlinux

by Dave Anderson

----- "Wang Chao" <wang.chao(a)cn.fujitsu.com> wrote: > Hi Dave and all, > > The attached patch read the .debug_frame section of vmlinux into > crash. By doing this, we could backtrace kernel stack frame using > dwarf unwind information contained in that section. > > ChangeLog: > - Modified section_header_info() function to initialize the offset > and size of .debug_frame section. > - Modified init_unwind_table() function to read in .debug_frame. > - Modified the prototype of unwind(), adding a int argument to > indicate whether .eh_frame or .debug_frame is used. Also there're > a few changes due to the difference between these two sections. > > Well, it's the first time I came to crash utility, and I hope that > I didn't do anything wrong. > > Thanks, > Wang Chao Hello Wang, The dwarf-based backtrace code is suffering from bit-rot, and in testing with your patch, I'm seeing a few problems with it on certain kernels, where several backtraces of active tasks either fail, display strange frames, show the same frame twice, or just miss frames entirely. (which is why I kept the two possible x86_64 backtrace facilities separate...) But it seems to work for the most part, whereas before it could not even attempt the backtrace. So anyway, given that the patch does not affect the default backtrace path, it seems safe enough to add your patch for those users who wish to run with "unwind" turned on. Thanks, Dave

14 years, 8 months

1
0
0 / 0

Re: [Crash-utility] [PATCH] Fix memory leak when accessing radix tree

by Dave Anderson

----- "Wang Chao" <wang.chao(a)cn.fujitsu.com> wrote: > Hi Dave, > > It seems that function do_radix_tree doesn't free dynamic > allocated memory and in turn cause memory leak. > Hope that the attached patch would help. > > Thanks, > Wang Chao The do_radix_tree() function is not used by any other entity in the crash utility, but could be used by an extension module. It was added back in 2004 in crash version 3.8-1 -- before I kept the changelog file -- and I'm afraid I don't even have a record of who wrote the original version to give them credit. It was updated slightly by atyson(a)hp.com in 2008 (version 4.0-5.1), but as I recently discovered, it doesn't work for any kernel after linux-2.6.19, because it was written with a hardwired dependency on the radix_tree_node structure never changing, and it had hardwired copies of the kernel's RADIX_TREE_MAP_SHIFT, RADIX_TREE_MAP_SIZE and RADIX_TREE_MAP_MASK #define's. In any case, for the next crash release, I have resurrected its functionality by re-writing it to not depend upon the radix_tree_node structure remaining unchanged, and made the 3 #define's dynamically calculated values. I did so in order to use do_radix_tree() with an update the "irq" command, so that it will work with 2.6.34 and later kernels that have replace the irq_desc[] array with the irq_desc_tree radix tree. Anyway, getting back to your patch, while it's certainly good practice to call FREEBUF() for any prior GETBUF() call, in reality it's not a memory leak. All buffers allocated with GETBUF() are *only* in effect for the lifetime of a particular crash command. If any buffers allocated by GETBUF() are not explicitly freed by FREEBUF(), they will be freed automatically by the free_all_bufs() function before the next "crash> " prompt. So it's impossible for them to cause a memory leak. But I'll certainly add the FREEBUF() call in your patch. Thanks, Dave

14 years, 8 months

1
0
0 / 0

[PATCH] s390dbf: Add -s option for saving s390 debug feature

by Michael Holzheu

Hello Dave, Could you please include the following patch: This patch adds a new option "-s" to the s390dbf command. With this option it is possible to save the content of the s390 debug feature (a driver tracing infrastructure) to the specified directory. As output exactly the same directory tree is created as it can be seen on a live system under "/sys/kernel/debug/s390dbf". Michael Signed-off-by: Michael Holzheu <holzheu(a)linux.vnet.ibm.com> --- s390dbf.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 113 insertions(+), 13 deletions(-) --- a/s390dbf.c +++ b/s390dbf.c @@ -204,6 +204,7 @@ static inline kaddr_t kl_funcaddr(kaddr_ #define LOAD_FLAG (1 << C_LFLG_SHFT) #define VIEWS_FLAG (2 << C_LFLG_SHFT) +#define SAVE_DBF_FLAG (4 << C_LFLG_SHFT) #ifndef MIN #define MIN(a,b) (((a)<(b))?(a):(b)) @@ -215,7 +216,7 @@ static inline kaddr_t kl_funcaddr(kaddr_ #define DBF_VERSION_V2 2 #define PAGE_SIZE 4096 #define DEBUG_MAX_VIEWS 10 /* max number of views in proc fs */ -#define DEBUG_MAX_PROCF_LEN 16 /* max length for a proc file name */ +#define DEBUG_MAX_PROCF_LEN 64 /* max length for a proc file name */ #define DEBUG_SPRINTF_MAX_ARGS 10 /* define debug-structures for lcrash */ @@ -1039,6 +1040,18 @@ free_debug_info_v2(debug_info_t * db_inf free(db_info); } +static void +debug_write_output(debug_info_t *db_info, debug_view_t *db_view, FILE * fp) +{ + if (dbf_version == DBF_VERSION_V1) { + debug_format_output_v1(db_info, db_view, fp); + free_debug_info_v1(db_info); + } else { + debug_format_output_v2(db_info, db_view, fp); + free_debug_info_v2(db_info); + } +} + static int get_debug_areas(void) { @@ -1140,13 +1153,7 @@ list_one_view(char *area_name, char *vie fprintf(cmd->efp, "View '%s' not registered!\n", view_name); return -1; } - if(dbf_version == DBF_VERSION_V1){ - debug_format_output_v1(db_info, db_view, cmd->ofp); - free_debug_info_v1(db_info); - } else { - debug_format_output_v2(db_info, db_view, cmd->ofp); - free_debug_info_v2(db_info); - } + debug_write_output(db_info, db_view, cmd->ofp); return 0; } @@ -1222,6 +1229,86 @@ load_debug_view(const char *path, comman } #endif +static int +save_one_view(const char *dbf_dir_name, const char *area_name, + const char *view_name, command_t *cmd) +{ + char path_view[PATH_MAX]; + debug_info_t *db_info; + debug_view_t *db_view; + FILE *view_fh; + + db_info = find_debug_area(area_name); + if (db_info == NULL) { + fprintf(cmd->efp, "Debug log '%s' not found!\n", area_name); + return -1; + } + db_info = get_debug_info(db_info->addr, 1); + + db_view = find_lcrash_debug_view(view_name); + if (db_view == NULL) { + fprintf(cmd->efp, "View '%s' not registered!\n", view_name); + return -1; + } + sprintf(path_view, "%s/%s/%s", dbf_dir_name, area_name, view_name); + view_fh = fopen(path_view, "w"); + if (view_fh == NULL) { + fprintf(cmd->efp, "Could not create file: %s (%s)\n", + path_view, strerror(errno)); + return -1; + } + debug_write_output(db_info, db_view, view_fh); + fclose(view_fh); + return 0; +} + +static int +save_one_area(const char *dbf_dir_name, const char *area_name, command_t *cmd) +{ + char dir_name_area[PATH_MAX]; + debug_info_t *db_info; + int i; + + db_info = find_debug_area(area_name); + if (db_info == NULL) { + fprintf(cmd->efp, "Debug log '%s' not found!\n", area_name); + return -1; + } + sprintf(dir_name_area, "%s/%s", dbf_dir_name, area_name); + if (mkdir(dir_name_area, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH) != 0) { + fprintf(cmd->efp, "Could not create directory: %s (%s)\n", + dir_name_area, strerror(errno)); + return -1; + } + for (i = 0; i < DEBUG_MAX_VIEWS; i++) { + if (db_info->views[i] == NULL) + continue; + if (!find_lcrash_debug_view(db_info->views[i]->name)) + continue; + save_one_view(dbf_dir_name, area_name, db_info->views[i]->name, + cmd); + } + return 0; +} + +static void +save_dbf(const char *dbf_dir_name, command_t *cmd) +{ + debug_info_t *act_debug_info = debug_area_first; + FILE *ofp = cmd->ofp; + + fprintf(ofp, "Saving s390dbf to directory \"%s\"\n", dbf_dir_name); + if (mkdir(dbf_dir_name, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH) != 0) { + fprintf(cmd->efp, "Could not create directory: %s (%s)\n", + dbf_dir_name, strerror(errno)); + return; + } + while (act_debug_info != NULL) { + save_one_area(dbf_dir_name, act_debug_info->name, cmd); + act_debug_info = act_debug_info->next; + } +} + /* * s390dbf_cmd() -- Run the 's390dbf' command. */ @@ -1272,6 +1359,14 @@ s390dbf_cmd(command_t * cmd) if(get_debug_areas() == -1) return -1; + if (cmd->flags & SAVE_DBF_FLAG) { + if (cmd->nargs != 2) { + fprintf(cmd->efp, "Specify directory name for -s\n"); + return 1; + } + save_dbf(cmd->args[1], cmd); + return 0; + } switch (cmd->nargs) { case 0: rc = list_areas(cmd->ofp); @@ -1289,7 +1384,7 @@ s390dbf_cmd(command_t * cmd) return rc; } -#define _S390DBF_USAGE " [-v] [debug log] [debug view]" +#define _S390DBF_USAGE " [-v] [-s dirname] [debug log] [debug view]" /* * s390dbf_usage() -- Print the usage string for the 's390dbf' command. @@ -1307,17 +1402,19 @@ s390dbf_usage(command_t * cmd) char *help_s390dbf[] = { "s390dbf", "s390dbf prints out debug feature logs", - "[-v] [debug_log] [debug_log view]", + "[-v] [-s dirname] [debug log] [debug view]" "", "Display Debug logs:", " + If called without parameters, all active debug logs are listed.", - " + If called with '-v', all debug views which are available to", - " 'crash' are listed", " + If called with the name of a debug log, all debug-views for which", " the debug-log has registered are listed. It is possible thatsome", " of the debug views are not available to 'crash'.", " + If called with the name of a debug-log and an available viewname,", " the specified view is printed.", + " + If called with '-s dirname', the s390dbf is saved to the specified", + " directory", + " + If called with '-v', all debug views which are available to", + " 'crash' are listed", NULL }; @@ -1336,11 +1433,14 @@ void cmd_s390dbf() for (i=1; i < argcnt; i++) cmd.args[i-1] = args[i]; - while ((c = getopt(argcnt, args, "v")) != EOF) { + while ((c = getopt(argcnt, args, "vs")) != EOF) { switch(c) { case 'v': cmd.flags |= VIEWS_FLAG; break; + case 's': + cmd.flags |= SAVE_DBF_FLAG; + break; default: s390dbf_usage(&cmd); return;

14 years, 8 months

2
1
0 / 0

[PATCH] Fix memory leak when accessing radix tree

by Wang Chao

Hi Dave, It seems that function do_radix_tree doesn't free dynamic allocated memory and in turn cause memory leak. Hope that the attached patch would help. Thanks, Wang Chao

14 years, 8 months

1
0
0 / 0

[PATCH] Read in .debug_frame section of vmlinux

by Wang Chao

Hi Dave and all, The attached patch read the .debug_frame section of vmlinux into crash. By doing this, we could backtrace kernel stack frame using dwarf unwind information contained in that section. ChangeLog: - Modified section_header_info() function to initialize the offset and size of .debug_frame section. - Modified init_unwind_table() function to read in .debug_frame. - Modified the prototype of unwind(), adding a int argument to indicate whether .eh_frame or .debug_frame is used. Also there're a few changes due to the difference between these two sections. Well, it's the first time I came to crash utility, and I hope that I didn't do anything wrong. Thanks, Wang Chao

14 years, 8 months

1
0
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility November 2010