crash version 4.0-8.11 is available
by Dave Anderson
- Kdump ELF vmcores contain NT_PRSTATUS notes for online cpus only, so
if cpus have been offlined prior to a crash, there will be fewer
notes than the number of cpus in the system, and therefore there will
not be a one-to-one correlation between each cpu and its associated
NT_PRSTATUS note. That causes backtrace failures for architectures
like ppc64 that depend upon the contents of the NT_PRSTATUS notes for
gathering the starting stack location.
(chandru(a)in.ibm.com, anderson(a)redhat.com)
- Fix and enhancement for the "dev" command. When the command was run
against 2.6.26 or later kernels, it would fail with the error message
"dev: invalid structure member offset: char_device_struct_fops".
Additionally, even when the command did work, more often than not it
would fail to determine the file_operations structure associated with
the block or character device, and erroneously display "(none)" or
"(unused)". This patch makes a more comprehensive search for the
file_operations structure, and instead of just displaying its address
and symbolic translation, it will display the address of the data
structure that contains the pointer to the file_operations structure,
along with the symbolic translation of the file_operations structure.
For character devices, the containing structure is a "cdev", and for
block devices the containing structure is a "gendisk". The command
output adds new CDEV and GENDISK columns, and under the OPERATIONS
column is the symbolic translation of its file_operations structure.
(anderson(a)redhat.com, bob.montgomery(a)hp.com)
- Fix for a potential segmentation violation when running "foreach bt"
on a very active live system with many processes starting and ending.
Without the patch, a segmentation violation could occur when a "bt"
was attempted on a task that had become non-existent. This would
happen on x86_64 or ppc64 machines, and was due to the usage of a
kernel stack pointer taken from a stale/invalid task_struct. The
command will now recognize the bad stack pointer and display the
error message "bt: task no longer exists" or "bt: invalid/stale
stack pointer for this task: <address>".
(anderson(a)redhat.com)
- Fix to correctly read LKCD Version 8 and later x86 dumpfile headers.
(talk90091e(a)gmail.com)
- If a kdump NMI issued to a non-crashing x86_64 cpu was received while
running in schedule(), after having set the next task as "current" in
the cpu's runqueue, but prior to changing the kernel stack to that of
the next task, then a backtrace would fail to make the transition
from the NMI exception stack back to the process stack, with the
error message "bt: cannot transition from exception stack to current
process stack". This patch will report inconsistencies found between
a task marked as the current task in a cpu's runqueue, and the task
found in the per-cpu x8664_pda "pcurrent" field (2.6.29 and earlier)
or the per-cpu "current_task" variable (2.6.30 and later). If it can
be safely determined that the runqueue setting (used by default) is
premature, then the crash utility's internal per-cpu active task will
be changed to be the task indicated by the appropriate architecture
specific value. Also, a new "set -a <task>" option has been added
to manually set a task to be the "active" task on its cpu.
(anderson(a)redhat.com)
- Fix for x86_64 "bt" command when transitioning from the IRQ stack
back to the process stack on 2.6.29 and later kernels. Without the
patch, the interrupt exception frame address on the process stack
would be incorrectly determined, and its display would typically be
preceded by "[exception RIP: unknown or invalid address]", and the
backtrace would fail from that point on.
(anderson(a)redhat.com)
- Enhancement to the "runq" command to show the current task in each
cpu's runqueue, plus a few formatting changes to make the output
easier to understand.
(anderson(a)redhat.com)
- Fix for a memory leak when running on live systems, due to the
repetitive reallocation of the internal array of active tasks.
(anderson(a)redhat.com)
- Fix for usage with vmlinux debuginfo files using Dwarf 3 format,
for example, the Fedora 2.6.31-0.24.rc0.git18.fc12 kernel. Without
the patch, the crash session fails during initialization with the
error message: "Dwarf Error: wrong version in compilation unit header
(is 3, should be 2) [in module <path-to>/vmlinux]", followed by
the erroneous message "crash: <path-to>/vmlinux: no debugging
data available". The patch simply accepts the Dwarf 3 header, and
the embedded gdb-6.1 version still appears to work with the updated
vmlinux debuginfo file format.
(anderson(a)redhat.com)
- Fix for faulty invocation failure when a System.map file is used as
an argument with a compressed diskdump or compressed kdump dumpfile.
If the System.map argument appears after the vmcore file on the
command line, as in: "crash vmcore System.map vmlinux", the crash
session fails immediately with the error message: "crash: vmcore:
initialization failed". With the patch, the arguments may be entered
in any order.
(anderson(a)redhat.com)
- Fix for a potential segmentation violation during invocation if a
vmcore file, a System.map file, and a non-matching vmlinux file are
used as command line arguments. The problem is that whenever a
System.map file is used, it is presumed that the user knows what he
is doing, and that the vmlinux file is not the same as the kernel
that generated the vmcore; therefore the vmlinux/vmcore matching and
verification routines are not performed. However, if the kernel data
structures in the non-matching vmlinux vary widely enough from the
kernel that generated the vmcore, all manners of bogus data may be
read and consumed. The reported segmentation violation occurred when
using a vmcore created from a "stock" Red Hat kernel with a vmlinux
file from a Red Hat "debug" kernel, where the kernel data structures
are significantly different. The patch adds a several new defensive
mechanisms, and displays additional warning messages, when invalid or
questionable data is read, and as a result the crash session will fail
in a more reasonable manner.
(anderson(a)redhat.com)
- Adjusted several virtual and physical memory address definitions for
2.6.31 x86_64 kernels: MAX_PHYSMEM_BITS, VMALLOC_START, VMALLOC_END,
VMEMMAP_VADDR, VMEMMAP_END, MODULES_VADDR and MODULES_END. Without
the patch, when run against CONFIG_SPARSEMEM_VMEMMAP 2.6.31 kernels,
the "kmem -i" option would hang, and when run against CONFIG_SLUB and
CONFIG_SPARSEMEM_VMEMMAP 2.6.31 kernels, the "kmem -s" option would
report numerous errors indicating "kmem: read error: kernel virtual
address: <address> type: page inuse", where the <address> was
a legitimate virtual-memmap page structure address.
(anderson(a)redhat.com)
- Improvement for CONFIG_SLUB "kmem -s" or "kmem -S" options when an
invalid slab page link address is encountered. Without the patch,
the commands fail with a generic "invalid kernel virtual address"
read error message, and "kmem -s" would not display any previously
collected statistics. With the patch, the error message displays
the slab cache name, the list type, and the invalid pointer found,
for example, "kmem: dentry: partial list: page.lru.next: 100100".
(anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
15 years, 5 months
Re: nr_cpus is not calculated properly
by Dave Anderson
----- "Wei Jiang" <talk90091e(a)gmail.com> wrote:
> >
> In my test, I did not see any exceptions else due to my 32bits dump
> file is corrupted. As you know, a incorrect nr_cpus will
> lead to some following fields(dha_smp_current_task, dha_stack) are
> pointed to a error location, which might be a potential defect and
> will be raised in future.
Actually I don't know -- so that's why I asked. I almost never see
an LKCD dumpfile.
Anyway, the fix is queued for the next release.
Thanks,
Dave
15 years, 5 months
Re: [Crash-utility] [RFC][PATCH]: crash aborts with cannot determine idle task
by Dave Anderson
----- "Chandru" <chandru(a)in.ibm.com> wrote:
> > Yes, I tested these changes and they work fine.
> >
> > Thanks,
> > Chandru
>
> Hello Dave,
>
> Could you please let me know if these changes will make it into the next
> version of crash utility ?,
Yes they will -- I just wanted your sign-off before I checked them in.
Thanks again,
Dave
15 years, 5 months
Re: [Crash-utility] [RFC][PATCH]: crash aborts with cannot determine idle task
by Dave Anderson
----- "Chandru" <chandru(a)in.ibm.com> wrote:
> Hi Dave,
>
> Thanks a lot for catching the segfault issue and finding the root cause for it. Here
> follows the updated patch taking in the suggestions from the review comments.
>
> kdump installs NT_PRSTATUS notes into vmcore file only to the cpus that were
> online at the time of crash. In such cases, while reading in the notes from the
> dump file, we are unsure of the cpu to NT_PRSTATUS mapping. The cpu
> possible, present and online map is not available until cpu_maps_init() initializes
> them. Hence we remap the prstatus pointer array to online cpus just after
> a call to this function.
>
> Signed-off-by: Chandru Siddalingappa <chandru(a)linux.vnet.ibm.com>
> Reviewed-by: Dave Anderson <anderson(a)redhat.com>
> Cc: Haren Myneni <haren(a)us.ibm.com>
> ---
This looks one good. The only change that I will make is
in the map_cpu_prstatus() function -- which should just return
immediately if get_cpus_online() is equal to nd->num_prstatus_notes.
Thanks,
Dave
>
> --- crash-4.0-8.10/ppc64.c.orig 2009-06-08 16:08:09.000000000 +0530
> +++ crash-4.0-8.10/ppc64.c 2009-06-09 15:45:39.000000000 +0530
> @@ -2407,13 +2407,16 @@ ppc64_paca_init(void)
> if (!symbol_exists("paca"))
> error(FATAL, "PPC64: Could not find 'paca' symbol\n");
>
> - if (cpu_map_addr("present"))
> + if (cpu_map_addr("possible"))
> + map = POSSIBLE;
> + else if (cpu_map_addr("present"))
> map = PRESENT;
> else if (cpu_map_addr("online"))
> map = ONLINE;
> else
> - error(FATAL,
> - "PPC64: cannot find 'cpu_present_map' or 'cpu_online_map'
> symbols\n");
> + error(FATAL,
> + "PPC64: cannot find 'cpu_possible_map' or\
> + 'cpu_present_map' or 'cpu_online_map' symbols\n");
>
> if (!MEMBER_EXISTS("paca_struct", "data_offset"))
> return;
> @@ -2423,8 +2426,8 @@ ppc64_paca_init(void)
>
> cpu_paca_buf = GETBUF(SIZE(ppc64_paca));
>
> - if (!(nr_paca = get_array_length("paca", NULL, 0)))
> - nr_paca = NR_CPUS;
> + if (!(nr_paca = get_array_length("paca", NULL, 0)))
> + nr_paca = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
>
> if (nr_paca > NR_CPUS) {
> error(WARNING,
> @@ -2435,7 +2438,7 @@ ppc64_paca_init(void)
>
> for (i = cpus = 0; i < nr_paca; i++) {
> /*
> - * CPU present (or online)?
> + * CPU present or online or can exist in the system(possible)?
> */
> if (!in_cpu_map(map, i))
> continue;
> --- crash-4.0-8.10/kernel.c.orig 2009-06-08 16:07:53.000000000 +0530
> +++ crash-4.0-8.10/kernel.c 2009-06-09 15:01:51.000000000 +0530
> @@ -74,6 +74,9 @@ kernel_init()
>
> cpu_maps_init();
>
> + if (KDUMP_DUMPFILE())
> + map_cpu_prstatus();
> +
> kt->stext = symbol_value("_stext");
> kt->etext = symbol_value("_etext");
> get_text_init_space();
> --- crash-4.0-8.10/netdump.c.orig 2009-06-08 16:07:58.000000000 +0530
> +++ crash-4.0-8.10/netdump.c 2009-06-09 16:24:52.000000000 +0530
> @@ -45,6 +45,38 @@ static void check_dumpfile_size(char *);
> (machine_type("IA64") || machine_type("PPC64"))
>
> /*
> + * kdump installs NT_PRSTATUS elf notes only to the cpus
> + * that were online during dumping. Hence we call into
> + * this function after reading the cpu map from the kernel,
> + * to remap the NT_PRSTATUS notes only to the online cpus
> + */
> +void map_cpu_prstatus(void)
> +{
> + void *nt_ptr;
> + int i, j, nrcpus;
> +
> + /* temporary buffer to hold the prstatus_percpu array */
> + if ((nt_ptr = (void *)calloc(nd->num_prstatus_notes,
> + sizeof(void *))) == NULL)
> + error(FATAL,
> + "cannot allocate a buffer to hold prstatus_percpu array\n");
> +
> + memcpy((void *)nt_ptr, nd->nt_prstatus_percpu,
> + (nd->num_prstatus_notes * sizeof(void *)));
> + memset(nd->nt_prstatus_percpu, 0,
> + (nd->num_prstatus_notes * sizeof(void *)));
> +
> + nrcpus = (kt->kernel_NR_CPUS ? kt->kernel_NR_CPUS : NR_CPUS);
> +
> + /* re-populate the array with the notes mapping to online cpus */
> + for (i = 0, j = 0; i < nrcpus; i++)
> + if (in_cpu_map(ONLINE, i))
> + ((unsigned long *)nd->nt_prstatus_percpu)[i] =
> + ((unsigned long *)nt_ptr)[j++];
> + free(nt_ptr);
> +}
> +
> +/*
> * Determine whether a file is a netdump/diskdump/kdump creation,
> * and if TRUE, initialize the vmcore_data structure.
> */
> @@ -618,7 +650,7 @@ get_netdump_panic_task(void)
> crashing_cpu = -1;
> if (kernel_symbol_exists("crashing_cpu")) {
> get_symbol_data("crashing_cpu", sizeof(int), &i);
> - if ((i >= 0) && (i < nd->num_prstatus_notes)) {
> + if ((i >= 0) && in_cpu_map(ONLINE, i)) {
> crashing_cpu = i;
> if (CRASHDEBUG(1))
> error(INFO,
> @@ -2236,7 +2268,7 @@ get_netdump_regs_ppc64(struct bt_info *b
> * CPUs if they responded to an IPI.
> */
> if (nd->num_prstatus_notes > 1) {
> - if (bt->tc->processor >= nd->num_prstatus_notes)
> + if (!nd->nt_prstatus_percpu[bt->tc->processor])
> error(FATAL,
> "cannot determine NT_PRSTATUS ELF note "
> "for %s task: %lx\n",
15 years, 5 months
Re: nr_cpus is not calculated properly
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Wei Jiang" <talk90091e(a)gmail.com> wrote:
...
> > So this line
> > 140 nr_cpus = (hdr_size - offset) / sizeof(dump_CPU_info_t);
> >
> > would not get a correct nr_cpus due to the sizeof().
> >
> > A patch to fix this problem as below.
>
> BTW, what exactly are the ramifications without the patch -- does the
> crash session die during initialization? How come nobody ran into
> this issue given that the code has been in place for almost 2 years?
Again -- what actually happens as a result of the incorrect nr_cpus calculation?
I need something to put in the crash.changelog.
Dave
15 years, 5 months
Re: [Crash-utility] Re: nr_cpus is not calculated properly
by Dave Anderson
----- "Bernhard Walle" <bernhard.walle(a)gmx.de> wrote:
> Dave Anderson schrieb:
> >>
> >> As we know, on x86(32 bits), uint32_t is 4 bytes and uint64_t is 8
> >> bytes.
> >>
> >> So this line
> >> 140 nr_cpus = (hdr_size - offset) / sizeof(dump_CPU_info_t);
> >>
> >> would not get a correct nr_cpus due to the sizeof().
> >>
> >> A patch to fix this problem as below.
> >
> > BTW, what exactly are the ramifications without the patch -- does the
> > crash session die during initialization? How come nobody ran into
> > this issue given that the code has been in place for almost 2 years?
>
> >
> > 4.0-4.8 - ...
> >
> > - Change for support of LKCD dumpfile version 8 and later to determine
> > the backtrace starting registers from the dumpfile header. Increase
> > (maximum) NR_CPUS for ia64 to 4096.
> > (bwalle(a)suse.de)
> >
> > ...
> >
> > (10/30/07)
> >
> > Anyway, the patch looks reasonable to me, but I don't touch the LKCD
> > code without a sign-off from the LKCD maintainers on this mailing list.
> >
> > LKCD maintainers -- do you have any objection to this patch?
>
> Sorry for that mistake, it was me. :-(
>
> It's a copy & paste error (the members are just copied from the
> dump_header_asm_t definition above. And I acknowledge the patch (from
> reading it, I have no test material here any more). Troy may give the
> ultimate acknowledge. ;-)
>
> Regards,
> Bernhard
Good, thanks Bernhard -- it looked pretty obvious, and I'll put it in.
I still wish the guy had indicated exactly what the failure mode was.
It looks like, at a minimum, there could be one or two LKCD-specific
warning messages during initialization, but the crash session should
still come up, right? Given that "nr_cpus" is a local variable and
has nothing to do with crash utility's determination of how many cpus
there are, I wonder what other problems might arise?
Dave
15 years, 5 months
Re: nr_cpus is not calculated properly
by Dave Anderson
----- "Wei Jiang" <talk90091e(a)gmail.com> wrote:
> Hi,
>
> I found nr_cpus is not calculated properly in 32 bits(x86) at
> crash-4.0-8.9.
>
> Around line 140 in file lkcd_v8.c.
> 137 * to find out how many CPUs are configured.
> 138 */
> 139 offset = offsetof(dump_header_asm_t, dha_smp_regs[0]);
> 140 nr_cpus = (hdr_size - offset) / sizeof(dump_CPU_info_t);
> 141
> 142 fprintf(stderr, "CPU number NR_CPUS %d \n", NR_CPUS);
> 143 fprintf(stderr, "header_asm_t size %d \n",
> sizeof(dump_header_asm_t));
>
> And in the corresponding head file.
> # cat -n lkcd_dump_v8.h|grep -A 20 434
> 434 /* smp specific */
> 435 uint32_t dha_smp_num_cpus;
> 436 uint32_t dha_dumping_cpu;
> 437 struct pt_regs dha_smp_regs[NR_CPUS];
> 438 uint32_t dha_smp_current_task[NR_CPUS];
> 439 uint32_t dha_stack[NR_CPUS];
> 440 uint32_t dha_stack_ptr[NR_CPUS];
> 441 } __attribute__((packed)) dump_header_asm_t;
> 442
> 443 /*
> 444 * CPU specific part of dump_header_asm_t
> 445 */
> 446 typedef struct dump_CPU_info_s {
> 447 struct pt_regs dha_smp_regs;
> 448 uint64_t dha_smp_current_task;
> 449 uint64_t dha_stack;
> 450 uint64_t dha_stack_ptr;
> 451 } __attribute__ ((packed)) dump_CPU_info_t;
> 452
> 453
> 454 /*
>
> As we know, on x86(32 bits), uint32_t is 4 bytes and uint64_t is 8
> bytes.
>
> So this line
> 140 nr_cpus = (hdr_size - offset) / sizeof(dump_CPU_info_t);
>
> would not get a correct nr_cpus due to the sizeof().
>
> A patch to fix this problem as below.
BTW, what exactly are the ramifications without the patch -- does the
crash session die during initialization? How come nobody ran into
this issue given that the code has been in place for almost 2 years?
4.0-4.8 - ...
- Change for support of LKCD dumpfile version 8 and later to determine
the backtrace starting registers from the dumpfile header. Increase
(maximum) NR_CPUS for ia64 to 4096.
(bwalle(a)suse.de)
...
(10/30/07)
Anyway, the patch looks reasonable to me, but I don't touch the LKCD
code without a sign-off from the LKCD maintainers on this mailing
list.
LKCD maintainers -- do you have any objection to this patch?
Thanks,
Dave
>
> Thanks.
> -Wj
>
> --- lkcd_dump_v8.h.orig 2009-04-16 13:14:22.000000000 -0400
> +++ lkcd_dump_v8.h 2009-06-10 03:31:37.815122032 -0400
> @@ -445,9 +445,9 @@ typedef struct _dump_header_asm_s {
> */
> typedef struct dump_CPU_info_s {
> struct pt_regs dha_smp_regs;
> - uint64_t dha_smp_current_task;
> - uint64_t dha_stack;
> - uint64_t dha_stack_ptr;
> + uint32_t dha_smp_current_task;
> + uint32_t dha_stack;
> + uint32_t dha_stack_ptr;
> } __attribute__ ((packed)) dump_CPU_info_t;
15 years, 5 months
nr_cpus is not calculated properly
by Wei Jiang
Hi,
I found nr_cpus is not calculated properly in 32 bits(x86) at
crash-4.0-8.9.
Around line 140 in file lkcd_v8.c.
137 * to find out how many CPUs are configured.
138 */
139 offset = offsetof(dump_header_asm_t, dha_smp_regs[0]);
140 nr_cpus = (hdr_size - offset) / sizeof(dump_CPU_info_t);
141
142 fprintf(stderr, "CPU number NR_CPUS %d \n", NR_CPUS);
143 fprintf(stderr, "header_asm_t size %d \n",
sizeof(dump_header_asm_t));
And in the corresponding head file.
# cat -n lkcd_dump_v8.h|grep -A 20 434
434 /* smp specific */
435 uint32_t dha_smp_num_cpus;
436 uint32_t dha_dumping_cpu;
437 struct pt_regs dha_smp_regs[NR_CPUS];
438 uint32_t dha_smp_current_task[NR_CPUS];
439 uint32_t dha_stack[NR_CPUS];
440 uint32_t dha_stack_ptr[NR_CPUS];
441 } __attribute__((packed)) dump_header_asm_t;
442
443 /*
444 * CPU specific part of dump_header_asm_t
445 */
446 typedef struct dump_CPU_info_s {
447 struct pt_regs dha_smp_regs;
448 uint64_t dha_smp_current_task;
449 uint64_t dha_stack;
450 uint64_t dha_stack_ptr;
451 } __attribute__ ((packed)) dump_CPU_info_t;
452
453
454 /*
As we know, on x86(32 bits), uint32_t is 4 bytes and uint64_t is 8
bytes.
So this line
140 nr_cpus = (hdr_size - offset) / sizeof(dump_CPU_info_t);
would not get a correct nr_cpus due to the sizeof().
A patch to fix this problem as below.
Thanks.
-Wj
--- lkcd_dump_v8.h.orig 2009-04-16 13:14:22.000000000 -0400
+++ lkcd_dump_v8.h 2009-06-10 03:31:37.815122032 -0400
@@ -445,9 +445,9 @@ typedef struct _dump_header_asm_s {
*/
typedef struct dump_CPU_info_s {
struct pt_regs dha_smp_regs;
- uint64_t dha_smp_current_task;
- uint64_t dha_stack;
- uint64_t dha_stack_ptr;
+ uint32_t dha_smp_current_task;
+ uint32_t dha_stack;
+ uint32_t dha_stack_ptr;
} __attribute__ ((packed)) dump_CPU_info_t;
15 years, 5 months
Re: [Crash-utility] [RFC][PATCH]: crash aborts with cannot determine idle task
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> And lastly, when I run a kernel with this patch against a set of x86_64-only
> dumpfiles, I get a segmentation violation like this on certain kdump
> kernels:
>
> ...
> please wait... (determining panic task)
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000051c79c in get_netdump_panic_task () at netdump.c:719
> 719 len = roundup(len + note64->n_namesz, 4);
> (gdb) bt
> #0 0x000000000051c79c in get_netdump_panic_task () at netdump.c:719
> #1 0x0000000000521ae5 in get_kdump_panic_task () at netdump.c:2316
> #2 0x00000000004a5550 in get_dumpfile_panic_task () at task.c:5493
> #3 0x00000000004a51b1 in panic_search () at task.c:5386
> #4 0x00000000004a2ef6 in get_panic_context () at task.c:4574
> #5 0x00000000004974ee in task_init () at task.c:456
> #6 0x0000000000449e3a in main_loop () at main.c:536
> ...
>
> And if I remove the call to map_prstatus_array(), it works OK again.
>
> I haven't dug into what changed to cause the problem though...
The problem is this memset() statement, which makes no sense:
+void map_prstatus_array(void)
+{
+ void *nt_ptr;
+ int i, j;
+
+ /* temporary buffer to hold the prstatus_percpu array */
+ if ((nt_ptr = (void *)calloc(nd->num_prstatus_notes,
+ sizeof(void *))) == NULL)
+ error(FATAL,
+ "cannot allocate a buffer to hold prstatus_percpu array\n");
+
+ memcpy((void *)nt_ptr, nd->nt_prstatus_percpu,
+ nd->num_prstatus_notes * sizeof(void *));
+ memset(nd->nt_prstatus_percpu, 0, nd->num_prstatus_notes);
...because it zero's out the first few bytes (whatever the number of NT_PRSTATUS
sections there are) of the first entry in the array. So for example, here's
a before-and-after of the contents of a kdump's nd->nt_prstatus_percpu[] array
which has just 2 NT_PRSTATUS sections:
before memset():
1d9f5dc8 1d9f5f2c 0 0 0 0 0 0 0 0 0 0 0
after memset():
1d9f0000 1d9f5f2c 0 0 0 0 0 0 0 0 0 0 0
And then depending upon whether the resultant virtual address actually exists
in the crash utility's virtual address space, it craps out in get_netdump_panic_task()
when it tries to access the faulty address.
Dave
15 years, 5 months
Re: [Crash-utility] dev command deteriorates with new kernels
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
>
> > On Fri, 2009-06-05 at 14:53 +0000, Dave Anderson wrote:
> >
> > >
> > > I've attached what I'm going with. I've added the capability of getting
> > > the file_operations from the cdev_map when necessary. The block device
> > > code was also suffering from bit-rot as well, and so I put in a new
> > > collector function that uses the bdev_map as well.
> >
> > Dave, this looks good. Two issues:
> >
> > 1) Add "-f" to dev help? (What does it mean to still be a "(none)" device?)
>
> It means that a pointer to a file_operations either doesn't exist
> (or that I have no clue how to find it...) For the hell of it I
> added that -f flag to show those devices in case somebody's
> interested.
>
> >
> > 2) The old code found the block extended device number (a feature added
> > to the kernel by a 25 Aug 2008 patch from Tejun Heo):
> >
> > 259 blkext (unknown)
> >
> > Also shown in /proc/devices:
> > ...
> > Block devices:
> > 1 ramdisk
> > 259 blkext
> > 7 loop
> > 11 sr
> > 104 cciss0
> >
> > Deliberate omission?
>
> I did see that, and I forget now how the old code found it (although the
> function still exists), but the structures being used now are bdev_map.probes[]
> and major_names[]:
>
> crash> whatis struct kobj_map
> struct kobj_map {
> struct probe *probes[255];
> struct mutex *lock;
> }
> SIZE: 2048
> crash> whatis major_names
> struct blk_major_name *major_names[255];
> crash>
>
> where the kernel's kobj_map.probes[] array size is just hardwired to 255,
> and the major_names[] array size is BLKDEV_MAJOR_HASH_SIZE which is 255.
> So obviously 259 won't be found.
Correction -- it does appear in the major_names[] array, in a 2.6.30
kernel for example, like this:
crash> p * major_names[4]
$51 = {
next = 0x0,
major = 259,
name = "blkext\000\000\000\000\000\000\000\000\000"
}
where it appears to be the only major_names[] entry whose "major" value
doesn't equal the index into the array (i.e., 259 != 4). But the
bdev_map.probes[4] entry is unused.
Dave
>
> If you want to figure out how to show it, send me a patch.
>
> At this point I'm about ready to deprecate the whole command... ;-)
>
> Dave
>
>
>
> >
> > Thanks for cleaning this up,
> > Bob Montgomery
15 years, 5 months