Re: [Crash-utility] dev command deteriorates with new kernels
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> On Mon, 2009-06-08 at 19:50 +0000, Dave Anderson wrote:
>
> Regarding blkext 259
> >
> > Correction -- it does appear in the major_names[] array, in a
> 2.6.30
> > kernel for example, like this:
> >
> > crash> p * major_names[4]
> > $51 = {
> > next = 0x0,
> > major = 259,
> > name = "blkext\000\000\000\000\000\000\000\000\000"
> > }
> >
> > where it appears to be the only major_names[] entry whose "major"
> value
> > doesn't equal the index into the array (i.e., 259 != 4). But the
> > bdev_map.probes[4] entry is unused.
>
> The blkext is supposed to get used as soon as someone wants to put more
> than 15 partitions on an sd disk. (Explanation at
> http://thread.gmane.org/gmane.linux.kernel/701825)
>
> Then the upper partitions will appear as minors of 259. When that
> starts happening more, having a way to see it might (*might*) be
> interesting.
>
> > > At this point I'm about ready to deprecate the whole command...
> ;-)
>
> And at this point I couldn't really argue much :-) I obviously hadn't
> used the command for a while. I was casting around, hoping it would
> maybe help me find the diskstats stuff for the block devices. I was
> trying to answer a question like "How much disk activity was going on
> before this crashed?"
>
> When I tried the dev command and it bombed at the chrdev stuff before
> getting to the blkdev stuff, I was motivated to get it going enough to
> see if it would help answer my question, but it doesn't print anything
> (like hd_struct pointers) that could help me with my problem. But I
> figured someone had once wanted that device info, so it might as well
> work...
OK, I understand...
But anyway, given the command's current capability, do you think that the
alternative -f option should just be thrown out, and that all devices
should be dumped regardless whether they have file_operations associated
with them or not?
Dave
15 years, 6 months
Re: [Crash-utility] dev command deteriorates with new kernels
by Dave Anderson
Bob,
Removing these lines from my proposed dump_blkdevs_v3() will at least show
the blkext with "dev -f":
if (major != i)
continue;
Like this:
crash> dev
CHRDEV NAME OPERATIONS
1 mem ffffffff804f6100 <memory_fops>
4 /dev/vc/0 ffffffff804f6e00 <console_fops>
4 tty ffffffff804f6d20 <tty_fops>
4 ttyS ffffffff804f6d20 <tty_fops>
5 /dev/tty ffffffff804f6d20 <tty_fops>
5 /dev/console ffffffff804f6e00 <console_fops>
5 /dev/ptmx ffffffff80ce48a0 <ptmx_fops>
7 vcs ffffffff804f75c0 <vcs_fops>
10 misc ffffffff804f74e0 <misc_fops>
13 input ffffffff805017a0 <input_fops>
14 sound ffffffffa0146700 <soundcore_fops>
21 sg ffffffffa02bce60 <sg_fops>
29 fb ffffffff804ee860 <fb_fops>
116 alsa ffffffffa0154440 <snd_fops>
128 ptm ffffffff804f6d20 <tty_fops>
136 pts ffffffff804f6d20 <tty_fops>
162 raw ffffffff804f8560 <raw_fops>
180 usb ffffffff805009e0 <usb_fops>
189 usb_device ffffffff80500b00 <usbdev_file_operations>
202 cpu/msr ffffffff804d00c0 <msr_fops>
203 cpu/cpuid ffffffff804d01a0 <cpuid_fops>
251 rtc ffffffffa020c800 <rtc_dev_fops>
253 usbmon ffffffff80501460 <mon_fops_binary>
254 pcmcia ffffffff80500680 <ds_fops>
BLKDEV NAME OPERATIONS
1 ramdisk ffffffff80670740 <brd_fops>
2 fd ffffffffa01e13a0 <floppy_fops>
3 ide0 ffffffffa0278de0 <idecd_ops>
8 sd ffffffffa0091ea0 <sd_fops>
crash>
crash> dev -f
CHRDEV NAME OPERATIONS
1 mem ffffffff804f6100 <memory_fops>
4 /dev/vc/0 ffffffff804f6e00 <console_fops>
4 tty ffffffff804f6d20 <tty_fops>
4 ttyS ffffffff804f6d20 <tty_fops>
5 /dev/tty ffffffff804f6d20 <tty_fops>
5 /dev/console ffffffff804f6e00 <console_fops>
5 /dev/ptmx ffffffff80ce48a0 <ptmx_fops>
7 vcs ffffffff804f75c0 <vcs_fops>
10 misc ffffffff804f74e0 <misc_fops>
13 input ffffffff805017a0 <input_fops>
14 sound ffffffffa0146700 <soundcore_fops>
21 sg ffffffffa02bce60 <sg_fops>
29 fb ffffffff804ee860 <fb_fops>
116 alsa ffffffffa0154440 <snd_fops>
128 ptm ffffffff804f6d20 <tty_fops>
136 pts ffffffff804f6d20 <tty_fops>
162 raw ffffffff804f8560 <raw_fops>
180 usb ffffffff805009e0 <usb_fops>
189 usb_device ffffffff80500b00 <usbdev_file_operations>
202 cpu/msr ffffffff804d00c0 <msr_fops>
203 cpu/cpuid ffffffff804d01a0 <cpuid_fops>
251 rtc ffffffffa020c800 <rtc_dev_fops>
252 usb_endpoint (none)
253 usbmon ffffffff80501460 <mon_fops_binary>
254 pcmcia ffffffff80500680 <ds_fops>
BLKDEV NAME OPERATIONS
1 ramdisk ffffffff80670740 <brd_fops>
2 fd ffffffffa01e13a0 <floppy_fops>
3 ide0 ffffffffa0278de0 <idecd_ops>
259 blkext (none)
8 sd ffffffffa0091ea0 <sd_fops>
9 md (none)
65 sd (none)
66 sd (none)
67 sd (none)
68 sd (none)
69 sd (none)
70 sd (none)
71 sd (none)
128 sd (none)
129 sd (none)
130 sd (none)
131 sd (none)
132 sd (none)
133 sd (none)
134 sd (none)
135 sd (none)
253 device-mapper (none)
254 mdp (none)
crash>
Is that preferable?
Dave
15 years, 6 months
Re: [Crash-utility] dev command deteriorates with new kernels
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> On Fri, 2009-06-05 at 14:53 +0000, Dave Anderson wrote:
>
> >
> > I've attached what I'm going with. I've added the capability of getting
> > the file_operations from the cdev_map when necessary. The block device
> > code was also suffering from bit-rot as well, and so I put in a new
> > collector function that uses the bdev_map as well.
>
> Dave, this looks good. Two issues:
>
> 1) Add "-f" to dev help? (What does it mean to still be a "(none)"
> device?)
It means that a pointer to a file_operations either doesn't exist
(or that I have no clue how to find it...) For the hell of it I
added that -f flag to show those devices in case somebody's interested.
>
> 2) The old code found the block extended device number (a feature added
> to the kernel by a 25 Aug 2008 patch from Tejun Heo):
>
> 259 blkext (unknown)
>
> Also shown in /proc/devices:
> ...
> Block devices:
> 1 ramdisk
> 259 blkext
> 7 loop
> 11 sr
> 104 cciss0
>
> Deliberate omission?
I did see that, and I forget now how the old code found it (although the
function still exists), but the structures being used now are bdev_map.probes[]
and major_names[]:
crash> whatis struct kobj_map
struct kobj_map {
struct probe *probes[255];
struct mutex *lock;
}
SIZE: 2048
crash> whatis major_names
struct blk_major_name *major_names[255];
crash>
where the kernel's kobj_map.probes[] array size is just hardwired to 255,
and the major_names[] array size is BLKDEV_MAJOR_HASH_SIZE which is 255.
So obviously 259 won't be found.
If you want to figure out how to show it, send me a patch.
At this point I'm about ready to deprecate the whole command... ;-)
Dave
>
> Thanks for cleaning this up,
> Bob Montgomery
15 years, 6 months
dev command deteriorates with new kernels
by Bob Montgomery
With respect to character devices:
Sometime way back in the 2.6ish kernel, the fops field of struct
char_device_struct ceased to be used for anything, and on those kernels
(like 2.6.18), dev always reports the OPERATIONS field as "(none)":
crash-4.0-8.10> dev
CHRDEV NAME OPERATIONS
1 mem (none)
2 pty (none)
3 ttyp (none)
4 /dev/vc/0 (none)
4 tty (none)
4 ttyS (none)
5 /dev/tty (none)
5 /dev/console (none)
5 /dev/ptmx (none)
7 vcs (none)
10 misc (none)
13 input (none)
21 sg (none)
29 fb (none)
...
Then recently (by 2.6.28, anyway), someone noticed that the fops field
was unused and removed it from struct char_device_struct altogether, so
now we get:
crash-4.0-8.10> dev
dev: invalid structure member offset: char_device_struct_fops
FILE: dev.c LINE: 221 FUNCTION: dump_chrdevs()
[/home/bobm/bin/crash-4.0-8.10] error trace: 452e75 => 4cf528 => 4cf957
=> 50d8d3
CHRDEV NAME OPERATIONS
50d8d3: OFFSET_verify+168
4cf957: dump_chrdevs+1043
4cf528: cmd_dev+198
452e75: exec_command+306
dev: invalid structure member offset: char_device_struct_fops
FILE: dev.c LINE: 221 FUNCTION: dump_chrdevs()
crash-4.0-8.10>
The attached patch changes the behavior in both cases to something like
this:
crash-4.0-8.10fix> dev
CHRDEV NAME OPERATIONS
1 mem ffffffff8043ee00 <memory_fops>
2 pty (none)
3 ttyp (none)
4 /dev/vc/0 (none)
4 tty (none)
4 ttyS (none)
5 /dev/tty (none)
5 /dev/console (none)
5 /dev/ptmx (none)
7 vcs ffffffff8043ff40 <vcs_fops>
10 misc ffffffff8043fe40 <misc_fops>
13 input ffffffff805413c0 <input_fops>
21 sg (none)
29 fb ffffffff80531c60 <fb_fops>
Which is definitely an improvement.
But wondering about those remaining (none) entries, I found that if I
pursue info through the cdev_map with a series of crash commands like
this:
crash-4.0-8.10fix> p (*cdev_map.probes[2]).data
$7 = (void *) 0xffff81013a66c408
crash-4.0-8.10fix> p (*(struct cdev *)0xffff81013a66c408).ops
$8 = (const struct file_operations *) 0xffffffff8043f860
crash-4.0-8.10fix> sym 0xffffffff8043f860
ffffffff8043f860 (r) tty_fops
crash-4.0-8.10fix> p (*cdev_map.probes[21]).data
$9 = (void *) 0xffff81013acf8d80
crash-4.0-8.10fix> p (*(struct cdev *)0xffff81013acf8d80).ops
$10 = (const struct file_operations *) 0xffffffff8820ee00
crash-4.0-8.10fix> sym 0xffffffff8820ee00
ffffffff8820ee00 (d) sg_fops
I can come up with believable file_ops values for (most? all?) of the
others. And this leads me to wonder if crash shouldn't be collecting
this info in a similar manner to fill in the other OPERATIONS fields.
But now I'm quite a bit past what I know about how the character device
stuff works.
Thanks,
Bob Montgomery
Working at HP
15 years, 6 months
Re: [Crash-utility] [RFC][PATCH]: crash aborts with cannot determine idle task
by Dave Anderson
----- "Chandru" <chandru(a)in.ibm.com> wrote:
> This thread relates to an old issue discussed earlier here ...
> https://www.redhat.com/archives/crash-utility/2008-April/msg00007.html.
> The following patch currently fixes the issue. The kernel cpu possible,present
> and online cpu map is not available until cpu_maps_init() initializes them. Hence
> we remap the nd->nt_prstatus_percpu array to online cpus right after a call to
> this function.
A couple points:
> Signed-off-by: Chandru Siddalingappa <chandru(a)linux.vnet.ibm.com>
> Cc: Haren Myneni <haren(a)us.ibm.com>
> ---
>
> --- crash-4.0-8.10/ppc64.c.orig 2009-06-08 16:08:09.000000000 +0530
> +++ crash-4.0-8.10/ppc64.c 2009-06-08 18:47:04.000000000 +0530
> @@ -2407,13 +2407,11 @@ ppc64_paca_init(void)
> if (!symbol_exists("paca"))
> error(FATAL, "PPC64: Could not find 'paca' symbol\n");
>
> - if (cpu_map_addr("present"))
> - map = PRESENT;
> - else if (cpu_map_addr("online"))
> - map = ONLINE;
> + if (cpu_map_addr("possible"))
> + map = POSSIBLE;
> else
> error(FATAL,
> - "PPC64: cannot find 'cpu_present_map' or 'cpu_online_map'
> symbols\n");
> + "PPC64: cannot find 'cpu_possible_map' symbol\n");
>
> if (!MEMBER_EXISTS("paca_struct", "data_offset"))
> return;
Depending upon "cpu_possible_map" breaks backwards-compatibility for old
kernels that don't even have a "cpu_possible_map". The function will still
need to fallback to *something* that exists instead of killing the whole
crash session.
> @@ -2424,7 +2422,7 @@ ppc64_paca_init(void)
> cpu_paca_buf = GETBUF(SIZE(ppc64_paca));
>
> if (!(nr_paca = get_array_length("paca", NULL, 0)))
> - nr_paca = NR_CPUS;
> + nr_paca = kt->kernel_NR_CPUS;
>
> if (nr_paca > NR_CPUS) {
> error(WARNING,
It is possible that kt->kernel_NR_CPUS may not even be initialized
at this point in time -- and for that matter, it is possible that
kt->kernel_NR_CPUS may *never* be initialized. So for that reason,
whenever it is used, the code first checks for a non-zero value.
and if zero, defaults to the compiled-in, equal-to-or-higher,
value of NR_CPUS.
> @@ -2435,7 +2433,7 @@ ppc64_paca_init(void)
>
> for (i = cpus = 0; i < nr_paca; i++) {
> /*
> - * CPU present (or online)?
> + * CPU in possible map ?
> */
> if (!in_cpu_map(map, i))
> continue;
> --- crash-4.0-8.10/kernel.c.orig 2009-06-08 16:07:53.000000000 +0530
> +++ crash-4.0-8.10/kernel.c 2009-06-08 16:48:53.000000000 +0530
> @@ -74,6 +74,9 @@ kernel_init()
>
> cpu_maps_init();
>
> + if (KDUMP_DUMPFILE())
> + map_prstatus_array();
> +
> kt->stext = symbol_value("_stext");
> kt->etext = symbol_value("_etext");
> get_text_init_space();
> --- crash-4.0-8.10/netdump.c.orig 2009-06-08 16:07:58.000000000 +0530
> +++ crash-4.0-8.10/netdump.c 2009-06-08 17:40:36.000000000 +0530
> @@ -45,6 +45,35 @@ static void check_dumpfile_size(char *);
> (machine_type("IA64") || machine_type("PPC64"))
>
> /*
> + * kdump installs NT_PRSTATUS elf sections only to the cpus
> + * that were online during dumping. Hence we call into
> + * this function after reading the cpu map from the kernel,
> + * to remap the NT_PRSTATUS sections only to the online cpus
> + */
> +void map_prstatus_array(void)
> +{
> + void *nt_ptr;
> + int i, j;
> +
> + /* temporary buffer to hold the prstatus_percpu array */
> + if ((nt_ptr = (void *)calloc(nd->num_prstatus_notes,
> + sizeof(void *))) == NULL)
> + error(FATAL,
> + "cannot allocate a buffer to hold prstatus_percpu array\n");
> +
> + memcpy((void *)nt_ptr, nd->nt_prstatus_percpu,
> + nd->num_prstatus_notes * sizeof(void *));
> + memset(nd->nt_prstatus_percpu, 0, nd->num_prstatus_notes);
> +
> + /* re-populate the array with the sections mapping to online cpus
> */
> + for (i = 0, j = 0; i < kt->kernel_NR_CPUS; i++)
> + if (in_cpu_map(ONLINE, i))
> + ((unsigned long *)nd->nt_prstatus_percpu)[i] =
> + ((unsigned long *)nt_ptr)[j++];
> + free(nt_ptr);
> +}
Same thing with kt->kernel_NR_CPUS usage above...
> +
> +/*
> * Determine whether a file is a netdump/diskdump/kdump creation,
> * and if TRUE, initialize the vmcore_data structure.
> */
> @@ -618,7 +647,7 @@ get_netdump_panic_task(void)
> crashing_cpu = -1;
> if (kernel_symbol_exists("crashing_cpu")) {
> get_symbol_data("crashing_cpu", sizeof(int), &i);
> - if ((i >= 0) && (i < nd->num_prstatus_notes)) {
> + if ((i >= 0) && in_cpu_map(ONLINE, i)) {
> crashing_cpu = i;
> if (CRASHDEBUG(1))
> error(INFO,
> @@ -2236,7 +2265,7 @@ get_netdump_regs_ppc64(struct bt_info *b
> * CPUs if they responded to an IPI.
> */
> if (nd->num_prstatus_notes > 1) {
> - if (bt->tc->processor >= nd->num_prstatus_notes)
> + if (!nd->nt_prstatus_percpu[bt->tc->processor])
> error(FATAL,
> "cannot determine NT_PRSTATUS ELF note "
> "for %s task: %lx\n",
>
And lastly, when I run a kernel with this patch against a set of x86_64-only
dumpfiles, I get a segmentation violation like this on certain kdump kernels:
...
please wait... (determining panic task)
Program received signal SIGSEGV, Segmentation fault.
0x000000000051c79c in get_netdump_panic_task () at netdump.c:719
719 len = roundup(len + note64->n_namesz, 4);
(gdb) bt
#0 0x000000000051c79c in get_netdump_panic_task () at netdump.c:719
#1 0x0000000000521ae5 in get_kdump_panic_task () at netdump.c:2316
#2 0x00000000004a5550 in get_dumpfile_panic_task () at task.c:5493
#3 0x00000000004a51b1 in panic_search () at task.c:5386
#4 0x00000000004a2ef6 in get_panic_context () at task.c:4574
#5 0x00000000004974ee in task_init () at task.c:456
#6 0x0000000000449e3a in main_loop () at main.c:536
...
And if I remove the call to map_prstatus_array(), it works OK again.
I haven't dug into what changed to cause the problem though...
Dave
15 years, 6 months
[RFC][PATCH]: crash aborts with cannot determine idle task
by Chandru
This thread relates to an old issue discussed earlier here ...
https://www.redhat.com/archives/crash-utility/2008-April/msg00007.html.
The following patch currently fixes the issue. The kernel cpu possible,present
and online cpu map is not available until cpu_maps_init() initializes them. Hence
we remap the nd->nt_prstatus_percpu array to online cpus right after a call to
this function.
Signed-off-by: Chandru Siddalingappa <chandru(a)linux.vnet.ibm.com>
Cc: Haren Myneni <haren(a)us.ibm.com>
---
--- crash-4.0-8.10/ppc64.c.orig 2009-06-08 16:08:09.000000000 +0530
+++ crash-4.0-8.10/ppc64.c 2009-06-08 18:47:04.000000000 +0530
@@ -2407,13 +2407,11 @@ ppc64_paca_init(void)
if (!symbol_exists("paca"))
error(FATAL, "PPC64: Could not find 'paca' symbol\n");
- if (cpu_map_addr("present"))
- map = PRESENT;
- else if (cpu_map_addr("online"))
- map = ONLINE;
+ if (cpu_map_addr("possible"))
+ map = POSSIBLE;
else
error(FATAL,
- "PPC64: cannot find 'cpu_present_map' or 'cpu_online_map' symbols\n");
+ "PPC64: cannot find 'cpu_possible_map' symbol\n");
if (!MEMBER_EXISTS("paca_struct", "data_offset"))
return;
@@ -2424,7 +2422,7 @@ ppc64_paca_init(void)
cpu_paca_buf = GETBUF(SIZE(ppc64_paca));
if (!(nr_paca = get_array_length("paca", NULL, 0)))
- nr_paca = NR_CPUS;
+ nr_paca = kt->kernel_NR_CPUS;
if (nr_paca > NR_CPUS) {
error(WARNING,
@@ -2435,7 +2433,7 @@ ppc64_paca_init(void)
for (i = cpus = 0; i < nr_paca; i++) {
/*
- * CPU present (or online)?
+ * CPU in possible map ?
*/
if (!in_cpu_map(map, i))
continue;
--- crash-4.0-8.10/kernel.c.orig 2009-06-08 16:07:53.000000000 +0530
+++ crash-4.0-8.10/kernel.c 2009-06-08 16:48:53.000000000 +0530
@@ -74,6 +74,9 @@ kernel_init()
cpu_maps_init();
+ if (KDUMP_DUMPFILE())
+ map_prstatus_array();
+
kt->stext = symbol_value("_stext");
kt->etext = symbol_value("_etext");
get_text_init_space();
--- crash-4.0-8.10/netdump.c.orig 2009-06-08 16:07:58.000000000 +0530
+++ crash-4.0-8.10/netdump.c 2009-06-08 17:40:36.000000000 +0530
@@ -45,6 +45,35 @@ static void check_dumpfile_size(char *);
(machine_type("IA64") || machine_type("PPC64"))
/*
+ * kdump installs NT_PRSTATUS elf sections only to the cpus
+ * that were online during dumping. Hence we call into
+ * this function after reading the cpu map from the kernel,
+ * to remap the NT_PRSTATUS sections only to the online cpus
+ */
+void map_prstatus_array(void)
+{
+ void *nt_ptr;
+ int i, j;
+
+ /* temporary buffer to hold the prstatus_percpu array */
+ if ((nt_ptr = (void *)calloc(nd->num_prstatus_notes,
+ sizeof(void *))) == NULL)
+ error(FATAL,
+ "cannot allocate a buffer to hold prstatus_percpu array\n");
+
+ memcpy((void *)nt_ptr, nd->nt_prstatus_percpu,
+ nd->num_prstatus_notes * sizeof(void *));
+ memset(nd->nt_prstatus_percpu, 0, nd->num_prstatus_notes);
+
+ /* re-populate the array with the sections mapping to online cpus */
+ for (i = 0, j = 0; i < kt->kernel_NR_CPUS; i++)
+ if (in_cpu_map(ONLINE, i))
+ ((unsigned long *)nd->nt_prstatus_percpu)[i] =
+ ((unsigned long *)nt_ptr)[j++];
+ free(nt_ptr);
+}
+
+/*
* Determine whether a file is a netdump/diskdump/kdump creation,
* and if TRUE, initialize the vmcore_data structure.
*/
@@ -618,7 +647,7 @@ get_netdump_panic_task(void)
crashing_cpu = -1;
if (kernel_symbol_exists("crashing_cpu")) {
get_symbol_data("crashing_cpu", sizeof(int), &i);
- if ((i >= 0) && (i < nd->num_prstatus_notes)) {
+ if ((i >= 0) && in_cpu_map(ONLINE, i)) {
crashing_cpu = i;
if (CRASHDEBUG(1))
error(INFO,
@@ -2236,7 +2265,7 @@ get_netdump_regs_ppc64(struct bt_info *b
* CPUs if they responded to an IPI.
*/
if (nd->num_prstatus_notes > 1) {
- if (bt->tc->processor >= nd->num_prstatus_notes)
+ if (!nd->nt_prstatus_percpu[bt->tc->processor])
error(FATAL,
"cannot determine NT_PRSTATUS ELF note "
"for %s task: %lx\n",
15 years, 6 months
Re: [Crash-utility] dev command deteriorates with new kernels
by Dave Anderson
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> With respect to character devices:
>
> Sometime way back in the 2.6ish kernel, the fops field of struct
> char_device_struct ceased to be used for anything, and on those
> kernels
> (like 2.6.18), dev always reports the OPERATIONS field as "(none)":
>
> crash-4.0-8.10> dev
> CHRDEV NAME OPERATIONS
> 1 mem (none)
> 2 pty (none)
> 3 ttyp (none)
> 4 /dev/vc/0 (none)
> 4 tty (none)
> 4 ttyS (none)
> 5 /dev/tty (none)
> 5 /dev/console (none)
> 5 /dev/ptmx (none)
> 7 vcs (none)
> 10 misc (none)
> 13 input (none)
> 21 sg (none)
> 29 fb (none)
> ...
>
> Then recently (by 2.6.28, anyway), someone noticed that the fops field
> was unused and removed it from struct char_device_struct altogether, so
> now we get:
>
> crash-4.0-8.10> dev
>
> dev: invalid structure member offset: char_device_struct_fops
> FILE: dev.c LINE: 221 FUNCTION: dump_chrdevs()
>
> [/home/bobm/bin/crash-4.0-8.10] error trace: 452e75 => 4cf528 => 4cf957 => 50d8d3
> CHRDEV NAME OPERATIONS
>
> 50d8d3: OFFSET_verify+168
> 4cf957: dump_chrdevs+1043
> 4cf528: cmd_dev+198
> 452e75: exec_command+306
>
> dev: invalid structure member offset: char_device_struct_fops
> FILE: dev.c LINE: 221 FUNCTION: dump_chrdevs()
>
> crash-4.0-8.10>
>
> The attached patch changes the behavior in both cases to something like
> this:
>
> crash-4.0-8.10fix> dev
> CHRDEV NAME OPERATIONS
> 1 mem ffffffff8043ee00 <memory_fops>
> 2 pty (none)
> 3 ttyp (none)
> 4 /dev/vc/0 (none)
> 4 tty (none)
> 4 ttyS (none)
> 5 /dev/tty (none)
> 5 /dev/console (none)
> 5 /dev/ptmx (none)
> 7 vcs ffffffff8043ff40 <vcs_fops>
> 10 misc ffffffff8043fe40 <misc_fops>
> 13 input ffffffff805413c0 <input_fops>
> 21 sg (none)
> 29 fb ffffffff80531c60 <fb_fops>
>
> Which is definitely an improvement.
>
> But wondering about those remaining (none) entries, I found that if I
> pursue info through the cdev_map with a series of crash commands like
> this:
>
> crash-4.0-8.10fix> p (*cdev_map.probes[2]).data
> $7 = (void *) 0xffff81013a66c408
> crash-4.0-8.10fix> p (*(struct cdev *)0xffff81013a66c408).ops
> $8 = (const struct file_operations *) 0xffffffff8043f860
> crash-4.0-8.10fix> sym 0xffffffff8043f860
> ffffffff8043f860 (r) tty_fops
>
>
> crash-4.0-8.10fix> p (*cdev_map.probes[21]).data
> $9 = (void *) 0xffff81013acf8d80
> crash-4.0-8.10fix> p (*(struct cdev *)0xffff81013acf8d80).ops
> $10 = (const struct file_operations *) 0xffffffff8820ee00
> crash-4.0-8.10fix> sym 0xffffffff8820ee00
> ffffffff8820ee00 (d) sg_fops
>
> I can come up with believable file_ops values for (most? all?) of the
> others. And this leads me to wonder if crash shouldn't be collecting
> this info in a similar manner to fill in the other OPERATIONS fields.
> But now I'm quite a bit past what I know about how the character device
> stuff works.
>
>
> Thanks,
> Bob Montgomery
> Working at HP
Thanks Bob, I appreciate your investigation and patch.
I'll take a look (it's been years...), and build upon your patch to have the
command dig further when necessary.
Dave
15 years, 6 months