After analysis, we figured out that the crash occurs in the function
n_read_tty of kernel-source/drivers/char/n_tty.c . The oops occurred on
linux kernel 2.6.32. Below is the code fragment where the page fault
occurred. The page fault occurs when executing the statement c =
tty->read_buf[tty->read_tail] .
/* N.B. avoid overrun if nr == 0 */
while (nr && tty->read_cnt) {
int eol;
eol = test_and_clear_bit(tty->read_tail, tty->read_flags);
c = tty->read_buf[tty->read_tail]; //
page fault statement after analyzing oops
BTW, are you sure about that?
Presuming that the "tty" pointer is ffff8802cbd54800 as you've shown below,
and therefore tty->read_buf is 0xffff8802cbfe6000 and tty->read_tail is 0,
then the statement above would be simply be reading tty->read_buf[0], or
virtual address 0xffff8802cbfe6000. But the oops shows it faulting on a
virtual address of "5":
BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
Dave
Below is the contents of the structure tty_struct ( at the time of
oops
). This was passed as an argument to the function n_read_tty().
tty_struct ffff8802cbd54800
struct tty_struct { ...
magic = 21505,
driver = 0xffff88031b54ea00,
ops = 0xffffffff8130f650,
name = "pts9\000\...",
driver_data = 0xffff88029c8a9668,
icanon = 1 '\001',
read_buf = 0xffff8802cbfe6000 "",
read_head = 0,
read_tail = 0,
read_cnt = 0,
read_flags = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
canon_data = 0,
......................................
As per crash utility the field read_cnt is 0 when kernel oopsed.In
that
case, the statement while (nr && tty->read_cnt) in the above code
fragment should have failed. This leads me to think that there was
some
other thread/task in kernel which should have updated the read_cnt
field in parallel. However the crash utility reports that the runqueue
of all CPUs at the time of crash as idle. Except CPU1 which was
executing the user program telnet in kernel context ( system call ).
Below is the runqueue output.
CPU 0 RUNQUEUE: ffff880033012d80
CURRENT: PID: 0 TASK: ffffffff814204b0 COMMAND: "swapper"
RT PRIO_ARRAY: ffff880033012e98
[no tasks queued]
CFS RB_ROOT: ffff880033012e10
[no tasks queued]
CPU 1 RUNQUEUE: ffff880033032d80
CURRENT: PID: 13366 TASK: ffff88031b60d580 COMMAND: "telnet"
RT PRIO_ARRAY: ffff880033032e98
[no tasks queued]
CFS RB_ROOT: ffff880033032e10
[no tasks queued]
CPU 2 RUNQUEUE: ffff880033052d80
CURRENT: PID: 0 TASK: ffff88031e0e3540 COMMAND: "swapper"
RT PRIO_ARRAY: ffff880033052e98
[no tasks queued]
CFS RB_ROOT: ffff880033052e10
[no tasks queued]
CPU 3 RUNQUEUE: ffff880033072d80
CURRENT: PID: 0 TASK: ffff88031e113580 COMMAND: "swapper"
RT PRIO_ARRAY: ffff880033072e98
[no tasks queued]
CFS RB_ROOT: ffff880033072e10
[no tasks queued]
How is this logically possible. Crash reports there are no tasks
running
currently. Or before the oops trigger and kdump capturing the memory
image, some process/thread ran which could have updated the data
structure. I wanted to know if this scenario is possible. I kindly
request your suggestion/guidance. Please let me know if you need any
other details.
Regards
Shashidhara
-----Original Message-----
From: crash-utility-bounces(a)redhat.com
[mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: Tuesday, June 21, 2011 7:24 PM
To: Discussion list for crash utility usage,maintenance and
development
Subject: Re: [Crash-utility] Unable to switch stack frames while using
crash
----- Original Message -----
> Hi Dave,
>
> I updated the makedumpfile utility from 1.3.5 to 1.3.7 . When I run
the
> below command
>
> makedumpfile -c -d 31 -x vmlinux_temp vmcore vmcore-new
> The kernel version is not supported.
> The created dumpfile may be incomplete.
> check_release: Can't get the kernel version.
> makedumpfile Failed.
I see that makedumpfile-1.3.8 was recently released, but it still
has a LATEST_VERSION of 2.6.36:
#define OLDEST_VERSION KERNEL_VERSION(2, 6, 15)/*
linux-2.6.15 */
#define LATEST_VERSION KERNEL_VERSION(2, 6, 36)/*
linux-2.6.36 */
You haven't stated what your kernel version is, but it seems
makedumpfile
cannot get past this point. On the other hand, the compressed kdump
was
created, so I'm not entirely clear.
> Is there any other way to extract the ELF style vmcore file from the
> kdump compressed format. Please guide me.
I don't believe so...
But I'm not the makedumpfile maintainer, so I'd prefer not to give any
definitive answers to your questions. I've cc'd the upstream
maintainer
of makedumpfile.
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
Information transmitted by this e-mail is proprietary to MphasiS, its
associated companies and/ or its customers and is intended
for use only by the individual or entity to which it is addressed, and
may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the
intended recipient or it appears that this mail has been forwarded
to you without proper authority, you are notified that any use or
dissemination of this information in any manner is strictly
prohibited. In such cases, please notify us immediately at
mailmaster(a)mphasis.com and delete this mail from your records.
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility