From: Dave Anderson <anderson(a)redhat.com>
Subject: Re: [Crash-utility] [PATCH] do not check sp if ip points to user space
Date: Mon, 26 Sep 2011 09:20:26 -0400 (EDT)
----- Original Message -----
> Hello all,
>
> From: Wen Congyang <wency(a)cn.fujitsu.com>
> Subject: Re: [Crash-utility] [PATCH] do not check sp if ip points to
> user space
> Date: Mon, 26 Sep 2011 13:47:39 +0800
>
> > At 09/23/2011 09:41 PM, Dave Anderson Write:
> >>
> >>
> >> ----- Original Message -----
> >>> If the task is a user program, the sp can be points to anywhere,
> >>> because we can modify sp in assembly.
> >>> For example:
> >>>
> >>> .globl main
> >>> .type main, @function
> >>> main:
> >>>
> >>> finit
> >>> subq $16, (%rsp)
> >>> movq $0, (%rsp)
> >>> .loop:
> >>> jmp .loop
> >>>
> >>>
> >>
> >> Why would any user task do that?
> >>
> >> And what happens when a backtrace is attempted on such a task?
> >>
> >> Since the current code would not set BT_USER_SPACE, I'm guessing that
it
> >> would run into this (at least on x86_64):
> >>
> >> if (!(bt->flags & BT_USER_SPACE) && (!rsp ||
> >> !accessible(rsp))) {
> >> error(INFO, "cannot determine starting stack
> >> pointer\n");
> >> return;
> >> }
> >
> > Yes, crash will run into this on x86_64.
> >
> >>
> >> I do believe that I put the additional in_user_stack() checks in those
> >> locations for a reason. Consider a task running in kernel mode that
> >> corrupts its return address stack location with a non-kernel address,
> >> or called a function indirectly that had a NULL pointer in it. That
> >> would cause a kernel crash with a non-kernel RIP in its exception frame,
> >> and your patch would mistake it for user-space.
> >
> > I know the reason why you check if sp is in user stack.
> > What about this:
> > if !is_kernel_text(ip) && (sp is in kernel stack(include irq))
> > try to backtrace according to sp
> > else
> > display registers
> >
> > If both ip and sp is corrupted, and we can not determine sp according to
> > the content in the stack, I think we should display registers.
> >
> >>
> >> In any case, you're going to have to come up with a more compelling
> >> reason to change all of these locations. (And for that matter, I wonder
> >> why you didn't patch Fujitsu's get_sadump_regs() the same way?)
> >
> > No only for Fujitsu's sadump, I think kvmdump has the same problem.
> >
> > By the way, crash try to use the register when the dump format is diskdump.
> > I think we should use the register value when the dump format is Fujitsu's
> > sadump.
> >
>
> To be honest, I douted the logic just as you when I read
> get_kvmdump_regs() for the first time, but at that time I've
> understood why it is so, just as Dave has already explained, that bt
> command in crash utility mainly targets kernel stacks generated by C
> programs, and it seems reasonable enough as the feature users want.
>
> BTW, it is the fact that it's not sufficient to see only RIP and RSP
> to decide if the execution mode for a given task. More condition is
> neccesary.
>
> On X86 archtectures, for example, CS gives information about such
> information, and can cover the assembly example you showed in the
> first post, but of course this is the specific logic on X86 and
> X86_64, and I have no idea on other architectures, and might be
> inapplicable to kvmdump that targets archs other than X86.
Right, but the fact of the matter is that a backtrace cannot be
performed for a task with a nonsensical SP value, so it doesn't
make a difference whether it was in user-space or kernel-space.
So the "cannot determine starting stack pointer" error message
should still be displayed as it currently does -- and with my patch
suggestion, the registers can be dumped (if available) before
returning.
I understand. The condition to be used here is whether the backtrace
can be performed or not, implied by SP values pointing at outside a
variety of kernel stacks; I guess this is the definition of
nonsensical SP you've mean. I think the new behaviour reasonable.
By the way, I have a question that what intension do you have behind
!is_kernel_text(ip)? Just to exclude the case of user text? I guess
you're intending here also to exclude other possibilities.
Because sadump runs beyond the logic of kernel, register values
contained in vmcore sometimes the ones in real-address mode, appearing
having run in some firmware, which often happens at crash during boot
time.
I'm also wondering if there's other dump mechanism that can lead to
this kind of situation. For example, although I don't understand
detailed behaviour of NMI, if assuming NMI immediately triggered even
when running in firmware without rolling back register values saved
when entering the firmware context from kernel, register values in NMI
frames would be the ones in firmware and it would be concluded that
the situation can happen on kdump (and others that uses NMI); but I
have never seen such register values on kdump...
Thanks.
HATAYAMA, Daisuke