Date: Wed, 11 Aug 2021 14:24:30 +0200
From: Philipp Rudo <prudo(a)redhat.com>
To: lijiang <lijiang(a)redhat.com>
Cc: "Discussion list for crash utility usage, maintenance and
development" <crash-utility(a)redhat.com>
Subject: Re: [Crash-utility] [PATCH] x86_64: Fix check for
__per_cpu_offset initialisation
Message-ID: <20210811142430.5e3e1a86@rhtmp>
Content-Type: text/plain; charset=US-ASCII
Hi Lianbo,
On Wed, 11 Aug 2021 17:05:26 +0800
lijiang <lijiang(a)redhat.com> wrote:
> >
> > Date: Thu, 5 Aug 2021 15:19:37 +0200
> > From: Philipp Rudo <prudo(a)redhat.com>
> > To: crash-utility(a)redhat.com
> > Subject: [Crash-utility] [PATCH] x86_64: Fix check for
> > __per_cpu_offset initialisation
> > Message-ID: <20210805131937.5051-1-prudo(a)redhat.com>
> >
> > Since at least kernel v2.6.30 the __per_cpu_offset gets initialized to
> > __per_cpu_load. So first check if the __per_cpu_offset was set to a
> > proper value before reading any per cpu variable to prevent potential
> > bugs.
> >
> >
> Hi, Philipp
>
> Thank you for the patch. Can you help to describe more details about the
> potential risks? and what conditions might trigger the potential bugs?
the bug is always triggered during initialization of the per-cpu data
on x86_64. At least for kernels not using struct x8664_pda, which
AFAIK was also removed with kernel v2.6.30.
The risk for crash is low. Right after the superfluous read there is a
check if the read cpunumber matches the expected one.
if (cpunumber != cpus)
break;
So the worst case scenario I see is that crash initializes one
additional cpu with non-sense data. But given that the bug exists for
~12 years and nobody reported such an bug I assume that the check worked
well so far.
Thank you for the explanation in detail, Philipp.
> Did you mean that it's related to the crash live analysis
issue(1978032)? I
> tried to reproduce it, but so far I haven't reproduced it with the
upstream
> kernel.
Yes, this bug is related to bz1978032. For whatever reason the
superfluous read triggered the panic.
I could reproduce the bug upstream with CONFIG_IO_URING _disabled_.
Unfortunately there is a RHEL-only patch [1] that tampers with the
Kconfig for IO_URING. So when you copy a kernel-ark config to the
upstream repo and run 'make oldconfig' the IO_URING will silently be
_enabled_.
You are right.
BTW, I tried to reproduce the panic yesterday on kernel-5.14.0-0.rc4
but failed. Not sure if the bug was fixed in the meantime or I was
simply "lucky"...
This issue may have been fixed in the kernel-5.14.0-0.rc4, however, this
patch is still meaningful, and can prevent potential risks.
Acked-by: Lianbo Jiang <lijiang(a)redhat.com>