On Wed, Jul 17, 2024 at 12:01 PM Tao Liu <ltao(a)redhat.com> wrote:
Hi Lianbo,
On Wed, Jul 17, 2024 at 2:24 PM lijiang <lijiang(a)redhat.com> wrote:
>
> On Wed, Jul 17, 2024 at 6:01 AM Tao Liu <ltao(a)redhat.com> wrote:
>>
>> Hi Lianbo,
>>
>> On Tue, Jul 16, 2024 at 11:19 AM Tao Liu <ltao(a)redhat.com> wrote:
>> >
>> > Hi Lianbo,
>> >
>> > On Fri, Jul 12, 2024 at 6:32 PM Lianbo Jiang <lijiang(a)redhat.com>
wrote:
>> > >
>> > > Hi, Tao
>> > >
>> > > On 7/5/24 9:26 AM, devel-request(a)lists.crash-utility.osci.io wrote:
>> > > > Date: Thu, 4 Jul 2024 17:00:56 +1200
>> > > > From: Tao Liu<ltao(a)redhat.com>
>> > > > Subject: [Crash-utility] [PATCH] Fix "irq -a" exceeding
the memory
>> > > > range issue
>> > > > To:devel@lists.crash-utility.osci.io
>> > > > Cc: Tao Liu<ltao(a)redhat.com>
>> > > > Message-ID:<20240704050056.17375-1-ltao@redhat.com>
>> > > > Content-Type: text/plain; charset="US-ASCII";
x-default=true
>> > > >
>> > > > Previously without the patch, there was an error observed as
follows:
>> > > >
>> > > > crash> irq -a
>> > > > IRQ NAME AFFINITY
>> > > > 0 timer 0-191
>> > > > 4 ttyS0 0-23,96-119
>> > > > ...
>> > > > 84 smartpqi 72-73,168
>> > > > irq: page excluded: kernel virtual address: ffff97d03ffff000
type: "irq_desc affinity"
>> > > >
>> > > > The reason is the reading of irq affinity exceeded the memory
range, see
>> > > > the following debug info:
>> > > >
>> > > > Thread 1 "crash" hit Breakpoint 1,
generic_get_irq_affinity
(irq=85) at kernel.c:7373
>> > > > 7375 irq_desc_addr = get_irq_desc_addr(irq);
>> > > > (gdb) p/x irq_desc_addr
>> > > > $1 = 0xffff97d03f21e800
>> > > >
>> > > > crash> struct irq_desc 0xffff97d03f21e800
>> > > > struct irq_desc {
>> > > > irq_common_data = {
>> > > > state_use_accessors = 425755136,
>> > > > node = 3,
>> > > > handler_data = 0x0,
>> > > > msi_desc = 0xffff97ca51b83480,
>> > > > affinity = 0xffff97d03fffee60,
>> > > > effective_affinity = 0xffff97d03fffe6c0
>> > > > },
>> > > >
>> > > > crash> whatis cpumask_t
>> > > > typedef struct cpumask {
>> > > > unsigned long bits[128];
>> > > > } cpumask_t;
>> > > > SIZE: 1024
>> > > >
>> > > > In order to get the affinity, crash will read the memory range
0xffff97d03fffee60
>> > > > ~ 0xffff97d03fffee60 + 1024(0x400) by line:
>> > > >
>> > > > readmem(affinity_ptr, KVADDR, affinity, len,
>> > > > "irq_desc affinity", FAULT_ON_ERROR);
>> > > >
>> > > > However the reading will exceed the effective memory range:
>> > > >
>> > > > crash> kmem 0xffff97d03fffee60
>> > > > CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE
NAME
>> > > > ffff97c900044400 32 123297 162944 1273 4k
kmalloc-32
>> > > > SLAB MEMORY NODE TOTAL ALLOCATED
FREE
>> > > > fffffca460ffff80 ffff97d03fffe000 3 128 81
47
>> > > > FREE / [ALLOCATED]
>> > > > [ffff97d03fffee60]
>> > > >
>> > > > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
>> > > > fffffca460ffff80 83fffe000 dead000000000001 ffff97d03fffe340 1
d7ffffe0000800 slab
>> > > >
>> > > > crash> kmem ffff97d03ffff000
>> > > > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
>> > > > fffffca460ffffc0 83ffff000 0 0 1
d7ffffe0004000 reserved
>> > > >
>> > > > crash> dmesg
>> > > > ...
>> > > > [ 0.000000] BIOS-e820: [mem
0x00000000fe000000-0x00000000fe00ffff] reserved
>> > > > [ 0.000000] BIOS-e820: [mem
0x0000000100000000-0x000000083fffefff] usable
>> > > > [ 0.000000] BIOS-e820: [mem
0x000000083ffff000-0x000000083fffffff] reserved
>> > > > ...
>> > > >
>> > > > The beginning physical address, aka 0x83fffe000, is located in
the usable
>> > > > area and is readable, however the later physical address,
starting from
>> > > > 0x83ffff000, is located in reserved region and not readable. In
fact,
>> > > > the affinity member is allocated by alloc_cpumask_var_node(),
for
the 192 CPUs
>> > > > system, the allocated size is only 24, and we can see it is
within
>> > > > the kmalloc-32 slab. So it is incorrect to read 1024
length(given
by
>> > > > STRUCT_SIZE("cpumask_t")), only 24 is enough.
>> > > >
>> > > > Since there are plenty of places in crash which takes the value
of
>> > > > STRUCT_SIZE("cpumask_t"), and works fine for the past,
this patch
will
>> > > > not modify them all, but only this place which encountered the
issue.
>> > > >
>> > > > Signed-off-by: Tao Liu<ltao(a)redhat.com>
>> > > > ---
>> > > > kernel.c | 9 ++++++---
>> > > > 1 file changed, 6 insertions(+), 3 deletions(-)
>> > > >
>> > > > diff --git a/kernel.c b/kernel.c
>> > > > index 8a9d498..464e877 100644
>> > > > --- a/kernel.c
>> > > > +++ b/kernel.c
>> > > > @@ -7362,7 +7362,7 @@ void
>> > > > generic_get_irq_affinity(int irq)
>> > > > {
>> > > > ulong irq_desc_addr;
>> > > > - long len;
>> > > > + long len, len_cpumask;
>> > > > ulong affinity_ptr;
>> > > > ulong *affinity;
>> > > > ulong tmp_addr;
>> > > > @@ -7382,8 +7382,11 @@ generic_get_irq_affinity(int irq)
>> > > > if (!action)
>> > > > return;
>> > > >
>> > > > - if ((len = STRUCT_SIZE("cpumask_t")) < 0)
>> > > > - len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
sizeof(ulong);
>> > > > + len = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
sizeof(ulong);
>> > > > + len_cpumask = STRUCT_SIZE("cpumask_t");
>> > > > + if (len_cpumask > 0) {
>> > > > + len = len_cpumask > len ? len : len_cpumask;
>> > > > + }
>> > > >
>> > >
>> > > This change looks good, but I still have two comments below:
>> > >
>> > > [1] Can we drop the evaluation of
"STRUCT_SIZE("cpumask_t")" and
just
>> > > use the size of "DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
sizeof(ulong)"
>> > > ? Are there any regression issues?
>>
>> I made a regression test, if all STRUCT_SIZE("cpumask_t") are
replaced
>> by DIV_ROUND_UP(...), there are regression issues found, but I didn't
>> dive into the root cause of the failing reason.
>>
>
> If so, let's keep it.
>
> Anyway, I'm still curious about the regression issue.
>
I re-read your comments, and I think I have misunderstood your request.
The regressions are found when all places of STRUCT_SIZE("cpumask_t")
in crash were replaced by DIV_ROUND_UP(...). You see there are plenty
of places in crash which used STRUCT_SIZE("cpumask_t"), including the
Not needed.
Maybe they can be replaced with the SIZE(cpumask_t) in your unwind stack
patchset. Anyway, that is another issue.
2(the one which this patch is dealing with, and the one in tools.c)
However if only the 2 places (the one which this patch is dealing
with, and the one in tools.c) are replaced, there is no regression
being noticed.
Totally correct. That is my concern.
So I'm going to deal with the only 2 places in v2.
Sounds good. Thank you, Tao.
Lianbo
Thanks,
Tao Liu
>> >
>> > I'm not sure about the change, I will run a regression against it.
>> > >
>> > > [2] There are the similar case in the get_cpumask_buf(), see
tools.c,
>> > > can you make the same change?
>>
>> Yes, unlike [1], with only the similar case modified, no regressions
>> found. I will post v2 to include it.
>
>
> Ok, sounds good. Thank you for the regression test.
>
> Lianbo
>
>>
>>
>> Thanks,
>> Tao Liu
>>
>> >
>> > Yes, I will give it a try to see if regressions are found.
>> >
>> > Thanks,
>> > Tao Liu
>> > >
>> > > ulong *
>> > > get_cpumask_buf(void)
>> > > {
>> > > int cpulen;
>> > > if ((cpulen = STRUCT_SIZE("cpumask_t")) < 0)
>> > > cpulen = DIV_ROUND_UP(kt->cpus, BITS_PER_LONG) *
>> > > sizeof(ulong);
>> > > return (ulong *)GETBUF(cpulen);
>> > > }
>> > >
>> > > Any thoughts?
>> > >
>> > >
>> > > Thanks
>> > >
>> > > Lianbo
>> > >
>> > >
>> > > > affinity = (ulong *)GETBUF(len);
>> > > > if (VALID_MEMBER(irq_common_data_affinity))
>> > > > -- 2.40.1
>> > >
>>