[Crash-utility] Re: [PATCH] Fix for "runq -g" option failure

Wednesday, 19 November 2025

On Wed, Nov 19, 2025 at 12:50 PM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com&gt;
wrote:

...
 On 2025/11/18 17:55, Lianbo Jiang wrote:
 > The "runq -g" option may fail on some vmcores from customers, and report
 > the following error:
 >
 >    crash> runq -g
 >    ...
 >        malloc_bp[1998]: 11592c20
 >        malloc_bp[1999]: 11662490
 >    ...
 >        average size: 11922
 >      runq: cannot allocate any more memory!
 >
 > This is because the maximum number of malloc() was reached through
 > GETBUF(), currently which is limited to MAX_MALLOC_BUFS(2000).
 > Furthermore, the error messages is not very clear.
 >
 > Given that, let's expand the limitation of MAX_MALLOC_BUFS and make the
 > error message clear and concise.

 Hi Lianbo,

 out of curiosity, does this mean that the cause is clear and there
 is no other way to fix the issue?  IOW, is there no buffer leak,
 wasteful GETBUF or etc?
 I'm sorry if you have already investigated them.

Good questions, Kazu.

So far I haven't got the better way to fix it, the malloc_bp will be
exhausted when running the runq -g, and
I did not see the buffer leak(malloc_bp) on the specific code path(if
anybody finds it, please let me know).

...
 Generally, relaxing a limitation is the last resort, I think,
 because limitations are kind of safety mechanism.  Also, relaxing
 the limitation may be a stopgap solution for the vmcore.  If you

Agree with you.

...
 get another vmcore hitting this again, do you relax it again?

That needs to  be considered according to the actual situation, against the
current case, if the limitation is not expanded, probably we have to tell
customers that the "runq -g" can not work because of the max limitation of
MAX_MALLOC_BUFS(2000).

BTW: for some large-scale servers equipped with multi-core(even hundreds of
cpus) running thousands of tasks, and utilizing the task group, the max
value of 2000 is really too small, therefore, it could be good to increase
it appropriately.

Thanks
Lianbo

...

 Thanks,
 Kazu

 >
 > With the patch:
 >    crash> runq -g
 >    ...
 >    CPU 95
 >      CURRENT: PID: 64281  TASK: ffff9f541b064000  COMMAND: "xxx_64281_sv"
 >      ROOT_TASK_GROUP: ffffffffa64ff940  RT_RQ: ffff9f86bfdf3a80
 >         [no tasks queued]
 >      ROOT_TASK_GROUP: ffffffffa64ff940  CFS_RQ: ffff9f86bfdf38c0
 >         [120] PID: 64281  TASK: ffff9f541b064000  COMMAND:
 "xxx_64281_sv" [CURRENT]
 >         TASK_GROUP: ffff9f47cb3b9180  CFS_RQ: ffff9f67c0417a00
 <user.slice>
 >            [120] PID: 65275  TASK: ffff9f6820208000  COMMAND: "server"
 >         TASK_GROUP: ffff9f67f9ac2300  CFS_RQ: ffff9f6803662000
 <oratfagroup>
 >            [120] PID: 1209636  TASK: ffff9f582f25c000  COMMAND: "crsctl"
 >
 > Reported-by: Buland Kumar Singh <bsingh(a)redhat.com&gt;
 > Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com&gt;
 > ---
 >   tools.c | 4 ++--
 >   1 file changed, 2 insertions(+), 2 deletions(-)
 >
 > diff --git a/tools.c b/tools.c
 > index a9ad18d520d9..6676881c182a 100644
 > --- a/tools.c
 > +++ b/tools.c
 > @@ -5698,7 +5698,7 @@ ll_power(long long base, long long exp)
 >   #define B32K (4)
 >
 >   #define SHARED_BUF_SIZES  (B32K+1)
 > -#define MAX_MALLOC_BUFS   (2000)
 > +#define MAX_MALLOC_BUFS   (3072)
 >   #define MAX_CACHE_SIZE    (KILOBYTES(32))
 >
 >   struct shared_bufs {
 > @@ -6130,7 +6130,7 @@ getbuf(long reqsize)
 >       dump_shared_bufs();
 >
 >       return ((char *)(long)
 > -             error(FATAL, "cannot allocate any more memory!\n"));
 > +             error(FATAL, "cannot allocate any more memory, reached to
 max numbers of malloc() via GETBUF()!\n"));
 >   }
 >
 >   /* 

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH] Fix for "runq -g" option failure