On 2025/11/18 17:55, Lianbo Jiang wrote:
The "runq -g" option may fail on some vmcores from
customers, and report
the following error:
crash> runq -g
...
malloc_bp[1998]: 11592c20
malloc_bp[1999]: 11662490
...
average size: 11922
runq: cannot allocate any more memory!
This is because the maximum number of malloc() was reached through
GETBUF(), currently which is limited to MAX_MALLOC_BUFS(2000).
Furthermore, the error messages is not very clear.
Given that, let's expand the limitation of MAX_MALLOC_BUFS and make the
error message clear and concise.
Hi Lianbo,
out of curiosity, does this mean that the cause is clear and there
is no other way to fix the issue? IOW, is there no buffer leak,
wasteful GETBUF or etc?
I'm sorry if you have already investigated them.
Generally, relaxing a limitation is the last resort, I think,
because limitations are kind of safety mechanism. Also, relaxing
the limitation may be a stopgap solution for the vmcore. If you
get another vmcore hitting this again, do you relax it again?
Thanks,
Kazu
>
> With the patch:
> crash> runq -g
> ...
> CPU 95
> CURRENT: PID: 64281 TASK: ffff9f541b064000 COMMAND: "xxx_64281_sv"
> ROOT_TASK_GROUP: ffffffffa64ff940 RT_RQ: ffff9f86bfdf3a80
> [no tasks queued]
> ROOT_TASK_GROUP: ffffffffa64ff940 CFS_RQ: ffff9f86bfdf38c0
> [120] PID: 64281 TASK: ffff9f541b064000 COMMAND: "xxx_64281_sv"
[CURRENT]
> TASK_GROUP: ffff9f47cb3b9180 CFS_RQ: ffff9f67c0417a00 <user.slice>
> [120] PID: 65275 TASK: ffff9f6820208000 COMMAND: "server"
> TASK_GROUP: ffff9f67f9ac2300 CFS_RQ: ffff9f6803662000 <oratfagroup>
> [120] PID: 1209636 TASK: ffff9f582f25c000 COMMAND: "crsctl"
>
> Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
> Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
> ---
> tools.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools.c b/tools.c
> index a9ad18d520d9..6676881c182a 100644
> --- a/tools.c
> +++ b/tools.c
> @@ -5698,7 +5698,7 @@ ll_power(long long base, long long exp)
> #define B32K (4)
>
> #define SHARED_BUF_SIZES (B32K+1)
> -#define MAX_MALLOC_BUFS (2000)
> +#define MAX_MALLOC_BUFS (3072)
> #define MAX_CACHE_SIZE (KILOBYTES(32))
>
> struct shared_bufs {
> @@ -6130,7 +6130,7 @@ getbuf(long reqsize)
> dump_shared_bufs();
>
> return ((char *)(long)
> - error(FATAL, "cannot allocate any more memory!\n"));
> + error(FATAL, "cannot allocate any more memory, reached to max numbers of
malloc() via GETBUF()!\n"));
> }
>
> /*