On Thu, Aug 24, 2023 at 10:01 AM HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com> wrote:
On 2023/08/23 14:44, Lianbo Jiang wrote:
> When a task is exiting, usually kernel marks its flags as 'PF_EXITING',
> but even so, sometimes the mm_struct has not been freed, it might still
> be valid. For such tasks, the "ps/vm" commands won't display the memory
> usage. For example:
>
>    crash> ps 47070
>          PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
>        47070       1   0  ffff9ba7c4910000  UN   0.0        0        0  ra_ris.parse
>    crash> vm 47070
>    PID: 47070    TASK: ffff9ba7c4910000  CPU: 0    COMMAND: "ra_ris.parse"
>           MM               PGD          RSS    TOTAL_VM
>           0                 0            0k       0k
>
> To be honest, this is a corner case, but it has already occurred in
> actual production environments. Given that, let's allow the "ps/vm"
> commands to try to display the memory usage for this case, but it does
> not guarantee that it can work well at any time, which still depends on
> how far the mm_struct deconstruction has proceeded.

Agree to display it, and looks like the deconstruction is done after
task->mm is set to NULL, so it looks fine to me.


Thank you for the comments, Kazu.
 
void __noreturn do_exit(long code)
{
...
         exit_signals(tsk);  /* sets PF_EXITING */
...
         exit_mm();

static void exit_mm(void)
{
         struct mm_struct *mm = current->mm;
...
         current->mm = NULL;  ## task->mm is set to NULL here
...
         mmput(mm);           ## release the resources actually


On the other hand, the mm->mm_count is decremented in mmput(), is there
need to check it?


Good question. I had the same thoughts, but finally I chose to double check with the mm_count. Not sure how to ensure the memory synchronization can be done, when the kernel is panicking.

If the address of mm pointer is valid and the mm_struct members are always legitimate, we won't need to double check.

But anyway, this is just my thoughts, maybe it's not correct completely. If you do not want to have it, I can post v2 and simply remove the IS_EXITING(task) from get_task_mem_usage().

Thanks.
Lianbo
 
void mmput(struct mm_struct *mm)
{
         might_sleep();

         if (atomic_dec_and_test(&mm->mm_users))
                 __mmput(mm);
}

static inline void __mmput(struct mm_struct *mm)
{
...
         exit_mmap(mm);
...
         mmdrop(mm);
}

static inline void mmdrop(struct mm_struct *mm)
{
...
         if (unlikely(atomic_dec_and_test(&mm->mm_count)))
                 __mmdrop(mm);
}

Thanks,
Kazu

>
> With the patch:
>    crash> ps 47070
>          PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
>        47070       1   0  ffff9ba7c4910000  UN  90.8 38461228 31426444  ra_ris.parse
>    crash> vm 47070
>    PID: 47070    TASK: ffff9ba7c4910000  CPU: 0    COMMAND: "ra_ris.parse"
>           MM               PGD          RSS    TOTAL_VM
>    ffff9bad6e873840  ffff9baee0544000  31426444k  38461228k
>          VMA           START       END     FLAGS FILE
>    ffff9bafdbe1d6c8     400000     8c5000 8000875 /data1/rishome/ra_cu_cn_412/sbin/ra_ris.parse
>    ...
>
> Reported-by: Buland Kumar Singh <bsingh@redhat.com>
> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
> ---
>   memory.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/memory.c b/memory.c
> index 5d76c5d7fe6f..7d59c0555a0e 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -4792,10 +4792,12 @@ get_task_mem_usage(ulong task, struct task_mem_usage *tm)
>   {
>       struct task_context *tc;
>       long rss = 0, rss_cache = 0;
> +     int mm_count = 0;
> +     ulong addr;
>   
>       BZERO(tm, sizeof(struct task_mem_usage));
>   
> -     if (IS_ZOMBIE(task) || IS_EXITING(task))
> +     if (IS_ZOMBIE(task))
>               return;
>   
>       tc = task_to_context(task);
> @@ -4805,7 +4807,14 @@ get_task_mem_usage(ulong task, struct task_mem_usage *tm)
>   
>       tm->mm_struct_addr = tc->mm_struct;
>   
> -     if (!task_mm(task, TRUE))
> +     if (!(addr = task_mm(task, TRUE)))
> +             return;
> +
> +     if (!readmem(addr + OFFSET(mm_struct_mm_count), KVADDR, &mm_count,
> +             sizeof(int), "mm_struct mm_count", RETURN_ON_ERROR))
> +             return;
> +
> +     if (IS_EXITING(task) && mm_count <= 0)
>               return;
>   
>       if (VALID_MEMBER(mm_struct_rss))