On 2023/08/24 17:29, lijiang wrote:
 On Thu, Aug 24, 2023 at 10:01 AM HAGIO KAZUHITO(萩尾 一仁)
<k-hagio-ab(a)nec.com>
 wrote:
 
> On 2023/08/23 14:44, Lianbo Jiang wrote:
>> When a task is exiting, usually kernel marks its flags as 'PF_EXITING',
>> but even so, sometimes the mm_struct has not been freed, it might still
>> be valid. For such tasks, the "ps/vm" commands won't display the
memory
>> usage. For example:
>>
>>     crash> ps 47070
>>           PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS
> COMM
>>         47070       1   0  ffff9ba7c4910000  UN   0.0        0        0
> ra_ris.parse
>>     crash> vm 47070
>>     PID: 47070    TASK: ffff9ba7c4910000  CPU: 0    COMMAND:
> "ra_ris.parse"
>>            MM               PGD          RSS    TOTAL_VM
>>            0                 0            0k       0k
>>
>> To be honest, this is a corner case, but it has already occurred in
>> actual production environments. Given that, let's allow the
"ps/vm"
>> commands to try to display the memory usage for this case, but it does
>> not guarantee that it can work well at any time, which still depends on
>> how far the mm_struct deconstruction has proceeded.
>
> Agree to display it, and looks like the deconstruction is done after
> task->mm is set to NULL, so it looks fine to me.
>
>
 Thank you for the comments, Kazu.
 
 
> void __noreturn do_exit(long code)
> {
> ...
>           exit_signals(tsk);  /* sets PF_EXITING */
> ...
>           exit_mm();
>
> static void exit_mm(void)
> {
>           struct mm_struct *mm = current->mm;
> ...
>           current->mm = NULL;  ## task->mm is set to NULL here
> ...
>           mmput(mm);           ## release the resources actually
>
>
> On the other hand, the mm->mm_count is decremented in mmput(), is there
> need to check it?
>
>
 Good question. I had the same thoughts, but finally I chose to double check
 with the mm_count. Not sure how to ensure the memory synchronization can be
 done, when the kernel is panicking.
 
Ok, it's fine to double check just in case, then ...
 
 If the address of mm pointer is valid and the mm_struct members are always
 legitimate, we won't need to double check.
 
 But anyway, this is just my thoughts, maybe it's not correct completely. If
 you do not want to have it, I can post v2 and simply remove
 the IS_EXITING(task) from get_task_mem_usage().
 
 Thanks.
 Lianbo
 
 
> void mmput(struct mm_struct *mm)
> {
>           might_sleep();
>
>           if (atomic_dec_and_test(&mm->mm_users))
>                   __mmput(mm);
> }
>
> static inline void __mmput(struct mm_struct *mm)
> {
> ...
>           exit_mmap(mm);
> ...
>           mmdrop(mm);
> }
>
> static inline void mmdrop(struct mm_struct *mm)
> {
> ...
>           if (unlikely(atomic_dec_and_test(&mm->mm_count)))
>                   __mmdrop(mm);
> }
>
> Thanks,
> Kazu
>
>>
>> With the patch:
>>     crash> ps 47070
>>           PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS
> COMM
>>         47070       1   0  ffff9ba7c4910000  UN  90.8 38461228 31426444
> ra_ris.parse
>>     crash> vm 47070
>>     PID: 47070    TASK: ffff9ba7c4910000  CPU: 0    COMMAND:
> "ra_ris.parse"
>>            MM               PGD          RSS    TOTAL_VM
>>     ffff9bad6e873840  ffff9baee0544000  31426444k  38461228k
>>           VMA           START       END     FLAGS FILE
>>     ffff9bafdbe1d6c8     400000     8c5000 8000875
> /data1/rishome/ra_cu_cn_412/sbin/ra_ris.parse
>>     ...
>>
>> Reported-by: Buland Kumar Singh <bsingh(a)redhat.com>
>> Signed-off-by: Lianbo Jiang <lijiang(a)redhat.com>
>> ---
>>    memory.c | 13 +++++++++++--
>>    1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/memory.c b/memory.c
>> index 5d76c5d7fe6f..7d59c0555a0e 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -4792,10 +4792,12 @@ get_task_mem_usage(ulong task, struct
> task_mem_usage *tm)
>>    {
>>        struct task_context *tc;
>>        long rss = 0, rss_cache = 0;
>> +     int mm_count = 0;
>> +     ulong addr;
>>
>>        BZERO(tm, sizeof(struct task_mem_usage));
>>
>> -     if (IS_ZOMBIE(task) || IS_EXITING(task))
>> +     if (IS_ZOMBIE(task))
>>                return;
>>
>>        tc = task_to_context(task);
>> @@ -4805,7 +4807,14 @@ get_task_mem_usage(ulong task, struct
> task_mem_usage *tm)
>>
>>        tm->mm_struct_addr = tc->mm_struct;
>>
>> -     if (!task_mm(task, TRUE))
>> +     if (!(addr = task_mm(task, TRUE)))
>> +             return;
>> +
>> +     if (!readmem(addr + OFFSET(mm_struct_mm_count), KVADDR, &mm_count,
>> +             sizeof(int), "mm_struct mm_count", RETURN_ON_ERROR))
>> +             return;
 
tt->mm_struct is filled in task_mm(), I think we can use this:
   mm_count = INT(tt->mm_struct + OFFSET(mm_struct_mm_count));
The following code in get_task_mem_usage() also use this.
Thanks,
Kazu