Hello Dave,
The patch has been modified, using pc->cmd_cleanup to clean tc->mm_struct.
Please check the attached patch.
On 12/09/2014 10:30 PM, Dave Anderson wrote:
----- Original Message -----
> On 12/08/2014 10:28 PM, Dave Anderson wrote:
>>
>>
>> ----- Original Message -----
>>> On 12/06/2014 04:11 AM, Dave Anderson wrote:
>>>> Interestingly enough, today I was asked to look at a vmcore in which an
>>>> oops
>>>> occurred during task exit after tsk->mm had been NULL'd out in
exit_mm():
>>>
>>> It almost matches what I am facing, when tsk->mm is set to NULL and
memory
>>> mapping is supposed to be displayed. This is a more simple implementation.
>>> I have tried to command like vm [taskp | pid | [-M mm_struct]]. But it have
>>> to modify a lot of thing.
>>>
>>> By the way, I feel the code is becoming more and more complicated, maybe a
>>> reconstruction is needed.
>>
>> Well, the vm_area_dump() function is relatively stable, so let's not go
crazy
>> here for what's essentially an "experimental" option.
>
> I see.
>
>>
>>>
>>>>
>>>> Of course it has its limitations. Since the page tables are being broken
down in this case,
>>>> "vm -p" fails:
>>>>
>>>> crash> vm -M ffff880495120dc0 -p
>>>> PID: 4563 TASK: ffff88049863f500 CPU: 8 COMMAND:
"postgres"
>>>> MM PGD RSS TOTAL_VM
>>>> 0 0 0k 0k
>>>> VMA START END FLAGS FILE
>>>> ffff8804a085ce90 400000 f56000 8001875
/usr/local/greenplum-db-4.3.3.1/bin/postgres
>>>> VIRTUAL PHYSICAL
>>>> vm: invalid kernel virtual address: 50 type: "mm_struct
pgd"
>>>> crash>
>>>
>>> After a glance, the pgd comes from the mm of task_struct. We need a lot of
work to make it
>>> replaced by argument of -M, I don't think it worse it right now.
>>
>> Actually it doesn't take much work at all. If both tc->mm_struct and
tm->mm_struct_addr
>> are replaced with the supplied address:
>>
>> tc->mm_struct = tm->mm_struct_addr = pc->curcmd_private;
>>
>> then "vm -M ffff880495120dc0 -p" also works OK with my sample vmcore.
>
> Yes, vm -p will tc->mm_struct to get pgd. But I was afraid permanently changing
tc->mm_struct
> is not good.
>
> Take my core into consideration, the case is:
>
> <cut>
> crash> help -t | grep mm_struct
> .mm_struct: 0
> mm_struct: 354cc80
> crash> vm -M 0xffff88003ae98ac0
> PID: 4860 TASK: ffff88003ae7eaf0 CPU: 0 COMMAND: "bash"
> MM PGD RSS TOTAL_VM
> 0 0 0k 0k
> VMA START END FLAGS FILE
> ffff88003acc3ad8 400000 4d4000 8001875 /bin/bash
> ...
> crash> help -t | grep mm_struct
> .mm_struct: ffff88003ae98ac0
> mm_struct: 354cc80
> crash>
> <cut>
>
> Is it OK to have ".mm_struct" changed here?
Probably not...
I did take a quick look at the other usages of tc->mm_struct, and I think
the only major difference is that the "search -u" command would search
the user-space memory of the changed task. But there's also an oddball
check for an i386 hypervisor callback from user-space, and it looks like
get_task_mem_usage() would use it from that point on, so I agree with
you that it should be restored to NULL.
Since there's a strong possibility of an error(FATAL, ...) call while
executing the command, it's not safe to simply restore it to NULL at the
end of the command, but rather the pc->cmd_cleanup() facility should
be used with the task address as the pc->cmd_cleanup_arg.
>>
>>>>
>>>> But it does seems like a worthwhile addition.
>>>>
>>>> The patch doesn't check whether mm->owner or mm->mm_count are
legitimate, but I'm not
>>>> sure whether it's even worth it? If it fails, it fails, and the help
page should just
>>>> indicate that the command option is not guaranteed to work. Does the
attached patch work
>>>> for you?
>>>
>>> Similar to the core I got. And I modified the patch to add some check. At
least I think
>>> we need to make sure the address still belongs to a mm_struct object.
>>
>> I suppose you could, although in all probability it's going to be stay in the
mm_struct
>> slab cache, and worst case, have been re-used by another task.
>
> So you like the modification.
Sure -- with the pc->cmd_cleanup in place, I don't see how it can hurt.
Dave
.
--
Regards
Qiao Nuohan