On Mon, May 26, 2025 at 5:04 PM <devel-request@lists.crash-utility.osci.io> wrote:
Date: Fri, 23 May 2025 05:23:40 +0000
From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab@nec.com>
Subject: [Crash-utility] Re: [PATCH] Fix the "ps -m" command shows
        wrong duration of RU task
To: Tao Liu <ltao@redhat.com>, Ke Yin <kyin@redhat.com>
Cc: "devel@lists.crash-utility.osci.io"
        <devel@lists.crash-utility.osci.io>
Message-ID: <23c74e07-d439-422a-bbea-8b2bf49b38f1@nec.com>
Content-Type: text/plain; charset="utf-8"

Hi,

On 2025/05/15 8:43, Tao Liu wrote:
> Hi Ke Yin,
>
> On Wed, May 14, 2025 at 8:58 PM Ke Yin <kyin@redhat.com> wrote:
>>
>> Hi Tao Liu & Kazu,
>>
>> Thanks for replying and sharing your thoughts.
>>
>> After a quick review of crash tool code, I found:
>>
>> runq -m will call dump_on_rq_milliseconds() to print the amount
>> of time that the active task on each cpu has been running,
>> but only for the current running task.
>>
>> runq -d will call dump_on_rq_tasks() to print all tasks in the run queue
>> and the task running on cpu without calling translate_nanoseconds().
>>
>> My preliminary idea is to combine these two functions and add a new
>> parameter, for example -q, to print the tasks on each cpu that has
>> been waiting in the run queue only. And as well as update doc of runq.
>>
>> In short:
>> runq -q will call new_function which is the modified function based on dump_on_rq_tasks() (skip current + translate_nanoseconds).
>>
>> What do you think?

I didn't know the "runq -d" option because it's a kind of debugging
option and has no description in the help page.  Also it searches all
tasks for ones that have on_rq = 1 and doesn't look very efficient
(nr_tasks * nr_cpus).  so ideally, maybe a new function should be based
on dump_runq() than based on dump_on_rq_tasks(), if possible..


Looks like getting a solution: adding a new option(E.g: runq -q) to achieve this purpose? Or need more discussion? Any update?

Thanks
Lianbo
 
Thanks,
Kazu


>
> I'm OK with your idea in general. Please check if I understood
> correctly, your implementation is like:
> cmd_runq() {
> ...
>    if (-d option) {
>      dump_on_rq_tasks(old path);
>    } else if (-q option) {
>      dump_on_rq_tasks(new path);
>    }
> }
>
> dump_on_rq_tasks(option)
> {
>    ...
>    for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
>      if (old path) // Old path stay unchanged
>        dump_task_runq_entry(tc, 0);
>      else // New path will output your time duration
>        your_new_function_with_translate_nanoseconds();
>    }
> }
>
> Thanks,
> Tao Liu
>
>>
>> Thanks
>> Kenneth Yin
>>
>>
>>
>>
>> On Mon, May 12, 2025 at 1:36 PM Tao Liu <ltao@redhat.com> wrote:
>>>
>>> Hi Kazu & Kenneth,
>>>
>>> Sorry for the late reply, and thanks for your fix and comments!
>>>
>>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾 一仁)
>>> <k-hagio-ab@nec.com> wrote:
>>>>
>>>> On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote:
>>>>> Hi,
>>>>>
>>>>> On 2025/04/28 19:38, Kenneth Yin wrote:
>>>>>> The RU/TASK_RUNNING stat means the task is runnable.
>>>>>> It is either currently running or on a run queue waiting to run.
>>>>>>
>>>>>> Currently, the crash tool uses the "rq_clock - sched_info->last_arrival" formula to
>>>>>> calculate the duration of task in RU state. This is for the scenario of a task running on a CPU.
>>>>>
>>>>> The "ps -l" and "ps -m" options display what their help text describes,
>>>>> not the duration of task in RU state.  Please see "help ps".
>>>>>
>>>>> Also, tasks are sorted by the value, using different values for it could
>>>>> make another confusion.
>>>>>
>>>>> The options have been used for a long time with the current code, if we
>>>>> change the semantics of the options, it would be better to be careful.
>>>>> The change might lose a kind of information instead of getting another
>>>>> kind of information.
>>>>>
>>>>> On the other hand, I think that the duration of waiting in queue might
>>>>> also be useful information.  I'm not sure how we should display them,
>>>>> but for example, how about adding a new option or adding a column for
>>>>> last_queued?
>>>>
>>>> I thought of that the "runq" command might be suitable to display the
>>>> waiting duration, because only tasks in the run queues have it.  For
>>>> example, extending the "runq -m" option or adding a new option.  just my
>>>> thought.
>>>>
>>>> Thanks,
>>>> Kazu
>>>>
>>>>>
>>>>> What do you think, folks?
>>>>>
>>>>> Thanks,
>>>>> Kazu
>>>>>
>>>>>>
>>>>>> But for the scenario of a task waiting in the  CPU run queue (due to some reason
>>>>>> for example cfs/rt queue throttled), this formula could cause misunderstanding.
>>>>>>
>>>>>> For example:
>>>>>> [ 220 10:36:38.026] [RU]  PID: 12345   TASK: ffff8d674ab6b180  CPU: 1        COMMAND: "task"
>>>>>>
>>>>>> Looking closer:
>>>>>>
>>>>>> crash> rq.clock ffff8de438a5acc0
>>>>>>      clock = 87029229985307234,
>>>>>>
>>>>>> crash> task -R sched_info,se.exec_start
>>>>>> PID: 12345   TASK: ffff8d674ab6b180  CPU: 1  COMMAND: "task"
>>>>>>      sched_info = {
>>>>>>       pcount = 33,
>>>>>>       run_delay = 0,
>>>>>>       last_arrival = 67983031958439673,
>>>>>>       last_queued = 87029224561119369
>>>>>>      },
>>>>>>      se.exec_start = 67983031958476937,
>>>>>>
>>>>>> 67983031         67983031                 87029224              87029229
>>>>>> |<-   running on CPU  ->| <-      IN    ->|<-    waiting in queue    ->|
>>>>>>
>>>>>> For this scenario, the "task" was waiting in the run queue of the CPU only for 5 seconds,
>>>>>> we should use the "rq_clock - sched_info->last_queued" formula.
>>>
>>> Please check if my understanding is correct:
>>>
>>> The result you saw is "rq_clock - sched_info->last_arrival == 87029229
>>> - 67983031 == 19046198"
>>> The expected result you want is: "rq_clock - sched_info->last_queued
>>> == 87029229 - 87029224 == 5"
>>>
>>> You think the 19046198 value is misleading and should be 5 which only
>>> contains the waiting in queue duration, am I correct?
>>>
>>> I agree with Kazu's idea, that we shouldn't change the existing ps
>>> cmd's behaviour, and runq is a better alternative for the
>>> waiting-in-queue duration display.
>>>
>>> What do you think? Could you please improve your code as well as an
>>> updated "help runq" doc for runq?
>>>
>>> Thanks,
>>> Tao Liu
>>>
>>>>>>
>>>>>> We can trust sched_info->last_queued as it is only set when the task enters the CPU run queue.
>>>>>> Furthermore, when the task hits/runs on a CPU or dequeues the CPU run queue, it will be reset to 0.
>>>>>>
>>>>>> Therefore, my idea is simple:
>>>>>>
>>>>>> If a task in RU stat and sched_info->last_queued has value (!= 0),
>>>>>> it means this task is waiting in the run queue, use "rq_clock - sched_info->last_queued".
>>>>>>
>>>>>> Otherwise, if a task in RU stat and sched_info->last_queued = 0
>>>>>> and sched_info->last_arrival has value (it must be), it means this task is running on the CPU,
>>>>>> use "rq_clock - sched_info->last_arrival".
>>>>>>
>>>>>> Signed-off-by: Kenneth Yin <kyin@redhat.com>
>>>>>> ---
>>>>>>     defs.h    |  1 +
>>>>>>     symbols.c |  2 ++
>>>>>>     task.c    | 21 +++++++++++++++------
>>>>>>     3 files changed, 18 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/defs.h b/defs.h
>>>>>> index 4cf169c..66f5ce4 100644
>>>>>> --- a/defs.h
>>>>>> +++ b/defs.h
>>>>>> @@ -1787,6 +1787,7 @@ struct offset_table {                    /* stash of commonly-used offsets */
>>>>>>       long vcpu_struct_rq;
>>>>>>       long task_struct_sched_info;
>>>>>>       long sched_info_last_arrival;
>>>>>> +    long sched_info_last_queued;
>>>>>>       long page_objects;
>>>>>>       long kmem_cache_oo;
>>>>>>       long char_device_struct_cdev;
>>>>>> diff --git a/symbols.c b/symbols.c
>>>>>> index e30fafe..fb5035f 100644
>>>>>> --- a/symbols.c
>>>>>> +++ b/symbols.c
>>>>>> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong makestruct)
>>>>>>                     OFFSET(sched_rt_entity_run_list));
>>>>>>       fprintf(fp, "       sched_info_last_arrival: %ld\n",
>>>>>>                     OFFSET(sched_info_last_arrival));
>>>>>> +    fprintf(fp, "       sched_info_last_queued: %ld\n",
>>>>>> +            OFFSET(sched_info_last_queued));
>>>>>>             fprintf(fp, "       task_struct_thread_info: %ld\n",
>>>>>>                     OFFSET(task_struct_thread_info));
>>>>>>             fprintf(fp, "             task_struct_stack: %ld\n",
>>>>>> diff --git a/task.c b/task.c
>>>>>> index 3bafe79..f5386ac 100644
>>>>>> --- a/task.c
>>>>>> +++ b/task.c
>>>>>> @@ -332,9 +332,12 @@ task_init(void)
>>>>>>             MEMBER_OFFSET_INIT(task_struct_last_run, "task_struct", "last_run");
>>>>>>             MEMBER_OFFSET_INIT(task_struct_timestamp, "task_struct", "timestamp");
>>>>>>             MEMBER_OFFSET_INIT(task_struct_sched_info, "task_struct", "sched_info");
>>>>>> -    if (VALID_MEMBER(task_struct_sched_info))
>>>>>> +    if (VALID_MEMBER(task_struct_sched_info)) {
>>>>>>               MEMBER_OFFSET_INIT(sched_info_last_arrival,
>>>>>>                       "sched_info", "last_arrival");
>>>>>> +            MEMBER_OFFSET_INIT(sched_info_last_queued,
>>>>>> +                    "sched_info", "last_queued");
>>>>>> +    }
>>>>>>       if (VALID_MEMBER(task_struct_last_run) ||
>>>>>>           VALID_MEMBER(task_struct_timestamp) ||
>>>>>>           VALID_MEMBER(sched_info_last_arrival)) {
>>>>>> @@ -6035,7 +6038,7 @@ ulonglong
>>>>>>     task_last_run(ulong task)
>>>>>>     {
>>>>>>             ulong last_run;
>>>>>> -    ulonglong timestamp;
>>>>>> +    ulonglong timestamp,last_queued;
>>>>>>
>>>>>>       timestamp = 0;
>>>>>>             fill_task_struct(task);
>>>>>> @@ -6047,10 +6050,16 @@ task_last_run(ulong task)
>>>>>>       } else if (VALID_MEMBER(task_struct_timestamp))
>>>>>>               timestamp = tt->last_task_read ?  ULONGLONG(tt->task_struct +
>>>>>>                       OFFSET(task_struct_timestamp)) : 0;
>>>>>> -    else if (VALID_MEMBER(sched_info_last_arrival))
>>>>>> -            timestamp = tt->last_task_read ?  ULONGLONG(tt->task_struct +
>>>>>> -                    OFFSET(task_struct_sched_info) +
>>>>>> -                    OFFSET(sched_info_last_arrival)) : 0;
>>>>>> +    else if (VALID_MEMBER(sched_info_last_queued))
>>>>>> +            last_queued = ULONGLONG(tt->task_struct +
>>>>>> +                    OFFSET(task_struct_sched_info) +
>>>>>> +                    OFFSET(sched_info_last_queued));
>>>>>> +            if (last_queued != 0) {
>>>>>> +                    timestamp = tt->last_task_read ? last_queued : 0;
>>>>>> +            } else if (VALID_MEMBER(sched_info_last_arrival))
>>>>>> +                            timestamp = tt->last_task_read ?  ULONGLONG(tt->task_struct +
>>>>>> +                            OFFSET(task_struct_sched_info) +
>>>>>> +                            OFFSET(sched_info_last_arrival)) : 0;
>>>>>>
>>>>>>             return timestamp;
>>>>>>     }
>>>>> --
>>>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>>>>> To unsubscribe send an email to devel-leave@lists.crash-utility.osci.io
>>>>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
>>>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
>>>> --
>>>> Crash-utility mailing list -- devel@lists.crash-utility.osci.io
>>>> To unsubscribe send an email to devel-leave@lists.crash-utility.osci.io
>>>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
>>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
>>>
>>
>>
>> --
>> Kenneth Yin
>> Senior Software Maintenance Engineer
>> Customer Experience and Engagement
>> Phone: +86-10-6533-9459
>> Red Hat China