On Mon, May 26, 2025 at 5:04 PM <devel-request(a)lists.crash-utility.osci.io>
wrote:
 Date: Fri, 23 May 2025 05:23:40 +0000
 From: HAGIO KAZUHITO(萩尾 一仁) <k-hagio-ab(a)nec.com>
 Subject: [Crash-utility] Re: [PATCH] Fix the "ps -m" command shows
         wrong duration of RU task
 To: Tao Liu <ltao(a)redhat.com>, Ke Yin <kyin(a)redhat.com>
 Cc: "devel(a)lists.crash-utility.osci.io"
         <devel(a)lists.crash-utility.osci.io>
 Message-ID: <23c74e07-d439-422a-bbea-8b2bf49b38f1(a)nec.com>
 Content-Type: text/plain; charset="utf-8"
 Hi,
 On 2025/05/15 8:43, Tao Liu wrote:
 > Hi Ke Yin,
 >
 > On Wed, May 14, 2025 at 8:58 PM Ke Yin <kyin(a)redhat.com> wrote:
 >>
 >> Hi Tao Liu & Kazu,
 >>
 >> Thanks for replying and sharing your thoughts.
 >>
 >> After a quick review of crash tool code, I found:
 >>
 >> runq -m will call dump_on_rq_milliseconds() to print the amount
 >> of time that the active task on each cpu has been running,
 >> but only for the current running task.
 >>
 >> runq -d will call dump_on_rq_tasks() to print all tasks in the run queue
 >> and the task running on cpu without calling translate_nanoseconds().
 >>
 >> My preliminary idea is to combine these two functions and add a new
 >> parameter, for example -q, to print the tasks on each cpu that has
 >> been waiting in the run queue only. And as well as update doc of runq.
 >>
 >> In short:
 >> runq -q will call new_function which is the modified function based on
 dump_on_rq_tasks() (skip current + translate_nanoseconds).
 >>
 >> What do you think?
 I didn't know the "runq -d" option because it's a kind of debugging
 option and has no description in the help page.  Also it searches all
 tasks for ones that have on_rq = 1 and doesn't look very efficient
 (nr_tasks * nr_cpus).  so ideally, maybe a new function should be based
 on dump_runq() than based on dump_on_rq_tasks(), if possible..
 
Looks like getting a solution: adding a new option(E.g: runq -q) to achieve
this purpose? Or need more discussion? Any update?
Thanks
Lianbo
 Thanks,
 Kazu
 >
 > I'm OK with your idea in general. Please check if I understood
 > correctly, your implementation is like:
 > cmd_runq() {
 > ...
 >    if (-d option) {
 >      dump_on_rq_tasks(old path);
 >    } else if (-q option) {
 >      dump_on_rq_tasks(new path);
 >    }
 > }
 >
 > dump_on_rq_tasks(option)
 > {
 >    ...
 >    for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
 >      if (old path) // Old path stay unchanged
 >        dump_task_runq_entry(tc, 0);
 >      else // New path will output your time duration
 >        your_new_function_with_translate_nanoseconds();
 >    }
 > }
 >
 > Thanks,
 > Tao Liu
 >
 >>
 >> Thanks
 >> Kenneth Yin
 >>
 >>
 >>
 >>
 >> On Mon, May 12, 2025 at 1:36 PM Tao Liu <ltao(a)redhat.com> wrote:
 >>>
 >>> Hi Kazu & Kenneth,
 >>>
 >>> Sorry for the late reply, and thanks for your fix and comments!
 >>>
 >>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾 一仁)
 >>> <k-hagio-ab(a)nec.com> wrote:
 >>>>
 >>>> On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote:
 >>>>> Hi,
 >>>>>
 >>>>> On 2025/04/28 19:38, Kenneth Yin wrote:
 >>>>>> The RU/TASK_RUNNING stat means the task is runnable.
 >>>>>> It is either currently running or on a run queue waiting to
run.
 >>>>>>
 >>>>>> Currently, the crash tool uses the "rq_clock -
 sched_info->last_arrival" formula to
 >>>>>> calculate the duration of task in RU state. This is for the
 scenario of a task running on a CPU.
 >>>>>
 >>>>> The "ps -l" and "ps -m" options display what
their help text
 describes,
 >>>>> not the duration of task in RU state.  Please see "help
ps".
 >>>>>
 >>>>> Also, tasks are sorted by the value, using different values for it
 could
 >>>>> make another confusion.
 >>>>>
 >>>>> The options have been used for a long time with the current code,
if
 we
 >>>>> change the semantics of the options, it would be better to be
 careful.
 >>>>> The change might lose a kind of information instead of getting
 another
 >>>>> kind of information.
 >>>>>
 >>>>> On the other hand, I think that the duration of waiting in queue
 might
 >>>>> also be useful information.  I'm not sure how we should display
them,
 >>>>> but for example, how about adding a new option or adding a column
for
 >>>>> last_queued?
 >>>>
 >>>> I thought of that the "runq" command might be suitable to
display the
 >>>> waiting duration, because only tasks in the run queues have it.  For
 >>>> example, extending the "runq -m" option or adding a new
option.  just
 my
 >>>> thought.
 >>>>
 >>>> Thanks,
 >>>> Kazu
 >>>>
 >>>>>
 >>>>> What do you think, folks?
 >>>>>
 >>>>> Thanks,
 >>>>> Kazu
 >>>>>
 >>>>>>
 >>>>>> But for the scenario of a task waiting in the  CPU run queue
(due
 to some reason
 >>>>>> for example cfs/rt queue throttled), this formula could cause
 misunderstanding.
 >>>>>>
 >>>>>> For example:
 >>>>>> [ 220 10:36:38.026] [RU]  PID: 12345   TASK: ffff8d674ab6b180 
CPU:
 1        COMMAND: "task"
 >>>>>>
 >>>>>> Looking closer:
 >>>>>>
 >>>>>> crash> rq.clock ffff8de438a5acc0
 >>>>>>      clock = 87029229985307234,
 >>>>>>
 >>>>>> crash> task -R sched_info,se.exec_start
 >>>>>> PID: 12345   TASK: ffff8d674ab6b180  CPU: 1  COMMAND:
"task"
 >>>>>>      sched_info = {
 >>>>>>       pcount = 33,
 >>>>>>       run_delay = 0,
 >>>>>>       last_arrival = 67983031958439673,
 >>>>>>       last_queued = 87029224561119369
 >>>>>>      },
 >>>>>>      se.exec_start = 67983031958476937,
 >>>>>>
 >>>>>> 67983031         67983031                 87029224
 87029229
 >>>>>> |<-   running on CPU  ->| <-      IN    ->|<-   
waiting in queue
   ->|
 >>>>>>
 >>>>>> For this scenario, the "task" was waiting in the run
queue of the
 CPU only for 5 seconds,
 >>>>>> we should use the "rq_clock -
sched_info->last_queued" formula.
 >>>
 >>> Please check if my understanding is correct:
 >>>
 >>> The result you saw is "rq_clock - sched_info->last_arrival ==
87029229
 >>> - 67983031 == 19046198"
 >>> The expected result you want is: "rq_clock -
sched_info->last_queued
 >>> == 87029229 - 87029224 == 5"
 >>>
 >>> You think the 19046198 value is misleading and should be 5 which only
 >>> contains the waiting in queue duration, am I correct?
 >>>
 >>> I agree with Kazu's idea, that we shouldn't change the existing ps
 >>> cmd's behaviour, and runq is a better alternative for the
 >>> waiting-in-queue duration display.
 >>>
 >>> What do you think? Could you please improve your code as well as an
 >>> updated "help runq" doc for runq?
 >>>
 >>> Thanks,
 >>> Tao Liu
 >>>
 >>>>>>
 >>>>>> We can trust sched_info->last_queued as it is only set when
the
 task enters the CPU run queue.
 >>>>>> Furthermore, when the task hits/runs on a CPU or dequeues the
CPU
 run queue, it will be reset to 0.
 >>>>>>
 >>>>>> Therefore, my idea is simple:
 >>>>>>
 >>>>>> If a task in RU stat and sched_info->last_queued has value
(!= 0),
 >>>>>> it means this task is waiting in the run queue, use
"rq_clock -
 sched_info->last_queued".
 >>>>>>
 >>>>>> Otherwise, if a task in RU stat and sched_info->last_queued =
0
 >>>>>> and sched_info->last_arrival has value (it must be), it means
this
 task is running on the CPU,
 >>>>>> use "rq_clock - sched_info->last_arrival".
 >>>>>>
 >>>>>> Signed-off-by: Kenneth Yin <kyin(a)redhat.com>
 >>>>>> ---
 >>>>>>     defs.h    |  1 +
 >>>>>>     symbols.c |  2 ++
 >>>>>>     task.c    | 21 +++++++++++++++------
 >>>>>>     3 files changed, 18 insertions(+), 6 deletions(-)
 >>>>>>
 >>>>>> diff --git a/defs.h b/defs.h
 >>>>>> index 4cf169c..66f5ce4 100644
 >>>>>> --- a/defs.h
 >>>>>> +++ b/defs.h
 >>>>>> @@ -1787,6 +1787,7 @@ struct offset_table {                   
/*
 stash of commonly-used offsets */
 >>>>>>       long vcpu_struct_rq;
 >>>>>>       long task_struct_sched_info;
 >>>>>>       long sched_info_last_arrival;
 >>>>>> +    long sched_info_last_queued;
 >>>>>>       long page_objects;
 >>>>>>       long kmem_cache_oo;
 >>>>>>       long char_device_struct_cdev;
 >>>>>> diff --git a/symbols.c b/symbols.c
 >>>>>> index e30fafe..fb5035f 100644
 >>>>>> --- a/symbols.c
 >>>>>> +++ b/symbols.c
 >>>>>> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong
 makestruct)
 >>>>>>                     OFFSET(sched_rt_entity_run_list));
 >>>>>>       fprintf(fp, "       sched_info_last_arrival:
%ld\n",
 >>>>>>                     OFFSET(sched_info_last_arrival));
 >>>>>> +    fprintf(fp, "       sched_info_last_queued:
%ld\n",
 >>>>>> +            OFFSET(sched_info_last_queued));
 >>>>>>             fprintf(fp, "       task_struct_thread_info:
%ld\n",
 >>>>>>                     OFFSET(task_struct_thread_info));
 >>>>>>             fprintf(fp, "             task_struct_stack:
%ld\n",
 >>>>>> diff --git a/task.c b/task.c
 >>>>>> index 3bafe79..f5386ac 100644
 >>>>>> --- a/task.c
 >>>>>> +++ b/task.c
 >>>>>> @@ -332,9 +332,12 @@ task_init(void)
 >>>>>>             MEMBER_OFFSET_INIT(task_struct_last_run,
"task_struct",
 "last_run");
 >>>>>>             MEMBER_OFFSET_INIT(task_struct_timestamp,
 "task_struct", "timestamp");
 >>>>>>             MEMBER_OFFSET_INIT(task_struct_sched_info,
 "task_struct", "sched_info");
 >>>>>> -    if (VALID_MEMBER(task_struct_sched_info))
 >>>>>> +    if (VALID_MEMBER(task_struct_sched_info)) {
 >>>>>>               MEMBER_OFFSET_INIT(sched_info_last_arrival,
 >>>>>>                       "sched_info",
"last_arrival");
 >>>>>> +            MEMBER_OFFSET_INIT(sched_info_last_queued,
 >>>>>> +                    "sched_info",
"last_queued");
 >>>>>> +    }
 >>>>>>       if (VALID_MEMBER(task_struct_last_run) ||
 >>>>>>           VALID_MEMBER(task_struct_timestamp) ||
 >>>>>>           VALID_MEMBER(sched_info_last_arrival)) {
 >>>>>> @@ -6035,7 +6038,7 @@ ulonglong
 >>>>>>     task_last_run(ulong task)
 >>>>>>     {
 >>>>>>             ulong last_run;
 >>>>>> -    ulonglong timestamp;
 >>>>>> +    ulonglong timestamp,last_queued;
 >>>>>>
 >>>>>>       timestamp = 0;
 >>>>>>             fill_task_struct(task);
 >>>>>> @@ -6047,10 +6050,16 @@ task_last_run(ulong task)
 >>>>>>       } else if (VALID_MEMBER(task_struct_timestamp))
 >>>>>>               timestamp = tt->last_task_read ?
 ULONGLONG(tt->task_struct +
 >>>>>>                       OFFSET(task_struct_timestamp)) : 0;
 >>>>>> -    else if (VALID_MEMBER(sched_info_last_arrival))
 >>>>>> -            timestamp = tt->last_task_read ?
 ULONGLONG(tt->task_struct +
 >>>>>> -                    OFFSET(task_struct_sched_info) +
 >>>>>> -                    OFFSET(sched_info_last_arrival)) : 0;
 >>>>>> +    else if (VALID_MEMBER(sched_info_last_queued))
 >>>>>> +            last_queued = ULONGLONG(tt->task_struct +
 >>>>>> +                    OFFSET(task_struct_sched_info) +
 >>>>>> +                    OFFSET(sched_info_last_queued));
 >>>>>> +            if (last_queued != 0) {
 >>>>>> +                    timestamp = tt->last_task_read ?
last_queued :
 0;
 >>>>>> +            } else if (VALID_MEMBER(sched_info_last_arrival))
 >>>>>> +                            timestamp = tt->last_task_read
?
 ULONGLONG(tt->task_struct +
 >>>>>> +                            OFFSET(task_struct_sched_info) +
 >>>>>> +                            OFFSET(sched_info_last_arrival)) :
0;
 >>>>>>
 >>>>>>             return timestamp;
 >>>>>>     }
 >>>>> --
 >>>>> Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
 >>>>> To unsubscribe send an email to
 devel-leave(a)lists.crash-utility.osci.io
 >>>>> https://${domain_name}/admin/lists/
 devel.lists.crash-utility.osci.io/
 >>>>> Contribution Guidelines:
https://github.com/crash-utility/crash/wiki
 >>>> --
 >>>> Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
 >>>> To unsubscribe send an email to
 devel-leave(a)lists.crash-utility.osci.io
 >>>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
 >>>> Contribution Guidelines: 
https://github.com/crash-utility/crash/wiki
 >>>
 >>
 >>
 >> --
 >> Kenneth Yin
 >> Senior Software Maintenance Engineer
 >> Customer Experience and Engagement
 >> Phone: +86-10-6533-9459
 >> Red Hat China