Hi Ke Yin,
On Wed, May 14, 2025 at 8:58 PM Ke Yin <kyin(a)redhat.com> wrote:
Hi Tao Liu & Kazu,
Thanks for replying and sharing your thoughts.
After a quick review of crash tool code, I found:
runq -m will call dump_on_rq_milliseconds() to print the amount
of time that the active task on each cpu has been running,
but only for the current running task.
runq -d will call dump_on_rq_tasks() to print all tasks in the run queue
and the task running on cpu without calling translate_nanoseconds().
My preliminary idea is to combine these two functions and add a new
parameter, for example -q, to print the tasks on each cpu that has
been waiting in the run queue only. And as well as update doc of runq.
In short:
runq -q will call new_function which is the modified function based on dump_on_rq_tasks()
(skip current + translate_nanoseconds).
What do you think?
I'm OK with your idea in general. Please check if I understood
correctly, your implementation is like:
cmd_runq() {
...
if (-d option) {
dump_on_rq_tasks(old path);
} else if (-q option) {
dump_on_rq_tasks(new path);
}
}
dump_on_rq_tasks(option)
{
...
for (i = 0; i < RUNNING_TASKS(); i++, tc++) {
if (old path) // Old path stay unchanged
dump_task_runq_entry(tc, 0);
else // New path will output your time duration
your_new_function_with_translate_nanoseconds();
}
}
Thanks,
Tao Liu
Thanks
Kenneth Yin
On Mon, May 12, 2025 at 1:36 PM Tao Liu <ltao(a)redhat.com> wrote:
>
> Hi Kazu & Kenneth,
>
> Sorry for the late reply, and thanks for your fix and comments!
>
> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾 一仁)
> <k-hagio-ab(a)nec.com> wrote:
> >
> > On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾 一仁) wrote:
> > > Hi,
> > >
> > > On 2025/04/28 19:38, Kenneth Yin wrote:
> > >> The RU/TASK_RUNNING stat means the task is runnable.
> > >> It is either currently running or on a run queue waiting to run.
> > >>
> > >> Currently, the crash tool uses the "rq_clock -
sched_info->last_arrival" formula to
> > >> calculate the duration of task in RU state. This is for the scenario of
a task running on a CPU.
> > >
> > > The "ps -l" and "ps -m" options display what their help
text describes,
> > > not the duration of task in RU state. Please see "help ps".
> > >
> > > Also, tasks are sorted by the value, using different values for it could
> > > make another confusion.
> > >
> > > The options have been used for a long time with the current code, if we
> > > change the semantics of the options, it would be better to be careful.
> > > The change might lose a kind of information instead of getting another
> > > kind of information.
> > >
> > > On the other hand, I think that the duration of waiting in queue might
> > > also be useful information. I'm not sure how we should display them,
> > > but for example, how about adding a new option or adding a column for
> > > last_queued?
> >
> > I thought of that the "runq" command might be suitable to display the
> > waiting duration, because only tasks in the run queues have it. For
> > example, extending the "runq -m" option or adding a new option. just
my
> > thought.
> >
> > Thanks,
> > Kazu
> >
> > >
> > > What do you think, folks?
> > >
> > > Thanks,
> > > Kazu
> > >
> > >>
> > >> But for the scenario of a task waiting in the CPU run queue (due to
some reason
> > >> for example cfs/rt queue throttled), this formula could cause
misunderstanding.
> > >>
> > >> For example:
> > >> [ 220 10:36:38.026] [RU] PID: 12345 TASK: ffff8d674ab6b180 CPU: 1
COMMAND: "task"
> > >>
> > >> Looking closer:
> > >>
> > >> crash> rq.clock ffff8de438a5acc0
> > >> clock = 87029229985307234,
> > >>
> > >> crash> task -R sched_info,se.exec_start
> > >> PID: 12345 TASK: ffff8d674ab6b180 CPU: 1 COMMAND: "task"
> > >> sched_info = {
> > >> pcount = 33,
> > >> run_delay = 0,
> > >> last_arrival = 67983031958439673,
> > >> last_queued = 87029224561119369
> > >> },
> > >> se.exec_start = 67983031958476937,
> > >>
> > >> 67983031 67983031 87029224
87029229
> > >> |<- running on CPU ->| <- IN ->|<- waiting
in queue ->|
> > >>
> > >> For this scenario, the "task" was waiting in the run queue of
the CPU only for 5 seconds,
> > >> we should use the "rq_clock - sched_info->last_queued"
formula.
>
> Please check if my understanding is correct:
>
> The result you saw is "rq_clock - sched_info->last_arrival == 87029229
> - 67983031 == 19046198"
> The expected result you want is: "rq_clock - sched_info->last_queued
> == 87029229 - 87029224 == 5"
>
> You think the 19046198 value is misleading and should be 5 which only
> contains the waiting in queue duration, am I correct?
>
> I agree with Kazu's idea, that we shouldn't change the existing ps
> cmd's behaviour, and runq is a better alternative for the
> waiting-in-queue duration display.
>
> What do you think? Could you please improve your code as well as an
> updated "help runq" doc for runq?
>
> Thanks,
> Tao Liu
>
> > >>
> > >> We can trust sched_info->last_queued as it is only set when the task
enters the CPU run queue.
> > >> Furthermore, when the task hits/runs on a CPU or dequeues the CPU run
queue, it will be reset to 0.
> > >>
> > >> Therefore, my idea is simple:
> > >>
> > >> If a task in RU stat and sched_info->last_queued has value (!= 0),
> > >> it means this task is waiting in the run queue, use "rq_clock -
sched_info->last_queued".
> > >>
> > >> Otherwise, if a task in RU stat and sched_info->last_queued = 0
> > >> and sched_info->last_arrival has value (it must be), it means this
task is running on the CPU,
> > >> use "rq_clock - sched_info->last_arrival".
> > >>
> > >> Signed-off-by: Kenneth Yin <kyin(a)redhat.com>
> > >> ---
> > >> defs.h | 1 +
> > >> symbols.c | 2 ++
> > >> task.c | 21 +++++++++++++++------
> > >> 3 files changed, 18 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/defs.h b/defs.h
> > >> index 4cf169c..66f5ce4 100644
> > >> --- a/defs.h
> > >> +++ b/defs.h
> > >> @@ -1787,6 +1787,7 @@ struct offset_table { /* stash
of commonly-used offsets */
> > >> long vcpu_struct_rq;
> > >> long task_struct_sched_info;
> > >> long sched_info_last_arrival;
> > >> + long sched_info_last_queued;
> > >> long page_objects;
> > >> long kmem_cache_oo;
> > >> long char_device_struct_cdev;
> > >> diff --git a/symbols.c b/symbols.c
> > >> index e30fafe..fb5035f 100644
> > >> --- a/symbols.c
> > >> +++ b/symbols.c
> > >> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong makestruct)
> > >> OFFSET(sched_rt_entity_run_list));
> > >> fprintf(fp, " sched_info_last_arrival: %ld\n",
> > >> OFFSET(sched_info_last_arrival));
> > >> + fprintf(fp, " sched_info_last_queued: %ld\n",
> > >> + OFFSET(sched_info_last_queued));
> > >> fprintf(fp, " task_struct_thread_info:
%ld\n",
> > >> OFFSET(task_struct_thread_info));
> > >> fprintf(fp, " task_struct_stack:
%ld\n",
> > >> diff --git a/task.c b/task.c
> > >> index 3bafe79..f5386ac 100644
> > >> --- a/task.c
> > >> +++ b/task.c
> > >> @@ -332,9 +332,12 @@ task_init(void)
> > >> MEMBER_OFFSET_INIT(task_struct_last_run,
"task_struct", "last_run");
> > >> MEMBER_OFFSET_INIT(task_struct_timestamp,
"task_struct", "timestamp");
> > >> MEMBER_OFFSET_INIT(task_struct_sched_info,
"task_struct", "sched_info");
> > >> - if (VALID_MEMBER(task_struct_sched_info))
> > >> + if (VALID_MEMBER(task_struct_sched_info)) {
> > >> MEMBER_OFFSET_INIT(sched_info_last_arrival,
> > >> "sched_info",
"last_arrival");
> > >> + MEMBER_OFFSET_INIT(sched_info_last_queued,
> > >> + "sched_info", "last_queued");
> > >> + }
> > >> if (VALID_MEMBER(task_struct_last_run) ||
> > >> VALID_MEMBER(task_struct_timestamp) ||
> > >> VALID_MEMBER(sched_info_last_arrival)) {
> > >> @@ -6035,7 +6038,7 @@ ulonglong
> > >> task_last_run(ulong task)
> > >> {
> > >> ulong last_run;
> > >> - ulonglong timestamp;
> > >> + ulonglong timestamp,last_queued;
> > >>
> > >> timestamp = 0;
> > >> fill_task_struct(task);
> > >> @@ -6047,10 +6050,16 @@ task_last_run(ulong task)
> > >> } else if (VALID_MEMBER(task_struct_timestamp))
> > >> timestamp = tt->last_task_read ?
ULONGLONG(tt->task_struct +
> > >> OFFSET(task_struct_timestamp)) : 0;
> > >> - else if (VALID_MEMBER(sched_info_last_arrival))
> > >> - timestamp = tt->last_task_read ?
ULONGLONG(tt->task_struct +
> > >> - OFFSET(task_struct_sched_info) +
> > >> - OFFSET(sched_info_last_arrival)) : 0;
> > >> + else if (VALID_MEMBER(sched_info_last_queued))
> > >> + last_queued = ULONGLONG(tt->task_struct +
> > >> + OFFSET(task_struct_sched_info) +
> > >> + OFFSET(sched_info_last_queued));
> > >> + if (last_queued != 0) {
> > >> + timestamp = tt->last_task_read ? last_queued :
0;
> > >> + } else if (VALID_MEMBER(sched_info_last_arrival))
> > >> + timestamp = tt->last_task_read ?
ULONGLONG(tt->task_struct +
> > >> + OFFSET(task_struct_sched_info) +
> > >> + OFFSET(sched_info_last_arrival)) : 0;
> > >>
> > >> return timestamp;
> > >> }
> > > --
> > > Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
> > > To unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
> > > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> > > Contribution Guidelines:
https://github.com/crash-utility/crash/wiki
> > --
> > Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io
> > To unsubscribe send an email to devel-leave(a)lists.crash-utility.osci.io
> > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
> > Contribution Guidelines:
https://github.com/crash-utility/crash/wiki
>
--
Kenneth Yin
Senior Software Maintenance Engineer
Customer Experience and Engagement
Phone: +86-10-6533-9459
Red Hat China