Hi Aditya,
On Sun, Mar 17, 2024 at 03:07:44PM +0530, Aditya Gupta wrote:
Hi Tao,
On Fri, Mar 15, 2024 at 08:33:31PM +0800, Tao Liu wrote:
> Hi Aditya,
>
> > > As we can see, other cpus[1-6, 8-23] just take the reg cache of
> > > cpu[7], which is incorrect. And if users go further like, "thread
20"
> > > and "gdb bt", it will also give incorrect stack traces.
> > >
> > > The cpu cache will only get refreshed once user type "set
<pid>", so
> > > the cpu cache will be refreshed by the <pid> task's context.
> > >
> > > I doubt a user will understand all the details and constraints, so
I'm
> > > afraid the user will be confused by the faulty output. But I also have
> > > no objection if the performance is the priority. Basically it is a
> > > balance of pays and gains. In addition, since cmd "info" and
"thread"
> > > is a command provided by gdb, currently I don't know how to hack
> > > those, so cpu cache can be refreshed when "info threads" or
"thread
> > > <num>" have been invoked.
> > >
> > > Do you have any thoughts?
> >
> > I also had faced that issue initially, ie. the other CPUs using up same
> > regcache, if all are not refreshed.
> > While iterating through all threads, gdb switches it's context
> > temporarily, while crash's context remained same, thus causing gdb to
> > get same registers for all threads other than 0.
> >
> > This was solved in patch #3 (synchronise cpu context changes between
crash/gdb)
> > in the ppc's 'Improve stack unwind on ppc64' series, by syncing
gdb's
> > context with crash.
> >
> > Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ?
> > That will fix it.
>
> Could you share your patch, based on your v10 and my v1 patch series,
> so I can get a clue how to do this?
Sure tao, i will attach it to the end of this mail.
Basically what I did is to revert changes to gdb-10.2.patch in this
patch. I pushed it along with testing with only regcache_refresh for CPU
0 instead of all CPUs, to:
https://github.com/adi-g15-ibm/crash/tree/tmp-test-branch-10928
I have tried with your repo, and I noticed the following behaviour, not
sure if it is expected:
cmd "info threads" will always reflush the cpus regcache to be the active
tasks' right? E.g:
crash> thread 15
[Switching to thread 15 (CPU 14)]
#0 <unavailable> in ?? ()
crash> bt
PID: 29867 TASK: ffff88025b04af70 CPU: 14 COMMAND: "elasticsearch[l"
...
crash> gdb bt
#0 <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further <== pid
29867's regcache in CPU14
crash> set ffff88003592cf10
crash> bt
PID: 835 TASK: ffff88003592cf10 CPU: 14 COMMAND: "kdmflush"
#0 [ffff880fd6fc7da8] __schedule at ffffffff816a8f65
...
crash> gdb bt
#0 0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff88003592cf10) at
kernel/sched/core.c:2527
#1 __schedule () at kernel/sched/core.c:3540 <== pid
835's regcache in CPU14
...
crash> info threads
...
crash> bt
PID: 29867 TASK: ffff88025b04af70 CPU: 14 COMMAND: "elasticsearch[l"
...
crash> gdb bt
#0 <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further <== pid
29867's regcache in CPU14
Frankly I would expect the task context remains to be pid 835's after
"info threads", because I previously typed the command "set XX" to
switch to
it, so I would assume the context stay unchange until I retype cmd "set YY".
What do you think?
Thanks,
Tao Liu
>
> I tried but was unsuccessful. Since I have changed your #3 patch a bit
> in my v1 patch series, such as gdb_change_cpu_context() ->
> gdb_change_thread_context(), I doubt that's the reason for failing.
>
> What I did is keeping "set_cpu" in thread.c:thread_command() as the
> gdb-10.2.patch describes in your #3 patch. But only one thread gets
> refreshed when I invoke "thread X", and no regcache refreshed when
> invoke "info threads".
If i understand clearly, "thread X" causing refresh for one thread/CPU
is expected, as we want only registers for "X" to be refreshed.
But 'info threads' not refreshing any regcache should be solved by the
restoring changes to gdb-10.2.patch to do the 'set_cpu' in the
thread_command.
Thanks Tao,
- Aditya Gupta
commit d1ad22747de0b6c9846ecc8ea746ee9a38c7dade
Author: Tao Liu <ltao(a)redhat.com>
Date: Wed Feb 14 10:44:54 2024 +0800
change thread context
Previously we can only view the stack unwinding for the tasks which are
running on each CPUs. This patch will enable the ability to view
arbitrary tasks stack unwinding.
After crash get initialized, "info threads" will output like the
following:
crash> info threads
Id Target Id Frame
1 CPU 0 native_safe_halt () at arch/x86/include/asm/irqflags.h:54
...
* 8 CPU 7 blk_mq_rq_timed_out (req=0xffff880fdb246000,
reserved=reserved@entry=false) at block/blk-mq.c:640
...
13 CPU 12 <unavailable> in ?? ()
14 CPU 13 native_safe_halt () at arch/x86/include/asm/irqflags.h:54
...
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffffffff819f9480 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffff880169411fa0 RU 0.0 0 0 [swapper/1]
...
0 0 23 ffff8801694e0000 RU 0.0 0 0 [swapper/23]
1 0 13 ffff880169b30000 IN 0.0 193052 4180 systemd
"info threads" show the tasks which are currently running on each CPU. If
we'd
like to view systemd, which are not running, task's stack unwinding, we
do the following:
crash> set 1
or
crash> set ffff880169b30000
Then the register cache of systemd will be swapped into CPU 13:
crash> info threads
crash> info threads
Id Target Id Frame
1 CPU 0 native_safe_halt () at arch/x86/include/asm/irqflags.h:54
...
8 CPU 7 blk_mq_rq_timed_out (req=0xffff880fdb246000,
reserved=reserved@entry=false) at block/blk-mq.c:640
...
13 CPU 12 <unavailable> in ?? ()
* 14 CPU 13 0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0,
prev=0xffff880169b30000) at kernel/sched/core.c:2527
...
And we can view the stack unwinding of systemd:
crash> bt
PID: 1 TASK: ffff880169b30000 CPU: 13 COMMAND: "systemd"
#0 [ffff880169b3bd58] __schedule at ffffffff816a8f65
#1 [ffff880169b3bdc0] schedule at ffffffff816a94e9
#2 [ffff880169b3bdd0] schedule_hrtimeout_range_clock at ffffffff816a86fd
#3 [ffff880169b3be68] schedule_hrtimeout_range at ffffffff816a8733
#4 [ffff880169b3be78] ep_poll at ffffffff8124bb7e
#5 [ffff880169b3bf30] sys_epoll_wait at ffffffff8124d00d
#6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009
RIP: 00007f0449407923 RSP: 00007ffc35a3c378 RFLAGS: 00010246
RAX: 00000000000000e8 RBX: ffffffff816b5009 RCX: 0000000000000071
RDX: 000000000000001d RSI: 00007ffc35a3d5a0 RDI: 0000000000000004
RBP: 00007ffc35a3d810 R8: 0000000000000000 R9: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000293 R12: 0000563ca2ebe980
R13: 0000000000000003 R14: ffffffffffffffff R15: 0000000000000001
ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b
crash> gdb bt
#0 0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000)
at kernel/sched/core.c:2527
#1 __schedule () at kernel/sched/core.c:3540
#2 0xffffffff816a94e9 in schedule () at kernel/sched/core.c:3577
#3 0xffffffff816a86fd in schedule_hrtimeout_range_clock (expires=expires@entry=0x0,
delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS, clock=clock@entry=1) at
kernel/hrtimer.c:1724
#4 0xffffffff816a8733 in schedule_hrtimeout_range (expires=expires@entry=0x0,
delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/hrtimer.c:1778
#5 0xffffffff8124bb7e in ep_poll (ep=0xffff880fd861f8c0,
events=events@entry=0x7ffc35a3d5a0, maxevents=maxevents@entry=29,
timeout=timeout@entry=-1) at fs/eventpoll.c:1669
#6 0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>,
maxevents=29, events=<optimized out>, epfd=<optimized out>) at
fs/eventpoll.c:2043
#7 SyS_epoll_wait (epfd=<optimized out>, events=140721208415648,
maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008
#8 <signal handler called>
#9 0x00007f0449407923 in ?? ()
Signed-off-by: Tao Liu <ltao(a)redhat.com>
Signed-off-by: Aditya Gupta <adityag(a)linux.ibm.com>
diff --git a/crash_target.c b/crash_target.c
index d06383f594aa..1df1e9d34a45 100644
--- a/crash_target.c
+++ b/crash_target.c
@@ -29,10 +29,10 @@ extern "C" int gdb_readmem_callback(unsigned long, void *,
int, int);
extern "C" int crash_get_nr_cpus(void);
extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
int regsize, void *val);
-extern "C" int gdb_change_cpu_context (unsigned int cpu);
extern "C" void gdb_refresh_regcache(unsigned int cpu);
extern "C" int set_cpu(int cpu, int print_context);
-
+extern "C" int crash_set_thread(ulong);
+extern "C" int gdb_change_thread_context (ulong task);
/* The crash target. */
@@ -110,11 +110,13 @@ crash_target::xfer_partial (enum target_object object, const char
*annex,
#define CRASH_INFERIOR_PID 1
+crash_target *target = NULL;
+
void
crash_target_init (void)
{
int nr_cpus = crash_get_nr_cpus();
- crash_target *target = new crash_target ();
+ target = new crash_target ();
/* Own the target until it is successfully pushed. */
target_ops_up target_holder (target);
@@ -137,27 +139,33 @@ crash_target_init (void)
reinit_frame_cache ();
}
-/*
- * Change gdb's thread context to the thread on given CPU
- **/
extern "C" int
-gdb_change_cpu_context(unsigned int cpu)
+gdb_change_thread_context (ulong task)
{
+ int tried = 0;
+ inferior* inf = current_inferior ();
+ int cpu = crash_set_thread(task);
+ if (cpu < 0)
+ return FALSE;
+
ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
- inferior *inf = current_inferior ();
+
+retry:
thread_info *tp = find_thread_ptid (inf, ptid);
+ if (tp == nullptr && !tried) {
+ thread_info *thread = add_thread_silent(target, ptid_t(CRASH_INFERIOR_PID, 0,
cpu));
+ tried++;
+ if (thread) {
+ goto retry;
+ }
+ }
- if (tp == nullptr)
+ if (tp == nullptr && tried)
return FALSE;
- /* Making sure that crash's context is same */
- set_cpu(cpu, FALSE);
-
- /* Switch to the thread */
+ target_fetch_registers(get_thread_regcache(tp), -1);
switch_to_thread(tp);
-
- /* Fetch/Refresh thread's registers */
- gdb_refresh_regcache(cpu);
+ reinit_frame_cache ();
return TRUE;
}
diff --git a/defs.h b/defs.h
index 49b606979d9e..d5cef621b465 100644
--- a/defs.h
+++ b/defs.h
@@ -8192,7 +8192,6 @@ enum ppc64_regnum {
};
/* crash_target.c */
-extern int gdb_change_cpu_context (unsigned int cpu);
extern void gdb_refresh_regcache (unsigned int cpu);
#endif /* !GDB_COMMON */
diff --git a/kernel.c b/kernel.c
index ea5b5cb32914..50832ed906e5 100644
--- a/kernel.c
+++ b/kernel.c
@@ -6544,6 +6544,29 @@ set_cpu(int cpu, int print_context)
show_context(CURRENT_CONTEXT());
}
+int
+crash_set_thread(ulong task)
+{
+ bool found = FALSE;
+ struct task_context *tc = FIRST_CONTEXT();
+
+ for (int i = 0; i < RUNNING_TASKS(); i++, tc++) {
+ if (tc->task == task) {
+ found = TRUE;
+ break;
+ }
+ }
+
+ if (!found)
+ return -1;
+
+ if (CURRENT_TASK() == tc->task)
+ return tc->processor;
+
+ set_context(tc->task, NO_PID);
+ return tc->processor;
+}
+
/*
* Collect the irq_desc[] entry along with its associated handler and
diff --git a/task.c b/task.c
index a405b05a47d1..ef79f533f11a 100644
--- a/task.c
+++ b/task.c
@@ -715,7 +715,8 @@ task_init(void)
* crash_target::fetch_registers, so CPU 0's registers are shown as
* <unavailable> in gdb mode
* */
- gdb_refresh_regcache(0);
+ for (int i = 0; i < get_cpus_online(); i++)
+ gdb_refresh_regcache(i);
tt->flags |= TASK_INIT_DONE;
}
@@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread)
/* change the selected thread in gdb, according to current context */
if (update_gdb_thread)
- return gdb_change_cpu_context(tc->processor);
+ return gdb_change_thread_context(tc->task);
else
return TRUE;
} else {