[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context

Sunday, 17 March 2024

Hi Tao,

On Fri, Mar 15, 2024 at 08:33:31PM +0800, Tao Liu wrote:
...
 Hi Aditya,
 
 > > As we can see, other cpus[1-6, 8-23] just take the reg cache of
 > > cpu[7], which is incorrect. And if users go further like, "thread
20"
 > > and "gdb bt", it will also give incorrect stack traces.
 > >
 > > The cpu cache will only get refreshed once user type "set
<pid>", so
 > > the cpu cache will be refreshed by the <pid> task's context.
 > >
 > > I doubt a user will understand all the details and constraints, so I'm
 > > afraid the user will be confused by the faulty output. But I also have
 > > no objection if the performance is the priority. Basically it is a
 > > balance of pays and gains. In addition, since cmd "info" and
"thread"
 > > is a command provided by gdb, currently I don't know how to hack
 > > those, so cpu cache can be refreshed when "info threads" or
"thread
 > > <num>" have been invoked.
 > >
 > > Do you have any thoughts?
 >
 > I also had faced that issue initially, ie. the other CPUs using up same
 > regcache, if all are not refreshed.
 > While iterating through all threads, gdb switches it's context
 > temporarily, while crash's context remained same, thus causing gdb to
 > get same registers for all threads other than 0.
 >
 > This was solved in patch #3 (synchronise cpu context changes between crash/gdb)
 > in the ppc's 'Improve stack unwind on ppc64' series, by syncing
gdb's
 > context with crash.
 >
 > Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ?
 > That will fix it.
 
 Could you share your patch, based on your v10 and my v1 patch series,
 so I can get a clue how to do this? 
Sure tao, i will attach it to the end of this mail.
Basically what I did is to revert changes to gdb-10.2.patch in this
patch. I pushed it along with testing with only regcache_refresh for CPU
0 instead of all CPUs, to:

https://github.com/adi-g15-ibm/crash/tree/tmp-test-branch-10928

...
 
 I tried but was unsuccessful. Since I have changed your #3 patch a bit
 in my v1 patch series, such as gdb_change_cpu_context() ->
 gdb_change_thread_context(), I doubt that's the reason for failing.
 
 What I did is keeping "set_cpu" in thread.c:thread_command() as the
 gdb-10.2.patch describes in your #3 patch. But only one thread gets
 refreshed when I invoke "thread X", and no regcache refreshed when
 invoke "info threads". 
If i understand clearly, "thread X" causing refresh for one thread/CPU
is expected, as we want only registers for "X" to be refreshed.
But 'info threads' not refreshing any regcache should be solved by the
restoring changes to gdb-10.2.patch to do the 'set_cpu' in the
thread_command.

Thanks Tao,

- Aditya Gupta

commit d1ad22747de0b6c9846ecc8ea746ee9a38c7dade
Author: Tao Liu <ltao(a)redhat.com&gt;
Date:   Wed Feb 14 10:44:54 2024 +0800

    change thread context
    
    Previously we can only view the stack unwinding for the tasks which are
    running on each CPUs. This patch will enable the ability to view
    arbitrary tasks stack unwinding.
    
    After crash get initialized, "info threads" will output like the
    following:
    
    crash> info threads
      Id   Target Id         Frame
      1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
    ...
    * 8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000,
reserved=reserved@entry=false) at block/blk-mq.c:640
    ...
      13   CPU 12            <unavailable> in ?? ()
      14   CPU 13            native_safe_halt () at arch/x86/include/asm/irqflags.h:54
    ...
    
    crash> ps
          PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
...
       0       0   0  ffffffff819f9480  RU   0.0        0        0 
[swapper/0]
       0       0   1  ffff880169411fa0  RU   0.0        0        0  [swapper/1]    
...
            0       0  23  ffff8801694e0000  RU   0.0        0        0  [swapper/23]
            1       0  13  ffff880169b30000  IN   0.0   193052     4180  systemd
    
    "info threads" show the tasks which are currently running on each CPU. If
we'd
    like to view systemd, which are not running, task's stack unwinding, we
    do the following:
    
    crash> set 1
    or
    crash> set ffff880169b30000
    
    Then the register cache of systemd will be swapped into CPU 13:
    
    crash> info threads
    crash> info threads
      Id   Target Id         Frame
      1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
    ...
      8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000,
reserved=reserved@entry=false) at block/blk-mq.c:640
    ...
      13   CPU 12            <unavailable> in ?? ()
    * 14   CPU 13            0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0,
prev=0xffff880169b30000) at kernel/sched/core.c:2527
    ...
    
    And we can view the stack unwinding of systemd:
    
    crash> bt
    PID: 1        TASK: ffff880169b30000  CPU: 13   COMMAND: "systemd"
     #0 [ffff880169b3bd58] __schedule at ffffffff816a8f65
     #1 [ffff880169b3bdc0] schedule at ffffffff816a94e9
     #2 [ffff880169b3bdd0] schedule_hrtimeout_range_clock at ffffffff816a86fd
     #3 [ffff880169b3be68] schedule_hrtimeout_range at ffffffff816a8733
     #4 [ffff880169b3be78] ep_poll at ffffffff8124bb7e
     #5 [ffff880169b3bf30] sys_epoll_wait at ffffffff8124d00d
     #6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009
        RIP: 00007f0449407923  RSP: 00007ffc35a3c378  RFLAGS: 00010246
        RAX: 00000000000000e8  RBX: ffffffff816b5009  RCX: 0000000000000071
        RDX: 000000000000001d  RSI: 00007ffc35a3d5a0  RDI: 0000000000000004
        RBP: 00007ffc35a3d810   R8: 0000000000000000   R9: 0000000000000000
        R10: 00000000ffffffff  R11: 0000000000000293  R12: 0000563ca2ebe980
        R13: 0000000000000003  R14: ffffffffffffffff  R15: 0000000000000001
        ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
    crash> gdb bt
     #0  0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000)
at kernel/sched/core.c:2527
     #1  __schedule () at kernel/sched/core.c:3540
     #2  0xffffffff816a94e9 in schedule () at kernel/sched/core.c:3577
     #3  0xffffffff816a86fd in schedule_hrtimeout_range_clock (expires=expires@entry=0x0,
delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS, clock=clock@entry=1) at
kernel/hrtimer.c:1724
     #4  0xffffffff816a8733 in schedule_hrtimeout_range (expires=expires@entry=0x0,
delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/hrtimer.c:1778
     #5  0xffffffff8124bb7e in ep_poll (ep=0xffff880fd861f8c0,
events=events@entry=0x7ffc35a3d5a0, maxevents=maxevents@entry=29,
timeout=timeout@entry=-1) at fs/eventpoll.c:1669
     #6  0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>,
maxevents=29, events=<optimized out>, epfd=<optimized out>) at
fs/eventpoll.c:2043
     #7  SyS_epoll_wait (epfd=<optimized out>, events=140721208415648, maxevents=29,
timeout=4294967295) at fs/eventpoll.c:2008
     #8  <signal handler called>
     #9  0x00007f0449407923 in ?? ()
    
    Signed-off-by: Tao Liu <ltao(a)redhat.com&gt;
    Signed-off-by: Aditya Gupta <adityag(a)linux.ibm.com&gt;

diff --git a/crash_target.c b/crash_target.c
index d06383f594aa..1df1e9d34a45 100644
--- a/crash_target.c
+++ b/crash_target.c
@@ -29,10 +29,10 @@ extern "C" int gdb_readmem_callback(unsigned long, void *,
int, int);
 extern "C" int crash_get_nr_cpus(void);
 extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
                                   int regsize, void *val);
-extern "C" int gdb_change_cpu_context (unsigned int cpu);
 extern "C" void gdb_refresh_regcache(unsigned int cpu);
 extern "C" int set_cpu(int cpu, int print_context);
-
+extern "C" int crash_set_thread(ulong);
+extern "C" int gdb_change_thread_context (ulong task);
 
 /* The crash target.  */
 
@@ -110,11 +110,13 @@ crash_target::xfer_partial (enum target_object object, const char
*annex,
 
 #define CRASH_INFERIOR_PID 1
 
+crash_target *target = NULL;
+
 void
 crash_target_init (void)
 {
   int nr_cpus = crash_get_nr_cpus();
-  crash_target *target = new crash_target ();
+  target = new crash_target ();
 
   /* Own the target until it is successfully pushed.  */
   target_ops_up target_holder (target);
@@ -137,27 +139,33 @@ crash_target_init (void)
   reinit_frame_cache ();
 }
 
-/*
- * Change gdb's thread context to the thread on given CPU
- **/
 extern "C" int
-gdb_change_cpu_context(unsigned int cpu)
+gdb_change_thread_context (ulong task)
 {
+  int tried = 0;
+  inferior* inf = current_inferior ();
+  int cpu = crash_set_thread(task);
+  if (cpu < 0)
+    return FALSE;
+
   ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
-  inferior *inf = current_inferior ();
+
+retry:
   thread_info *tp = find_thread_ptid (inf, ptid);
+  if (tp == nullptr && !tried) {
+    thread_info *thread = add_thread_silent(target, ptid_t(CRASH_INFERIOR_PID, 0, cpu));
+    tried++;
+    if (thread) {
+      goto retry;
+    }
+  }
 
-  if (tp == nullptr)
+  if (tp == nullptr && tried)
     return FALSE;
 
-  /* Making sure that crash's context is same */
-  set_cpu(cpu, FALSE);
-
-  /* Switch to the thread */
+  target_fetch_registers(get_thread_regcache(tp), -1);
   switch_to_thread(tp);
-
-  /* Fetch/Refresh thread's registers */
-  gdb_refresh_regcache(cpu);
+  reinit_frame_cache ();
 
   return TRUE;
 }
diff --git a/defs.h b/defs.h
index 49b606979d9e..d5cef621b465 100644
--- a/defs.h
+++ b/defs.h
@@ -8192,7 +8192,6 @@ enum ppc64_regnum {
 };
 
 /* crash_target.c */
-extern int gdb_change_cpu_context (unsigned int cpu);
 extern void gdb_refresh_regcache (unsigned int cpu);
 
 #endif /* !GDB_COMMON */
diff --git a/kernel.c b/kernel.c
index ea5b5cb32914..50832ed906e5 100644
--- a/kernel.c
+++ b/kernel.c
@@ -6544,6 +6544,29 @@ set_cpu(int cpu, int print_context)
 		show_context(CURRENT_CONTEXT());
 }
 
+int
+crash_set_thread(ulong task)
+{
+	bool found = FALSE;
+	struct task_context *tc = FIRST_CONTEXT();
+
+	for (int i = 0; i < RUNNING_TASKS(); i++, tc++) {
+		if (tc->task == task) {
+			found = TRUE;
+			break;
+		}
+	}
+
+	if (!found)
+		return -1;
+
+	if (CURRENT_TASK() == tc->task)
+		return tc->processor;
+
+	set_context(tc->task, NO_PID);
+	return tc->processor;
+}
+
 
 /*
  *  Collect the irq_desc[] entry along with its associated handler and
diff --git a/task.c b/task.c
index a405b05a47d1..ef79f533f11a 100644
--- a/task.c
+++ b/task.c
@@ -715,7 +715,8 @@ task_init(void)
 	 * crash_target::fetch_registers, so CPU 0's registers are shown as
 	 * <unavailable> in gdb mode
 	 * */
-	gdb_refresh_regcache(0);
+	for (int i = 0; i < get_cpus_online(); i++)
+		gdb_refresh_regcache(i);
 
 	tt->flags |= TASK_INIT_DONE;
 }
@@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread)
 
 		/* change the selected thread in gdb, according to current context */
 		if (update_gdb_thread)
-			return gdb_change_cpu_context(tc->processor);
+			return gdb_change_thread_context(tc->task);
 		else
 			return TRUE;
 	} else {

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context