[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context

Monday, 18 March 2024

Hi Aditya,

On Sun, Mar 17, 2024 at 03:07:44PM +0530, Aditya Gupta wrote:
...
 Hi Tao,

 On Fri, Mar 15, 2024 at 08:33:31PM +0800, Tao Liu wrote:
 > Hi Aditya,
 > 
 > > > As we can see, other cpus[1-6, 8-23] just take the reg cache of
 > > > cpu[7], which is incorrect. And if users go further like, "thread
20"
 > > > and "gdb bt", it will also give incorrect stack traces.
 > > >
 > > > The cpu cache will only get refreshed once user type "set
<pid>", so
 > > > the cpu cache will be refreshed by the <pid> task's context.
 > > >
 > > > I doubt a user will understand all the details and constraints, so
I'm
 > > > afraid the user will be confused by the faulty output. But I also have
 > > > no objection if the performance is the priority. Basically it is a
 > > > balance of pays and gains. In addition, since cmd "info" and
"thread"
 > > > is a command provided by gdb, currently I don't know how to hack
 > > > those, so cpu cache can be refreshed when "info threads" or
"thread
 > > > <num>" have been invoked.
 > > >
 > > > Do you have any thoughts?
 > >
 > > I also had faced that issue initially, ie. the other CPUs using up same
 > > regcache, if all are not refreshed.
 > > While iterating through all threads, gdb switches it's context
 > > temporarily, while crash's context remained same, thus causing gdb to
 > > get same registers for all threads other than 0.
 > >
 > > This was solved in patch #3 (synchronise cpu context changes between
crash/gdb)
 > > in the ppc's 'Improve stack unwind on ppc64' series, by syncing
gdb's
 > > context with crash.
 > >
 > > Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ?
 > > That will fix it.
 > 
 > Could you share your patch, based on your v10 and my v1 patch series,
 > so I can get a clue how to do this?

 Sure tao, i will attach it to the end of this mail.
 Basically what I did is to revert changes to gdb-10.2.patch in this
 patch. I pushed it along with testing with only regcache_refresh for CPU
 0 instead of all CPUs, to:

 https://github.com/adi-g15-ibm/crash/tree/tmp-test-branch-10928
  I have tried with your repo, and I noticed the following behaviour, not
sure if it is expected:

cmd "info threads" will always reflush the cpus regcache to be the active
tasks' right? E.g:

crash> thread 15
[Switching to thread 15 (CPU 14)]
#0  <unavailable> in ?? ()
crash> bt
PID: 29867    TASK: ffff88025b04af70  CPU: 14   COMMAND: "elasticsearch[l"
...
crash> gdb bt
#0  <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further <== pid
29867's regcache in CPU14
crash> set ffff88003592cf10
crash> bt
PID: 835      TASK: ffff88003592cf10  CPU: 14   COMMAND: "kdmflush"
 #0 [ffff880fd6fc7da8] __schedule at ffffffff816a8f65
...
crash> gdb bt
#0  0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff88003592cf10) at
kernel/sched/core.c:2527
#1  __schedule () at kernel/sched/core.c:3540                                 <== pid
835's regcache in CPU14
...
crash> info threads
...
crash> bt
PID: 29867    TASK: ffff88025b04af70  CPU: 14   COMMAND: "elasticsearch[l"
...
crash> gdb bt
#0  <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further <== pid
29867's regcache in CPU14

Frankly I would expect the task context remains to be pid 835's after
"info threads", because I previously typed the command "set XX" to
switch to
it, so I would assume the context stay unchange until I retype cmd "set YY".

What do you think?

Thanks,
Tao Liu

...
 > 
 > I tried but was unsuccessful. Since I have changed your #3 patch a bit
 > in my v1 patch series, such as gdb_change_cpu_context() ->
 > gdb_change_thread_context(), I doubt that's the reason for failing.
 > 
 > What I did is keeping "set_cpu" in thread.c:thread_command() as the
 > gdb-10.2.patch describes in your #3 patch. But only one thread gets
 > refreshed when I invoke "thread X", and no regcache refreshed when
 > invoke "info threads".

 If i understand clearly, "thread X" causing refresh for one thread/CPU
 is expected, as we want only registers for "X" to be refreshed.
 But 'info threads' not refreshing any regcache should be solved by the
 restoring changes to gdb-10.2.patch to do the 'set_cpu' in the
 thread_command.

 Thanks Tao,

 - Aditya Gupta

 commit d1ad22747de0b6c9846ecc8ea746ee9a38c7dade
 Author: Tao Liu <ltao(a)redhat.com&gt;
 Date:   Wed Feb 14 10:44:54 2024 +0800

     change thread context

     Previously we can only view the stack unwinding for the tasks which are
     running on each CPUs. This patch will enable the ability to view
     arbitrary tasks stack unwinding.

     After crash get initialized, "info threads" will output like the
     following:

     crash> info threads
       Id   Target Id         Frame
       1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
     ...
     * 8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000,
reserved=reserved@entry=false) at block/blk-mq.c:640
     ...
       13   CPU 12            <unavailable> in ?? ()
       14   CPU 13            native_safe_halt () at arch/x86/include/asm/irqflags.h:54
     ...

     crash> ps
           PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
     >       0       0   0  ffffffff819f9480  RU   0.0        0        0  [swapper/0]
     >       0       0   1  ffff880169411fa0  RU   0.0        0        0  [swapper/1]
     ...
             0       0  23  ffff8801694e0000  RU   0.0        0        0  [swapper/23]
             1       0  13  ffff880169b30000  IN   0.0   193052     4180  systemd

     "info threads" show the tasks which are currently running on each CPU. If
we'd
     like to view systemd, which are not running, task's stack unwinding, we
     do the following:

     crash> set 1
     or
     crash> set ffff880169b30000

     Then the register cache of systemd will be swapped into CPU 13:

     crash> info threads
     crash> info threads
       Id   Target Id         Frame
       1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
     ...
       8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000,
reserved=reserved@entry=false) at block/blk-mq.c:640
     ...
       13   CPU 12            <unavailable> in ?? ()
     * 14   CPU 13            0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0,
prev=0xffff880169b30000) at kernel/sched/core.c:2527
     ...

     And we can view the stack unwinding of systemd:

     crash> bt
     PID: 1        TASK: ffff880169b30000  CPU: 13   COMMAND: "systemd"
      #0 [ffff880169b3bd58] __schedule at ffffffff816a8f65
      #1 [ffff880169b3bdc0] schedule at ffffffff816a94e9
      #2 [ffff880169b3bdd0] schedule_hrtimeout_range_clock at ffffffff816a86fd
      #3 [ffff880169b3be68] schedule_hrtimeout_range at ffffffff816a8733
      #4 [ffff880169b3be78] ep_poll at ffffffff8124bb7e
      #5 [ffff880169b3bf30] sys_epoll_wait at ffffffff8124d00d
      #6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009
         RIP: 00007f0449407923  RSP: 00007ffc35a3c378  RFLAGS: 00010246
         RAX: 00000000000000e8  RBX: ffffffff816b5009  RCX: 0000000000000071
         RDX: 000000000000001d  RSI: 00007ffc35a3d5a0  RDI: 0000000000000004
         RBP: 00007ffc35a3d810   R8: 0000000000000000   R9: 0000000000000000
         R10: 00000000ffffffff  R11: 0000000000000293  R12: 0000563ca2ebe980
         R13: 0000000000000003  R14: ffffffffffffffff  R15: 0000000000000001
         ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
     crash> gdb bt
      #0  0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000)
at kernel/sched/core.c:2527
      #1  __schedule () at kernel/sched/core.c:3540
      #2  0xffffffff816a94e9 in schedule () at kernel/sched/core.c:3577
      #3  0xffffffff816a86fd in schedule_hrtimeout_range_clock (expires=expires@entry=0x0,
delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS, clock=clock@entry=1) at
kernel/hrtimer.c:1724
      #4  0xffffffff816a8733 in schedule_hrtimeout_range (expires=expires@entry=0x0,
delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/hrtimer.c:1778
      #5  0xffffffff8124bb7e in ep_poll (ep=0xffff880fd861f8c0,
events=events@entry=0x7ffc35a3d5a0, maxevents=maxevents@entry=29,
timeout=timeout@entry=-1) at fs/eventpoll.c:1669
      #6  0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>,
maxevents=29, events=<optimized out>, epfd=<optimized out>) at
fs/eventpoll.c:2043
      #7  SyS_epoll_wait (epfd=<optimized out>, events=140721208415648,
maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008
      #8  <signal handler called>
      #9  0x00007f0449407923 in ?? ()

     Signed-off-by: Tao Liu <ltao(a)redhat.com&gt;
     Signed-off-by: Aditya Gupta <adityag(a)linux.ibm.com&gt;

 diff --git a/crash_target.c b/crash_target.c
 index d06383f594aa..1df1e9d34a45 100644
 --- a/crash_target.c
 +++ b/crash_target.c
 @@ -29,10 +29,10 @@ extern "C" int gdb_readmem_callback(unsigned long, void *,
int, int);
  extern "C" int crash_get_nr_cpus(void);
  extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
                                    int regsize, void *val);
 -extern "C" int gdb_change_cpu_context (unsigned int cpu);
  extern "C" void gdb_refresh_regcache(unsigned int cpu);
  extern "C" int set_cpu(int cpu, int print_context);
 -
 +extern "C" int crash_set_thread(ulong);
 +extern "C" int gdb_change_thread_context (ulong task);

  /* The crash target.  */

 @@ -110,11 +110,13 @@ crash_target::xfer_partial (enum target_object object, const char
*annex,

  #define CRASH_INFERIOR_PID 1

 +crash_target *target = NULL;
 +
  void
  crash_target_init (void)
  {
    int nr_cpus = crash_get_nr_cpus();
 -  crash_target *target = new crash_target ();
 +  target = new crash_target ();

    /* Own the target until it is successfully pushed.  */
    target_ops_up target_holder (target);
 @@ -137,27 +139,33 @@ crash_target_init (void)
    reinit_frame_cache ();
  }

 -/*
 - * Change gdb's thread context to the thread on given CPU
 - **/
  extern "C" int
 -gdb_change_cpu_context(unsigned int cpu)
 +gdb_change_thread_context (ulong task)
  {
 +  int tried = 0;
 +  inferior* inf = current_inferior ();
 +  int cpu = crash_set_thread(task);
 +  if (cpu < 0)
 +    return FALSE;
 +
    ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
 -  inferior *inf = current_inferior ();
 +
 +retry:
    thread_info *tp = find_thread_ptid (inf, ptid);
 +  if (tp == nullptr && !tried) {
 +    thread_info *thread = add_thread_silent(target, ptid_t(CRASH_INFERIOR_PID, 0,
cpu));
 +    tried++;
 +    if (thread) {
 +      goto retry;
 +    }
 +  }

 -  if (tp == nullptr)
 +  if (tp == nullptr && tried)
      return FALSE;

 -  /* Making sure that crash's context is same */
 -  set_cpu(cpu, FALSE);
 -
 -  /* Switch to the thread */
 +  target_fetch_registers(get_thread_regcache(tp), -1);
    switch_to_thread(tp);
 -
 -  /* Fetch/Refresh thread's registers */
 -  gdb_refresh_regcache(cpu);
 +  reinit_frame_cache ();

    return TRUE;
  }
 diff --git a/defs.h b/defs.h
 index 49b606979d9e..d5cef621b465 100644
 --- a/defs.h
 +++ b/defs.h
 @@ -8192,7 +8192,6 @@ enum ppc64_regnum {
  };

  /* crash_target.c */
 -extern int gdb_change_cpu_context (unsigned int cpu);
  extern void gdb_refresh_regcache (unsigned int cpu);

  #endif /* !GDB_COMMON */
 diff --git a/kernel.c b/kernel.c
 index ea5b5cb32914..50832ed906e5 100644
 --- a/kernel.c
 +++ b/kernel.c
 @@ -6544,6 +6544,29 @@ set_cpu(int cpu, int print_context)
  		show_context(CURRENT_CONTEXT());
  }

 +int
 +crash_set_thread(ulong task)
 +{
 +	bool found = FALSE;
 +	struct task_context *tc = FIRST_CONTEXT();
 +
 +	for (int i = 0; i < RUNNING_TASKS(); i++, tc++) {
 +		if (tc->task == task) {
 +			found = TRUE;
 +			break;
 +		}
 +	}
 +
 +	if (!found)
 +		return -1;
 +
 +	if (CURRENT_TASK() == tc->task)
 +		return tc->processor;
 +
 +	set_context(tc->task, NO_PID);
 +	return tc->processor;
 +}
 +

  /*
   *  Collect the irq_desc[] entry along with its associated handler and
 diff --git a/task.c b/task.c
 index a405b05a47d1..ef79f533f11a 100644
 --- a/task.c
 +++ b/task.c
 @@ -715,7 +715,8 @@ task_init(void)
  	 * crash_target::fetch_registers, so CPU 0's registers are shown as
  	 * <unavailable> in gdb mode
  	 * */
 -	gdb_refresh_regcache(0);
 +	for (int i = 0; i < get_cpus_online(); i++)
 +		gdb_refresh_regcache(i);

  	tt->flags |= TASK_INIT_DONE;
  }
 @@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread)

  		/* change the selected thread in gdb, according to current context */
  		if (update_gdb_thread)
 -			return gdb_change_cpu_context(tc->processor);
 +			return gdb_change_thread_context(tc->task);
  		else
  			return TRUE;
  	} else {

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context