June 2025 - Crash-utility - Crash Utility List Archives

Re: [PATCH] Fix "kmem -p" option on Linux 6.16-rc1 and later kernels

by lijiang

Thank you for the fix, Kazu. On Wed, Jun 18, 2025 at 2:05 PM <devel-request(a)lists.crash-utility.osci.io> wrote: > Date: Tue, 17 Jun 2025 06:08:52 +0000 > From: HAGIO KAZUHITO(萩尾一仁) <k-hagio-ab(a)nec.com> > Subject: [Crash-utility] [PATCH] Fix "kmem -p" option on Linux > 6.16-rc1 and later kernels > To: "devel(a)lists.crash-utility.osci.io" > <devel(a)lists.crash-utility.osci.io> > Message-ID: <1750140529-10427-1-git-send-email-k-hagio-ab(a)nec.com> > Content-Type: text/plain; charset="iso-2022-jp" > > Kernel commit acc53a0b4c156 ("mm: rename page->index to > page->__folio_index"), which is contained in Linux 6.16-rc1 and later > kernels, renamed the member. Without the patch, the "kmem -p" option > fails with the following error: > > kmem: invalid structure member offset: page_index > FILE: memory.c LINE: 6016 FUNCTION: dump_mem_map_SPARSEMEM() > > Signed-off-by: Kazuhito Hagio <k-hagio-ab(a)nec.com> > --- > memory.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/memory.c b/memory.c > index 0d8d89862383..5cb8b58e2181 100644 > --- a/memory.c > +++ b/memory.c > @@ -531,6 +531,8 @@ vm_init(void) > ASSIGN_OFFSET(page_mapping) = MEMBER_OFFSET("page", > "_mapcount") + > STRUCT_SIZE("atomic_t") + sizeof(ulong); > MEMBER_OFFSET_INIT(page_index, "page", "index"); > + if (INVALID_MEMBER(page_index)) /* 6.16 and later */ > + MEMBER_OFFSET_INIT(page_index, "page", "__folio_index"); > This looks good to me, so: Ack. Thanks Lianbo > if (INVALID_MEMBER(page_index)) > ANON_MEMBER_OFFSET_INIT(page_index, "page", "index"); > MEMBER_OFFSET_INIT(page_buffers, "page", "buffers"); > -- > 2.31.1 >

1 week

1
0
0 / 0

[PATCH] vmcoreinfo: read vmcoreinfo using 'vmcoreinfo_data' when unavailable in elf note

by Aditya Gupta

Few vmcores don't have vmcoreinfo elf note, such as those created using virsh-dump. On architectures such as PowerPC64, vmcoreinfo is mandatory to fetch the first_vmalloc_address, for vmcores of upstream linux, since crash-utility commit: commit 5b24e363a898 ("get vmalloc start address from vmcoreinfo") Try reading from the 'vmcoreinfo_data' symbol instead, if the vmcoreinfo crash tries to read in case of diskdump/netdump is empty/missing. The approach to read 'vmcoreinfo_data' was used for a live kernel, which can be reused in the case of missing vmcoreinfo note also, as the 'vmcoreinfo_data' symbol is available with vmcore too Note though, till GDB interface is not initialised, reading from vmcoreinfo_data symbol is not done, so behaviour is same as previously with no vmcoreinfo (only till GDB interface is not initialised) Hence rename 'vmcoreinfo_read_string' in kernel.c to 'vmcoreinfo_read_from_memory', and use it in netdump.c and diskdump.c too. Reported-by: Anushree Mathur <anushree.mathur(a)linux.ibm.com> Reported-by: Kowshik Jois <kowsjois(a)linux.ibm.com> Tested-by: Anushree Mathur <anushree.mathur(a)linux.ibm.com> Tested-by: Kowshik Jois <kowsjois(a)linux.ibm.com> Signed-off-by: Aditya Gupta <adityag(a)linux.ibm.com> --- defs.h | 1 + diskdump.c | 18 ++++++++++++++++++ kernel.c | 17 ++++++++++++----- netdump.c | 19 +++++++++++++++++++ 4 files changed, 50 insertions(+), 5 deletions(-) diff --git a/defs.h b/defs.h index 2fdb4db56a05..fbd09e19103f 100644 --- a/defs.h +++ b/defs.h @@ -6213,6 +6213,7 @@ void dump_kernel_table(int); void dump_bt_info(struct bt_info *, char *where); void dump_log(int); void parse_kernel_version(char *); +char *vmcoreinfo_read_from_memory(const char *); #define LOG_LEVEL(v) ((v) & 0x07) #define SHOW_LOG_LEVEL (0x1) diff --git a/diskdump.c b/diskdump.c index ce3cbb7b12dd..3be56248c7a9 100644 --- a/diskdump.c +++ b/diskdump.c @@ -1041,6 +1041,13 @@ pfn_to_pos(ulong pfn) return desc_pos; } +/** + * Check if vmcoreinfo in vmcore is missing/empty + */ +static bool is_vmcoreinfo_empty(void) +{ + return (dd->sub_header_kdump->size_vmcoreinfo == 0); +} /* * Determine whether a file is a diskdump creation, and if TRUE, @@ -1088,6 +1095,17 @@ is_diskdump(char *file) pc->read_vmcoreinfo = vmcoreinfo_read_string; + /* + * vmcoreinfo can be empty in case of dump collected via virsh-dump + * + * check if vmcoreinfo is not available in vmcore, and try to read + * the vmcoreinfo from memory, using "vmcoreinfo_data" symbol + */ + if (is_vmcoreinfo_empty()) { + error(WARNING, "vmcoreinfo is empty, will read from symbols\n"); + pc->read_vmcoreinfo = vmcoreinfo_read_from_memory; + } + if ((pc->flags2 & GET_LOG) && KDUMP_CMPRS_VALID()) { pc->dfd = dd->dfd; pc->readmem = read_diskdump; diff --git a/kernel.c b/kernel.c index b8d3b7999974..b296487ea036 100644 --- a/kernel.c +++ b/kernel.c @@ -99,7 +99,6 @@ static ulong dump_audit_skb_queue(ulong); static ulong __dump_audit(char *); static void dump_audit(void); static void dump_printk_safe_seq_buf(int); -static char *vmcoreinfo_read_string(const char *); static void check_vmcoreinfo(void); static int is_pvops_xen(void); static int get_linux_banner_from_vmlinux(char *, size_t); @@ -11892,8 +11891,8 @@ dump_printk_safe_seq_buf(int msg_flags) * Returns a string (that has to be freed by the caller) that contains the * value for key or NULL if the key has not been found. */ -static char * -vmcoreinfo_read_string(const char *key) +char * +vmcoreinfo_read_from_memory(const char *key) { char *buf, *value_string, *p1, *p2; size_t value_length; @@ -11903,6 +11902,14 @@ vmcoreinfo_read_string(const char *key) buf = value_string = NULL; + if (!(pc->flags & GDB_INIT)) { + /* + * GDB interface hasn't been initialised yet, so can't + * access vmcoreinfo_data + */ + return NULL; + } + switch (get_symbol_type("vmcoreinfo_data", NULL, NULL)) { case TYPE_CODE_PTR: @@ -11958,10 +11965,10 @@ check_vmcoreinfo(void) switch (get_symbol_type("vmcoreinfo_data", NULL, NULL)) { case TYPE_CODE_PTR: - pc->read_vmcoreinfo = vmcoreinfo_read_string; + pc->read_vmcoreinfo = vmcoreinfo_read_from_memory; break; case TYPE_CODE_ARRAY: - pc->read_vmcoreinfo = vmcoreinfo_read_string; + pc->read_vmcoreinfo = vmcoreinfo_read_from_memory; break; } } diff --git a/netdump.c b/netdump.c index c7ff009e7f90..c9f0e4eaa580 100644 --- a/netdump.c +++ b/netdump.c @@ -111,6 +111,14 @@ map_cpus_to_prstatus(void) FREEBUF(nt_ptr); } +/** + * Check if vmcoreinfo in vmcore is missing/empty + */ +static bool is_vmcoreinfo_empty(void) +{ + return (nd->size_vmcoreinfo == 0); +} + /* * Determine whether a file is a netdump/diskdump/kdump creation, * and if TRUE, initialize the vmcore_data structure. @@ -464,6 +472,17 @@ is_netdump(char *file, ulong source_query) pc->read_vmcoreinfo = vmcoreinfo_read_string; + /* + * vmcoreinfo can be empty in case of dump collected via virsh-dump + * + * check if vmcoreinfo is not available in vmcore, and try to read + * the vmcoreinfo from memory, using "vmcoreinfo_data" symbol + */ + if (is_vmcoreinfo_empty()) { + error(WARNING, "vmcoreinfo is empty, will read from symbols\n"); + pc->read_vmcoreinfo = vmcoreinfo_read_from_memory; + } + if ((source_query == KDUMP_LOCAL) && (pc->flags2 & GET_OSRELEASE)) kdump_get_osrelease(); -- 2.49.0

1 week, 1 day

1
0
0 / 0

[PATCH RFC][makedumpfile 00/10] btf/kallsyms based eppic extension for mm page filtering

by Tao Liu

A) This patchset will introduce the following features to makedumpfile: 1) Enable eppic script for memory pages filtering. 2) Enable btf and kallsyms for symbol type and address resolving. 3) Port maple tree data structures and functions, primarily used for vma iteration. B) The purpose of the features are: 1) Currently makedumpfile filters mm pages based on page flags, because flags can help to determine one page's usage. But this page-flag-checking method lacks of flexibility in certain cases, e.g. if we want to filter those mm pages occupied by GPU during vmcore dumping due to: a) GPU may be taking a large memory and contains sensitive data; b) GPU mm pages have no relations to kernel crash and useless for vmcore analysis. But there is no GPU mm page specific flags, and apparently we don't need to create one just for kdump use. A programmable filtering tool is more suitable for such cases. In addition, different GPU vendors may use different ways for mm pages allocating, programmable filtering is better than hard coding these GPU specific logics into makedumpfile in this case. 2) Currently makedumpfile already contains a programmable filtering tool, aka eppic script, which allows user to write customized code for data erasing. However it has the following drawbacks: a) cannot do mm page filtering. b) need to access to debuginfo of both kernel and modules, which is not applicable in the 2nd kernel. c) Poor performance, making vmcore dumping time unacceptable (See the following performance testing). makedumpfile need to resolve the dwarf data from debuginfo, to get symbols types and addresses. In recent kernel there are dwarf alternatives such as btf/kallsyms which can be used for this purpose. And btf/kallsyms info are already packed within vmcore, so we can use it directly. 3) Maple tree data structures are used in recent kernels, such as vma iteration. So maple tree poring is needed. With these, this patchset introduces an upgraded eppic, which is based on btf/kallsyms symbol resolving, and is programmable for mm page filtering. The following info shows its usage and performance, please note the tests are performed in 1st kernel: $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/dwarf.out -x /lib/debug/lib/modules/6.11.8-300.fc41.x86_64/vmlinux --eppic eppic_scripts/filter_amdgpu_mm_pages.c real 14m6.894s user 4m16.900s sys 9m44.695s $ time ./makedumpfile -d 31 -l /var/crash/127.0.0.1-2025-06-10-18\:03\:12/vmcore /tmp/btf.out --eppic eppic_scripts/filter_amdgpu_mm_pages.c real 0m10.672s user 0m9.270s sys 0m1.130s -rw------- 1 root root 367475074 Jun 10 18:06 btf.out -rw------- 1 root root 367475074 Jun 10 21:05 dwarf.out -rw-rw-rw- 1 root root 387181418 Jun 10 18:03 /var/crash/127.0.0.1-2025-06-10-18:03:12/vmcore C) Discussion: 1) GPU types: Currently only tested with amdgpu's mm page filtering, others are not tested. 2) Code structure: There are some similar code shared by makedumpfile and crash, such as maple tree data structure, also I planed to port the btf/kallsyms code to crash as well, so there are code duplications for crash & makedumpfile. Since I havn't working on crash poring, code change on btf/kallsyms is expected. How can we share the code, creating a common library or keep the duplication as it is? 3) OS: The code can work on rhel-10+/rhel9.5+ on x86_64/arm64/s390/ppc64. Others are not tested. D) Testing: 1) If you don't want to create your vmcore, you can find a vmcore which I created with amdgpu mm pages unfiltered [1], the amdgpu mm pages are allocated by program [2]. You can use the vmcore in 1st kernel to filter the amdgpu mm pages by the previous performance testing cmdline. To verify the pages are filtered in crash: Unfiltered: crash> search -c "!QAZXSW@#EDC" ffff96b7fa800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ffff96b87c800000: !QAZXSW@#EDCXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX crash> rd ffff96b7fa800000 ffff96b7fa800000: 405753585a415121 !QAZXSW@ crash> rd ffff96b87c800000 ffff96b87c800000: 405753585a415121 !QAZXSW@ Filtered: crash> search -c "!QAZXSW@#EDC" crash> rd ffff96b7fa800000 rd: page excluded: kernel virtual address: ffff96b7fa800000 type: "64-bit KVADDR" crash> rd ffff96b87c800000 rd: page excluded: kernel virtual address: ffff96b87c800000 type: "64-bit KVADDR" 2) You can use eppic_scripts/print_all_vma.c against an ordinary vmcore to test only btf/kallsyms functions by output all VMAs if no amdgpu vmcores/machine avaliable. [1]: https://people.redhat.com/~ltao/core/ [2]: https://gist.github.com/liutgnu/a8cbce1c666452f1530e1410d1f352df Tao Liu (10): dwarf_info: Support kernel address randomization dwarf_info: Fix a infinite recursion bug for search_domain Add page filtering function Add btf/kallsyms support for symbol type/address resolving Export necessary btf/kallsyms functions to eppic extension Port the maple tree data structures and functions Supporting main() as the entry of eppic script Enable page filtering for dwarf eppic Enable page filtering for btf/kallsyms eppic Introducing 2 eppic scripts to test the dwarf/btf eppic extension Makefile | 6 +- btf.c | 919 +++++++++++++++++++++++++ btf.h | 176 +++++ dwarf_info.c | 15 +- eppic_maple.c | 431 ++++++++++++ eppic_maple.h | 8 + eppic_scripts/filter_amdgpu_mm_pages.c | 36 + eppic_scripts/print_all_vma.c | 29 + erase_info.c | 123 +++- erase_info.h | 22 + extension_btf.c | 218 ++++++ extension_eppic.c | 41 +- extension_eppic.h | 6 +- kallsyms.c | 371 ++++++++++ kallsyms.h | 42 ++ makedumpfile.c | 21 +- makedumpfile.h | 11 + 17 files changed, 2448 insertions(+), 27 deletions(-) create mode 100644 btf.c create mode 100644 btf.h create mode 100644 eppic_maple.c create mode 100644 eppic_maple.h create mode 100644 eppic_scripts/filter_amdgpu_mm_pages.c create mode 100644 eppic_scripts/print_all_vma.c create mode 100644 extension_btf.c create mode 100644 kallsyms.c create mode 100644 kallsyms.h -- 2.47.0

2 weeks, 2 days

1
10
0 / 0

Re: [PATCH] Fix the "ps -m" command shows wrong duration of RU task

by lijiang

On Mon, May 26, 2025 at 5:04 PM <devel-request(a)lists.crash-utility.osci.io> wrote: > Date: Fri, 23 May 2025 05:23:40 +0000 > From: HAGIO KAZUHITO(萩尾一仁) <k-hagio-ab(a)nec.com> > Subject: [Crash-utility] Re: [PATCH] Fix the "ps -m" command shows > wrong duration of RU task > To: Tao Liu <ltao(a)redhat.com>, Ke Yin <kyin(a)redhat.com> > Cc: "devel(a)lists.crash-utility.osci.io" > <devel(a)lists.crash-utility.osci.io> > Message-ID: <23c74e07-d439-422a-bbea-8b2bf49b38f1(a)nec.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > On 2025/05/15 8:43, Tao Liu wrote: > > Hi Ke Yin, > > > > On Wed, May 14, 2025 at 8:58 PM Ke Yin <kyin(a)redhat.com> wrote: > >> > >> Hi Tao Liu & Kazu, > >> > >> Thanks for replying and sharing your thoughts. > >> > >> After a quick review of crash tool code, I found: > >> > >> runq -m will call dump_on_rq_milliseconds() to print the amount > >> of time that the active task on each cpu has been running, > >> but only for the current running task. > >> > >> runq -d will call dump_on_rq_tasks() to print all tasks in the run queue > >> and the task running on cpu without calling translate_nanoseconds(). > >> > >> My preliminary idea is to combine these two functions and add a new > >> parameter, for example -q, to print the tasks on each cpu that has > >> been waiting in the run queue only. And as well as update doc of runq. > >> > >> In short: > >> runq -q will call new_function which is the modified function based on > dump_on_rq_tasks() (skip current + translate_nanoseconds). > >> > >> What do you think? > > I didn't know the "runq -d" option because it's a kind of debugging > option and has no description in the help page. Also it searches all > tasks for ones that have on_rq = 1 and doesn't look very efficient > (nr_tasks * nr_cpus). so ideally, maybe a new function should be based > on dump_runq() than based on dump_on_rq_tasks(), if possible.. > > Looks like getting a solution: adding a new option(E.g: runq -q) to achieve this purpose? Or need more discussion? Any update? Thanks Lianbo > Thanks, > Kazu > > > > > > I'm OK with your idea in general. Please check if I understood > > correctly, your implementation is like: > > cmd_runq() { > > ... > > if (-d option) { > > dump_on_rq_tasks(old path); > > } else if (-q option) { > > dump_on_rq_tasks(new path); > > } > > } > > > > dump_on_rq_tasks(option) > > { > > ... > > for (i = 0; i < RUNNING_TASKS(); i++, tc++) { > > if (old path) // Old path stay unchanged > > dump_task_runq_entry(tc, 0); > > else // New path will output your time duration > > your_new_function_with_translate_nanoseconds(); > > } > > } > > > > Thanks, > > Tao Liu > > > >> > >> Thanks > >> Kenneth Yin > >> > >> > >> > >> > >> On Mon, May 12, 2025 at 1:36 PM Tao Liu <ltao(a)redhat.com> wrote: > >>> > >>> Hi Kazu & Kenneth, > >>> > >>> Sorry for the late reply, and thanks for your fix and comments! > >>> > >>> On Thu, May 8, 2025 at 12:20 PM HAGIO KAZUHITO(萩尾一仁) > >>> <k-hagio-ab(a)nec.com> wrote: > >>>> > >>>> On 2025/05/07 16:16, HAGIO KAZUHITO(萩尾一仁) wrote: > >>>>> Hi, > >>>>> > >>>>> On 2025/04/28 19:38, Kenneth Yin wrote: > >>>>>> The RU/TASK_RUNNING stat means the task is runnable. > >>>>>> It is either currently running or on a run queue waiting to run. > >>>>>> > >>>>>> Currently, the crash tool uses the "rq_clock - > sched_info->last_arrival" formula to > >>>>>> calculate the duration of task in RU state. This is for the > scenario of a task running on a CPU. > >>>>> > >>>>> The "ps -l" and "ps -m" options display what their help text > describes, > >>>>> not the duration of task in RU state. Please see "help ps". > >>>>> > >>>>> Also, tasks are sorted by the value, using different values for it > could > >>>>> make another confusion. > >>>>> > >>>>> The options have been used for a long time with the current code, if > we > >>>>> change the semantics of the options, it would be better to be > careful. > >>>>> The change might lose a kind of information instead of getting > another > >>>>> kind of information. > >>>>> > >>>>> On the other hand, I think that the duration of waiting in queue > might > >>>>> also be useful information. I'm not sure how we should display them, > >>>>> but for example, how about adding a new option or adding a column for > >>>>> last_queued? > >>>> > >>>> I thought of that the "runq" command might be suitable to display the > >>>> waiting duration, because only tasks in the run queues have it. For > >>>> example, extending the "runq -m" option or adding a new option. just > my > >>>> thought. > >>>> > >>>> Thanks, > >>>> Kazu > >>>> > >>>>> > >>>>> What do you think, folks? > >>>>> > >>>>> Thanks, > >>>>> Kazu > >>>>> > >>>>>> > >>>>>> But for the scenario of a task waiting in the CPU run queue (due > to some reason > >>>>>> for example cfs/rt queue throttled), this formula could cause > misunderstanding. > >>>>>> > >>>>>> For example: > >>>>>> [ 220 10:36:38.026] [RU] PID: 12345 TASK: ffff8d674ab6b180 CPU: > 1 COMMAND: "task" > >>>>>> > >>>>>> Looking closer: > >>>>>> > >>>>>> crash> rq.clock ffff8de438a5acc0 > >>>>>> clock = 87029229985307234, > >>>>>> > >>>>>> crash> task -R sched_info,se.exec_start > >>>>>> PID: 12345 TASK: ffff8d674ab6b180 CPU: 1 COMMAND: "task" > >>>>>> sched_info = { > >>>>>> pcount = 33, > >>>>>> run_delay = 0, > >>>>>> last_arrival = 67983031958439673, > >>>>>> last_queued = 87029224561119369 > >>>>>> }, > >>>>>> se.exec_start = 67983031958476937, > >>>>>> > >>>>>> 67983031 67983031 87029224 > 87029229 > >>>>>> |<- running on CPU ->| <- IN ->|<- waiting in queue > ->| > >>>>>> > >>>>>> For this scenario, the "task" was waiting in the run queue of the > CPU only for 5 seconds, > >>>>>> we should use the "rq_clock - sched_info->last_queued" formula. > >>> > >>> Please check if my understanding is correct: > >>> > >>> The result you saw is "rq_clock - sched_info->last_arrival == 87029229 > >>> - 67983031 == 19046198" > >>> The expected result you want is: "rq_clock - sched_info->last_queued > >>> == 87029229 - 87029224 == 5" > >>> > >>> You think the 19046198 value is misleading and should be 5 which only > >>> contains the waiting in queue duration, am I correct? > >>> > >>> I agree with Kazu's idea, that we shouldn't change the existing ps > >>> cmd's behaviour, and runq is a better alternative for the > >>> waiting-in-queue duration display. > >>> > >>> What do you think? Could you please improve your code as well as an > >>> updated "help runq" doc for runq? > >>> > >>> Thanks, > >>> Tao Liu > >>> > >>>>>> > >>>>>> We can trust sched_info->last_queued as it is only set when the > task enters the CPU run queue. > >>>>>> Furthermore, when the task hits/runs on a CPU or dequeues the CPU > run queue, it will be reset to 0. > >>>>>> > >>>>>> Therefore, my idea is simple: > >>>>>> > >>>>>> If a task in RU stat and sched_info->last_queued has value (!= 0), > >>>>>> it means this task is waiting in the run queue, use "rq_clock - > sched_info->last_queued". > >>>>>> > >>>>>> Otherwise, if a task in RU stat and sched_info->last_queued = 0 > >>>>>> and sched_info->last_arrival has value (it must be), it means this > task is running on the CPU, > >>>>>> use "rq_clock - sched_info->last_arrival". > >>>>>> > >>>>>> Signed-off-by: Kenneth Yin <kyin(a)redhat.com> > >>>>>> --- > >>>>>> defs.h | 1 + > >>>>>> symbols.c | 2 ++ > >>>>>> task.c | 21 +++++++++++++++------ > >>>>>> 3 files changed, 18 insertions(+), 6 deletions(-) > >>>>>> > >>>>>> diff --git a/defs.h b/defs.h > >>>>>> index 4cf169c..66f5ce4 100644 > >>>>>> --- a/defs.h > >>>>>> +++ b/defs.h > >>>>>> @@ -1787,6 +1787,7 @@ struct offset_table { /* > stash of commonly-used offsets */ > >>>>>> long vcpu_struct_rq; > >>>>>> long task_struct_sched_info; > >>>>>> long sched_info_last_arrival; > >>>>>> + long sched_info_last_queued; > >>>>>> long page_objects; > >>>>>> long kmem_cache_oo; > >>>>>> long char_device_struct_cdev; > >>>>>> diff --git a/symbols.c b/symbols.c > >>>>>> index e30fafe..fb5035f 100644 > >>>>>> --- a/symbols.c > >>>>>> +++ b/symbols.c > >>>>>> @@ -9930,6 +9930,8 @@ dump_offset_table(char *spec, ulong > makestruct) > >>>>>> OFFSET(sched_rt_entity_run_list)); > >>>>>> fprintf(fp, " sched_info_last_arrival: %ld\n", > >>>>>> OFFSET(sched_info_last_arrival)); > >>>>>> + fprintf(fp, " sched_info_last_queued: %ld\n", > >>>>>> + OFFSET(sched_info_last_queued)); > >>>>>> fprintf(fp, " task_struct_thread_info: %ld\n", > >>>>>> OFFSET(task_struct_thread_info)); > >>>>>> fprintf(fp, " task_struct_stack: %ld\n", > >>>>>> diff --git a/task.c b/task.c > >>>>>> index 3bafe79..f5386ac 100644 > >>>>>> --- a/task.c > >>>>>> +++ b/task.c > >>>>>> @@ -332,9 +332,12 @@ task_init(void) > >>>>>> MEMBER_OFFSET_INIT(task_struct_last_run, "task_struct", > "last_run"); > >>>>>> MEMBER_OFFSET_INIT(task_struct_timestamp, > "task_struct", "timestamp"); > >>>>>> MEMBER_OFFSET_INIT(task_struct_sched_info, > "task_struct", "sched_info"); > >>>>>> - if (VALID_MEMBER(task_struct_sched_info)) > >>>>>> + if (VALID_MEMBER(task_struct_sched_info)) { > >>>>>> MEMBER_OFFSET_INIT(sched_info_last_arrival, > >>>>>> "sched_info", "last_arrival"); > >>>>>> + MEMBER_OFFSET_INIT(sched_info_last_queued, > >>>>>> + "sched_info", "last_queued"); > >>>>>> + } > >>>>>> if (VALID_MEMBER(task_struct_last_run) || > >>>>>> VALID_MEMBER(task_struct_timestamp) || > >>>>>> VALID_MEMBER(sched_info_last_arrival)) { > >>>>>> @@ -6035,7 +6038,7 @@ ulonglong > >>>>>> task_last_run(ulong task) > >>>>>> { > >>>>>> ulong last_run; > >>>>>> - ulonglong timestamp; > >>>>>> + ulonglong timestamp,last_queued; > >>>>>> > >>>>>> timestamp = 0; > >>>>>> fill_task_struct(task); > >>>>>> @@ -6047,10 +6050,16 @@ task_last_run(ulong task) > >>>>>> } else if (VALID_MEMBER(task_struct_timestamp)) > >>>>>> timestamp = tt->last_task_read ? > ULONGLONG(tt->task_struct + > >>>>>> OFFSET(task_struct_timestamp)) : 0; > >>>>>> - else if (VALID_MEMBER(sched_info_last_arrival)) > >>>>>> - timestamp = tt->last_task_read ? > ULONGLONG(tt->task_struct + > >>>>>> - OFFSET(task_struct_sched_info) + > >>>>>> - OFFSET(sched_info_last_arrival)) : 0; > >>>>>> + else if (VALID_MEMBER(sched_info_last_queued)) > >>>>>> + last_queued = ULONGLONG(tt->task_struct + > >>>>>> + OFFSET(task_struct_sched_info) + > >>>>>> + OFFSET(sched_info_last_queued)); > >>>>>> + if (last_queued != 0) { > >>>>>> + timestamp = tt->last_task_read ? last_queued : > 0; > >>>>>> + } else if (VALID_MEMBER(sched_info_last_arrival)) > >>>>>> + timestamp = tt->last_task_read ? > ULONGLONG(tt->task_struct + > >>>>>> + OFFSET(task_struct_sched_info) + > >>>>>> + OFFSET(sched_info_last_arrival)) : 0; > >>>>>> > >>>>>> return timestamp; > >>>>>> } > >>>>> -- > >>>>> Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io > >>>>> To unsubscribe send an email to > devel-leave(a)lists.crash-utility.osci.io > >>>>> https://${domain_name}/admin/lists/ > devel.lists.crash-utility.osci.io/ > >>>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki > >>>> -- > >>>> Crash-utility mailing list -- devel(a)lists.crash-utility.osci.io > >>>> To unsubscribe send an email to > devel-leave(a)lists.crash-utility.osci.io > >>>> https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ > >>>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki > >>> > >> > >> > >> -- > >> Kenneth Yin > >> Senior Software Maintenance Engineer > >> Customer Experience and Engagement > >> Phone: +86-10-6533-9459 > >> Red Hat China >

2 weeks, 3 days

2
1
0 / 0

Re: [PATCH v2 0/3] Update to gdb-16.2

by lijiang

On Thu, Feb 6, 2025 at 3:05 PM <devel-request(a)lists.crash-utility.osci.io> wrote: > Date: Thu, 6 Feb 2025 20:04:11 +1300 > From: Tao Liu <ltao(a)redhat.com> > Subject: [Crash-utility] [PATCH v2 0/3] Update to gdb-16.2 > To: devel(a)lists.crash-utility.osci.io > Cc: Tao Liu <ltao(a)redhat.com> > Message-ID: <20250206070418.1038668-1-ltao(a)redhat.com> > Content-Type: text/plain; charset="US-ASCII"; x-default=true > > v2 -> v1: Splite the "Fix several build failures" patchset, which is > already merged btw, from gdb v16.2 upgrading. > Thank you for the update, Tao. I will have a look next week. Thanks Lianbo > > Tao Liu (3): > LoongArch64: Revert all previous LoongArch64 related commits > Revert: Fix C99 compatibility issues in embedded copy of GDB > Update to gdb-16.2 > > Makefile | 10 +- > README | 6 +- > configure.c | 64 +- > crash.8 | 2 +- > crash_target.c | 6 +- > defs.h | 174 +- > diskdump.c | 24 +- > gdb-10.2.patch | 16323 ------------------------------------------ > gdb-16.2.patch | 2254 ++++++ > gdb_interface.c | 14 +- > help.c | 15 +- > kernel.c | 6 +- > lkcd_vmdump_v1.h | 2 +- > lkcd_vmdump_v2_v3.h | 5 +- > loongarch64.c | 1368 ---- > main.c | 3 +- > netdump.c | 27 +- > ramdump.c | 2 - > symbols.c | 37 +- > 19 files changed, 2322 insertions(+), 18020 deletions(-) > delete mode 100644 gdb-10.2.patch > create mode 100644 gdb-16.2.patch > delete mode 100644 loongarch64.c > > -- > 2.47.0 >

2 weeks, 3 days

1
1
0 / 0

[Crash-utility][PATCH] Fix crash initialization failure on LoongArch with recent GDB versions

by Ming Wang

The crash tool failed to initialize on LoongArch64 when using GDB 16.2 (and likely other recent GDB versions that have enhanced LoongArch support) due to the error: "fatal error: buffer size is not enough to fit register value". This occurs in supply_registers() because GDB now correctly reports the size of LoongArch LASX (256-bit) vector registers (xr0-xr31) as 32 bytes. The `regval` buffer in `crash_target.c` was previously fixed at 16 bytes. This patch increases the `regval` buffer size to 32 bytes to accommodate the largest LoongArch registers reported by GDB. This allows crash to initialize successfully. Signed-off-by: Ming Wang <wangming01(a)loongson.cn> --- crash_target.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crash_target.c b/crash_target.c index 5966b7b..d93d58c 100644 --- a/crash_target.c +++ b/crash_target.c @@ -71,7 +71,7 @@ public: static void supply_registers(struct regcache *regcache, int regno) { - gdb_byte regval[16]; + gdb_byte regval[32]; struct gdbarch *arch = regcache->arch (); const char *regname = gdbarch_register_name(arch, regno); int regsize = register_size(arch, regno); -- 2.41.3

2 weeks, 3 days

1
0
0 / 0

Re: [PATCH v2] Use CC env var to get compiler version

by lijiang

Hi, Kéléfa Sané Thank you for the patch. On Mon, May 26, 2025 at 5:04 PM <devel-request(a)lists.crash-utility.osci.io> wrote: > Date: Mon, 26 May 2025 11:03:24 +0200 > From: kelefa.sane(a)smile.fr > Subject: [Crash-utility] [meta-oe][PATCH v2] Use CC env var to get > compiler version > To: devel(a)lists.crash-utility.osci.io > Cc: Kéléfa Sané <kelefa.sane(a)smile.fr> > Message-ID: <20250526090324.3113589-1-kelefa.sane(a)smile.fr> > Content-Type: text/plain; charset=UTF-8 > > From: Kéléfa Sané <kelefa.sane(a)smile.fr> > > The source file build_data.c generated at compilation time define a > variable compiler_version which is obtained by calling "gcc --version" > cmd. This call retrieve the native gcc compiler install on host build > machine but not necessarily the compiler use to build the project (ex: > cross compilation). > Good findings. > > The CC env variable commonly used in Makefile project define the > compiler to use at build, so this is the appropriate way to retrieve the > compiler version, when the CC env var is define. > If the CC env variable is not set, this is still a problem. We should not expect that the CC env variable is always defined(or set). I would suggest parsing the GDB_CONF_FLAGS to get the target gcc version, which is visible in the configure.c What do you think? Thanks Lianbo > Signed-off-by: Kéléfa Sané <kelefa.sane(a)smile.fr> > --- > configure.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/configure.c b/configure.c > index 4668c9a..4b65bd7 100644 > --- a/configure.c > +++ b/configure.c > @@ -1362,7 +1362,17 @@ make_build_data(char *target) > > fp1 = popen("date", "r"); > fp2 = popen("id", "r"); > - fp3 = popen("gcc --version", "r"); > + > + const char *cc_env = getenv("CC"); > + if(NULL == cc_env) { > + fp3 = popen("gcc --version", "r"); > + } > + else { > + char compiler_version_cmd[512]; > + > + snprintf(compiler_version_cmd, > sizeof(compiler_version_cmd), "%s --version", cc_env); > + fp3 = popen(compiler_version_cmd, "r"); > + } > > if ((fp4 = fopen("build_data.c", "w")) == NULL) { > perror("build_data.c"); > > > >

3 weeks

1
0
0 / 0

Re: [PATCH] Fix incorrect task state during exit

by lijiang

On Tue, May 13, 2025 at 1:26 AM <devel-request(a)lists.crash-utility.osci.io> wrote: > Date: Mon, 12 May 2025 10:22:43 +1200 > From: Tao Liu <ltao(a)redhat.com> > Subject: [Crash-utility] Re: [PATCH] Fix incorrect task state during > exit > To: Stephen Brennan <stephen.s.brennan(a)oracle.com> > Cc: devel(a)lists.crash-utility.osci.io > Message-ID: > <CAO7dBbWHa5x+BdiVF3y_QL-JgQhDLgEqKzaQS+YfBrL= > jY6N_w(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Hi Stephen, > > Thanks for your fix and detailed explanation! > > On Sat, May 3, 2025 at 8:19 AM Stephen Brennan > <stephen.s.brennan(a)oracle.com> wrote: > > > > task_state() assumes that exit_state is a unsigned long, when in > > reality, it has been declared as an int since 97dc32cdb1b53 ("reduce > > size of task_struct on 64-bit machines"), in Linux 2.6.22. So on 64-bit > > machines, task_state() reads 8 bytes rather than 4, and gets the wrong > > exit_state value by including the next field. > > > > This has gone unnoticed because directly after exit_state comes > > exit_code, which is generally zero while the task is alive. When the > > exit_code is set, exit_state is usually set not long after. Since > > task_state_string() only checks whether exit_state bits are set, it > > never notices the presence of the exit code inside of the state. > > > > But this leaves open a window during the process exit, when the > > exit_code has been set (in do_exit()), but the exit_state has not (in > > exit_notify()). In this case, crash reports a state of "??", but in > > reality, the task is still running -- it's just running the exit() > > system call. This race window can be long enough to be observed in core > > dumps, for example if the mmput() takes a long time. > > > > This should be considered a bug. A task state of "??" or "(unknown)" is > > frequently of concern when debugging, as it could indicate that the > > state fields had some sort of corruption, and draw the attention of the > > debugger. To handle it properly, record the size of exit_state, and read > > it conditionally as a UINT or ULONG, just like the state. This ensures > > we retain compatibility with kernel before v2.6.22. Whether that is > > actually desirable is anybody's guess. > > > > Reported-by: Jeffery Yoder <jeffery.yoder(a)oracle.com> > > Signed-off-by: Stephen Brennan <stephen.s.brennan(a)oracle.com> > > --- > > defs.h | 1 + > > task.c | 11 +++++++++-- > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/defs.h b/defs.h > > index 4cf169c..58362d0 100644 > > --- a/defs.h > > +++ b/defs.h > > @@ -2435,6 +2435,7 @@ struct size_table { /* stash of > commonly-used sizes */ > > long prb_desc; > > long wait_queue_entry; > > long task_struct_state; > > + long task_struct_exit_state; > > long printk_safe_seq_buf_buffer; > > long sbitmap_word; > > long sbitmap; > > The patch looks good to me, except for the above code. Any newly added > In addition, also need to dump it, please see the "help -o". Otherwise, for the patch: Ack. Thanks Lianbo > members for size/offset_table should go to the end of the struct, > rather than the middle of it. See the 2nd item of > https://github.com/crash-utility/crash/wiki#writing-patches. But this > is a slight error, I can get it corrected when merging. Other than > that, Ack for the patch. > > Thanks, > Tao Liu > > > diff --git a/task.c b/task.c > > index 3bafe79..e07b479 100644 > > --- a/task.c > > +++ b/task.c > > @@ -306,6 +306,7 @@ task_init(void) > > MEMBER_SIZE_INIT(task_struct_state, "task_struct", > "__state"); > > } > > MEMBER_OFFSET_INIT(task_struct_exit_state, "task_struct", > "exit_state"); > > + MEMBER_SIZE_INIT(task_struct_exit_state, "task_struct", > "exit_state"); > > MEMBER_OFFSET_INIT(task_struct_pid, "task_struct", "pid"); > > MEMBER_OFFSET_INIT(task_struct_comm, "task_struct", "comm"); > > MEMBER_OFFSET_INIT(task_struct_next_task, "task_struct", > "next_task"); > > @@ -5965,8 +5966,14 @@ task_state(ulong task) > > state = ULONG(tt->task_struct + > OFFSET(task_struct_state)); > > else > > state = UINT(tt->task_struct + > OFFSET(task_struct_state)); > > - exit_state = VALID_MEMBER(task_struct_exit_state) ? > > - ULONG(tt->task_struct + OFFSET(task_struct_exit_state)) > : 0; > > + > > + if (VALID_MEMBER(task_struct_exit_state) > > + && SIZE(task_struct_exit_state) == sizeof(ulong)) > > + exit_state = ULONG(tt->task_struct + > OFFSET(task_struct_exit_state)); > > + else if (VALID_MEMBER(task_struct_exit_state)) > > + exit_state = UINT(tt->task_struct + > OFFSET(task_struct_exit_state)); > > + else > > + exit_state = 0; > > > > return (state | exit_state); > > } > > -- >

3 weeks, 2 days

1
0
0 / 0

[PATCH] vmware_guestdump: Version 7 support

by Alexey Makhalov

ESXi 9.0 updated debug.guest format. CPU architecture type was introduced and several fields of the header not used by the crash were moved around. It is version 7 now. Make corresponding changes in debug.guest parser and keep it backward compatible with older versions. Fix comment and log messages typos as well. Signed-off-by: Alexey Makhalov <alexey.makhalov(a)broadcom.com> --- vmware_guestdump.c | 48 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 44 insertions(+), 4 deletions(-) diff --git a/vmware_guestdump.c b/vmware_guestdump.c index 78f37fb..1a6ef9b 100644 --- a/vmware_guestdump.c +++ b/vmware_guestdump.c @@ -30,6 +30,7 @@ * 2. Number of Virtual CPUs (4 bytes) } - struct guestdumpheader * 3. Reserved gap * 4. Main Memory information - struct mainmeminfo{,_old} + * 5. Reserved gap #2. Only in v7+ * (use get_vcpus_offset() to get total size of guestdumpheader) * vcpus_offset: ---------\ * 1. struct vcpu_state1 \ @@ -111,6 +112,22 @@ struct vcpu_state2 { uint8_t reserved3[65]; } __attribute__((packed)); +typedef enum { + CPU_ARCH_AARCH64, + CPU_ARCH_X86, +} cpu_arch; + +/* + * Returns the size of reserved gap #2 in the header right after the Main Mem. + */ +static inline long +get_gap2_size(uint32_t version) +{ + if (version == 7) + return 11; + return 0; +} + /* * Returns the size of the guest dump header. */ @@ -128,6 +145,9 @@ get_vcpus_offset(uint32_t version, int mem_holes) return sizeof(struct guestdumpheader) + 14 + sizeof(struct mainmeminfo); case 6: /* ESXi 8.0u2 */ return sizeof(struct guestdumpheader) + 15 + sizeof(struct mainmeminfo); + case 7: /* ESXi 9.0 */ + return sizeof(struct guestdumpheader) + 8 + sizeof(struct mainmeminfo) + + get_gap2_size(version); } return 0; @@ -155,10 +175,10 @@ get_vcpu_gapsize(uint32_t version) * * guestdump (debug.guest) is a simplified version of the *.vmss which does * not contain a full VM state, but minimal guest state, such as a memory - * layout and CPUs state, needed for debugger. is_vmware_guestdump() + * layout and CPUs state, needed for the debugger. is_vmware_guestdump() * and vmware_guestdump_init() functions parse guestdump header and * populate vmss data structure (from vmware_vmss.c). In result, all - * handlers (except mempry_dump) from vmware_vmss.c can be reused. + * handlers (except memory_dump) from vmware_vmss.c can be reused. * * debug.guest does not have a dedicated header magic or file format signature * To probe debug.guest we need to perform series of validations. In addition, @@ -225,7 +245,8 @@ is_vmware_guestdump(char *filename) /* vcpu_offset adjustment for mem_holes is required only for version 1. */ vcpus_offset = get_vcpus_offset(hdr.version, mmi.mem_holes); } else { - if (fseek(fp, vcpus_offset - sizeof(struct mainmeminfo), SEEK_SET) == -1) { + if (fseek(fp, vcpus_offset - sizeof(struct mainmeminfo) - get_gap2_size(hdr.version), + SEEK_SET) == -1) { if (CRASHDEBUG(1)) error(INFO, LOGPRX"Failed to fseek '%s': [Error %d] %s\n", filename, errno, strerror(errno)); @@ -240,6 +261,25 @@ is_vmware_guestdump(char *filename) fclose(fp); return FALSE; } + + /* Check CPU architecture field. Next 4 bytes after the Main Mem */ + if (hdr.version >= 7) { + cpu_arch arch; + if (fread(&arch, sizeof(cpu_arch), 1, fp) != 1) { + if (CRASHDEBUG(1)) + error(INFO, LOGPRX"Failed to read '%s' from file '%s': [Error %d] %s\n", + "CPU arch", filename, errno, strerror(errno)); + fclose(fp); + return FALSE; + } + if (arch != CPU_ARCH_X86) { + if (CRASHDEBUG(1)) + error(INFO, + LOGPRX"Invalid or unsupported CPU architecture: %d\n", arch); + fclose(fp); + return FALSE; + } + } } if (fseek(fp, 0L, SEEK_END) == -1) { if (CRASHDEBUG(1)) @@ -300,7 +340,7 @@ vmware_guestdump_init(char *filename, FILE *ofp) if (!machine_type("X86") && !machine_type("X86_64")) { error(INFO, - LOGPRX"Invalid or unsupported host architecture for .vmss file: %s\n", + LOGPRX"Invalid or unsupported host architecture for .guest file: %s\n", MACHINE_TYPE); result = FALSE; goto exit; -- 2.43.5

3 weeks, 2 days

1
0
0 / 0

[PATCH v3 0/5] gdb multi-stack unwinding support

by Tao Liu

This patchset is based on Alexy's work [1], and is the follow-up of the previous "gdb stack unwinding support for crash utility" patchset. Currently gdb target analyzes only one task at a time and it backtraces only straight stack until end of the stack. If stacks were concatenated during exceptions or interrupts, gdb bt will show only the topmost one. This patchset will introduce multiple stacks support for gdb stack unwinding, which can be observed as a different threads from gdb perspective. A short usage is as follows: 'set <PID>' - to switch to a specific task 'gdb info threads' - to see list of in-kernel stacks of this task. 'gdb thread <ID>' - to switch to the stack. 'gdb bt' - to unwind it. E.g, with the patchset: crash> bt PID: 17636 TASK: ffff88032e0742c0 CPU: 11 COMMAND: "kworker/11:4" #0 [ffff88037fca6b58] machine_kexec at ffffffff8103cef2 #1 [ffff88037fca6ba8] crash_kexec at ffffffff810c9aa3 #2 [ffff88037fca6c70] panic at ffffffff815f0444 ... #9 [ffff88037fca6ec8] do_nmi at ffffffff815fd980 #10 [ffff88037fca6ef0] end_repeat_nmi at ffffffff815fcec1 [exception RIP: memcpy+13] RIP: ffffffff812f5b1d RSP: ffff88034f2a9728 RFLAGS: 00010046 RAX: ffffc900139fe000 RBX: ffff880374b7a1b0 RCX: 0000000000000030 RBP: ffff88034f2a9778 R8: 000000007fffffff R9: 00000000ffffffff ... ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #11 [ffff88034f2a9728] memcpy at ffffffff812f5b1d #12 [ffff88034f2a9728] mga_dirty_update at ffffffffa024ad2b [mgag200] #13 [ffff88034f2a9780] mga_imageblit at ffffffffa024ae3f [mgag200] #14 [ffff88034f2a97a0] bit_putcs at ffffffff813424ef ... crash> info threads Id Target Id Frame * 1 17636 kworker/11:4 (stack 0) crash_setup_regs (oldregs=0x0, newregs=0xffff88037fca6bb0) 2 17636 kworker/11:4 (stack 1) 0xffffffff812f5b1d in memcpy () crash> thread 2 crash> gdb bt #0 0xffffffff812f5b1d in memcpy () at arch/x86/lib/memcpy_64.S:69 ... There are 2 stacks of the current task, and we can list/switch-to/unwind each stack. [1]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01204.html v2 -> v1: 1) Rebase this patchset onto gdb-16.2 [2]. 2) Improved the silent_call_bt() to catch the error FATAL. [2]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01354.html v3 -> v2: 1) Rebase this patchset to crash v9.0.0. 2) Fix v2's segfault in cmd "bt -E". 3) Elimit repeat stacks by adding constraints before gdb_add_substack(). Tao Liu (5): Add multi-threads support in crash target Call cmd_bt silently after "set pid" x86_64: Add gdb multi-stack unwind support arm64: Add gdb multi-stack unwind support ppc64: Add gdb multi-stack unwind support arm64.c | 102 +++++++++++++++++++++++++++++++++-- crash_target.c | 49 +++++++++++++++-- defs.h | 3 +- gdb_interface.c | 6 +-- kernel.c | 43 +++++++++++++++ ppc64.c | 78 ++++++++++++++++++++++++--- task.c | 4 +- x86_64.c | 138 +++++++++++++++++++++++++++++++++++++++++++++--- 8 files changed, 393 insertions(+), 30 deletions(-) -- 2.47.0

3 weeks, 2 days

1
5
0 / 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Crash-utility June 2025