Hi Dave,
Thanks for your review. It's a great honor to give me some valuable suggestions about
my patch.
I have different points of view about the definition of max logbuf length.
In upstream kernel, the max logbuf length is still determined by arch-specific
CONFIG_LOG_BUF_SHIFT definition.
[kernel/printk/printk.c]
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
LOG_BUF_LEN_MAX comes from commit e6fe3e5b7d16e8f146a4ae7fe481bc6e97acde1e, which give
error on
attempt to set log buffer length to over 2G.
For the question where I got the MAX_BUFSIZE of 2MB?
I'm working on QCOM's ARM64 arch. In QCOM's kernel-4.14 code, the max logbuf
length is set to 2MB.
[arch/arm64/configs/xxx_config]
CONFIG_LOG_BUF_SHIFT=21
Above your suggestions, I have correct the logic error and made some significant changes
for my patch.
The new patch file has been upload to attachment.
Thanks for your review. I’m looking forward to your favourable reply!
Best regards,
Qiwu
-----Original Message-----
From: Dave Anderson <anderson(a)redhat.com>
Sent: Thursday, October 17, 2019 4:33 AM
To: 陈启武 <chenqiwu(a)xiaomi.com>
Subject: [External Mail]Re: [PATCH] optimize the way to find the panic task.
Hi Qiwu,
I tested your patch against several ARM64 dumpfiles that I have on hand, and a couple of
them that were created with "virsh dump" generated a segmentation violations
like this:
...
please wait... (determining panic task)
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6bba0b0 in __strstr_sse42 () from /lib64/libc.so.6 Missing separate debuginfos,
use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64
libstdc++-4.8.5-36.el7_6.2.x86_64 lzo-2.06-8.el7.x86_64
ncurses-libs-5.9-14.20130511.el7_4.x86_64 snappy-1.1.0-3.el7.x86_64
zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007ffff6bba0b0 in __strstr_sse42 () from /lib64/libc.so.6
#1 0x00000000004b6389 in get_log_panic_task () at task.c:7485
#2 0x00000000004ccefe in panic_search () at task.c:7361
#3 get_panic_context () at task.c:6205
#4 task_init () at task.c:642
#5 0x0000000000460be5 in main_loop () at main.c:774
#6 0x0000000000659383 in captured_command_loop (data=data@entry=0x0) at main.c:258
#7 0x00000000006580aa in catch_errors (func=func@entry=0x659370
<captured_command_loop>, func_args=func_args@entry=0x0,
errstring=errstring@entry=0x890f87 "", mask=mask@entry=6) at
exceptions.c:557
#8 0x000000000065a316 in captured_main (data=data@entry=0x7fffffffdb30) at main.c:1064
#9 0x00000000006580aa in catch_errors (func=func@entry=0x659650 <captured_main>,
func_args=func_args@entry=0x7fffffffdb30,
errstring=errstring@entry=0x890f87 "", mask=mask@entry=6) at
exceptions.c:557
#10 0x000000000065a677 in gdb_main (args=0x7fffffffdb30) at main.c:1079
#11 gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7fffffffdc98) at
main.c:1099
#12 0x00000000004eeab4 in gdb_main_loop (argc=<optimized out>, argc@entry=3,
argv=argv@entry=0x7fffffffdc98) at gdb_interface.c:76
#13 0x000000000045f03a in main (argc=3, argv=0x7fffffffdc98) at main.c:707
(gdb)
The SIGSEGV is the strstr() call in get_log_panic_task(), where the "buf"
pointer must be OK, but for some reason "i" is not available in the gdb session.
So I added the following debug line:
7479 BZERO(buf, MAX_BUFSIZE);
7480 open_tmpfile();
7481 dump_log(SHOW_LOG_TEXT);
7482 rewind(pc->tmpfile);
7483 if (fread(buf, 1, MAX_BUFSIZE, pc->tmpfile)) {
7484 while (panic_keywords[i++]) {
7485 fprintf(stderr, "[%d][%s]\n", i, panic_keywords[i]);
7486 if ((p1 = strstr(buf, panic_keywords[i]))) {
7487 if ((p1 = strstr(p1, "CPU: "))) {
7488 p1 += strlen("CPU: ");
7489 p2 = p1;
7490
and as expected, it runs off the end of the panic_keywords[] array:
...
please wait... (determining panic task)[1][BUG: unable to handle kernel]
[2][Kernel BUG at]
[3][kernel BUG at]
[4][Bad mode in]
[5][Oops]
[6][Kernel panic]
[7][(null)]
Segmentation fault (core dumped)
$
But anyway, aside from the logic error above, a couple other comments:
(1) I do not want to change the order in which the panic task
search is made -- it should still try "foreach bt" first,
and only if that fails, search the log.
(2) The upstream kernel has a LOG_BUF_LEN_MAX that is 2GB,
so I'm not sure where you got the MAX_BUFSIZE of 2MB?
(3) But regardless of the log buffer size, I don't like the idea
of reading the whole log into a buffer. It's already captured
into a temporary file that can be searched, so why bother copying
it into another buffer?
I would suggest using "while (fgets(buf, BUFSIZE, pc->tmpfile))"
instead. BUFSIZE should be large enough to contain any line in the log buffer, or
certainly any line that contains one of the panic_keywords[] strings.
Also, can you please post any patches to the crash-utility mailing list instead of
emailing me directly?
Thanks,
Dave
----- Original Message -----
Hi Dave,
I‘m working on arm64 kdump by crash-7.2.7, there is a warning msg "
panic task not found " gernarated as below:
please wait... (determining panic task)
KERNEL: vmlinux
DUMPFILES: /var/tmp/ramdump_elf_8mA3xU [temporary ELF header]
DDRCS0_0.BIN
DDRCS1_0.BIN
DDRCS1_1.BIN
CPUS: 8
DATE: Sat Feb 6 10:11:39 1971
UPTIME: 00:00:07
LOAD AVERAGE: 0.64, 0.13, 0.04
TASKS: 624
NODENAME: localhost
RELEASE: 4.4.184-perf-gdaa9cd595d7e-dirty
VERSION: #1 SMP PREEMPT Thu Aug 22 14:41:16 CST 2019
MACHINE: aarch64 (unknown Mhz)
MEMORY: 5.7 GB
PANIC: "Unable to handle kernel paging request at virtual address
ffffffd532a1b2f8"
PID: 0
COMMAND: "swapper/0"
TASK: ffffff803ec15390 (1 of 8) [THREAD_INFO: ffffff803ec15390]
CPU: 0
STATE: TASK_RUNNING
WARNING: panic task not found
The panic task cannot be found by the following backtrace, result in
the error running task info in the overview showing :
[ 7.630611] Process swapper/4 (pid: 0, stack limit = 0xffffffd536704000)
[ 7.630614] Call trace:
[ 7.630661] [<ffffffd532a1b2f8>] 0xffffffd532a1b2f8
[ 7.630666] [<ffffff803c71d92c>] run_timer_softirq+0x508/0x554
[ 7.630671] [<ffffff803c6835e4>] __do_softirq+0x1fc/0x3e4
[ 7.630676] [<ffffff803c6aa0f0>] irq_exit+0x88/0xd0
[ 7.630681] [<ffffff803c70b330>] __handle_domain_irq+0x8c/0xac
[ 7.630685] [<ffffff803c681154>] gic_handle_irq+0xc8/0x190
So I introduce this patch to optimize the way for finding the panic task.
We can find the panic task by searching arch-specific panic keywords
from kernel log.
I define some arch-specific panic keywords in a const array by
printing order of panic:
const char* panic_keywords[] = {
"Unable to handle kernel",
"BUG: unable to handle kernel",
"Kernel BUG at",
"kernel BUG at",
"Bad mode in",
"Oops",
"Kernel panic"
};
We can search these panic keywords orderly from kernel log.
Generally, these panic keywords follow by printing out the stack trace
info of panic. Arch-specific dump_stack() implementations can use
dump_stack_print_info() function to print out the same generic debug
info. So we can determine the panic task by finding the first
keyword("CPU: ") behind the panic keyword have found.
The patch file has been upload to attachment.
Thanks for your review. I’m looking forward to your favourable reply!
Best regards,
Qiwu
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部
或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from
XIAOMI, which is intended only for the person or entity whose address
is listed above. Any use of the information contained herein in any
way (including, but not limited to, total or partial disclosure,
reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error,
please notify the sender by phone or email immediately and delete
it!******/#
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from XIAOMI, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!******/#