Badari Pulavarty wrote:
On Thu, 2005-10-27 at 14:36 -0400, Dave Anderson wrote:
>
>
> #ifdef X86_64
> #define _64BIT_
> #define MACHINE_TYPE "X86_64"
>
> #define USERSPACE_TOP 0x0000008000000000
> #define __START_KERNEL_map 0xffffffff80000000
> #define PAGE_OFFSET 0x0000010000000000
>
> #define VMALLOC_START 0xffffff0000000000
> #define VMALLOC_END 0xffffff7fffffffff
> #define MODULES_VADDR 0xffffffffa0000000
> #define MODULES_END 0xffffffffafffffff
> #define MODULES_LEN (MODULES_END - MODULES_VADDR)
>
> So I believe the place to start would be to make these
> values into x86_64-specific variables that get initialized
> early on based upon the symbol values gathered during
> symtab_init(), which is called by main(). After it
> completes, machdep_init(PRE_GDB) is called, i.e. x86_64_init():
>
> /*
> * Initialize various subsystems.
> */
> fd_init();
> buf_init();
> cmdline_init();
> mem_init();
> machdep_init(PRE_SYMTAB);
> symtab_init();
> machdep_init(PRE_GDB);
> kernel_init(PRE_GDB);
> verify_version();
> datatype_init();
>
> In x86_64_init(PRE_GDB), the former hardwired #defines would need
> to be variables, initialized properly based upon clues in the symbol
> list.
>
> Interested in taking a look into this?
>
> Dave
Well, I took a stab at it. Here are the changes I made to "defs.h"
looking at Documentation/x86_64/mm.txt. We need to some how put
this under "#if THIS_KERNEL_VERSION > 2.6.10".
First off -- thanks very much for all you've done so far. I
really appreciate the effort.
Anyway, what I meant was that -- for x86_64 specifically -- things
like USERSPACE_TOP, PAGE_OFFSET, VMALLOC_START, etc. should no longer
be hardwired #defines, but instead, they should be references to
x86_64 data variables define in x86_64.c. So, for example,
USERSPACE_TOP would be defined something like:
#define USERSPACE_TOP (x86_64_userspace_top)
and there would be one x86_64_xxx variable per virtual address
item. And each of the variables would be initialized in
machdep_init(PRE_GDB), which is called just after symtab_init().
The fact that symtab_init() has been done is important because
the variables behind "THIS_KERNEL_VERSION" haven't even been
initialized yet. So instead, I would look at the symbol_value()
of "_stext", or some known kernel text symbol, and based upon its
value, it would be obvious whether to use the "old" or "new"
virtual address values to then set up the each of the x86_64_xxxx
virtual address values.
But for testing the new addresses, what you've done below should
suffice.
---
defs.h.org 2005-10-28 13:43:11.000000000 -0700
+++ defs.h 2005-10-28 13:53:58.000000000 -0700
@@ -1740,14 +1740,14 @@ struct load_module {
#define _64BIT_
#define MACHINE_TYPE "X86_64"
-#define USERSPACE_TOP 0x0000008000000000
+#define USERSPACE_TOP 0x0000800000000000
#define __START_KERNEL_map 0xffffffff80000000
-#define PAGE_OFFSET 0x0000010000000000
+#define PAGE_OFFSET 0xffff810000000000
-#define VMALLOC_START 0xffffff0000000000
-#define VMALLOC_END 0xffffff7fffffffff
-#define MODULES_VADDR 0xffffffffa0000000
-#define MODULES_END 0xffffffffafffffff
+#define VMALLOC_START 0xffffc20000000000
+#define VMALLOC_END 0xffffe1ffffffffff
+#define MODULES_VADDR 0xffffffff88000000
+#define MODULES_END 0xfffffffffff00000
#define MODULES_LEN (MODULES_END - MODULES_VADDR)
#define PTOV(X) ((unsigned long)(X)+(machdep->kvbase))
Even with these changes, I am not sure if crash is running
fine. Its seem doesn't show any useful stacks + there is a
warning on start (about exception stacks).
I'm wondering whether the per-cpu calculations are being
done correctly? The exception stack addresses come from the
same per-cpu tss_struct code that started this whole mess,
and if the per-cpu address calculations needed to find those data
structures were incorrect, it would lead to exception stack
error message that you're seeing. This is the old code, but if
the readmem() of 7 ebase addresses below came from the
wrong place, the error message you're seeing would result:
} else if (symbol_exists("per_cpu__init_tss")) {
for (c = 0; c < NR_CPUS; c++) {
if ((kt->flags & SMP) && (kt->flags &
PER_CPU_OFF)) {
if (kt->__per_cpu_offset[c] == 0)
break;
vaddr = symbol_value("per_cpu__init_tss") +
kt->__per_cpu_offset[c];
} else
vaddr = symbol_value("per_cpu__init_tss");
vaddr += OFFSET(tss_struct_ist);
readmem(vaddr, KVADDR, &ms->stkinfo.ebase[c][0],
sizeof(ulong) * 7, "tss_struct ist array",
FAULT_ON_ERROR);
if (ms->stkinfo.ebase[c][0] == 0)
break;
}
}
The error message only error checks the contents of cpu 0's array
of exception stack addresses, the first of which should be a
pointer to the "boot_exception_stacks" array in the kernel.
[root@localhost crash-4.0-2.8]# ./crash
crash 4.0-2.8
Copyright (C) 2002, 2003, 2004, 2005 Red Hat, Inc.
Copyright (C) 2004, 2005 IBM Corporation
Copyright (C) 1999-2005 Hewlett-Packard Co
Copyright (C) 1999, 2002 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public
License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: cpu 0 first exception stack: cccccccccccccccc
boot_exception_stacks: ffffffff8052ce80
KERNEL: /usr/src/linux-2.6.14-rc5-madv/vmlinux
DUMPFILE: /dev/mem
CPUS: 2
DATE: Fri Oct 28 13:58:50 2005
UPTIME: 06:32:12
LOAD AVERAGE: 0.11, 0.10, 0.06
TASKS: 66
NODENAME: localhost.localdomain
RELEASE: 2.6.14-rc5
VERSION: #10 SMP Wed Oct 26 15:58:51 PDT 2005
MACHINE: x86_64 (3000 Mhz)
MEMORY: 4.6 GB
PID: 1460
COMMAND: "crash"
TASK: ffff810122c9f0c0 [THREAD_INFO: ffff810113442000]
CPU: 0
STATE: TASK_RUNNING (ACTIVE)
crash>
crash> bt 13939
PID: 13939 TASK: ffff810119123740 CPU: 0 COMMAND: "vi"
#0 [ffff810114535c78] schedule at ffffffff803b12b3
RIP: 000000377c7beb95 RSP: 00007ffffff402d8 RFLAGS: 00010246
RAX: 0000000000000017 RBX: ffffffff8010dc26 RCX: 00007ffffff40388
RDX: 0000000000000000 RSI: 00007ffffff400a0 RDI: 0000000000000001
RBP: 0000000000000000 R8: 0000000000000000 R9: 00007ffffff40020
R10: 00007ffffff40020 R11: 0000000000000246 R12: 000000000058b0e0
R13: 000000000058b0e0 R14: 0000000000000058 R15: 0000000000000001
ORIG_RAX: 0000000000000017 CS: 0033 SS: 002b
It shows only "schedule" for all processes. Doesn't seem to show
any more stack traces.
I don't really have any suggestions here, other than to determine
why the x86_64_low_budget_back_trace_cmd() section that walks the
process stack is only finding/printing the schedule() line.
Does "bt -t" work?
I note that this one doesn't show the "cannot access vmalloc space"
message. Can you read vmalloc and user space addresses? Does "mod"
work? How about "runq", which is one of the places that depends
upon being able to read per-cpu data?
Thanks (and off for the weekend...),
Dave
Thanks,
Badari
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility