Re: [Crash-utility] [PATCH/RFC] Fix relocation address
by Dave Anderson
----- "Simon Kagstrom" <simon.kagstrom(a)netinsight.net> wrote:
> On Thu, 17 Dec 2009 11:17:56 -0500 (EST)
> Dave Anderson <anderson(a)redhat.com> wrote:
>
> > > > So I started looking into the code and found something which looks like
> > > > a typo in relocate() (patch below). Changing this makes crash work for me.
> > >
> > > Actually it's not a typo -- your patch would presumably break with all kernels
> > > that have a CONFIG_PHYSICAL_START is greater than CONFIG_PHYSICAL_ALIGN, which
> > > is what the patch was written to handle.
> > >
> > > What are your kernel's CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN
> > > values? Does crash work with your kernel on the live system?
>
> You are right. I had problems with getting things working, so I've
> played around with various settings. I had CONFIG_PHYSICAL_START set to
> 0 and CONFIG_PHYSICAL_ALIGN set to 0x100000. Setting these to e.g.,
> 0x100000 and 0x100000 unbreaks things again.
>
> I don't need to supply --reloc either then, not sure what I did wrong
> before. I'm sticking with sane settings from now on.
>
> > > Anyway, I believe that the fix would require support for supplying a
> > > negative --reloc value.
> >
> > On the other hand, if the config values were the other way around, the
> > problem didn't use to show up -- at least according to list item "1)"
> > below in the changelog:
> >
> > 1) Configure the kernel with CONFIG_PHYSICAL_START less than
> > or equal to CONFIG_PHYSICAL_ALIGN. Having done that, there
> > is no problem; the resultant vmlinux file will be loaded at
> > the address for which it was compiled, which has always
> > been the case.
>
> > I wonder if you can use the unpatched crash, but supply a --reloc value that
> > will cause a wrap-around to the correct value?
>
> Well, I suppose that would work if it was possible to supply a negative
> --reloc value, but I'm not sure it's really worth it. What would be
> nice would be to get a more descriptive error message.
Yeah, the problem is that the "do not match" errors can result from
a multitude of error scenarios. Usually by entering a "-d <number>"
on the command line (the higher the debug number the more verbose),
the issue generating the failure typically is evident.
>
> Thanks for the help, please ignore the patch.
OK for now -- and thanks for posting. It's only a matter of time before
somebody else runs into the same thing.
Thanks,
Dave
>
> // Simon
14 years, 11 months
Re: [Crash-utility] [PATCH/RFC] Fix relocation address
by Dave Anderson
----- "Simon Kagstrom" <simon.kagstrom(a)netinsight.net> wrote:
> Hi!
>
> I'm having problems getting kdumps from my relocatable kernel (2.6.31-8)
> working with crash on a IA-32 board. I use makedumpfile to generate a
> compressed dump, and when I try to load it with crash I get
>
> ./crash vmlinux vmcore --reloc=0x100000
> crash: invalid kernel virtual address: 98 type: "present"
> WARNING: cannot read cpu_present_map
> crash: invalid kernel virtual address: 908bd975 type: "online"
> WARNING: cannot read cpu_online_map
> crash: cannot determine base kernel version
> crash: vmlinux and vmcore do not match!
>
> specifying --reloc also fails:
>
> ./crash vmlinux vmcore --reloc=0x100000
> crash: seek error: kernel virtual address: c01a2108 type:
> "cpu_possible_mask"
>
>
> So I started looking into the code and found something which looks like
> a typo in relocate() (patch below). Changing this makes crash work for me.
Actually it's not a typo -- your patch would presumably break with all kernels
that have a CONFIG_PHYSICAL_START is greater than CONFIG_PHYSICAL_ALIGN, which
is what the patch was written to handle.
What are your kernel's CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN
values? Does crash work with your kernel on the live system?
Anyway, I believe that the fix would require support for supplying a
negative --reloc value.
>
> Great tool by the way, leaves you longing for the next kernel panic
> ;-)
>
> // Simon
>
> --- orig-crash-4.1.2/symbols.c 2009-12-09 21:37:40.000000000 +0100
> +++ crash-4.1.2/symbols.c 2009-12-17 16:03:24.000000000 +0100
> @@ -671,7 +671,7 @@ relocate(ulong symval, char *symname, in
> break;
> }
>
> - return (symval - kt->relocate);
> + return (symval + kt->relocate);
> }
>
> /*
14 years, 11 months
[PATCH/RFC] Fix relocation address
by Simon Kagstrom
Hi!
I'm having problems getting kdumps from my relocatable kernel (2.6.31-8)
working with crash on a IA-32 board. I use makedumpfile to generate a
compressed dump, and when I try to load it with crash I get
./crash vmlinux vmcore --reloc=0x100000
crash: invalid kernel virtual address: 98 type: "present"
WARNING: cannot read cpu_present_map
crash: invalid kernel virtual address: 908bd975 type: "online"
WARNING: cannot read cpu_online_map
crash: cannot determine base kernel version
crash: vmlinux and vmcore do not match!
specifying --reloc also fails:
./crash vmlinux vmcore --reloc=0x100000
crash: seek error: kernel virtual address: c01a2108 type: "cpu_possible_mask"
So I started looking into the code and found something which looks like
a typo in relocate() (patch below). Changing this makes crash work for
me.
Great tool by the way, leaves you longing for the next kernel panic ;-)
// Simon
--- orig-crash-4.1.2/symbols.c 2009-12-09 21:37:40.000000000 +0100
+++ crash-4.1.2/symbols.c 2009-12-17 16:03:24.000000000 +0100
@@ -671,7 +671,7 @@ relocate(ulong symval, char *symname, in
break;
}
- return (symval - kt->relocate);
+ return (symval + kt->relocate);
}
/*
14 years, 11 months
Re: Request for ppc64 help from IBM
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> Somewhere between the RHEL5 (2.6.18-based) and RHEL6 timeframe,
> the ppc64 architecture has started using a virtual memmap scheme
> for the arrays of page structures used to describe/handle
> each physical page of memory.
... [ snip ] ...
> So my speculation (guess?) is that the ppc64.c ppc64_vtop()
> function needs updating to properly translate these addresses.
>
> Since the ppc64 stuff in the crash utility was written by, and
> has been maintained by IBM (and since I am ppc64-challenged),
> can you guys take a look at what needs to be done?
[ sound of crickets... ]
Well that request apparently fell on deaf ears...
Here's my understanding of the situation.
In 2.6.26 the ppc64 architecture started using a new kernel virtual
memory region to map the kernel's page structure array(s), so that
now there are three kernel virtual memory regions:
KERNEL 0xc000000000000000
VMALLOC 0xd000000000000000
VMEMMAP 0xf000000000000000
The KERNEL region is the unity-mapped region, where the underlying
physical address can be determined by manipulating the virtual address
itself.
The VMALLOC region requires a page-table walk-through to find
the underlying physical address in a PTE.
The new VMEMMAP region is mapped in ppc64 firmware, where a
physical address of a given size is mapped to a VMEMMAP virtual
address. So for example, the page structure for physical page 0
is at VMEMMAP address 0xf000000000000000, the page for physical
page 1 is at f000000000000068, and so on. Once mapped in the
firmware TLB (?) the virtual-to-physical translation is done
automatically while running in kernel mode.
The problem is that the physical-to-vmemmap address/size mapping
information is not stored in the kernel proper, so there is
no way for the crash utility to make the translation. That
being the case, any crash command that needs to read the contents
of any page structure will fail.
The kernel mapping is performed here in 2.6.26 through 2.6.31:
int __meminit vmemmap_populate(struct page *start_page,
unsigned long nr_pages, int node)
{
unsigned long start = (unsigned long)start_page;
unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
/* Align to the page size of the linear mapping. */
start = _ALIGN_DOWN(start, page_size);
for (; start < end; start += page_size) {
int mapped;
void *p;
if (vmemmap_populated(start, page_size))
continue;
p = vmemmap_alloc_block(page_size, node);
if (!p)
return -ENOMEM;
pr_debug("vmemmap %08lx allocated at %p, physical %08lx.\n",
start, p, __pa(p));
mapped = htab_bolt_mapping(start, start + page_size, __pa(p),
pgprot_val(PAGE_KERNEL),
mmu_vmemmap_psize, mmu_kernel_ssize);
BUG_ON(mapped < 0);
}
return 0;
}
So if the pr_debug() statement is turned on, it shows on my test system:
vmemmap f000000000000000 allocated at c000000003000000, physical 03000000
This would make for an extremely simple virtual-to-physical translation
for the crash utility, but note that neither the unity-mapped virtual address
of 0xc000000003000000 nor its associated physical address of 0x3000000 are
stored anywhere, since "p" is a stack variable. The htab_bolt_mapping()
function does not store the mapping information in the kernel either, it
just uses temporary stack variables before calling the ppc_md.hpte_insert()
function which eventually leads to a machine-dependent (directly to firmware)
function.
So unless I'm missing something, nowhere along the vmemmap call-chain are the
VTOP address/size particulars stored anywhere -- say for example, in a
/proc/iomem-like "resource" data structure.
(FWIW, I note that in 2.6.32, CONFIG_PPC_BOOK3E arches still use the normal page
tables to map the memmap array(s). I don't know whether BOOK3E arch is the
most common or not...)
In any case, not being able to read the page structure contents has a
significant effect on the crash utility. This is about the only thing
that can be done for these kernels, where a warning gets printed during
initialization, and any command that attempts to read a page structure
will subsequently fail:
# crash vmlinux vmcore
crash 4.1.2p1
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
WARNING: cannot translate vmemmap kernel virtual addresses:
commands requiring page structure contents will fail
KERNEL: vmlinux
DUMPFILE: vmcore
CPUS: 2
DATE: Thu Dec 10 05:40:35 2009
UPTIME: 21:44:59
LOAD AVERAGE: 0.11, 0.03, 0.01
TASKS: 196
NODENAME: ibm-js20-04.lab.bos.redhat.com
RELEASE: 2.6.31-38.el6.ppc64
VERSION: #1 SMP Sun Nov 22 08:15:30 EST 2009
MACHINE: ppc64 (unknown Mhz)
MEMORY: 2 GB
PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
PID: 10656
COMMAND: "runtest.sh"
TASK: c000000072156420 [THREAD_INFO: c000000072058000]
CPU: 0
STATE: TASK_RUNNING (PANIC)
crash> kmem -i
kmem: cannot translate vmemmap address: f000000000000000
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
kmem: cannot translate vmemmap address: f000000000000000
crash> kmem -s
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
kmem: cannot translate vmemmap address: f00000000030db44
crash>
Can any of the IBM engineers on this list (or any ppc64 user)
confirm my findings? Maybe I'm missing something, but I don't
see it.
And if you agree, perhaps you can work on an upstream solution to
store the vmemmap-to-physical data information?
Dave
14 years, 11 months
calling crash from another program (or vice versa)
by James Washer
Often, I'd like to be able to run one crash command, massage the data
produced, and run follow up commands using the massaged data
A (possibly crazy) example, run the mount command, collect the
superblocks addresses, for each super_block, get the s_inodes list head,
traverse each list head to the inode, for each inode, find it's i_data
(address_space) and get the number of pages.. Now.. sum these up and
print a table of filesystem mounts points and the number of cached pages
for each... Perhaps, I'd even traverse the struct pages to provide a
count of clean and dirty pages for each file system.
I do do this by hand. (i.e. mount > mount.file; perlscript mount.file >
crash-script-step-1, then, back in crash I do ". crash-script-step-1 >
data-file-2; and repeat with more massaging).. This is gross, prone to
error, and not terribly fast.
I'd love to start crash as a child of perl and either use expect (which
is a bit of a hack) or better yet, have some machine interface to crash
(ala gdbmi)...
I know.. it's open source, I should write it myself. I just don't want
to reinvent the wheel, if someone else already has done something like
this.
Perhaps I need to learn sial. But what little sial I've looked at seems
a bit low level for my needs.
Has anyone had much luck using expect with crash?
thanks
- jim
14 years, 11 months
Request for ppc64 help from IBM
by Dave Anderson
Somewhere between the RHEL5 (2.6.18-based) and RHEL6 timeframe,
the ppc64 architecture has started using a virtual memmap scheme
for the arrays of page structures used to describe/handle
each physical page of memory.
In RHEL5, the page structures in the memmap array were unity-mapped
(i.e., the physical address is or'd with c000000000000000), as
"kmem -n" shows below in the sparsemem data breakdown under MEM_MAP:
crash> kmem -n
... [ snip ] ...
NR SECTION CODED_MEM_MAP MEM_MAP PFN
0 c000000000750000 c000000000760000 c000000000760000 0
1 c000000000750008 c000000000760000 c000000000763800 256
2 c000000000750010 c000000000760000 c000000000767000 512
3 c000000000750018 c000000000760000 c00000000076a800 768
4 c000000000750020 c000000000760000 c00000000076e000 1024
5 c000000000750028 c000000000760000 c000000000771800 1280
6 c000000000750030 c000000000760000 c000000000775000 1536
7 c000000000750038 c000000000760000 c000000000778800 1792
8 c000000000750040 c000000000760000 c00000000077c000 2048
9 c000000000750048 c000000000760000 c00000000077f800 2304
10 c000000000750050 c000000000760000 c000000000783000 2560
11 c000000000750058 c000000000760000 c000000000786800 2816
12 c000000000750060 c000000000760000 c00000000078a000 3072
...
also shown via the memmap page structure listing displayed by
"kmem -p":
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
c000000000760000 0 0 0 1 400
c000000000760038 10000 0 0 1 400
c000000000760070 20000 0 0 1 400
c0000000007600a8 30000 0 0 1 400
c0000000007600e0 40000 0 0 1 400
c000000000760118 50000 0 0 1 400
c000000000760150 60000 0 0 1 400
c000000000760188 70000 0 0 1 400
c0000000007601c0 80000 0 0 1 400
c0000000007601f8 90000 0 0 1 400
...
In RHEL6 (2.6.31-38.el6) the memmap page array is apparently
virtually memmap'd -- using a virtual range of memory starting
at a heretofore-unseen virtual address range starting at
f000000000000000:
crash> kmem -n
... [ snip ] ...
NR SECTION CODED_MEM_MAP MEM_MAP PFN
0 c000000002160000 f000000000000000 f000000000000000 0
1 c000000002160020 f000000000000000 f000000000006800 256
2 c000000002160040 f000000000000000 f00000000000d000 512
3 c000000002160060 f000000000000000 f000000000013800 768
4 c000000002160080 f000000000000000 f00000000001a000 1024
5 c0000000021600a0 f000000000000000 f000000000020800 1280
6 c0000000021600c0 f000000000000000 f000000000027000 1536
7 c0000000021600e0 f000000000000000 f00000000002d800 1792
8 c000000002160100 f000000000000000 f000000000034000 2048
9 c000000002160120 f000000000000000 f00000000003a800 2304
10 c000000002160140 f000000000000000 f000000000041000 2560
... [ snip ] ...
crash> kmem -p
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
f000000000000000 0 0 0 0 0
f000000000000068 10000 0 0 0 0
f0000000000000d0 20000 0 0 0 0
f000000000000138 30000 0 0 0 0
f0000000000001a0 40000 0 0 0 0
f000000000000208 50000 0 -4611686016392006416 0 0
f000000000000270 60000 0 0 0 0
f0000000000002d8 70000 0 0 0 0
f000000000000340 80000 0 0 0 0
f0000000000003a8 90000 0 -4611686016730798344 0 0
f000000000000410 a0000 0 0 0 0
f000000000000478 b0000 0 0 0 0
f0000000000004e0 c0000 0 0 0 c0000000651534e0
f000000000000548 d0000 0 0 0 0
...
But as can be seen in the "kmem -p" output, and when using other
commands that actually read the data in the page structure, the
data read is either bogus or the readmem() of the address just fails
the virtual address translation and indicates that the page is not mapped.
Because the page structures' virtual address is not unity-mapped,
the page address gets translated via page table walk-through in the
same manner as vmalloc()'d addresses. In the ppc64 architecture,
the vmalloc range starts at d000000000000000:
crash> mach
...
KERNEL VIRTUAL BASE: c000000000000000
KERNEL VMALLOC BASE: d000000000000000
...
Since the ppc64 virtual-to-physical address translations of
these f000000000000000-based addresses returns either a
bogus physical address or fails entirely, this in turn causes
bizarre errors in crash commands that actually read the contents
of page structures -- such as "kmem -s", where slub data is
stored in the page structure.
So my speculation (guess?) is that the ppc64.c ppc64_vtop()
function needs updating to properly translate these addresses.
Since the ppc64 stuff in the crash utility was written by, and
has been maintained by IBM (and since I am ppc64-challenged),
can you guys take a look at what needs to be done?
Thanks,
Dave
14 years, 11 months
[ANNOUNCE] crash version 4.1.2 is available
by Dave Anderson
- Fix for 2.6.31 or later x86_64 CONFIG_NEED_MULTIPLE_NODES kernels
running on systems that have multiple NUMA nodes. By default, those
kernels use the "page" (or "lpage") percpu memory allocators, which
utilize vmalloc space for percpu memory. Without the patch, the
crash session would fail during initialization with the error message
"crash: cannot determine idle task addresses from init_tasks[] or
runqueues[]", followed by "crash: cannot resolve init_task_union".
(anderson(a)redhat.com)
- Fix for the snap.c extension module to properly handle NUMA systems
with multiple nodes, or single node systems whose first unity-mapped
PT_LOAD segment starts on a non-zero physical address. Without the
patch, a crash session on the resultant vmcore would fail with the
error message: "crash: vmlinux and <filename> do not match!"
(anderson(a)redhat.com)
- Added a defensive mechanism to handle corrupt Elf32_Phdr/Elf64_Phdr
structures in an ELF vmcore. Without the patch, a hand-carved bogus
p_offset field in a Elf32_Phdr/Elf64_Phdr structure could possibly
cause a segmentation violation during inialization. With the fix,
if an invalid Elf32_Phdr or Elf64_Phdr p_offset field is encountered,
a warning message will be displayed, and the crash session will bail
out gracefully, or continue on if possible.
(anderson(a)redhat.com)
- Added a defensive mechanism to handle corrupt Elf32_Ehdr/Elf64_Ehdr
structures in an ELF vmcore. Without the patch, a hand-carved bogus
e_phnum field in a Elf32_Phdr/Elf64_Phdr structure could possibly
cause a segmentation violation during inialization. With the fix,
if an invalid Elf32_Ehdr or Elf64_Ehdr e_phnum field is encountered,
a warning message will be displayed and the crash session will bail
out gracefully.
(anderson(a)redhat.com)
- More non-functional changes for future integration of gdb-7.0 and
for addressing Fedora packaging guidelines.
(anderson(a)redhat.com)
- Fix for the x86 "bt [-t|-T]" commands when the backtrace passes
through three stacks, which can happen when an interrupt is taken
while operating on a per-cpu soft IRQ stack, and the crash occurs
while operating on the per-cpu hard IRQ stack. Without the patch,
the "bt" command terminates after displaying backtrace on the hard
IRQ stack; "bt -t" displays the stack contents of the hard IRQ stack
but stops with the error message "bt: non-process stack address for
this task: <task-address>"; "bt -T" displays the the same error
message as "bt -t", but displays the stack contents of the process
stack. With the fix, all three "bt" invocations will display the
backtraces or kernel text addresses on all three stacks, correctly
transitioning from the hard IRQ stack to the soft IRQ stack to the
process stack.
(anderson(a)redhat.com)
- When handcrafting the backtrace starting points for the "bt" command
by using the -S options, and the starting stack address is not in
the task's process stack, a message gets displayed that indicates
"non-process stack address for this task". However, if the starting
stack address is a legitimate non-process stack address, such as a
hard or soft IRQ stack address, or an x86_64 exception stack address,
the message is confusing, and has been removed.
(anderson(a)redhat.com)
Download from: http://people.redhat.com/anderson
14 years, 11 months
Re: [Crash-utility] fuzzing crash(8)
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
> ----- "Adrien Kunysz" <adk(a)redhat.com> wrote:
>
> > Adrien Kunysz wrote:
> > > Actually that patch fixes all the crashes I found with my previous round
> > > of black box fuzzing on x86_64 (using zzuf if anyone is interested). I
> > > am currently playing with bunny
> > > (http://code.google.com/p/bunny-the-fuzzer/) but I am a bit doubtful it
> > > will find anything useful in any decent amount of time without some
> > > manual work, oh well CPU time is cheap :)
> >
> > I wasn't expecting Bunny to find anything for a few days but it only took
> > about three hours :)
> >
> > If we take the same x86_64 vmcore again:
> >
> > 00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
> > 00000010 04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
> > 00000020 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............|
> > 00000030 00 00 00 00 40 00 38 00 03 80 00 00 00 00 00 00 |....@.8.........|
> >
> > and mess a bit with byte 0x39:
> >
> > 00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
> > 00000010 04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
> > 00000020 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |@...............|
> > 00000030 00 00 00 00 40 00 38 00 03 00 00 00 00 00 00 00 |....@.8.........|
You've got the two dumps above backwards, but as it turns out, a manual corruption
of the ELF header's e_phnum field should be pretty easy to handle -- try the attached
patch.
Thanks,
Dave
14 years, 11 months
Re: [Crash-utility] fuzzing crash(8)
by Dave Anderson
----- "Dave Anderson" <anderson(a)redhat.com> wrote:
I did the same thing to a vmcore (i.e. handcrafting the PT_NOTE
segment's p_offset field like you did), and was able to get the
crash session up with the attached patch.
Does it work for you?
Dave
14 years, 11 months