 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        fuzzing crash(8)
                                
                                
                                
                                    
                                        by Adrien Kunysz
                                    
                                
                                
                                        Earlier today I was pointed to a truncated vmcore that made crash(8) crash and this prompted me to do some fuzzing. 
Before going further I would like to know if there is interest to fix this kind of bugs and if I should report them to 
Bugzilla. After all, most of these crashes are unlikely to happen in real life as long as the vmcores have not been 
purposefully tempered with.
The most common crash by far in my tests is this one:
Consider a x86_64 vmcore file taken with the snap plugin:
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  04 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|
00000020  40 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  03 00 00 00 00 00 00 00  |....@.8.........|
00000040  04 00 00 00 00 00 00 00  e8 00 00 00 00 00 00 00  |................|
If we change byte 0x4e:
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  04 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|
00000020  40 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  03 00 00 00 00 00 00 00  |....@.8.........|
00000040  04 00 00 00 00 00 00 00  e8 00 00 00 00 00 80 00  |................|
This makes crash(8) segfault:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f1bf4 in dump_Elf64_Nhdr (offset=36028797018964200, store=1) at netdump.c:1807
1807            notesize = (uint64_t)note->n_namesz + (uint64_t)note->n_descsz;
(gdb) bt full
#0  0x00000000004f1bf4 in dump_Elf64_Nhdr (offset=36028797018964200, store=1) at netdump.c:1807
         i = 0
         lf = 0
         words = 0
         note = (Elf64_Nhdr *) 0x800000159520c8
         len = 140737175810672
         buf = '\0' <repeats 1499 times>
         ptr = 0x800000159520d4 <Address 0x800000159520d4 out of bounds>
         uptr = (ulonglong *) 0x100000000
         iptr = (int *) 0x0
         up = (ulong *) 0x6f0617
         xen_core = 0
         vmcoreinfo = 0
         remaining = 0
         notesize = 362094736
#1  0x00000000004ed99a in is_netdump (file=0x7fffed5f1bee "vmcore-sample-small.x86_64",
     source_query=128) at netdump.c:335
         i = 2
         fd = 6
         swap = 0
         elf32 = (Elf32_Ehdr *) 0x7fffed5ef8b0
         load32 = (Elf32_Phdr *) 0x0
         elf64 = (Elf64_Ehdr *) 0x7fffed5ef8b0
         load64 = (Elf64_Phdr *) 0x7fffed5ef928
         eheader = [...]
         buf = [...]
         size = 760
         len = 0
         tot = 0
         offset32 = 32767
         offset64 = 36028797018964200
         tmp_flags = 64
         tmp_elf_header = 0x15951fe0 "\177ELF\002\001\001"
#2  0x00000000004f3e3b in is_kdump (file=0x7fffed5f1bee "vmcore-sample-small.x86_64", source_query=128)
     at netdump.c:2383
No locals.
#3  0x000000000044c892 in main (argc=2, argv=0x7fffed5f0cb8) at main.c:401
         i = <value optimized out>
         c = <value optimized out>
         option_index = 0
It looks like it should do more sanity check on p_offset but I am unsure how to fix this properly.
This is crash-4.1.1-0. The sample vmcore is too large to send by mail or to attach to Bugzilla and I am not sure the 
crash core itself would be of much use.
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Heads up: possible 2.6.31 kdump and crash utility failures
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
You may have seen this discussion re: 2.6.31 kdump failures on the 
kexec(a)lists.infradead.org mailing list:
  Kdump issue with percpu_alloc=lpage (Was:Re: crash_notes posted to kexec-tools)
  http://lists.infradead.org/pipermail/kexec/2009-October/003587.html
or saw Vivek's subsequent post to LKML to address it:
  [PATCH] Fix kdump failure if booted with percpu_alloc=page
  http://lkml.org/lkml/2009/11/19/214
Basically if a 2.6.31 or later kernel is:
 (1) configured with CONFIG_NEED_MULTIPLE_NODES, and
 (2) the system actually has multiple NUMA nodes,
then it will use vmalloc space for its percpu data.  In that case, the 2.6.31 
kernel uses the "lpage" percpu memory allocator (subsequently renamed the 
"page" allocator) instead of the traditional "embed" percpu memory allocator.
At least on x86_64, this will cause the the crash utility to fail during
initialization, because it tries to read vmalloc memory prior to having
set itself up to be able to walk page tables.
Prior to 4.1.1, it would fail with this error message:
  crash: read error: kernel virtual address: ffffc9000000e2f8  type: cpu number (per_cpu)
With 4.1.1 -- which quietly accepts the readmem failure above -- it fails later on
with these two error messages:
  crash: cannot determine idle task addresses from init_tasks[] or runqueues[]
  crash: cannot resolve "init_task_union"
I believe that this only affects x86_64.  I am testing a fix for it, which
I will put in a new crash release in short order.
Dave
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        [ANNOUNCE] crash version 4.1.1 is available
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
 - Fix for a potential session initialization failure when running 
   against 2.6.30 or later x86_64 kernel dumpfiles whose pages have been
   filtered by the the makedumpfile facility.  Without the patch, the
   session may fail with the error message "crash: page excluded: kernel
   virtual address: <address>  type: cpu number (per_cpu)", but will
   initialize OK if the "--zero_excluded" command line option is used.
   (anderson(a)redhat.com)
 - Added "lsmod" as a built-in alias for the "mod" command.
   (anderson(a)redhat.com)
 - Added a defensive mechanism to handle corrupt Elf32_Nhdr/Elf64_Nhdr
   structures in an ELF vmcore.  The fix no longer presumes that all
   Elf32_Nhdr/Elf64_Nhdr structure contents are legitimate, and if an
   invalid Elf32_Nhdr or Elf64_Nhdr structure is encountered, it will 
   be ignored and a warning message will be displayed showing the 
   structure contents, and the crash session will continue on.  Without
   the patch, it was possible that an invalid  n_namesz or n_descsz 
   value could cause a segmentation violation when attempting to read 
   the bogus note contents.
   (anderson(a)redhat.com)
 - Fix for "mach -c" command option on 2.6.30 and later x86_64 kernels
   in which the per-cpu array x8664_pda data structures were replaced 
   with per-cpu variables.  Without the patch, the command displays 
   just the boot cpu's cpuinfo data structure and then fails with the
   error message: "mach: invalid structure name: x8664_pda".
   (anderson(a)redhat.com) 
 - Fix to properly set the DEBUG exception stack size and stack base
   address on 2.6.18 and later x86_64 kernels.  Without the patch, the
   DEBUG exception stack was presumed to be the same size as all of the
   other exception stacks, so in the extremely rare occurrance that a 
   kernel crash started while running on a per-cpu DEBUG stack, the 
   backtrace code would not recognize it as such, and would either start
   the trace using stale starting stack hooks, typically from "schedule"
   while running on the process stack, or the backtrace attempt would 
   fail with the error message "bt: cannot transition from exception 
   stack to current process stack".
   (anderson(a)redhat.com)
 - Related to the above, when the x86_64 "bt" is displaying a trace
   segment from one of the five exception stacks, change the output from 
   showing just "--- <exception stack> ..." to showing which exception 
   stack it's working from, for example, "--- <NMI exception stack> ---"
   or "--- <DEBUG exception stack> ---", etc.
   (anderson(a)redhat.com)
 - Fix for a session initialization failure when running against 2.6.30
   or later x86_64 kernels if the number of possible cpus equals the
   kernel's configured NR_CPUS.  Without the patch, the session fails 
   with the error message "crash: invalid kernel virtual address: cc08
   type: cpu number (per_cpu)".
   (bob.montgomery(a)hp.com)
 - Preparations in the top-level source code for the integration of 
   gbd-7.0.  The current embedded version remains gdb-6.1.
   (anderson(a)redhat.com)
 Download from: http://people.redhat.com/anderson
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] kmap
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Darrin Thompson" <darrinth(a)gmail.com> wrote:
> On Wed, Nov 18, 2009 at 3:21 PM, Dave Anderson <anderson(a)redhat.com>
> wrote:
> > Or for what it's worth, you can just read the data using the
> physical
> > address:
> >
> >  crash> rd -p 2b0000 10
> >            2b0000:  0100c70000080805 fff0db31fb000000  ............1...
> >            2b0010:  8b485500313e6b05 c931c03145302454  .k>1.UH.T$0E1.1.
> 
> That's exactly what I'm looking for.
> 
> When I'm looking at:
> 
> kmem -p ffff810104ffd258
>       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
> ffff810104ffd258 16da9d000                0        0  1 168100000000061
> 
> How do I get the translation of the flags? I've seen something useful
> in vtop but I can never tell if it's giving me flags for the page
> struct at the pointer I give or the page struct that would have
> pointed the address I gave.
The flags you see in the "vtop" output are PTE flags and not page flags.
For the page flags, you'll have to look at the kernel source code
in "include/linux/page-flags.h".  The usage of that bit-field changes
way too much for it to be hardwired into the crash utility. 
Dave
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: [Crash-utility] kmap
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Darrin Thompson" <darrinth(a)gmail.com> wrote:
> I'm finding a problem struct page in a kdump. I want to trace down
> what that page is referring to. For instance, if I could execute
> kmap(page), and run rd the pointer returned, what would I find there?
> I realize that this may not always be possible. What is the right way
> to attempt it? This is x86_64 if it matters.
If it's an x86_64, then calling kmap(page) ends up doing this
on the page struct address:
   __va(page_to_pfn(page) << PAGE_SHIFT);
So, I'm presuming that you know the page structure address, but you
want to know how to access the page data via its kmap'd virtual
address.
So for example, suppose I know that the page structure address
is ffff8100006ef680, then "kmem -p <page-address> shows the
physical address of the referenced page:
  crash> kmem -p ffff8100006ef680
        PAGE       PHYSICAL      MAPPING       INDEX CNT FLAGS
  ffff8100006ef680   2b0000                0        0  1 400
  crash>
For x86_64, then it's simply a matter of changing the physical
address into its unity-mapped kernel virtual address (i.e. as
returned by the __va() macro):
  crash> ptov 2b0000
  VIRTUAL           PHYSICAL        
  ffff8100002b0000  2b0000
  crash>
So kmap(0xffff8100006ef680) would return ffff8100002b0000, which
you can "rd":
      
  crash> rd ffff8100002b0000 10
  ffff8100002b0000:  0100c70000080805 fff0db31fb000000   ............1...
  ffff8100002b0010:  8b485500313e6b05 c931c03145302454   .k>1.UH.T$0E1.1.
  ffff8100002b0020:  03f8ba046a0c7a8b 8d4c2c24748b0000   .z.j.......t$,L.
  ffff8100002b0030:  f5e800000090248c fffffb37e9ffffe4   .$..........7...
  ffff8100002b0040:  03398330244c8b48 798300000156860f   H.L$0.9...V....y
  crash>
Or for what it's worth, you can just read the data using the physical
address:
  crash> rd -p 2b0000 10
            2b0000:  0100c70000080805 fff0db31fb000000   ............1...
            2b0010:  8b485500313e6b05 c931c03145302454   .k>1.UH.T$0E1.1.
            2b0020:  03f8ba046a0c7a8b 8d4c2c24748b0000   .z.j.......t$,L.
            2b0030:  f5e800000090248c fffffb37e9ffffe4   .$..........7...
            2b0040:  03398330244c8b48 798300000156860f   H.L$0.9...V....y
  crash> 
I *think* that's what's your asking...
Dave
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> On Wed, 2009-11-11 at 18:54 +0000, Dave Anderson wrote:
> 
> > > > But another question is in the (extremely) rare circumstance of
> a
> > > > non-CONFIG_SMP kernel.  In that case, the kt->__per_cpu_offset[] array
> > > > would be all NULL, and the symbol_value("per_cpu__cpu_number")
> > > > call would return the qualified unity-mapped address.  So the 
> > > > virtual address calculation should work in x86_64_per_cpu_init(),
> > > > and the loop wouldn't even be entered in x86_64_get_smp_cpus()
> > > > 
> > > > That being said, I don't think I've seen a recent x86_64 kernel
> > > > that was not compiled CONFIG_SMP, so I can't confirm that it's
> > > > ever been tested.  
> > > > 
> > > > So for sanity's sake, maybe your patch should also be applied,
> > > > but should also check if the "i" index is non-zero?
> 
> Now I'm thinking that test won't be needed for the non-CONFIG_SMP
> kernel.  If the array is full of 0x0s, the loop will compute the first
> address as (0x0 + symbol_value("per_cpu__cpu_number")) and read a
> cpunumber of 0.  Then on the next iteration, it will calculate the very
> same address again, and read the same cpunumber of 0.  But now the test
> is against cpus==1, so that test will fail and we'll drop out of the
> loop, right?  
Right!
> In the real smp case, we'll still try to read the small offset (cc08)
> like an address, but be spared any embarrassment by the QUIET|
> RETURN_ON_ERROR fix.
Just to be clear, I think that we agree that:
 (1) the QUIET|RETURN_ON_ERROR be applied in both functions,
 (2) the kt->__per_cpu_offset[] NULL-check should be completely dropped
     in x86_64_per_cpu_init(), and 
 (3) the kt->__per_cpu_offset[] NULL-check should still be applied in 
     x86_64_get_smp_cpus() since that loop pre-requires that it's SMP.
Dave
 
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                        
                                
                                
                                        
                                                
                                        
                                        
                                        kmap
                                
                                
                                
                                    
                                        by Darrin Thompson
                                    
                                
                                
                                        I'm finding a problem struct page in a kdump. I want to trace down
what that page is referring to. For instance, if I could execute
kmap(page), and run rd the pointer returned, what would I find there?
I realize that this may not always be possible. What is the right way
to attempt it? This is x86_64 if it matters.
--
Darrin
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> On Wed, 2009-11-11 at 14:52 +0000, Dave Anderson wrote:
> > ----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> > 
> > > I have a dump from a 2.6.31-based x86_64 system where the number of
> > > "possible" cpus equals the system's NR_CPUS (32).  
> > > On that system, the __per_cpu_offset table in the kernel consists of 32
> > > valid offset pointers.
> 
> > I have a similar-but-different fix queued for this, but instead of
> > checking for a NULL kt->__per_cpu_offset[i] entry, it changes the
> > readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR
> > like this:
> > 
> >                 if (!readmem(symbol_value("per_cpu__cpu_number") +
> >                     kt->__per_cpu_offset[i],
> >                     KVADDR, &cpunumber, sizeof(int),
> >                     "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
> >                         break;
> 
> > That should prevent the failure you're seeing.
> 
> I did that first, and thought it was sort of cheating :-)
Sort of.  But at that point in time we're still kind of blindly
wading around in the murk trying to figure out what we're 
running on...
 
> 
> > But another question is in the (extremely) rare circumstance of a
> > non-CONFIG_SMP kernel.  In that case, the kt->__per_cpu_offset[] array
> > would be all NULL, and the symbol_value("per_cpu__cpu_number")
> > call would return the qualified unity-mapped address.  So the
> > virtual address calculation should work in x86_64_per_cpu_init(),
> > and the loop wouldn't even be entered in x86_64_get_smp_cpus()
> > 
> > That being said, I don't think I've seen a recent x86_64 kernel
> > that was not compiled CONFIG_SMP, so I can't confirm that it's
> > ever been tested.  
> > 
> > So for sanity's sake, maybe your patch should also be applied,
> > but should also check if the "i" index is non-zero?
> 
> So like this?
> +               if (i && (kt->__per_cpu_offset[i] == NULL))
> +                       break;
Yes.
> 
> So it's always ok to try the readmem on the first element of
> the array.  And the RETURN_ON_ERROR would deal with something going
> wrong with that, although that case would presumably be a real
> problem with the dump, right?  (cpus == 0)
Most likely yes.  The motivation for my fix was due to a failure
attempting to readmem() a legitimate virtual address that was an
an excluded page from a makedumpfile-generated dump. If I recall
correctly, it was an in-house kexec-tools bugzilla, but I can't 
find it.
Dave
 
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        Re: invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
                                
                                
                                
                                    
                                        by Dave Anderson
                                    
                                
                                
                                        
----- "Bob Montgomery" <bob.montgomery(a)hp.com> wrote:
> I have a dump from a 2.6.31-based x86_64 system where the number of
> "possible" cpus equals the system's NR_CPUS (32).  
> On that system, the __per_cpu_offset table in the kernel consists of 32
> valid offset pointers.
> 
> When crash loads this table into its __per_cpu_offset[NR_CPUS=4096]
> array in struct kernel_table, it knows the length of the kernel's array
> (32*sizeof(long)), and copies the 32 pointers, leaving the rest of its
> (much longer) array full of 0x0s.
> 
> (This happens in kernel.c)
> 
>  193      if (symbol_exists("__per_cpu_offset")) {
>  194              if (LKCD_KERNTYPES())
>  195                      i = get_cpus_possible();
>  196              else
>  197                      i = get_array_length("__per_cpu_offset", NULL, 0);
>  198              get_symbol_data("__per_cpu_offset",
>  199                      sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS),
>  200                      &kt->__per_cpu_offset[0]);
>  201              kt->flags |= PER_CPU_OFF;
>  202      }
> 
> Later, in a couple of places, crash checks for the maximum valid
> __per_cpu_offset by reading the cpu_number value out of each per_cpu
> area and comparing it to the expected number until the comparison fails.
> (Remember NR_CPUS in crash is much larger then the kernel's NR_CPUS, and
> that's OK).
> 
> >From x86_64.c:
>   
> 4201            for (i = cpus = 0; i < NR_CPUS; i++) {
> 4202                    readmem(symbol_value("per_cpu__cpu_number") +
> 4203                            kt->__per_cpu_offset[i], KVADDR,
> 4204                            &cpunumber, sizeof(int),
> 4205                            "cpu number (per_cpu)", FAULT_ON_ERROR);
> 4206                    if (cpunumber != cpus)
> 4207                            break;
> 4208                    cpus++;
> 4209            }
> 
> This works well when the kernel's array has fewer real per_cpu_offsets
> than its own NR_CPUS, since the kernel preloads its array with a pointer
> (BOOT_PERCPU_OFFSET) and when this loop runs past the real
> per_cpu_offset pointers and tries to use the BOOT_PERCPU_OFFSET, it
> reads a bogus value for cpunumber and terminates.
> 
> But when the kernel's table is full of valid per_cpu_offset pointers,
> this loop continues off the end of that into the part of crash's
> __per_cpu_offset array that has the 0x0 initial values, and dies with:
> 
> crash: invalid kernel virtual address: cc08  type: "cpu number (per_cpu)"
> 
> The cc08 comes from the symbol_value of per_cpu__cpu_number:
> 000000000000cc08 D per_cpu__cpu_number
> 
> Bottom line:  Crash is assuming an insufficient array termination for
> the kernel's __per_cpu_offset array (a pointer that points to an invalid
> cpu_number).
> 
> The included patch adds an additional loop termination so that crash
> doesn't run off the end of what it loaded from the dump.  It just checks
> for a NULL 0x0 value in kt->__per_cpu_offset[i].
> 
> Bob Montgomery,
> Working at HP
I have a similar-but-different fix queued for this, but instead of
checking for a NULL kt->__per_cpu_offset[i] entry, it changes the
readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR
like this:
                if (!readmem(symbol_value("per_cpu__cpu_number") +
                    kt->__per_cpu_offset[i],
                    KVADDR, &cpunumber, sizeof(int),
                    "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
                        break;
That should prevent the failure you're seeing.
But another question is in the (extremely) rare circumstance of a
non-CONFIG_SMP kernel.  In that case, the kt->__per_cpu_offset[] array
would be all NULL, and the symbol_value("per_cpu__cpu_number")
call would return the qualified unity-mapped address.  So the
virtual address calculation should work in x86_64_per_cpu_init(),
and the loop wouldn't even be entered in x86_64_get_smp_cpus()
That being said, I don't think I've seen a recent x86_64 kernel
that was not compiled CONFIG_SMP, so I can't confirm that it's
ever been tested.  
So for sanity's sake, maybe your patch should also be applied,
but should also check if the "i" index is non-zero?
Thanks,
  Dave
                                
                         
                        
                                
                                15 years, 11 months
                        
                        
                 
         
 
        
            
        
        
        
                
                        
                                
                                 
                                        
                                
                         
                        
                                
                                
                                        
                                                
                                        
                                        
                                        invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
                                
                                
                                
                                    
                                        by Bob Montgomery
                                    
                                
                                
                                        I have a dump from a 2.6.31-based x86_64 system where the number of
"possible" cpus equals the system's NR_CPUS (32).  
On that system, the __per_cpu_offset table in the kernel consists of 32
valid offset pointers.
When crash loads this table into its __per_cpu_offset[NR_CPUS=4096]
array in struct kernel_table, it knows the length of the kernel's array
(32*sizeof(long)), and copies the 32 pointers, leaving the rest of its
(much longer) array full of 0x0s.
(This happens in kernel.c)
 193      if (symbol_exists("__per_cpu_offset")) {
 194              if (LKCD_KERNTYPES())
 195                      i = get_cpus_possible();
 196              else
 197                      i = get_array_length("__per_cpu_offset", NULL, 0);
 198              get_symbol_data("__per_cpu_offset",
 199                      sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS),
 200                      &kt->__per_cpu_offset[0]);
 201              kt->flags |= PER_CPU_OFF;
 202      }
Later, in a couple of places, crash checks for the maximum valid
__per_cpu_offset by reading the cpu_number value out of each per_cpu
area and comparing it to the expected number until the comparison fails.
(Remember NR_CPUS in crash is much larger then the kernel's NR_CPUS, and
that's OK).
>From x86_64.c:
  
4201            for (i = cpus = 0; i < NR_CPUS; i++) {
4202                    readmem(symbol_value("per_cpu__cpu_number") +
4203                            kt->__per_cpu_offset[i], KVADDR,
4204                            &cpunumber, sizeof(int),
4205                            "cpu number (per_cpu)", FAULT_ON_ERROR);
4206                    if (cpunumber != cpus)
4207                            break;
4208                    cpus++;
4209            }
This works well when the kernel's array has fewer real per_cpu_offsets
than its own NR_CPUS, since the kernel preloads its array with a pointer
(BOOT_PERCPU_OFFSET) and when this loop runs past the real
per_cpu_offset pointers and tries to use the BOOT_PERCPU_OFFSET, it
reads a bogus value for cpunumber and terminates.
But when the kernel's table is full of valid per_cpu_offset pointers,
this loop continues off the end of that into the part of crash's
__per_cpu_offset array that has the 0x0 initial values, and dies with:
crash: invalid kernel virtual address: cc08  type: "cpu number
(per_cpu)"
The cc08 comes from the symbol_value of per_cpu__cpu_number:
000000000000cc08 D per_cpu__cpu_number
Bottom line:  Crash is assuming an insufficient array termination for
the kernel's __per_cpu_offset array (a pointer that points to an invalid
cpu_number).
The included patch adds an additional loop termination so that crash
doesn't run off the end of what it loaded from the dump.  It just checks
for a NULL 0x0 value in kt->__per_cpu_offset[i].
Bob Montgomery,
Working at HP
                                
                         
                        
                                
                                15 years, 11 months