On Wed, 2009-11-11 at 14:52 +0000, Dave Anderson wrote:
 ----- "Bob Montgomery" <bob.montgomery(a)hp.com>
wrote:
 
 > I have a dump from a 2.6.31-based x86_64 system where the number of
 > "possible" cpus equals the system's NR_CPUS (32).  
 > On that system, the __per_cpu_offset table in the kernel consists of 32
 > valid offset pointers. 
 I have a similar-but-different fix queued for this, but instead of
 checking for a NULL kt->__per_cpu_offset[i] entry, it changes the
 readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR
 like this:
 
                 if (!readmem(symbol_value("per_cpu__cpu_number") +
                     kt->__per_cpu_offset[i],
                     KVADDR, &cpunumber, sizeof(int),
                     "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
                         break; 
 That should prevent the failure you're seeing. 
I did that first, and thought it was sort of cheating :-)
 But another question is in the (extremely) rare circumstance of a
 non-CONFIG_SMP kernel.  In that case, the kt->__per_cpu_offset[] array
 would be all NULL, and the symbol_value("per_cpu__cpu_number")
 call would return the qualified unity-mapped address.  So the
 virtual address calculation should work in x86_64_per_cpu_init(),
 and the loop wouldn't even be entered in x86_64_get_smp_cpus()
 
 That being said, I don't think I've seen a recent x86_64 kernel
 that was not compiled CONFIG_SMP, so I can't confirm that it's
 ever been tested.  
 
 So for sanity's sake, maybe your patch should also be applied,
 but should also check if the "i" index is non-zero? 
So like this?
+               if (i && (kt->__per_cpu_offset[i] == NULL))
+                       break;
So it's always ok to try the readmem on the first element of
the array.  And the RETURN_ON_ERROR would deal with something going
wrong with that, although that case would presumably be a real problem
with the dump, right?  (cpus == 0)
Thanks,
Bob M.