On Fri, Apr 15, 2011 at 12:13:51PM -0400, Dave Anderson wrote:
 
 
 ----- Original Message -----
 > (4/15/2011 09:04), Dave Anderson wrote:
 > >
 > >
 > > ----- Original Message -----
 > >> Hi Dave, and company,
 > >>
 > >> I get this error trying to open a dump of a large system:
 > >>
 > >> crash: compressed kdump: invalid nr_cpus value: 640
 > >> crash: vmcore: not a supported file format
 > >>
 > >> The message is from diskdump.c:
 > >> if (sizeof(*header) + sizeof(void *) * header->nr_cpus> block_size
 > >> ||
 > >>      header->nr_cpus<= 0) {
 > >>          error(INFO, "%s: invalid nr_cpus value: %d\n",
 > >>
 > >> block_size is the page size of 4096
 > >> struct disk_dump_header looks like 464 bytes
 > >> void * is 8
 > >> So it looks like 454 is the maximum number of cpus.
 > >> 464 + 454*8 -> 4096
 > >>
 > >> Is this intentional?
 > >> It looks like a restriction that no one ever complained about. But
 > >> there
 > >> are systems (Altix UV) with 2048 cpu's.
 > >>
 > >> Is there an easy fix?
 > >>
 > >> -Cliff
 > >
 > > To be honest, I don't know, I didn't design or write that code.
 > 
 > Yes, this is intentional for RHEL4/diskdump. In the RHEL4 kernel,
 > disk_dump_header is defined as follows.
 > 
 > struct disk_dump_header {
 > char signature[8]; /* = "DISKDUMP" */
 > (snip)
 > int nr_cpus; /* Number of CPUs */
 > struct task_struct *tasks[NR_CPUS];
 > };
 > 
 > And maximum logical CPUs of RHEL4 are 32(x86) or 64(x86_64) so
 > this does not cause any problem.
 > 
 > On the other hands, as you and Dave said, this causes limitation
 > problem
 > in RHEL5/RHEL6 kernel. But as far as I know, makedumpfile does not use
 > this "tasks" member, so we can skip here.
 > 
 > if (is_diskdump?)
 > if (sizeof(*header) + sizeof(void *) * header->nr_cpus> block_size ||
 > header->nr_cpus<= 0) {
 > error(INFO, "%s: invalid nr_cpus value: %d\n",
 > goto err;
 > }
 > 
 > Something like this. Dave, Ohmichi-san, what do you think?
 > 
 > Thanks,
 > Takao Indoh
 
 Looking at a couple sample compressed kdumps, it does appear that
 they do not set up the tasks[] array, and that the sub_header still
 starts at the "header->block_size".  That being the case, your
 proposal looks like a nice, simple fix!
 
 Cliff, can you enclose that piece of code with something like:
 
         if (DISKDUMP_VALID()) {
             if (sizeof(*header) + sizeof(void *) * header->nr_cpus > block_size ||
                 header->nr_cpus <= 0) {
                     error(INFO, "%s: invalid nr_cpus value: %d\n",
                             DISKDUMP_VALID() ? "diskdump" : "compressed
kdump",
                             header->nr_cpus);
                     goto err;
             }
         }
 
 I suppose it should still check for (header->nr_cpus <= 0), but I'll
 let Ken'ichi confirm that. 
Yes indeed.  That seems to work for the large cpu count case.
Thanks.
-Cliff
 > > And you're right, although dumpfiles with that many
cpus are highly
 > > unusual, but looking at the code, it certainly does appear that the
 > > disk_dump_header plus the task pointers for each cpu must fit in a
 > > "block_size", or page size, and that the sub_header is the first
 > > item
 > > following the contents of the first page:
 > >
 > > ---
 > >   ^ disk_dump_header
 > >   |     task on cpu 0
 > > page ...
 > >   |     task on cpu x-1
 > >   V task on cpu x
 > > ---
 > >         sub_header
 > >         bitmap
 > >         remaining stuff...
 > >
 > > Since your dump is presumably a compressed kdump, I'm wondering
 > > what the makedumpfile code did in your case? Did it round up the
 > > location of the sub_header to a page boundary?
 > >
 > > I've cc'd this to Ken'ichi Ohmichi (makedumpfile), and to Takao
 > > Indoh
 > > (original compressed diskdump) for their input. 
-- 
Cliff Wickman
SGI
cpw(a)sgi.com
(651) 683-3824