Dave
Thanks for the feed back. I am attaching the patch as per out
discussion, tested and it is working. Have a look at it and let me know
of your opinion.
Thanks
Sharyathi N
Dave Anderson wrote:
sharyathi nagesh wrote:
> Hi
> I am seeing this problem with crash tool on a system with NUMA nodes.
> crash exits with error message and no further analysis of dump is possible.
> =====
> Error message:
>
> cassinilp1:~ # crash
>
> crash 4.0-3.14
> Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc.
> Copyright (C) 2004, 2005, 2006 IBM Corporation
> Copyright (C) 1999-2006 Hewlett-Packard Co
> Copyright (C) 2005 Fujitsu Limited
> Copyright (C) 2005 NEC Corporation
> Copyright (C) 1999, 2002 Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions. Enter "help copying" to see the conditions.
> This program has absolutely no warranty. Enter "help warranty" for
details.
>
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
details.
> This GDB was configured as "powerpc64-unknown-linux-gnu"...
>
> crash: numnodes out of sync with pgdat_list?
>
> =====
> System configuration is given as
>
> Node 0 Memory:
> Node 1 Memory:
> Node 2 Memory:
> Node 3 Memory:
> Node 4 Memory: 0x0-0x180000000
>
> Node 0 CPUs: 0
> Node 1 CPUs:
> Node 2 CPUs:
> Node 3 CPUs:
> Node 4 CPUs: 1
> =====
> The problem is noticed because of mismatch:
>
> if (n != vt->numnodes)
> error(FATAL, "numnodes out of sync with pgdat_list?\n");
> in memory.c/dump_memory_nodes() function
>
> The problem is because of the mismatch between node_online_map and the number
of nodes observed by traversing through pgdat_list.
> node_online_map bit is set differently in kernel version 2.6.16 and 2.6.19.
> In earlier version all the bits from the first bit to
> nth bit, where n is last Node to which memory is assigned is set to '1'.
> But in later version node is considered online if either memory or cpu is
allocated (or both).
>
> So I need your suggestion on how to go and fix the problem
> A few ideas I had were
> 1) If KERNEL_VERSION <= 2.6.16 set increment vt->numnodes only if bits of
node_online_map and cpu_online_map are set.
> if KERNEL_VERSIOn > 2.6.16 use only node_online_map
> (This will partly solve the problem)
> 2) or as in node_table_init(). Raise the error only when CRASHDEBUG(2) is set else
update vt->numnodes with 'n'
>
> Please let me know of your opinion
> Regards
> Sharyathi Nagesh
>
>
Hi Sharyathi,
Thanks a lot for debugging this.
I prefer your idea (2) -- which if it works OK in your case -- will not break
any other currently-working incarnations.
Also, just to clarify, when you say "Raise the error...", node_table_init()
only makes an "error(NOTE, ...)" call, so you would simply get a "NOTE:
..."
message displayed if CRASHDEBUG(2), and the crash session would
still continue. That's also what we would want in this case, unlike the
"error(FATAL, ...)", session-ending, error that you're seeing now...
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility