Thanks Dave,
As always we all appreciate the huge effort you put into this for us.
Will keep an eye out for this.
Laurence
----- Original Message -----
From: "Dave Anderson" <anderson(a)redhat.com>
To: "Discussion list for crash utility usage, maintenance and development"
<crash-utility(a)redhat.com>
Sent: Thursday, May 5, 2011 5:18:04 PM
Subject: [Crash-utility] HEAD'S UP -- problem with kernels built with gcc-4.6.0
As a heads-up to those of you who are working with kernels
that were compiled with the new gcc-4.6.0.
I had thought that gcc-4.6.0 was painful only as far as compiling
the crash utility was concerned, where there were a bunch of new
"error: variable <variable> set but not used [-Werror=unused-but-set-variable]
messages that I fixed in crash-5.1.2 and -5.1.3. And you may be aware that
that those for-the-most-part useless warnings recently caused an LKML shitstorm
w/respect to building kernels.
But it's worse than that -- there is a problem with crash's embedded gdb
determining the member offsets of the (large) pglist_data structure if
the kernel was compiled with gcc-4.6.0. This is not specific to the
gdb-7.0 version that is built into crash, but with all gdb
versions as far as I can tell, certainly with gdb-7.2-48.el6
and gdb-7.2.50.20110328-31.fc15.
The problem is most clearly seen with "struct -o pglist_data", which
dumps the structure, showing the offset of each member.
For comparison, here is the output from a (good) 2.6.38-rc4 kernel
that was compiled with gcc-4.5.1:
crash> help -k | grep gcc_version
gcc_version: 4.5.1
crash> struct -o pglist_data
struct pglist_data {
[0x0] struct zone node_zones[4];
[0x1c00] struct zonelist node_zonelists[2];
[0x13e40] int nr_zones;
[0x13e44] spinlock_t node_size_lock;
[0x13e48] long unsigned int node_start_pfn;
[0x13e50] long unsigned int node_present_pages;
[0x13e58] long unsigned int node_spanned_pages;
[0x13e60] int node_id;
[0x13e68] wait_queue_head_t kswapd_wait;
[0x13e80] struct task_struct *kswapd;
[0x13e88] int kswapd_max_order;
[0x13e8c] enum zone_type classzone_idx;
}
SIZE: 0x13f00
crash>
While here is the output from a 2.6.38.2-9.fc15 kernel that
was compiled with gcc-4.6.0:
crash> help -k | grep gcc_version
gcc_version: 4.6.0
crash> struct -o pglist_data
struct pglist_data {
[0x0] struct zone node_zones[4];
[0x1c00] struct zonelist node_zonelists[2];
[0x0] int nr_zones;
[0x0] spinlock_t node_size_lock;
[0x0] long unsigned int node_start_pfn;
[0x0] long unsigned int node_present_pages;
[0x0] long unsigned int node_spanned_pages;
[0x0] int node_id;
[0x0] wait_queue_head_t kswapd_wait;
[0x0] struct task_struct *kswapd;
[0x0] int kswapd_max_order;
[0x0] enum zone_type classzone_idx;
}
SIZE: 0x13f00
crash>
It's interesting that it gets the size correct, but the member offset
values beyond the node_zonelists[] array are returned as 0.
Taking the crash utility out of the picture, the problem can be seen
by simply running "gdb vmlinux".
For example, with the first example above using the good kernel:
$ gdb vmlinux-2.6.38-rc4
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<
http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/vmlinux-2.6.38-rc4...done.
(gdb) ptype struct pglist_data
type = struct pglist_data {
struct zone node_zones[4];
struct zonelist node_zonelists[2];
int nr_zones;
spinlock_t node_size_lock;
long unsigned int node_start_pfn;
long unsigned int node_present_pages;
long unsigned int node_spanned_pages;
int node_id;
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
int kswapd_max_order;
enum zone_type classzone_idx;
}
(gdb) p &((struct pglist_data *)(0x0)).node_zonelists[0]
$1 = (struct zonelist *) 0x1c00
(gdb) p &((struct pglist_data *)(0x0)).nr_zones
$2 = (int *) 0x13e40
(gdb) p &((struct pglist_data *)(0x0)).node_size_lock
$3 = (spinlock_t *) 0x13e44
(gdb) p &((struct pglist_data *)(0x0)).node_start_pfn
$4 = (long unsigned int *) 0x13e48
(gdb) p &((struct pglist_data *)(0x0)).node_present_pages
$5 = (long unsigned int *) 0x13e50
(gdb) p &((struct pglist_data *)(0x0)).node_spanned_pages
$6 = (long unsigned int *) 0x13e58
(gdb) p &((struct pglist_data *)(0x0)).node_id
$7 = (int *) 0x13e60
(gdb) p &((struct pglist_data *)(0x0)).kswapd
$8 = (struct task_struct **) 0x13e80
(gdb) p &((struct pglist_data *)(0x0)).kswapd_max_order
$9 = (int *) 0x13e88
(gdb) p &((struct pglist_data *)(0x0)).classzone_idx
$10 = (enum zone_type *) 0x13e8c
(gdb)
And then with the kernel compiled with gcc-4.6.0:
# gdb vmlinux-2.6.38.2-9.fc15
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<
http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/vmlinux-2.6.38.2-9.fc15...done.
(gdb) ptype struct pglist_data
type = struct pglist_data {
struct zone node_zones[4];
struct zonelist node_zonelists[2];
int nr_zones;
spinlock_t node_size_lock;
long unsigned int node_start_pfn;
long unsigned int node_present_pages;
long unsigned int node_spanned_pages;
int node_id;
wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
int kswapd_max_order;
enum zone_type classzone_idx;
}
(gdb) p &((struct pglist_data *)(0x0)).node_zonelists[0]
$1 = (struct zonelist *) 0x1c00
(gdb) p &((struct pglist_data *)(0x0)).nr_zones
$2 = (int *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).node_size_lock
$3 = (spinlock_t *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).node_start_pfn
$4 = (long unsigned int *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).node_present_pages
$5 = (long unsigned int *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).node_spanned_pages
$6 = (long unsigned int *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).node_id
$7 = (int *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).kswapd_wait
$8 = (wait_queue_head_t *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).kswapd
$9 = (struct task_struct **) 0x0
(gdb) p &((struct pglist_data *)(0x0)).kswapd_max_order
$10 = (int *) 0x0
(gdb) p &((struct pglist_data *)(0x0)).classzone_idx
$11 = (enum zone_type *) 0x0
(gdb)
Anyway, given that the pglist_data structure is crucial to the
crash utility, the bogus offset data generates errors such as
the MEMORY value, as shown here on a 4GB system:
crash> sys
KERNEL: vmlinux-2.6.38.2-9.fc15.gz
DUMPFILE: vmcore.compressed
CPUS: 12
DATE: Thu May 5 16:01:44 2011
UPTIME: 00:02:45
LOAD AVERAGE: 1.26, 0.51, 0.20
TASKS: 171
NODENAME:
amd-toonie2-02.lab.bos.redhat.com
RELEASE: 2.6.38.2-9.fc15.x86_64
VERSION: #1 SMP Wed Mar 30 16:55:57 UTC 2011
MACHINE: x86_64 (2400 Mhz)
MEMORY: 680 KB
PANIC: ""
crash>
Bogus "kmem -n" node data gets output:
crash> kmem -n
NODE SIZE PGLIST_DATA BOOTMEM_DATA NODE_ZONES
170 170 ffff88003ffec000 ---- ffff88003ffec000
ffff88003ffec700
ffff88003ffece00
ffff88003ffed500
MEM_MAP START_PADDR START_MAPNR
ffffea0000002530 aa000 170
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 4080 ffffea0000000380 10000 16
1 DMA32 258048 ffffea0000038000 1000000 4096
2 Normal 0 0 0 0
3 Movable 0 0 0 0
...
And on a system configured with CONFIG_SLUB, "kmem -s" fails miserably:
crash> kmem -s
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
kmem: page_to_nid: cannot determine node for pages: ffffea0000cb2bc0
kmem: page_to_nid: cannot determine node for pages: ffffea0000cece18
ffff880037bd0300 UDPLITEv6 984 0 0 0 32k
kmem: page_to_nid: cannot determine node for pages: ffffea0000cc8640
ffff880037bd0100 tw_sock_TCPv6 280 0 0 0 8k
kmem: page_to_nid: cannot determine node for pages: ffffea0000cd70c0
kmem: page_to_nid: cannot determine node for pages: ffffea0000c6bb90
kmem: page_to_nid: cannot determine node for pages: ffffea0000ca5bf0
ffff880037a7f100 dm_raid1_read_record 1064 0 0 0 32k
ffff880037a7f000 kcopyd_job 368 0 0 0 8k
ffff880037a7ef00 dm_uevent 2608 0 0 0 32k
ffff880037a7ee00 dm_rq_target_io 400 0 0 0 8k
kmem: page_to_nid: cannot determine node for pages: ffffea0000c304e0
...
And there may be other problems that I'm not aware of that are associated
with the pglist_data data structure members specifically -- and perhaps with
other data structures as well?
I filed a bugzilla with gdb, although it may likely be a bug with
the debuginfo data created by gcc-4.6.0. We'll see what happens...
In the meantime, I do have a workaround kludge for pglist_data members that
will be included in the upcoming crash-5.1.5 release.
Annoyed to no end,
Dave
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility