Hello Everyone,
I am analysing a kernel crash dump (vmcore) captured from RHEL-5
kernel version (2.6.18-371.4.1.el5) and found that the value of
"NR_WRITEBACK" counter is negative (-126).
$ rpm -q crash
crash-7.0.6-2.el6.x86_64
crash> sys | grep -e RELEASE -e MACHINE -e MEMORY
RELEASE: 2.6.18-371.4.1.el5
MACHINE: x86_64 (3000 Mhz)
MEMORY: 31.5 GB
crash> kmem -z | grep -e ZONE -e NR_WRITEBACK
NODE: 0 ZONE: 0 ADDR: ffff810000032000 NAME: "DMA"
NR_WRITEBACK: 0
NODE: 0 ZONE: 1 ADDR: ffff810000032b00 NAME: "DMA32"
NR_WRITEBACK: 0
NODE: 0 ZONE: 2 ADDR: ffff810000033600 NAME: "Normal"
NR_WRITEBACK: -126 <<<<
NODE: 0 ZONE: 3 ADDR: ffff810000034100 NAME: "HighMem"
crash> kmem -V | grep -e NR_WRITEBACK
NR_WRITEBACK: -126 <<<<
crash> vm_stat
vm_stat = $1 =
{{
counter = 1106459
}, {
counter = 2940354
}, {
counter = 6341366
}, {
counter = 301750
}, {
counter = 245858
}, {
counter = 438
}, {
counter = -126 // NR_WRITEBACK <<<<
}, {
counter = 0
}, {
counter = 0
}, {
counter = 19687071384
}, {
counter = 0
}, {
counter = 0
}, {
counter = 29247123
}, {
counter = 19687071384
}, {
counter = 0
}}
As we're running a 64 bit kernel and the counters are signed long,
so this is very unlikely to be a counter overflow. I need pointers
and suggestions to determine the *cause* of negative counter from
vmcore.
Additional Information:
$ git show ce866b34ae1b7f1ce60234cf65855886ac7e7d30
[..]
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 6fed520..a7b3dcb 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -49,9 +49,6 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
char * buf)
get_page_state_node(&ps, nid);
__get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
- /* Check for negative values in these approximate counters */
- if ((long)ps.nr_writeback < 0)
- ps.nr_writeback = 0;
n = sprintf(buf, "\n"
"Node %d MemTotal: %8lu kB\n"
[..]
Thank you !
--
BKS