On Wed, Jan 08, 2014 at 03:32:01PM -0500, Dave Anderson wrote:
----- Original Message -----
>
>
>
> I have proposed a patch to makedumpfile to (optionally) exclude from
> a dump the page structures representing excluded pages.
> The idea being that a system with terabytes of system memory has
> millions of pages of page structures. And most of them are unneeded.
>
> That patch thread begins here:
>
http://marc.info/?l=kexec&m=138853299130125&w=2
>
> Dave [Anderson] raised these crash-related issues;
> Although I'm sure you tested this, I find it amazing that
> only the "kmem -[fF]" option is the only command option
> that is affected?
> If I'm not mistaken, this would be the first time that legitimate
> kernel data would be excluded from the dump, and the user would
> have no obvious way of knowing that it had been done, correct?
> If it were encoded in the in the header somewhere, at least a
> warning message could be printed during crash initialization.
> ...
> Right, but look at all of the other page struct offsets in addition to
> page.lru that are used. The page.flags usage comes to mind, and for
> example, what would "kmem -p" display for the missing pages?
> Or "kmem <address>"? And would "kmem -i" display
invalid data?
> I'm just speculating off the top of my head, but the page structure is
> such a fundamental data structure with several of its fields being used,
> just enter "help -o page_" to see all of its potential member usages.
>
> So I am submitting two patches for your consideration, should the patch
> to exclude unused vmemmap pages be taken into makedumpfile.
>
> - [PATCH 1/2] crash: initial note of excluded page structures
> This one makes crash startup look like this:
> This program has absolutely no warranty. Enter "help warranty" for
> details.
>
> NOTE: Unused vmemmap page structures are excluded from this dump.
> GNU gdb (GDB) 7.6
> Copyright (C) 2013 Free Software Foundation, Inc.
>
> - [PATCH 1/2] crash: kmem warnings for excluded page structures
> This patch modifies kmem options -f, -F, -s addr, -S addr, and -i.
> Those are the only options that I could detect looking for excluded pages.
>
> This patch applies on top of the first, and adds some warnings to the
> output of these kmem options. For example:
>
> crash> kmem -f
> Note: kmem -f may fail because unused page structures are excluded from
> this dump.
> NODE
> 0
> ZONE NAME SIZE FREE MEM_MAP START_PADDR
> START_MAPNR
> 0 DMA 4095 3934 ffffea0000000038 1000 0
> AREA SIZE FREE_AREA_STRUCT BLOCKS PAGES
> 0 4k ffff880000013068 2 2
> ...
>
> crash> kmem -i
> PAGES TOTAL PERCENTAGE
> TOTAL MEM 128008147 488.3 GB ----
> FREE 127599276 486.8 GB 99% of TOTAL MEM
> USED 408871 1.6 GB 0% of TOTAL MEM
> SHARED 11049 43.2 MB 0% of TOTAL MEM
> BUFFERS 5722 22.4 MB 0% of TOTAL MEM
> CACHED 44638 174.4 MB 0% of TOTAL MEM
> SLAB 62139 242.7 MB 0% of TOTAL MEM
>
> TOTAL SWAP 4893032 18.7 GB ----
> SWAP USED 0 0 0% of TOTAL SWAP
> SWAP FREE 4893032 18.7 GB 100% of TOTAL SWAP
>
> Note: 1970727 free pages not found (excluded); results are incomplete.
> Unused page structures are excluded from this dump.
>
> -Cliff Wickman
> cpw(a)sgi.com
Cliff,
Can you make this patch far simpler? I would prefer that an
error message *follows* the gdb banner, i.e., where a number of
current warnings get displayed? You seem to be showing it just
above the gdb banner, where it kind of gets lost.
Okay I'll move the message.
Secondly, I'm not sure where/how you are determining the 1970727
pages that are excluded. Is it possible to put that in the
early warning message?
The 1970727 pages were counted by the second patch. That is, I
put a counter in the readmem path that counted 'excluded' errors.
So there were 1970727 such errors during the execution of the
kmem -i.
It is probably not worth it to make a place for a new number
in the dump header. Knowing how many million pages were
excluded won't help if 'your' problem page was one of them.
Thirdly, I'm not convinced that the kmem locations where you
are adding per-option warning messages are going to be the only
place where problems may arise. For that reason, I would
not even bother putting them in all those various locations,
but rather listing the "known" commands that may fail in the
early warning message. So do something like:
Drop the 2nd patch then. If you think it might be more misleading
then helpful I'm fine with just the initialization warning.
$ crash vmlinux vmcore
crash 7.0.4
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: ominous warning message indicating (number) page structures have been
excluded, and that the kmem command will be affected...
KERNEL: /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux
DUMPFILE: /dev/crash
CPUS: 4
DATE: Wed Jan 8 15:27:14 2014
UPTIME: 57 days, 05:38:32
...
Dave
--
Cliff Wickman
SGI
cpw(a)sgi.com
(651) 683-3824