On Thu, Jan 09, 2014 at 04:11:40PM -0500, Dave Anderson wrote:
> > > > > > Version 2
> > > > > > - Moves the warning to this point:
> > > > > > ...
> > > > > > This GDB was configured as
"x86_64-unknown-linux-gnu"...
> > > > > >
> > > > > > WARNING: All unused vmemmap page structures are
excluded from
> > > > > > this
> > > > > > dump.
> > > > > > This will cause failures of the kmem command
if it
> > > > > > attempts to
> > > > > > walk any list of free pages (with options -f
-F -i -s
> > > > > > and
> > > > > > -S).
> > > > > >
> > > > > > SYSTEM MAP: /boot/System.map-2.6.32-cpw
> > > > > > ...
> > > > > >
> > > > > > - Drop patch 2, which added warnings to individual kmem
options.
> > > > > >
> > > > > > - Feel free to change the wording of the warning.
> > > > > >
> > > > > > And this patch is contingent upon the acceptance of a
change to the
> > > > > > makedumpfile command.
> > > > > >
http://marc.info/?l=kexec&m=138853299130125&w=2
> > > > > >
> > > > > >
> > > > > > If makedumpfile excludes unused page structures it will
flag that
> > > > > > fact in the dump header.
> > > > > > (There are about 3.67 million pages full of page structures
for
> > > > > > every tera byte of system memory. The great bulk of
those
> > > > > > page structures are not needed.)
> > > > > > Their exclusion is a makedumpfile option.
> > > > > >
> > > > > > Crash will display a note during initialization if such
structures
> > > > > > have been excluded. Crash commands that walk page
freelists, for
> > > > > > example, will fail. So the note will help the user
understand why.
> > > > > >
> > > > > > Signed-off-by: Cliff Wickman <cpw(a)sgi.com>
> > > > > > ---
> > > > > > defs.h | 1 +
> > > > > > diskdump.c | 3 +++
> > > > > > diskdump.h | 1 +
> > > > > > main.c | 6 ++++++
> > > > > > 4 files changed, 11 insertions(+)
> > > > > >
> > > > > > Index: crash-7.0.4/diskdump.c
> > > > > >
===================================================================
> > > > > > --- crash-7.0.4.orig/diskdump.c
> > > > > > +++ crash-7.0.4/diskdump.c
> > > > > > @@ -749,6 +749,9 @@ restart:
> > > > > > dd->valid_pages[i]++;
> > > > > > }
> > > > > >
> > > > > > + if (header->status & DUMP_DH_EXCLUDED_VMEMMAP)
> > > > > > + pc->flags2 |= VMEXCLUDED;
> > > > > > +
> > > > > > return TRUE;
> > > > > >
> > > > > > err:
> > > > > > Index: crash-7.0.4/diskdump.h
> > > > > >
===================================================================
> > > > > > --- crash-7.0.4.orig/diskdump.h
> > > > > > +++ crash-7.0.4/diskdump.h
> > > > > > @@ -84,6 +84,7 @@ struct kdump_sub_header {
> > > > > > #define DUMP_DH_COMPRESSED_ZLIB 0x1 /* page is
compressed with
> > > > > > zlib
> > > > > > */
> > > > > > #define DUMP_DH_COMPRESSED_LZO 0x2 /* page is
compressed with
> > > > > > lzo
> > > > > > */
> > > > > > #define DUMP_DH_COMPRESSED_SNAPPY 0x4 /* page is
compressed with
> > > > > > snappy
> > > > > > */
> > > > > > +#define DUMP_DH_EXCLUDED_VMEMMAP 0x8 /* unused vmemmap
pages are
> > > > > > excluded */
> > > > > >
> > > > > > /* descriptor of each page for vmcore */
> > > > > > typedef struct page_desc {
> > > > > > Index: crash-7.0.4/defs.h
> > > > > >
===================================================================
> > > > > > --- crash-7.0.4.orig/defs.h
> > > > > > +++ crash-7.0.4/defs.h
> > > > > > @@ -505,6 +505,7 @@ struct program_context {
> > > > > > #define VMCOREINFO (0x400ULL)
> > > > > > #define ALLOW_FP (0x800ULL)
> > > > > > #define REM_PAUSED_F (0x1000ULL)
> > > > > > +#define VMEXCLUDED (0x2000ULL)
> > > > > > #define REMOTE_PAUSED() (pc->flags2 &
REM_PAUSED_F)
> > > > > > char *cleanup;
> > > > > > char *namelist_orig;
> > > > > > Index: crash-7.0.4/main.c
> > > > > >
===================================================================
> > > > > > --- crash-7.0.4.orig/main.c
> > > > > > +++ crash-7.0.4/main.c
> > > > > > @@ -662,6 +662,12 @@ main_loop(void)
> > > > > > } else
> > > > > > SIGACTION(SIGINT, restart, &pc->sigaction,
NULL);
> > > > > >
> > > > > > + if (pc->flags2 & VMEXCLUDED)
> > > > > > + fprintf(fp,
> > > > > > + "WARNING: All unused vmemmap page structures
are excluded
> > > > > > from
> > > > > > this dump.\n"
> > > > > > + " This will cause failures of the kmem
command if it
> > > > > > attempts
> > > > > > to\n"
> > > > > > + " walk any list of free pages (with options
-f -F -i -s and
> > > > > > -S).\n\n");
> > > > > > +
> > > > > > /*
> > > > > > * Display system statistics and current
context.
> > > > > > */
> > > > > >
> > > > >
> > > > > This patch looks reasonable.
> > > > >
> > > > > BTW, what happens if you enter "kmem <address>"
alone with no option?
> > > > > It should fail as well, no?
> > > >
> > > > Yes, it does.
> > > >
> > > > crash> kmem 133360
> > > > kmem: page excluded: kernel virtual address: ffffea0000004350 type:
> > > > "page.lru.next"
> > > >
> > > > crash> kmem ffff8897baae7540
> > > > CACHE NAME OBJSIZE ALLOCATED TOTAL
SLABS
> > > > SSIZE
> > > > ffff88c7bec70040 task_struct 2656 4578 5361
1787
> > > > 8k
> > > > SLAB MEMORY TOTAL ALLOCATED FREE
> > > > ffff8897baae6040 ffff8897baae6080 3 2 1
> > > > FREE / [ALLOCATED]
> > > > [ffff8897baae7540]
> > > >
> > > > kmem: page excluded: kernel virtual address: ffffea0000007028 type:
"first list entry"
> > > >
> > > > Another way to search a freelist, that I didn't notice.
> > > > Do you want to add that to the warning?
> > >
> > > I'm thinking that there might be others as well. Like how does
"kmem -p" handle
> > > the missing page structs?
> >
> > kmem -p gives the appearance of working, but is unable to read the
> > mem_map pages. It is giving false results.
> > This is an example of what you feared.
> >
> > If we do this:
> >
> > ---
> > memory.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: crash-7.0.4/memory.c
> > ===================================================================
> > --- crash-7.0.4.orig/memory.c
> > +++ crash-7.0.4/memory.c
> > @@ -5657,7 +5657,7 @@ fill_mem_map_cache(ulong pp, ulong ppend
> > * Try to read it in one fell swoop.
> > */
> > if (readmem(pp, KVADDR, page_cache, SIZE(page) * PGMM_CACHED,
> > - "page struct cache", RETURN_ON_ERROR|QUIET))
> > + "page struct cache", FAULT_ON_ERROR))
> > return;
> >
> > /*
> >
> > Then we see the immediate failure:
> >
> > crash> kmem -p
> > PAGE PHYSICAL MAPPING INDEX CNT FLAGS
> > kmem: page excluded: kernel virtual address: ffffea0000004000 type:
"page
> > struct cache"
> >
> > What should be done about that?
>
> I'd leave it alone -- at least it doesn't fail miserably like the other
commands.
>
> > I wonder why the reads of the mem_map were done with RETURN_ON_ERROR|QUIET
> > to begin with?
>
> It's done to guard against a vmemmap boundary (real page structs vs. unmapped
> page structs) that happens to occur within the 512-page-size chunk of memory
> that's being cached. So first it tries to read the full chunk in one
readmem(),
> but if that fails, it breaks up the read requests, zeroing out the unmapped pages.
>
> Dave
Okay. I'll just make issue a more general warning like:
WARNING: All unused vmemmap page structures are excluded from this dump.
This will cause failures in any command that accesses page
structures of pages that are not included in the dump. This
is particularly likely when using several options of the
kmem command.
If that looks reasonable to you.
Dave, the excluding vmemmap obviously affect a user's investigation,
is this really acceptable for you ?
There is no chance to re-capture the same dump image,
I think we should be more carefully about filtering out
since it's an irreversible change.
How many users want to get such a broken dump image even if
it could be gotten faster ?
Why we capture dump images, it's for analyzing, of course.
At least, we should supply alternatives to the affected commands.
Thanks
Atsushi Kumagai
-Cliff
>
> >
> > -Cliff
> >
> > >
> > > It might be better to just generally warn that results may be
unpredictable
> > > because any command that accesses page structures, such as kmem, may
fail.
> > > By listing specific options, it makes it sound like that they are the
only
> > > ones, so it may be better to leave it open-ended.
> > >
> > > Dave
> >
> > --
> > Cliff Wickman
> > SGI
> > cpw(a)sgi.com
> > (651) 683-3824
> >
--
Cliff Wickman
SGI
cpw(a)sgi.com
(651) 683-3824
--
Crash-utility mailing list
Crash-utility(a)redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility