Re: [Crash-utility] kmem: WARNING: cannot find mem_map page for address

Tuesday, 18 December 2012

----- Original Message -----
...
 Hi Dave,

 On 12/17/12 11:23, Dave Anderson wrote:
 >>> Right -- I would never expect error() to be called while inside
 >>> an open_tmpfile() operation.  Normally the behind-the-scenes data
 >>> is parsed, and if anything is to be displayed while open_tmpfile()
 >>> is still in play, it would be fprint()'ed using pc->saved_fp.
 >>
 >> I think the aesthetically pleasing solution is an
"i_am_playing_with_tmpfile()"
 >> call that says it isn't closed and crash functions shouldn't be using
it.
 >> Plus a parallel "i_am_done_with_tmpfile()" that gets implied by
"close_tmpfile()".
 >> I can supply a patch, if you like.  Probably with less verbose function names.
 > 
 > If pc->tmpfile is non-NULL, then open_tmpfile() is in use.  What would be
 > the purpose of the extra functions?

 It would be to allow the client code that is processing that temp file to emit
 warning/info messages without disrupting the reading of that file pointer.
 To me, that doesn't seem unreasonable.  You run some code that emits output
 to a temp file and you reprocess those data.  You surely do not want such
 messages showing up in the file you are re-processing.  And you cannot
 call close_tmpfile() because it calls ftruncate().

 So, what is your recommendation for how to reprocess diverted output
 wherein you might occasionally want to say something during that
 reprocessing?

 Three solutions come to mind:

 1. Juggle file pointers before and after the __error() function call
 (please say, "No.") 
No.

...
 2. Create my own temporary file and fiddle the global "fp"
and "pc" state so it
     gets used while I am gathering data and crash code doesn't know about it later.
     (I insist the answer must be, "No." because there is too much fiddling
with
     intricate crash state.) 
No.

...
 3. These two functions that I am suggesting:

 void
 resume_tmpfile(void)
 {
 	int ret ATTRIBUTE_UNUSED;

         if (pc->tmpfile)
                 error(FATAL, "recursive temporary file usage\n");

 	if (!pc->tmp_fp)
         	error(FATAL, "temporary file not ready\n");

 	rewind(pc->tmp_fp);
 	pc->tmpfile = pc->tmp_fp;
 	pc->saved_fp = fp;
 	fp = pc->tmpfile;
 }

 void
 sequester_tmpfile(void)
 {
 	int ret ATTRIBUTE_UNUSED;

 	if (pc->tmpfile) {
 		fflush(pc->tmpfile);
 		rewind(pc->tmpfile);
 		pc->tmpfile = NULL;
 		fp = pc->saved_fp;
 	} else
 		error(FATAL, "trying to sequester an unopened temporary file\n");
 } 
And no...

When open_tmpfile() is in play and you want to print something, you can
always use fprintf(pc->saved_fp, ...) as is done everywhere now.

That being said, if you truly desire to use error() during an open_tmpfile()
operation, then that anomoly should be handled in the error() function.

So, if error() is called during open_tmpfile(), i.e., then the message should
be displayed as it is done now, which is to pc->stdpipe (i.e., the current 
more/less scroller if it is in effect), or to stdout if not:

        if (pc->stdpipe) {
                fprintf(pc->stdpipe, "%s%s%s %s%s",
                        new_line ? "\n" : "",
                        type == CONT ? spacebuf : pc->curcmd,
                        type == CONT ? " " : ":",
                        type == WARNING ? "WARNING: " :
                        type == NOTE ? "NOTE: " : "",
                        buf);
                fflush(pc->stdpipe);
        } else {
                fprintf(stdout, "%s%s%s %s%s",
                        new_line || end_of_line ? "\n" : "",
                        type == WARNING ? "WARNING" :
                        type == NOTE ? "NOTE" :
                        type == CONT ? spacebuf : pc->curcmd,
                        type == CONT ? " " : ":",
                        buf, end_of_line ? "\n" : "");
                fflush(stdout);
        }

and if the output is currently being redirected to a file or to a pipe,
then it is also issued to those end-points here:

        if ((fp != stdout) && (fp != pc->stdpipe)) {
                fprintf(fp, "%s%s%s %s", new_line ? "\n" :
"",
                        type == WARNING ? "WARNING" :
                        type == NOTE ? "NOTE" :
                        type == CONT ? spacebuf : pc->curcmd,
                        type == CONT ? " " : ":",
                        buf);
                fflush(fp);
        }

It's that "duplication" above that you're seeing.

And I am simply suggesting that the if statement above should be:

        if ((fp != stdout) && (fp != pc->stdpipe) && (fp !=
pc->tmpfile)) {

because you obviously don't want the message intermingled with your open_tmpfile()
output.

...

 I sequester the file after doing the data gathering and resume it
 after I am done reprocessing it.  It might be worth putting in a little jig
 to ensure that open/close_tmpfile work reasonably, too.  (I would guess
 that either would cancel the sequestration.)

 >>> I'm not sure, other than it doesn't seem to be able to find
ffffea001bb1d1e8
 >>
 >> I was able to figure that out.  I also printed out the "kmem -v" table
and
 >> sorted the result.  The result with "kmem -n"
 >>
 >> [...]
 >> 66  ffff88087fffa420  ffffea0000000000  ffffea0007380000  2162688
 >> 67  ffff88087fffa430  ffffea0000000000  ffffea0007540000  2195456
 >> 132608  ffff88083c9bdb98  ffff88083c9bdd98  ffff8840e49bdd98  4345298944
 >> 132609  ffff88083c9bdba8  ffff88083c9796c0  ffff8840e4b396c0  4345331712
 >> ;...]
 >>
 >> viz. it ain't there.  Which is quite interesting, because if the lustre
 >> cluster file system structure "cfs_trace_data" actually pointed off
into
 >> unmapped memory, it would have fallen over long, long before the point
 >> where it did fall over.
 > 
 > I don't see the vmemmap range in the "kmem -v" output.  It is mapped
 > kernel memory, but AFAIK it's not kept in the kernel's "vmlist"
list.
 > Do you see that range in your "kmem -v" output?

 Also no.  "kmem -v" and "kmem -n" both show the same memory mappings
 (as best as _my_ memory serves, that is.  For certain, neither has a mapping
 for 0xffffea001bb1d1e8.)

 > OK so you say you cannot get the mappings for it, but what
 > does "vtop 0xffffea001bb1d1e8" show?

 This:

 > crash> vtop 0xffffea001bb1d1e8
 > VIRTUAL           PHYSICAL
 > ffffea001bb1d1e8  879b1d1e8
 > 
 > PML4 DIRECTORY: ffffffff817e7000
 > PAGE DIRECTORY: 87fdf7067
 >    PUD: 87fdf7000 => 87fdf6067
 >    PMD: 87fdf66e8 => 8000000879a001e3
 >   PAGE: 879a00000  (2MB)
 > 
 >       PTE         PHYSICAL   FLAGS
 > 8000000879a001e3  879a00000  (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX)

 But given:

 > Sorry -- that's irrelevant.  You want to access the physical
 > memory that the odd vmemmap page address references (not the
 > physical page behind the page structure itself).

 Exactly right.  I need to be able to see the binary bits for that page so I can
 pull them in and write them back out to a file of just those bits.  From there,
 we'll be formatting a text file showing the lustre trace log.

 Thank you so much!  Regards, Bruce 
Right... seems like it should be such a simple thing to do...   :-(

I don't understand what's going on, but I'm presuming that even if the
vmemmap-type address doesn't fit into the "advertised" vmemmap range, 
that the kernel's __page_to_pfn() macro should still work to get the
pfn represented by the page:

 #elif defined(CONFIG_SPARSEMEM)
 /*
  * Note: section's mem_map is encorded to reflect its start_pfn.
  * section[i].section_mem_map == mem_map's address - start_pfn;
  */
 #define __page_to_pfn(pg)                                       \
 ({      const struct page *__pg = (pg);                         \
         int __sec = page_to_section(__pg);                      \
         (unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \
 })

Maybe you could play around with emulating that macro w/crash, and see what
comes up?

Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] kmem: WARNING: cannot find mem_map page for address