New subject: RFC: string search for crash

Wednesday, 2 February 2011

----- Original Message -----
...
 On Wed, 2011-02-02 at 14:49 +0000, Dave Anderson wrote:
 >
 > ----- Original Message -----
 > > Here is a patch to add string search to crash-5.1.1.
 > >
 > > It requires the my previous patch for the parse_line routine.
 > >
 > > It searches for the specified strings, and for half or more of the
 > > string appearing at the start and end of search blocks (usually pages)
 > > in case the string spans a page boundary.
 > >

 > First, this is a really nifty feature...

 Thanks.

 > Second, I appreciate that you created a new string_search() function
 > instead of attempting to merge it into the existing search()
 > function.

 Way too ugly otherwise.

 >
 > Third, given that you're searching for specific string, why not show
 > the actual byte-aligned address as the starting point, and then just
 > display the string contents until the next non-ASCII character, or some
 > other delimiter like a 48 byte limit or whatever?

 That's actually the easier thing to do and just a preference decision.
 I like looking at aligned addresses for using with subsequent rd
 commands. And it's also sometimes useful to see some pre-string context.
 But rounding to the previous long aligned address is really random with
 respect to pre-string context. There's probably a better way to provide
 that. 
Actually, rounding back down to the previous long-aligned address might
be reasonable.

...

 > I understand that the page-crossing issue is a PITA, but with respect
 > to a user searching memory for strings, the page-crossing issue is
 > seemingly irrelevant. In other words, why this:
 >
 > > ffff880125f72000: ger_event.......................................
 >
 > instead of displaying something like this:
 >
 >   ffff880125f71ffd: danger_event
 >
 > So I guess I'm just wondering why the "dan" at the end of the page
 > cannot be displayed? It just doesn't seem user-friendly to force
 > the user to understand the half-of-the-string-at-a-page-boundary
 > business. Since you do recognize that string exists and crosses
 > a page boundary, shouldn't you be able to display the first part?

 I started working on my search stuff directly on dumpfiles, which meant
 I was searching through a collection of physical pages. When searching
 physical pages, there is no reason to believe that the end of page N has
 anything logically to do with the beginning of page N+1. So looking
 across to the next page for the continuation of a string didn't seem
 strictly correct or useful. (This was never even a consideration when
 searching for a long or int; they don't cross page boundaries.) 
Right, certainly it doesn't make sense with physical pages, and to a
somewhat lesser extent, with unity-mapped pages.  But with user virtual,
the mapped kernel virtual region (x86_64 and ia64), and vmalloc addresses,
strings would cross page boundaries.  So I wonder whether it's worth the
effort to do something like I suggest since you always know what kind of
memory is being searched?  I know, I know, it's easy for *me* to say...  

But that leads to another issue.  As I mentioned before, the -p
implementation I've got uses its own function, so to marry that
with the string search capability kind of mucks with your patch.
I'm wondering whether your string search function could be called
from both the virtual and physical search functions near the bottom
where the page is searched?  Just something to think about. 

...

 If I'm searching by virtual address, then N+1 does follow N, but I don't
 know that at the time I'm working on page N. The current search
 strategy is "choose a page, load the page, search the page". So if I
 find a promising first part of a string at the end of a page, the search
 loop would need a memory to carry my position in the string(s) across to
 the next page, if I can determine that the next page follows
 contiguously in the current search type. Not impossible, but not
 trivial, I think. Possibly easier to pre-read the following page to the
 length of the longest search string onto the end of an extended page
 buffer and then search for the full string to the end of that extended
 page buffer before loading the next page (and part of the one after
 that). But you still have to deal with the possiblity that the "next"
 page isn't contiguous to the one you're working on. 
Right, perhaps the "last-page-contents-and-its-address" could always be saved?
And you could always attempt to read the next page if you see the beginning
of a search string.

...

 When you called out the example above, a first-of-page hit on
 "ger_event" when searching for the string "danger", my first thought
was
 that you had found a bug. The "dan" at the end of the previous page
 should also have met the criteria for half of the string "danger", and
 showed up as an end-of-page hit. But (whew!), here's what's in memory
 there:

 crash-5.1.1str> rd ffff880125f71ff0 8
 ffff880125f71ff0: 000000000ed7ab73 676972745f64656c s.......led_trig
 ffff880125f72000: 6e6576655f726567 0000000000000074 ger_event.......
 ffff880125f72010: 0000000000000000 0000000000000000 ................
 ffff880125f72020: 0000000000000000 0000000000000000 ................

 Which serves as a reminder that the "part of a string" match is just
 there to suggest possibilities that might turn out to be false
 positives.

 > Lastly, I'm still planning to add the remaining "search" command
 > updates for the -KVM flags and the "missing" x86_64 segment bug,
 > and so I may need you to re-work this patch to fit into an updated
 > version of search(). I've been delayed getting crash-5.1.2 out
 > the door because of all of the recent patch postings, and it may
 > be worth waiting to get this piece in until 5.1.3 -- is that OK
 > with you?

 No rush. 
OK good.

Thanks,
  Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Crash-utility] RFC: string search for crash