----- Original Message -----
On Tue, 2018-06-26 at 11:59 -0400, Dave Anderson wrote:
>
> ----- Original Message -----
> > On Tue, 2018-06-26 at 11:27 -0400, Dave Anderson wrote:
> > >
> > > ----- Original Message -----
> > > > On Tue, 2018-06-26 at 15:34 +0100, Jeremy Harris wrote:
> > > > > On 06/26/2018 03:29 PM, David Wysochanski wrote:
> > > > > > On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
> > > > > > > Yes, by default all list entries encountered are put
in the
> > > > > > > built-in
> > > > > > > hash queue, specifically for the purpose of
determining whether
> > > > > > > there
> > > > > > > are duplicate entries. So if it's still running,
it hasn't
> > > > > > > found
> > > > > > > any.
> > > > > > >
> > > > > > > To avoid the use of the hashing feature, try entering
"set hash
> > > > > > > off"
> > > > > > > before kicking off the command. But of course if it
finds any,
> > > > > > > it
> > > > > > > will loop forever.
> > > > > > >
> > > > > >
> > > > > > Ah ok yeah I forgot about the built-in list loop
detection!
> > > > >
> > > > > For a storage-less method of list loop-detection: run two
walkers
> > > > > down the list, advancing two versus one elements. If you ever
> > > > > match the same element location after starting, you have a
loop.
> > > >
> > > > I agree some algorithm [1] without a hash table may be better
> > > > especially for larger lists.
> > >
> > > I'll await your patch...
> > >
> >
> > Do you see any advantage to keeping the hash table for loop detection
> > or would you accept a patch that removes it completely in favor of a
> > another algorithm?
>
> For maintenance sake, it's probably worth keeping the hash queue option
> in place, primarily since there are a few dozen other internal facilities
> besides the user "list" command that use the do_list() function.
>
Ok. I was actually thinking about replacing all the callers of
hq_enter() with a new algorithm. There are 32 of them so it would be a
non-trivial patch though longer term may be simpler maintenance.
Probably only now it matters more since it's much more common to have
vmcores 100's of GB and much larger lists but maybe some of these it
would not be useful replacement.
I haven't read far enough whether there are instances where dumping out
the hash table would be more valuable or if it's solely for loop
detection.
Also it's unclear to me that there would be an even distribution into
'buckets' - HQ_INDEX() does not do much other than a simple mod so it's
possible the slowdown I saw was due to uneven distribution of the hash
function and we could improve that there. Still it would be good to
have an option of no memory usage for larger lists.
Could be. That's what the "crash --hash <count>" command line option
may alleviate. But remember that we changed the default number of
hash queue heads to 32768.
FWIW, the memory usage does not seem to be freed or at least I could
not find it yet.
Yep, you're right. I was thinking that the hq_close() done by restore_sanity()
would free the memory even if you end the command with Ctrl-c, but it does
keep it around for the next user.
Dave