On 06/29/2018 11:29 AM, David Mair wrote:
On 06/26/2018 08:34 AM, Jeremy Harris wrote:
> On 06/26/2018 03:29 PM, David Wysochanski wrote:
>> On Tue, 2018-06-26 at 09:21 -0400, Dave Anderson wrote:
>>> Yes, by default all list entries encountered are put in the built-in
>>> hash queue, specifically for the purpose of determining whether there
>>> are duplicate entries. So if it's still running, it hasn't found
any.
>>>
>>> To avoid the use of the hashing feature, try entering "set hash
off"
>>> before kicking off the command. But of course if it finds any, it
>>> will loop forever.
>>>
>>
>> Ah ok yeah I forgot about the built-in list loop detection!
>
> For a storage-less method of list loop-detection: run two walkers
> down the list, advancing two versus one elements. If you ever
> match the same element location after starting, you have a loop.
I experimented with this idea in userspace C++ yesterday to explore the
effect of different step sizes and ratios.
...
The difference is because of how much waste the long step walker does
within the loop before it has a short step walker to find. You can try
other values but it pans out in the implied directions. The lowest
overhead to reach the point where a loop is even detectable is where the
long step and short step have only a very small difference between them
and when the ratio of long step to short step is as close as possible to
1 and such that the small step has only a small chance of being larger
than the left margin (which you can't know until you find the loop), so
small but similar values for each are the best compromise. Taking
million node lists with a left margin of five hundred thousand the
overheads for a few cases of the ratio of long step to short step become:
Ratio Overhead
(long step : short step)
2:1 1.50M
4:1 2.50M
10:1 55.00M
3:2 1.25M
5:4 1.125M
99:98 0.505M
Corrections for two errors, both walkers have to do the left margin so
the total steps for two walkers can't be less than 1M for the described
loop and at 10:1 the overhead isn't as big as above:
10:1 5.500M
99:98 1.005M
...
...or to put it another way. The wasted time spent by the long step
walker iterating the loop when the short step walker has yet to reach
the loop is:
left margin * ((big step - short step) / short step)
Meaning for the same ratios in the same loop case the big walker wasted
effort in the loop is:
Ratio Overhead (additional steps by long walker)
(long step : short step)
2:1 500k
4:1 1.50M
10:1 4.50M
3:2 250k
5:4 125k
99:98 5102
Then, when both walkers are withing the loop the long step walker is
guaranteed to walk the loop at least once. It's already stepping the
loop before the short step walker can reach the loop. Meaning the
Overhead (additional step...) cases above are also "saved effort" when
the loop is longer than them because they are progress into the required
loop completion by the long step walker before it can find the short
step walker. So, for very long loops very high ratios of long step to
short step with very small values of short step are fastest for finding
the match when both walkers are within the loop. For very short loops it
probably doesn't make much difference but if the match is only being
checked for in one of the walkers then it's probably best if it's the
long step walker and that the short step walker is as close as possible
to one (minimizes how far into the loop or number of iterations of thew
loop the short step walker can be at when it is found). Since the size
of the loop isn't guaranteed to be long or short for each of a set of
cases the optimal approach is probably to favor the smallest size for
the short step and one greater than that for the long step.
--
David.