Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleCachingIteratorAggregate doesn't handle nested traversals #49

Closed
ghost opened this issue Dec 27, 2023 · 17 comments
Closed

SimpleCachingIteratorAggregate doesn't handle nested traversals #49

ghost opened this issue Dec 27, 2023 · 17 comments
Labels

Comments

@ghost
Copy link

ghost commented Dec 27, 2023

Steps required to reproduce the problem

This test shows the expected behavior and it fails on the last assert, since the actual result is 3 instead of 6.

final class SimpleCachingIteratorTest extends TestCase
{
    public function testSimpleCachingIteratorInnerTraversals(): void
    {
        $iter = new SimpleCachingIteratorAggregate(new ArrayIterator([1, 2, 3, 4, 5, 6]));

        $total = 0;

        foreach ($iter as $key => $item) {
            if ($key === 2) {
                $firstTotal = $this->traverseAll($iter);

                self::assertSame(6, $firstTotal);
            }

            ++$total;
        }

        self::assertSame(6, $total);
    }

    private function traverseAll(iterable $iter): int
    {
        $total = 0;

        foreach ($iter as $item) {
            ++$total;
        }

        return $total;
    }
}

Expected Result

It is expected that one may iterate through iterator until the end regardless of whether inner iterations happen or not,

Actual Result

Test fails on final self::assertSame(6, $total);, because outer loop iterated only over 3 items instead of 6.

@drupol
Copy link
Contributor

drupol commented Dec 27, 2023

Thanks for reporting the issue, I'll try to have a look tonight. Have you already made some checks to try to find where the issue comes from?

@ghost
Copy link
Author

ghost commented Dec 27, 2023

I guess the inner loop iterates through the same iterator until the end, thereby leaving outer loop without remaining results

@drupol
Copy link
Contributor

drupol commented Dec 28, 2023

A possible solution to this (but I don't like it) would be to consume the cache in the constructor, as such:

    public function __construct(Iterator $iterator, bool $cacheNow = false)
    {
        $this->iterator = new CachingIterator(
            $iterator,
            CachingIterator::FULL_CACHE
        );

        if ($cacheNow) {
            iterator_to_array($this->iterator);
        }
    }

Could you let me know if this would fix the issue for you?

Eventually, we could add a method in the class:

    public function consume(): void {
        iterator_to_array($this->iterator);
    }

Thx

@ghost
Copy link
Author

ghost commented Dec 28, 2023

Yes, though if cached right away we'll lose lazy nature of given collection. For my specific case I can't really use it like this. Anyway, adding $cacheNow flag doesn't seem to be a robust solution, but rather some obscure appendix.

Maybe it would make sense to keep track of iterated items in the getIterator method and yield remaining not yet iterated elements. I'll try to think up a solution

@rela589n
Copy link

rela589n commented Dec 28, 2023

Hi there!

Actually I have found the solution. Could you check it out?

The solution comes to keeping iteration map in the stack trace so that cached state may be quickly compared with items having already been iterated over. Anything extra found in the cache is the result of side effects coming from the inner loop. These values must be yielded as well.

    /** @return Generator<array-key|TKey, T> */
    public function getIterator(): Generator
    {
        $iterationMap = [];
        $initiallyCachedItems = $this->iterator->getCache();

        yield from $initiallyCachedItems;

        if ($this->iterator->hasNext()) {
            $iterationMap = array_fill_keys(array_keys($initiallyCachedItems), true);
        }

        while ($this->iterator->hasNext()) {
            $this->iterator->next();

            $iterationMap[$key = $this->iterator->key()] = true;

            yield $key => $this->iterator->current();

            $latestCachedItems = $this->iterator->getCache();

            if (count($latestCachedItems) === count($iterationMap)) {
                // No extra items were added to the cache since last yield iteration

                continue;
            }

            // The cache pointer has shifted since last iteration (code inside loop traversed part of cache iterator),
            // hence these items must be yielded as well

            $extraCachedItems = array_diff_key($latestCachedItems, $iterationMap);

            $iterationMap += array_fill_keys(array_keys($extraCachedItems), true);

            yield from $extraCachedItems;
        }
    }

Since array_diff is expensive operation, it should not to be executed on every iteration step. The code without this count($cachedItems) === count($iterationMap) condition will surely work correctly, though in basic case (no inner loop side-effects) if it is omitted, performance will significantly decrease (big O(n^2)). Thus, there's an early continue statement in case if cached items count has never changed. Hence, current approach still maintains big O(n) for the basic case, O(n) for average case, and O (n^2) for the worst case - when inner loop breaks after each next item of the outer loop (though, having such inner loop executed for each outer iteration is already O (n^2), therefore I guess it is fine since final time complexity is unchanged).

Also, regarding memory usage, no extra memory is allocated once iteration has completed due to if ($this->iterator->hasNext() statement.

Thank you for your time and have a nice evening.

@rela589n
Copy link

Have you had a chance to check it out?
Should I submit a pull request?

@drupol
Copy link
Contributor

drupol commented Dec 30, 2023

Hello there,

I don't have time to test and give some feedback. Don't worry, as soon as I'll get back on a computer I will check it out.

By the way, I wrote you an email early December, did you get it?

@rela589n
Copy link

By the way, I wrote you an email early December, did you get it?

I'm not sure regarding email, since I don't see anything in my mailbox. BTW, where could've you get my address?
Or are you talking regarding loophp/collection#314 discussion?

Anyway, sorry for disappearing from my side.

@drupol
Copy link
Contributor

drupol commented Dec 30, 2023

By the way, I wrote you an email early December, did you get it?

I'm not sure regarding email, since I don't see anything in my mailbox. BTW, where could've you get my address? Or are you talking regarding loophp/collection#314 discussion?

Anyway, sorry for disappearing from my side.

I've emailed you on the 6th of December, at 9.34, on the address: z...k@g....com

Basically, I was worried to not have any more feedback, I was wondering if it was me who did something wrong.

@drupol
Copy link
Contributor

drupol commented Dec 30, 2023

Hello,

I quickly checked your proposal, but I'm afraid this is not going to make it.

  • $iterationMap[$key = $this->iterator->key()] = true; this is going to fail as soon as keys are different from string|int.
  • At each while-loop the count function is used twice... this is going to be a nightmare, performance wise.
  • Usage of array_diff_key, array_fill_keys, array_keys... also not a good idea performance wise.

In overall, I have the feeling that this proposal is going to fix an issue that pretty much nobody will ever encounter. IMHO, the best option we have here is to update the documentation and mention that it doesn't work with nested traversals, unless we use the method I proposed in #49 (comment)

@rela589n
Copy link

Thanks for your response!

$iterationMap[$key = $this->iterator->key()] = true; this is going to fail as soon as keys are different from string|int.

Isn't purpose of SimpleCachingIteratorAggregate to be used only with int|string keys, since CachingIterator::getCache returns an array, and keys of an array can only be int|string?

@drupol
Copy link
Contributor

drupol commented Jan 2, 2024

Isn't purpose of SimpleCachingIteratorAggregate to be used only with int|string keys, since CachingIterator::getCache returns an array, and keys of an array can only be int|string?

Absolutely, you're correct. The SimpleCachingIteratorAggregate is indeed intended primarily for use with int|string keys, given that CachingIterator::getCache yields an array which inherently supports only these key types.

My experience in designing iterators for a broad spectrum of types momentarily diverted my attention. Thank you for pointing out the oversight in my previous response.

@rela589n
Copy link

rela589n commented Jan 6, 2024

Regarding the usage of array_diff_key, array_fill_keys, array_keys, I guess there may be an alternative solution that requires no additional memory. It was not obvious at the outset, but it floated up after your previous response.

Instead of keeping track of all previously iterated items, it is possible to keep track of last yielded key only (since keys of array are unique). Hence, if last yielded key is not the same as last key in cached items array, inner iterator has already been iterated over and remaining items have to be yielded from the cache.

Please, let me know your thoughts on this approach.

@drupol
Copy link
Contributor

drupol commented Jan 6, 2024

Keeping track of the last item is definitely a much better solution... going from $\mathcal{O}(n)$ to $\mathcal{O}(1)$, of course !!

Copy link

Since this issue has not had any activity within the last 5 days, I have marked it as stale.
I will close it if no further activity occurs within the next 5 days.

@rela589n
Copy link

Hi there again. Could you check out #50 ?

@github-actions github-actions bot removed the stale label Jan 15, 2024
Copy link

Since this issue has not had any activity within the last 5 days, I have marked it as stale.
I will close it if no further activity occurs within the next 5 days.

@github-actions github-actions bot added the stale label Jan 21, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants