Chains, Buckets, and Iteration

Written by Matt Tolman

Published: Sep. 20, 2022

Estimated reading time: 9 min read

Back in high school, I worked on robots with a few other students. When we built our robots, we had motors and wheels that were a foot or two apart. To transfer the energy from our motor to our wheels, we used chains.

Diagram of a robot where the wheel and morotr are separated by a large distance and connected by a chain.

As the motor spun, it iterated through the links in the chain sequentially. The motor never skipped a link in the chain, it always itereated in order.

When we wanted the robot to go backwards, the motor whould change its direction of iteration. Instead of going clockwise through the chain, it would go counter-clockwise. Even then, it never skipped a link.

When working on our robots, we often had buckets with parts or other odds and ends.

Whenever we needed a part, we'd go to a bucket and see if it was there. Usually, we knew which bucket we needed (e.g.g fourth one from the right) and we'd go straight to that bucket. In those cases, we didn't need to search around.

When we did search, we had a lot of flexibility. There isn't an inherit property of buckets which restricts how we could iterate through them. We could (and did) start our search anywhere, skip a few buckets we didn't think had it, go back when we realized no other bucket had it, etc.

In contrast, due to the nature of how sprockets interact with a chain of links, there is a restriction on how the motor can iterate through the chain. It can gto forwards or backwards, but it must go in order. It cannot skip parts of the chain.

In computer science terminology, the buckets are like an array and the chain is like a doubly linked list. With buckets (and arrays), you can go forwards and backwards, but you can also jump around. Meanwhile, with the linked list you can only go forwards (and sometimes backwards). There is no arbitrary jumping.

Why does it matter if something is array-like or list-like? Because it dictates how code can be written and how well code performs.

These types of constraints appear in not only data structures, but also in abstractions around data structures. For example, consider JavaScript's iterator. It is an abstraction around an underlying data source.

JavaScript iterators have a "next" method which returns both the next item in the iterator and whether the iteratir so finished[1]. While on the surface this seems perfectly fine, it does have an issue. If I want to get the 5^th item from an iterator, I have to get the 1^st item, and then the 2^nd item, and then the 3^rd item, and then the 4^th item, and then finally I can get the 5^th item. In short, I have to go sequentially through the iterator to get the item that I wanted. This is like the chain where I have to go through sequentially instead of jumping around.

The lack of jumping around can cause issues when developing libraries (speaking from experience[2]). The inability to jump around can cause issues with performance on large collections. If a program needs the 1,200,000^th item, why wast time iterating through 1,199,999 items? Arrays allow users (including other programs) to jump to the data they want. Iterators require programs to go sequentially through the data until it finds what it's looking for. For large data sets, this can become expensive. This can get more frustrating when the data structure underneath an abstraction has a better way to quickly retrieve the data (e.g. a binary search tree), but the abstraction forces a slower method.

Another problem arises when iterators don't have an end. Unlike most data structures, iterators allow for a special type of sequence called an "infinite sequence." An infinite sequence sounds like what it is, an infinite number of values. If you were to try to convert one of these sequences into an array, you'd get an infinite loop (and use an infinite amount of memory). The simplest way to make an infinite iterator in JavaScript is to have the "next" function never report that it's done. Unfortunately, there is no external indicator for whether an iterator is infinite. The programmer just has to "know" if something is infinite or not (or at least if it can be). And, that programmer cannot be the library developer. It has to be the application developer - who is probably using a collection of libraries from different developers. If the application developer is wrong, then there will be an infinite loop. And if the application developer doesn't understand what is going on, then the blame will most likely fall on one or more library developers.

These limitations often make iterators less suitable than arrays, which is partly why arrays are still very prevelant (especially resizable arrays).

That said, we can still improve iterators and fix some of the issues. These solutions come from the C++ community, which has five types of iterators[3].

Output
Input
Forward
Bidirectional
Random Access

Output iterators are for pushing values somewhere, such as pushing strings to stdout. This is similar to publishers in a pub/sub context.

Input iterators are for receiving values from somewhere, such as receiving keystrokes from the user. They are similar to subscribers in a pub/sub context. Input iterators can only get the next value, and they can only get values once. They should not be reused or forked, similar to Java streams[4].

Forward iterators are for forward access to the next value. It's like iterating over a singly-linked list where you can only go in one direction. However, unlike input iterators, a forward iterator can be restarted from the beginning to be reused. This makes forward iterators like the sand in an hourglass. The sand only goes from the top to the bottom (one direction). However, you can flip the hour glass over and restart from the beginning.

Bidirectional iterators are like the chain on the robot. They can go forwards and backwards, but still have to iterate sequentially. This is similar to iterating over a doubly linked list.

Diagram of a doubly-linked list where nodes are linked to the previous and next nodes.

Random Access iterators are the most powerful type of iterator in C++. THey allow bidirectional iteration (forwards and backwards), but they also allow skipping around or accessing elements by a key - which is very powerful. Does a program want the 1,200,000^th element? No problem! Just skip directly to it, no need to evaluate the first 1,199,999 elements. Now the program wants the 100^th element? Sure thing! Just skip to it.

Furthermore, unlike all the other iterators, a random access iterator can calculate the distance to any other iterator's position (on the same collection). For library implementers, this is handy since it allows quickly detcting sizes for partitioning and counting when given the current position and end of the sequence (or subsequence). With these tools, more data structures can be efficiently abstracted over with iterators.

But C++ isn't done. Remember the issue with infinite squences? C++ addresses this too by having iterator ranges.

Instead of having "is this done?" and "is this not done?", in C++ you can pass a pair of iterators to define a range. This pair has a start and end, so the library author knows that there is a fixed size (though they may not know what the size is). The range mechanic also allows for a lot more flexibility from the application side as well. Only want to sort the first 100 items? Well, pass a range to std::sort that encompases only those 100 items.

With unranged iterators (like JavaScript's) we can emulate C++ ranges with functions like "drop"[5] and "take"[6] from Clojure. However, the disadvantage of these emulation functions is that those ranges are implicit, meaning developers till need to know if the iterator they're working with is bounded or not. With C++, ranges are explicit, which allows libraries to guarantee that iterators are bounded.

Finally, there's one last trick C++ developers have, and that is the ability to change algorithms based on the type of the incoming iterator. This allows library authors to create a library which works with any iterator, but which also takes advantage of more performant algorithsm when available - something that we don't have in JavaScript.

For example, forward and bidirectional iterators don't know the distance to other iterators. This means that to count a range's size, we have to iterate over every element and count them one by one, which is $O(n)$ . However, when given a random access iterator, we can calculate the size instead, which could be as fast as $O(1)$ (e.g. subtract indexes for an array).

By allowing libraries to adapt their algorithms based on the kind of incoming iterator, library authors are able to give adaptive performance. This allows them to give the fastest performance available inside the constraints of the incoming abstraction (and underlying data structure), rather than giving consistently slow $O(n)$ performance for everything.

JavaScript iterators aren't perfect. But, with some tweakes they could be made much better. The solution isn't to throw out iterators, but to look at what other langauges are doing and then use some of those ideas to build upon what we have.

Bibliography

[1] "Iterators and generators," Mozilla Developer Network. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Iterators_and_Generators (accessed Sep. 20, 2022).
[2] "@tofurama3000/hat-js," NPM. https://www.npmjs.com/package/@tofurama3000/hat-js (accessed Sep. 20, 2022).
[3] "iterator" cplusplus.com. https://cplusplus.com/reference/iterator/ (accessed Sep. 20, 2022).
[4] "Interface Stream<T>" Java Platform Standard Ed. 8. https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html (accessed Sep. 20, 2022).
[5] "drop" ClojureDocs. https://clojuredocs.org/clojure.core/drop (accessed Sep. 20, 2022).
[6] "take" ClojureDocs. https://clojuredocs.org/clojure.core/take (accessed Sep. 20, 2022).