For loops – Python iteration under the hood
Discover how iteration works under the hood and how you can tap into this for your own code
Here is a very simple for loop:
names = 'Jo', 'Joe', 'Joseph', 'Johan', 'Johanna', 'Josephine', 'Joey'
for name in names:
print(name.upper())
Output:
JO
JOE
JOSEPH
JOHAN
JOHANNA
JOSEPHINE
JOEY
If this looks strange to you, first read my article on for loops then come back here.
Here is another way of iterating over the list of names:
name_iterator = iter(names)
while True:
try:
name = next(name_iterator)
except StopIteration:
break
print(name.upper())
This gives the same output as before. This is because it does exactly the same thing as a for loop. This is how a for loop works under the hood.
Let’s break this down.
The ‘names’ list, like any Python list, is an ‘iterable’. It is something which we could start to iterate over if we wanted to. The key word here is ‘start’.
Imagine a single book. It is iterable – one or more people could choose to read it page by page. Pages are ordered. There is a first page and a last page. And for each page we know which page is next. The only exception is the last page, which doesn’t have a next page but I’ll come back to that soon.
Say I want to read the book from start to end, in page order. In Python terms, I want to iterate over the pages. For this I need two things:
- The book
- Some way to remember what page I’m on
The combination of these two are an ‘iterator’ – something which is currently being read, in order, usually until the very last item.
If someone else wants to read the same book in parallel, they would need their own bookmark. We couldn’t share the bookmark because we want to be able to read at different speeds.
The ‘iter’ function creates an iterator from an iterable. In the case of the book, it would use the book and a bookmark to do this. In the case of our names list it uses an internal index to remember which element was last returned.
Now that we’ve got our book iterator (book plus bookmark) how do we use it?
To get the first page we call the ‘next’ function. This will ask the book iterator for the next page, starting with the first page, and return it to us
In the code above, the line
name = next(name_iterator)
returns the next name, starting with the first name, then the second, etc.
We keep reading our book, page after page. But how do we know when we’ve reached the last page?
When you ask for the next element (page, name, etc) and the iterator is exhausted (when the last element has already been returned), Python raises an exception. If we didn’t handle this with try/except it would give us an error message.
Why doesn’t it just tell us, why do we need an exception? We would need a special value for this, to mean ‘nothing left’. But what if that special value was one of the items in our list? Then the iteration would stop too early because we mistakenly thought we were done.
Instead Python raises a ‘StopIteration’ exception. We just need to be ready to handle this when using an iterator. Like this:
try:
name = next(name_iterator)
except StopIteration:
break
What can you do with this? How is this useful?
Sometimes all you want is the first element of an iterator. In that case, just use next()
Or you can consume two iterators in parallel:
names = 'Jo', 'Joe', 'Joseph', 'Johan', 'Johanna', 'Josephine', 'Joey'
ages = 25, 26, 27, 28, 29, 30
name_iterator = iter(names)
age_iterator = iter(ages)
while True:
try:
name = next(name_iterator)
age = next(age_iterator)
except StopIteration:
break
print(name.upper(), age)
JO 25
JOE 26
JOSEPH 27
JOHAN 28
JOHANNA 29
JOSEPHINE 30
Note that this is an example to illustrate using next() on two iterators. A better way to get the same result is zip():
for name, age in zip(names, ages):
print(name.upper(), age)
This gives the same output. But zip() always iterates through all iterators at the same speed. Maybe for every name we need to progress through two ages.
Say we’ve got two lists, both already sorted. We want to merge them into a new, also sorted, list. We would need to iterate over both lists, but can’t use zip(). We may need the first 10 elements of list A followed by the first 5 elements of list B, etc.
I may show the Python code to solve this in a later article. Or take this on as a Python challenge yourself 🙂
In short: When you can, use a for loop. When you must, create your own iterator and use next/StopIterator