Python – Count and learn
Here is a simple problem: Count how many times each character occurs in a string.
And one more thing: Do this using Python.
There are many different solutions to this and each shows us a different aspect of Python. This is an excellent problem to learn more about Python.
Note: The idea for this problem came whilst creating a Python training video comparing the speed of different approaches to common programming tasks. Members of LinkedIn’s Python Developers Community have added many more solutions. Thanks to everyone who has contributed.
General solution – Count individual characters, return a dictionary
def count_characters(some_string):
result = {}
for character in some_string:
result[character] = result[character] + 1
return result
The obvious way to store the result is to use a dictionary. When done, each key/value pair specifies the count (the value) for one of the characters (the key).
We loop over each character in some_string, and increase the count for that character.
Unfortunately it isn’t quite that simple. This gives a ‘KeyError’ for the key ‘S’. To calculate the new total we first need the old total (result[character]). However, we don’t have a value for ‘S’ yet, so this fails.
This is a very common situation when working with dictionaries. How do you deal with non-existing keys?
Python gives us many different ways to handle this. Let’s go through them.
Solution 1 – Dictionary, check list membership
def count_characters(some_string):
result = {}
for character in some_string:
if character in result:
result[character] += 1
else:
result[character] = 1
return result
Here we create a new entry if necessary, by testing for ‘character in result’. This is shorthand for ‘character in result.keys()’.
The first time we process a specific character (e.g. the first ‘S’ or the first ‘e’) we add a new entry in the dictionary and set it to 1. When we process that character again (e.g. the second ‘e’) we increase it.
I have abbreviated “result[character] = result[character] + 1” to “result[character] += 1”. This is an in-place operation. Both versions do exactly the same.
Solution 2 – Dictionary, ditto, using a ternary operator
def count_characters(some_string):
result = {}
for character in some_string:
result[character] = result[character] + 1 \
if character in result \
else 1
return result
(Note: The ternary operator would be clearer if it was on a single line. Unfortunately that doesn’t fit on this web page)
This is the same as the previous version, using a ternary operation.
If you haven’t seen this before, read lines 5 to 7 carefully – as if it is an English sentence.
It looks like it shouldn’t work, but it does. Hopefully you get the syntax from reading the code. If you’re not sure you understand it, start Python and try it out.
Solution 3 – Dictionary, using .get()
def count_characters(some_string):
result = {}
for character in some_string:
result[character] = result.get(character, 0) + 1
return result
Dictionaries have many useful methods, including .get()
<dictionary>.get(key, default) returns dictionary[key] if the key exists, otherwise it returns the default.
result.get(character, 0) returns the current count for that character, if we’ve already started counting it. If it is a new character it returns zero.
Solution 4 – Dictionary, using .setdefault()
def count_characters(some_string):
result = {}
for character in some_string:
result.setdefault(character, 0)
result[character] += 1
return result
Another useful dictionary method, .setdefault()
If the key is not already in the dictionary, <dictionary>.setdefault(key, value) creates a new entry, key:value.
If the key is in the dictionary, .setdefault() does nothing.
Solution 5 – Dictionary, using error handling
def count_characters(some_string):
result = {}
for character in some_string:
try:
result[character] += 1
except KeyError:
result[character] = 1
return result
As you know, we can’t just do “result[character] += 1”. If character isn’t in the dictionary we get a KeyError.
One way to fix this is to catch the KeyError and take appropriate action. This is an example of the “Easier to ask for forgiveness than permission” (EAFP) approach.
In the first version (count01.py) we checked if the character is in the dictionary before trying to access it, using the “Loop before you leap” approach (LBYL).
EAFP is a very common approach in Python. In other languages LBYL is more popular.
Solution 6 – Dictionary, defaultdict
import collections
def count_characters(some_string):
result = collections.defaultdict(int)
for character in some_string:
result[character] += 1
return result
Handling missing entries in a dictionary is such a common pattern that we’ve got a solution in Python’s standard library.
The collections module contains a defaultdict type. This works like a normal Python dictionary, plus we can tell it the value for undefined keys.
When creating a defaultdict we give it a function which creates the default value. Here we give it the ‘int’ function. int() returns an integer with a value of zero.
If you want to start with a different value, you can use a simple lambda function. For instance: collections.defaultdict(lambda: 5). The lambda function always returns a value of 5. Any missing key/value will be given a value of 5.
This code: result = collections.defaultdict(int); print(result[1]) will create a new entry in result, with the key of 1 and a value of zero, then print out zero.
Solution 7 – Dictionary, seeded with zeros
def count_characters(some_string):
result = {}
for character in set(some_string):
result[character] = 0
for character in some_string:
result[character] += 1
return result
Here we create a dictionary with a value of zero for each unique character in some_string. Once that is done we are ready to start counting.
Solution 8 – Dictionary, pre-built using dictionary comprehension
def count_characters(some_string):
result = {character: 0 for character in set(some_string)}
for character in some_string:
result[character] += 1
return result
Like the previous version, but now we use dictionary comprehension to create the initial dictionary.
Solution 9 – <string>.count() method
def count_characters(some_string):
result = {}
for character in set(some_string):
result[character] = some_string.count(character)
return result
Strings have a very handy method, .count(). You give it a substring and it returns how many times the substring appears in the string.
This sounds perfect for our problem. However, we need to know which substrings (in our chase, characters) we need to count. We use set(some_string) to get a set of unique characters.
“for character in some_string” would have given the same end result. It would take a bit longer because some characters would be counted multiple times.
Solution 10 – <string>.count(), dictionary comprehension
def count_characters(some_string):
return {
character: some_string.count(character)
for character in set(some_string)
}
We can create the dictionary directly, using dictionary comprehension.
For complex list, set or dictionary comprehension statements I like to break up the line into its logical components to make it easier to read.
Side note – Procedural versus declarative
In “procedural” programming we give Python the procedure, the steps to take to solve the problem. The first solution, and many of the other solutions above, are procedural.
In “declarative” programming we tell Python what we want, the required end result, and leave it up to Python to work out how to do it. Solution 10 is an example of declarative programming.
Solution 11 – Sort then count
def count_characters(some_string):
result = {}
characters = sorted(some_string)
previous_character = None
count = 0
for character in characters:
if character == previous_character:
count += 1
else:
if count:
result[previous_character] = count
count = 1
previous_character = character
if count:
result[previous_character] = count
return result
As humans we may solve this sort of problem by grouping the characters first, and then counting the number of characters in each group. This works especially well when counting physical objects.
A similar approach is to sort the characters first. We then start at the first character. As long as we see the same character, we keep adding one. When we get to a new character we record our count and start again with the new character.
This is a nice example of a very natural approach when we do this by hand, but which is tricky to get right in code.
Solution 12 – Functional Programming
from functools import reduce
def count_characters(some_string):
return reduce(
lambda result, character: {**result, character: result.get(character, 0) + 1},
some_string,
{}
)
According to Wikipedia, “In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map values to other values, rather than a sequence of imperative statements which update the running state of the program.”
Python is not a Functional Programming (FP) language but it does allow you to write using the FP style.
The reduce function above starts with an empty dictionary (the last argument), iterates over some_string, and calls the lambda function with the empty dictionary and the first character. The result of each function call is fed back into the lambda function for the next character, etc. The final result is then returned.
Solution 13 – Taking large steps
def count_characters(some_string):
unique_characters = set(some_string)
character_count = [
some_string.count(letter)
for letter in unique_characters
]
result = dict(zip(unique_characters, character_count))
return result
This version uses many of the previous ideas and breaks them up into three large steps.
The zip function is like having multiple columns of data (e.g. the unique_characters and the character_count columns) and return their content one row at a time. The first row (tuple) consists of (first unique_character and first character count), etc.
The dict function treats each tuple as a key:value pair and creates a new dictionary.
Solution 14 – The Counter class
import collections
count_characters = collections.Counter
Another useful entry in the collections module. Counter is a special type of dictionary. When you create a new instance of it and give it an iterator, it counts the elements of the iterator.
In short, collections.Counter can do all the work for us.
So far we’ve created a count_characters function to do this. For the sake of consistency we will do something similar here.
In Python functions and classes are first class objects. This means that we can use them like any other type of objects, such as integers or strings.
The last line creates a new variable name called ‘count_character’ and points it at (binds it to) collections.Counter
We can now use count_characters in the same way as the previous count_characters functions.
Conclusion
The Zen of Python (try: “import this”) states:
There should be one-- and preferably only one --obvious way to do it
Although that way may not be obvious at first unless you're Dutch..
The more you get to know Python, the more solutions you will find for the sample problem. And the more some solutions will seem better than others – whether you are Dutch or not.
And if you want to learn Python from a Dutchman, get in touch