The obvious solution
Say we want to write the following program:
Given a directory, find the file containing the most non-empty lines.
Even with something as simple as this, if we charge ahead without much thought we may soon find ourselves in a mess. Let’s make a mess now and see what we can learn from it.
So first we’ll create our function,
findMaxFile. According to the requirements, we’re going to need to find files in a directory. Easy:
Now that we’ve got the list of filenames from the directory we need to load all of those files:
Then we simply count the number of non-empty lines in each file, record the maximum, and send back the corresponding filename. Seems pretty straightforward.
However when we put down the hammer and step back to admire our handiwork, this is what we’re looking at:
From innocent beginnings we’ve ended up with a big chunk of code. As soon as our product gets a few “real-life business requirements” (eg. “On Tuesdays find the file with the second-highest number of lines”) we’ll start to see the cracks. We’ll be stuck with code that is hard to read, hard to test, and brittle to change. Unless we do something about it.
At this point there are plenty of steps we could take to massage out some of the knots. callbackhell.com is full of excellent advice that you should read if you haven’t already. However, massaging alone won’t address the underlying issues. We need to rethink.
The way we translate our requirements into functions can have a huge impact on the end result. There are a few techniques we’ll look at in this example:
- Use abstractions to simplify the problem
- Transform data in steps to avoid conditional branching
- Restrict knowledge of the outside world
Let’s start again and see how applying those techniques changes things.
1. Use abstractions to simplify the problem
Before we start refactoring code it can be helpful to take a fresh look at the requirements, and see whether we can do any simplification there. To recap, our requirements are:
Given a directory, find the file... For our purposes we could say that a directory is just a list of files, right? We could reframe the problem as
Given a list of files, find the one with the most non-empty lines. So we can simplify by splitting the work into 2 separate functions:
This might seem moot, but consider that in our main function we’ve gone from dealing with “directories and files” to just “files”. We have one less thing to consider, and the code will be cleaner as a result.
We can make a similar simplification when we look at the second half of the requirements:
find the file containing the most non-empty lines. When you think about it we’re not really concerned with counting lines in a file – we’re counting lines of a string! Strings are much simpler to work with, and we can easily factor out that logic without any reference to loading files:
2. Transform data in steps to avoid conditional branching
Let’s continue looking at the
countNonEmptyLines function. In our original example we had:
This does the job, and it looks simple and familiar. But loops (eg.
for) and conditions (eg.
if) can quickly become very complex. Every time you see an
if you must consider each possible outcome and trace the steps. Throw in a
continue as well, and you’ll need to account for that. And each logic branch compounds upon the complexity of the past.
There is a better way we can achieve the same goal, by transforming the data in steps rather than piece-by-piece. If we want to count non-empty lines we could use something like this:
There is less mental overhead here because at each step we can picture clearly how the data has been transformed:
- We start with a string:
- Then we
split('\n')to transform that into an array of strings
- Then we
filter(Boolean)to remove any empty strings from that array
- And finally we use
.lengthto tell us how many lines remain.
We can apply the same technique when we process files. In the original example we were loading one file at a time, each time counting the lines and then recording that number if a new maximum was found:
It’s a familiar pattern, but we can simplify things by first loading all files, then counting lines of all files, and finally locating the maximum:
This also improves our ability to write tests since each step is a function that can be tested independently of the others.
3. Restrict knowledge of the outside world
A common feature of the first two techniques is that they help us write functions that have little or no knowledge of the context they are being used in.
findMaxFiles doesn’t know (or care) if the files came from a directory listing or somewhere else.
countNonEmptyLines doesn’t need to know that the string came from a file.
indexOfMax only deals in arrays of numbers.
When a function becomes tangled with its context you have to think much harder about how to use it correctly. The less outside-knowledge a function has, the more confidently you can use it.
Putting it together
Finally let’s take the functions we’ve written and see if we can stick them together to solve the problem (You can see the complete code in https://github.com/joshwnj/untangling-callbacks):
In just a few lines of the
findMaxFile function we can see that we’re loading files, counting the lines and sending back the maximum. We can see the original requirements shining through – they are no longer tangled up in the implementation. When we find the right way to frame our code as independent composeable functions we also often discover that plain old callbacks are not so bad after all. Callbacks are just functions, so if we can untangle the functions, callbacks are a natural fit.
There is much more that could be learned on this topic than will ever fit in my brain. If you’ve found some useful techniques, or have tried applying these ones, please share!