There is a popular belief that big functions are bad. Martin Fowler considers long functions a code smell1. In the overwhelming number of cases big functions are actually bad.
Here, however, we are going to discuss that the line count may not be the best metric. We will discuss alternative metrics as well as when it doesn’t make sense to follow those metrics.
How to Identify a Big Function
Line count is not a good measure of complexity because of one obvious example. If you have a function with uniform pieces, like a long function that handles all possible cases of state machine inputs, then it’s not a problem. In this case you understand the structure of the functions. And this is the crucial part - structural complexity is what we should strive to reduce.
Measure of Function’s Complexity
One of the most popular metrics for measuring structural complexity of a function is cyclomatic complexity. Cyclomatic complexity is a metric of the control flow graph of a function. Here is a formula for cyclomatic complexity:
M = E - N + 2P
- E - the number of edges in the control flow graph.,
- N - the number of nodes in the control flow graph.,
- P - the number of connected components.
Control flow graph for a subroutine is always connected. Therefore this formula can be simplified to:
M = E - N + 2
This metric is often calculated by static analysis tools. However when you write your code you rarely sit with a calculator that shows you the cyclomatic complexity of your function.
Instead, programmers develop their own set of hints that will tell them when their cyclomatic complexity goes up. Every branching or loop in your code increases cyclomatic complexity by 1.
The nice property of cyclomatic complexity is that it measures structural complexity of a function instead of its size. If a long function has no branches or loops then the cyclomatic complexity of the function is 1.
It’s recommended to keep cyclomatic complexity of your functions under 10.
This concept gives a basic idea what you should look for when you want to subdivide your functions into smaller pieces.
When to Extract a Function
Loops And Branches
From the definition of cyclomatic complexity we can conclude that the first candidates for extraction are loops and branches. Loops and branches bear the heaviest cognitive load for the reader.
Another motivation for extracting a piece of code into a separate function is duplication. Duplication is not just two pieces of code looking the same. Duplication is when two separate pieces of code need to be changed for the same reason and they need to be changed simultaneously. When the same information is used in two places and it’s not expressed through language constructs.
At the same time similar functions may evolve completely independently in the future. In this case their similarity is merely a coincidence and shouldn’t be considered a duplication. Bringing those similar pieces of code into the same function would be a mistake. This mistake is very easy to fix somewhere in the guts of your application, but it’s way more costly to fix it in public facing interfaces.
Even if a function doesn’t have a duplicate it may still make sense to extract it for the purpose of documenting its body. Instead of making a comment you can extract a new function with a descriptive name. Comments are more often overlooked by the developers than actual code. Extracting a function with a good name may keep this name relevant longer than adding a comment.
Well Known Patterns
If a piece of code is actually a well known algorithm or design pattern it’s also a good reason to extract this piece of code. If this extracted code is named properly then an experienced reader can skip the well known parts and focus on the specific parts of the code.
Dense Subgraph of Dependencies
Let’s imagine that we built a graph of dependencies between the lines of the function. Line X depends on line Y, if line Y defines a variable that is used in line X.
So, it makes sense to extract a smaller function from the bigger function if the extracted code’s degree in the dependency graph is relatively small. If the degree of the extracted code is small then the number of input and output parameters of the extracted function will be small. We will achieve encapsulation by doing so.
If, on the other hand, the piece of code being extracted is highly connected with the rest of code we will need to pass as an argument every variable that is used in the function. In this case there will be very little encapsulation of the extracted code.
At the same time, if the whole function is very highly connected, then it can be difficult to find a piece of code to extract. In this case subdividing a function may only complicate things. Before subdividing such a highly connected function we can try reordering statements in order to create higher connected clusters of lines that we can extract.
Functions with Single Purpose
Many authors suggest that one function should have only one purpose. It means once you see a function with more than one purpose you need to split it up.
However few of them elaborate on how to look for this purpose. I can say that the most important thing is to look for this purpose. If you wrote your function with a single purpose in mind, no matter what that purpose was, then other people looking at your code will see this purpose.
The whole code writing is about communication. As long as you can clearly communicate your intention - you are good. Robert Martin has a great chapter about functions in his book Clean Code.2
When Long Function is Not a Problem
Contrary to popular belief long functions may not a problem if they actually consist of several pieces each with an isolated scope. It can be even easier to read sections of a bigger function one after another if you don’t need to jump between the definitions of the bodies of the smaller functions.
If all or nearly all variables in the function have a small scope then the long function that encloses them is not a problem. In this case it’s relatively easy to follow the control flow of the function. Sometimes it even makes sense not to extract these self contained parts into separate functions. The necessity to jump back and forth between small functions in order to understand the purse of the bigger function is not a pleasant experience either.
Uniformly structured functions are also not a problem. If you have a long function with a switch and you have thousand cases in this switch, then there is very little you can do about it. Extracting groups of switch statements into separate functions may only obscure the point.
- Martin Fowler, Refactoring. Improving the Design of Existing Code, Addison-Wesley, 2019
- Robert C.Martin, Clean Code: A Handbook of Agile Software Craftsmanship, Prentice Hall, 2019. Chapter 3, Functions