How To Avoid Flaky Tests - becomingbetterdeveloper

How often did you hear from the software engineers, “Ah, it’s a flaky test, just ignore it”? Probably each of us has experienced or even said that.

Flaky tests are annoying. Your feature is complete, the colleagues helped with the code review, and you are just a step behind hitting the Merge button. But that single test keeps failing, you restart the continuous integration job in the pipeline, and wait for a miracle. If you are lucky, your pull request will be green. If not, you have to repeat the flow once again.

Flaky tests can waste time, delay feature delivery, and impact team mood. Usually, the flakiness comes from a few sources in the code. Let’s explore how we can make the tests rock solid and reliable.

Order Dependency

The common error leading to test inconsistency is not preserving the order. That is possible when operating on unordered collections like a Set.

For example, imagine that the code returns a list of client names. The list is hardcoded for simplicity. Check the following piece of code.

def get_client_names():
    return ["James", "Anna", "Tony"]

Now we add a test for the function.

assert get_client_names() == ["James", "Anna", "Tony"]

This test should pass as we keep the item’s order of the asserted expression. If we change the order, the test will fail in the example below.

assert get_client_names() == ["Tony", "James", "Anna"]

However, if we change the type of the collection from a list to a set, the test will start passing.

assert set(get_client_names()) == set(["Tony", "James", "Anna"])

That is because different collections compare the data differently. In our case, the list is an ordered collection. Meanwhile, the set is an unordered collection containing unique elements. When we compare sets, we ensure the elements are the same, ignoring the order. But when we compare the list, we check if the order is preserved. Otherwise, the test will fail.

Keep this in mind, as different programming languages can have different implementations of the collections. When you write tests, always ask yourself: do you want to assert the precise order of the elements, or is it enough to check the inclusion of the elements, ignoring the order?

Time Dependency

Another common mistake causing flakiness in tests is incorrectly working with time.

Consider the following example.

def create_user(name: str) -> dict:
    return {
      "name": name,
      "created_at": datetime.now()
    }

..

timestamp = datetime.now()
user = create_user("James")
assert user == {"name": "James", "created_at": timestamp }

When it comes to testing, you will struggle. There is a high chance that the assertion will fail for the property created_at. Sometimes the test will pass, but sometimes it will not.

All of that is because of the dependency of the datetime.now(). It is impossible to get the exact timestamp that the function returns.

This is a common mistake when working with time. How can we resolve it?

There are a few options to bypass dependency limitations.

The first idea is to utilize a third-party library for mocking dates and times. For instance, the library FreezeGun overrides the timestamp in tests for Python. It allows setting a specific date and time for the function to return.

However, what should you do if there is no library you can add to the project?

The second idea is to follow dependency injection principles.

def create_user(name: str, timestamp: datetime = datetime.now()) -> dict:
    return {
      "name": name,
      "created_at": timestamp
    }

...

timestamp = datetime.now()
user = create_user("James", timestamp)
assert user == {"name": "James", "created_at": timestamp }

In this example, we inject the timestamp instead of creating one inside our function. Now we can pass any value and resolve the flakiness connected to the date.

Integration with Libraries

When you integrate third-party libraries into the project, you might need help with implementing tests. All of that depends on the framework and the libraries you are working with.

For example, Ruby and Python will allow you to mock many dependencies and you will not see any difference. But other programming languages like Java will not always allow you to override the behavior in tests.

How do we keep the necessary library and implement unit tests seamlessly?

The best approach is to apply the design pattern Facade or any similar structural pattern. It encapsulates the usage of the library and other implementation details. The facade class can be easily mocked in tests.

The bonus point is that when you decide to replace the library with another one, your interface will remain the same. All replacements will happen within the wrapper class. A common example is to have a wrapper class for the networking library.

Flaky tests in code are a sign of bad architectural design. Just patching a single test will not resolve all flakiness. We need to tackle the problem from a different angle and recognize the issue before it appears.

The problems stated above are the most common mistakes in unit and integration tests. If you learn to prevent them, your team will make a big step in code quality. Happy coding!

Looking for how to grow as a software developer?

Do you want to learn the essential principles of a successful engineer?
Are you curious about how to achieve the next level in your career?

My book Unlock the Code offers a comprehensive list of steps to boost 
your professional life. Get your copy now!