To DRY or not to DRY?

Published November 13, 2018 - 3 Comments

Today’s post is written by guest author Damien Beaufils. Damien is a passionate developer and a software crafter. Convinced that well-designed software is at least as important as working software, he works as a Tech Lead in agile projects, or as a trainer on software development practices such as Test Driven Development. You can find Damien on his Twitter and LinkedIn accounts.

Also interested in having your writing on Fluent C++? Check out the guest posting area!

—

You hear it since you started programming: you have to remove, remove and remove code duplication!

Why? If you have ever worked with a legacy project, when there was code duplication, the same bug in the code had to be corrected in several places, which drove you crazy. And I don’t even talk about introducing new features.

Even quality tools like SonarQube tells you about code duplication percentage, with a heavy hint: if you have duplication, it’s bad. And if your manager sees these percentages, he may show up asking you “Why are we having 6% duplication on this project? You need to do something about it!”.

And in the end, they’re right: removing code duplication eliminates code redundancy in order to make your product easier to maintain and to add new features.
It’s the famous principle of DRY: Don’t Repeat Yourself“.

Yes, BUT.

The full definition of DRY, as written in The Pragmatic Programmer by Andy Hunt and Dave Thomas is this: “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.”

DRY is about knowledge, business domain, use cases. Not about code.

This means that there are cases where code is exactly the same in multiple places, but you should not remove this duplication. You should intentionally keep it in the code. DRY is about removing code duplication which is technically the same AND which is about the same business domain.

Recognizing code which is technically the same is simple, your IDE or tools like SonarQube can analyze code for you.

Recognizing code which is technically the same but about a different business domain requires human analysis and thinking, and this is your responsibility as a software development professional.

In my opinion, there are at least 2 cases where you should not eliminate duplication. These cases represent little portions of your code, but have a major impact on your product, its scalability and its correctness.

Case 1: removing duplication “a priori”, a.k.a. DRY instead of KISS

Say that you are implementing a new feature in your code. If you work with an Agile methodology, you rely on a user story to understand and implement in code the feature that the business wants. The functionality described in the user story is simple for now, but will become more and more complex with other user stories in the future.

And when developing the first user stories, you come across a case of code duplication: the code you just wrote is in every way similar to the code of another feature developed in previous days. Your reflex: DRY. You find a way to reuse the same piece of code for the two features, whether the duplicated code is complex or not.

A few weeks later, the business domain of your application is getting more complex and the two features that shared this identical code both evolve in their own directions.

An often observed bias is that we want to keep the refactoring we have done previously. Because, after all, it was useful and legitimate, and also because we spent some time to factorize it, so we want to keep our return on investment (loss aversion bias).

But the features are now so different that you find yourself passing more and more arguments to the method and introducing conditions to execute such or such part of the code, depending on the arguments.

Examples:

C doSomething(std::optional<A> const& a, std::optional<B> const& b) {
    // do something common for both objects
    if (a) {
        // do something specific for a
    } else {
        // do something specific for b
    }
}

with the possible calls:

doSomething(a, std::nullopt);
doSomething(std::nullopt, b);

C doSomething(A const& a, B const& b, bool isA) {
    // do something common for both objects
    if (isA) {
        // do something specific for a
    } else {
        // do something specific for b
    }
}

with the possible calls:

doSomething(a, b, true);
doSomething(a, b, false);

In these cases, it can be beneficial to recreate the duplication for each of the calling methods, to ensure it implements the only needed use case rather than trying to cram all imaginable use cases in the same method. This will also simplify testing and the way the next developers will understand it.

Case 2: different lifecycles

Imagine that you are working on an online sales website.

On this website, a customer can browse your catalog and order products. Once an order has been delivered, the customer can download an invoice from his account: this invoice is generated automatically with the current date and up-to-date company information (logo, address, capital, etc.) . This invoice indicates the date of purchase, the products ordered with their quantity and their pricing.

In your code base, a product is a class with simple fields: identifier, name, description, picture, price. You will probably have a database to store these values, and the product table will be very similar to the class in your code. A product is used in the catalog, and in the invoice, to retrieve descriptions and prices. Everything is fine, it’s simple.

The website is live in production, and customers order products. It works perfectly, and the company generates profits. Several months go by, and the website evolves: new products are added, others are removed, others have their price reduced during some promotional campaigns. All is easily done by a back office that you have developed. Life is beautiful.

Then one day, the accounting department of your company calls up the IT department in charge of the website, and politely asks “What’s this mess with the invoices?!?”.

Your first reaction: look at the logs. Nothing, no exceptions.

Your second reaction: look at the status of servers and resources. Everything is fine.

When asking the accounting department for details, they explain to you that several customers wanting to download an invoice from an order made several months ago have an issue and have contacted the customer service: the total on the invoice downloaded today is not the same as the total initially debited when ordering.

Why? Because the ordered products have meanwhile dropped in price.

But an invoice should be immutable. It must show what the customer has purchased exactly, at the price he paid for it. It does not matter if the product has dropped in price since then.

The accounting department goes on and asks: “is this a bug that was added with the website update last week?”.

With a chill running down your spine, you answer “no, it has always been like that”, knowing deep down what it means: all the invoices ever downloaded since the opening of the website are potentially incorrect. This represents a legal risk for the company, and a hit to its image.

By the time you fix the bug, the accounting department have to manually republish the invoices for all customers wishing to download one. And to do this, they have to find the price of each product at the time the customer had purchased it.

And during this time, your communication department prepared an information e-mail to all customers, informing them of a possible error in the invoices, and inviting them to get in touch with the customer service for any question.

The customer service is then overloaded.

In short, everyone knows that all this is the fault of IT.

Why did this happen? Because of the strong coupling introduced by reusing the same Product class in your product catalog and in your invoice, in the name of DRY.

Yes, the code was exactly the same, but the data lifecycle was radically different: a product in a catalog can be changed at any time, while a product in an invoice is immutable, ad vitam eternam.

All because you wanted to remove code duplication.

Conclusion

DRY: Do not Repeat Yourself. This rule in the world of software development targets the code which is technically the same, and about the same business domain. The code having the same lifecycle, the same impacts.

To recognize a technically identical code, you or your tools can detect it. To recognize a business domain identical code, you have to ask your Product Owner.

In the case of the product in the catalog vs. the product in the invoice, if one of the development team members had asked “Do you agree that if we change the name or price of the product, it will be changed in the catalog but also in the next invoice downloaded by the customer?”, the Product Owner would have immediately warned of the error and that would have cost nothing to the company.

And if SonarQube or other similar tools tell you that you have code duplication, identify and explain healthy duplication to your team, and to anyone worrying about it.

Don't want to miss out ? Follow:
Share this post!

About Jonathan Boccara

To DRY or not to DRY?

Yes, BUT.

Case 1: removing duplication “a priori”, a.k.a. DRY instead of KISS

Case 2: different lifecycles

Conclusion

Comments are closed