What is duplicate code?
What is duplicate code?
Duplicate code as the name suggests is a repetition of a line or a block of code in the same file or sometimes in the same local environment. People might consider code duplication acceptable but in reality, it poses greater problems to your software than what you may have thought. Even code that has similar functionalities are said to be duplications.
The main reason for creation for the duplicate code is Copy and Paste Programming. Developers justify this practice of plagiarism by commenting that this section of code works for the program, and to make it look original modify variables and architecture of the code. They ignore that most programs they scrounge are made for academic purposes and strictly not for commercial development. So they contain redundant data so that the program is understandable.
Efficiency is the last preference in an academic environment, while it’s the first in the commercial sphere. Developers use copy-paste programming also because they don’t know about the language that much. Let’s talk about the 4 major ways in which code duplications harms your software development.
Top 4 reasons why code duplication is harmful:
1. Duplicate code makes your program lengthy and bulky :
Many programmers feel that if the software is working properly there is no reason to fix code duplications. You forget that you are just un-necessarily making your software bulky. Your argument can be that a few blocks of code would just use a few milliseconds to run. Which we agree to but only if you mean to use your software a few times. Code written for commercial purposes or web-applications is executed thousands and millions of times every minute.
Each millisecond of delay will contribute to greater delay and greater space requirements at the user’s local machine as well as your servers. Having well-written code with least duplications will make sure that your program runs faster and occupy less space. Also, age has gone when people didn’t care to wait. Now everything needs to be fast and smooth.
2. Duplication decreases your code quality :
Have duplication is fine, as long as you plan to throw your software away soon. Code quality is a necessity to make your software survive for long. Having duplicate code will make your code smelly and increases technical debt associated with it. The cost of repair of this debt is the amount of capital and time required to pay to a developer to simplify or de-duplicate it. The interest is the decreased productivity of developers.
Sometimes it’s impossible to refactor a duplicate code block but the aim should be to decrease as much technical debt. It helps to make your code of a higher quality.
3. Shotgun Surgery:
Suppose you write buggy code. You do a code review to find out the issue and then fix it. Now replace this scenario of buggy code with duplicate buggy code. You now have to fix every location that code is, losing your time, efficiency and sometimes temper. This situation is called inverse shotgun surgery and is pretty nasty for any developer.
4. Increases security risks:
Alright, this is more of a copy and paste programming issue than duplicate code but like most times one produces repetitive code by using the copy-paste process it deserves a mention. When someone takes a code from somewhere and adds into their software they forget about the holes and endings that plagiarized code has. Duplicate codes leave an opening for attackers to exploit and get into your code making it vulnerable.
How to detect code duplication?
By now you might have been convinced that duplicate code is harmful to your software. But how to check if your software has duplicate code? After all, it consists of multiple files and 100s of lines of code in each file. Going through each line one by one does not sound like a feasible solution unless it is just one file with a few lines of code.
Here is where automated tools come handy. By using code review tools like Codegrip, you can find duplication within a few seconds irrespective of the number of files or the lines of code. Codegrip not only shows you the duplication percentage but which file is it located with its line number and the file path and line number of the duplicated block of code. Apart from duplication Codegrip also shows bugs, code smells, security vulnerabilities and code coverage of the codebase.
Want to reduce code duplication and improve code quality?
Sign Up with Codegrip and setect all duplications for Now!
Don’t Repeat Yourself (DRY):
With serious risks and challenges involved with duplicate code, there is one renowned way of reducing it. Using DRY or Do not Repeat Yourself principle, you make sure that you stay away from duplicate code as often as you can. Rather you replace the duplicate code with abstractions or use data normalization. To reduce duplicity in a function, one can use loops and trees.
While DRY comes out as a system that can be applied for most types of duplicated codes, at some places it becomes a necessity to repeat. Like in many instances while using JAVA language. Hardcore coders also have problems maintaining the DRY principle because of their WET or We Enjoy Typing approach to programming. But even if you enjoy typing, the WET principle just translates to Waste Everyone’s Time.
How to remove duplicate code?
Duplicated line of code should be replaced with a single method. So the fix would be to extract the method and delegate to common behavior.
If a duplicate code is present in:
- The same method, create the same Local Variable and reuse it.
- The same class, create common Method refactoring.
- Subclasses of the same hierarchy, you should Extract Method and Pull it Up.
- Two different classes, you can use objects.
Modules with similar algorithms are hardest to find, as no duplication detector can find them. A solution for this is to use design patterns. Template Method design pattern can be applied for algorithms from the same class hierarchy. Whereas the Strategy design pattern can be applied to algorithms used in a different class hierarchy. Polymorphism should be used in place of switch cases and if-else that test for the same condition.
Advantages of removing code duplications
- Merging duplicate lines of code simplifies the structure of the code and reduces file size.
- It also increases the maintainability of the code and reduces the technical debt over time.
- Reduction in the number of security vulnerabilities.
- It keeps your code clean which in turn helps in providing feature support and new updates quicker.
When can you ignore duplications?
There are very few rare cases where you would want to ignore duplicate code. I recommend keeping your duplication percentage to a minimum. Every company has it’s own threshold for duplication percentage which matches their quality standards.
Having 0% duplication is very difficult and gets tougher as the number of files of source code increases. The most important thing is to follow proper coding standards like DRY and having an efficient code review process in place. It is important to conduct code reviews on a timely basis so as not to increase the technical debt later on. Even if you outsource your projects you need to check for duplications because, in the end, it’s your responsibility to maintain and update the software down the line.
Liked what you read? Subscribe and get fresh updates.