Make Bad Code Good
Refactor Broken Java Code for Fun and Profit
Summary
Every big project contains a section of indispensable, incomplete, and incomprehensible code. Careers rise and fall working on it, and the product lives or dies according to its quality. In this article, John Farrell suggests practical steps for revising, refactoring, and rebuilding bad code so that it performs the functions required of it, and becomes good code that you're proud to maintain. By following these steps, you can save the project and become a well-paid, respected team player with a corner office, without Losing Your Sanity or Working Late.
(4,800 Words)
By Dr. John Farrell
o, you've inherited some code It's 50,000 lines of the oldest code on the project The authors have left the company and will not return your calls It's undocumented, badly designed, and buggy;... yet the rest of the team uses IT all the time. there's absolutely no option but to make it work, and what's your job. What do you do?
Do not stress. This article will make some concrete suggestions as to the sorts of things you can do, from the simple and mindless to the complex and dangerous. The process can be mostly evolutionary and absolutely safe. In a couple of months, colleagues .
Some of the suggestions that follow are nothing new Indeed, concepts such as documenting code and following coding standards have been drummed into programmers since they went to school However, the usual reasons given are motherhood statements -.. Things you would not say the Opposite of (Be a good boy, write understandable code, for example). I've Tried to Explain Why you would.
Start with the easy things When making bad code good, it's best to start with easy changes, to make sure you do not break anything. A large lump of bad code breaks easily, and nobody can be expected to fix bugs in it without some PREPARATION.
Fix the Javadoc comments You never realize a comment's importance until you need to read it. Even if nobody else reads it (and usually, they do not), Javadoc comments are important for the code authors to help them remember what the code / class / Parameter Should Do. PIECES OF INFORMATION Particularly Important to include Are:
Whether an object parameter may be null, and what it means if it is Whether a parameter needs to be mutable or not Whether a return value is mutable Whether a return value may be null Whether changes to the return value affect the returner's internal state
Fixing the Javadoc comments does not mean just adding them where there were none before. You should also delete misleading and trivial comments (such as documenting setters and getters) as they engender a false sense of security. Important information exists for inclusion in Javadoc comments SO There's no point Wasting the time of potential readers.
When you write new methods and classes, try to include some Javadoc comments (just a few) to explain what they do. Even if it's obvious to you, the next person to get the code might thank you for the effort. If you don ' t know what to write in the Javadoc comments, you have no business writing the code.Javadoc comments can also serve as a record of what you learned about the code. When you figure out what a particularly tricky, clever, or ugly method does ( especially a private or protected method with no implied contract), write a comment to record it for posterity. This expands the amount of the code under your control and helps you later when you're trying to decipher something related.
Of course, Javadoc comments should only explain the effect of a method, never the implementation (because you might change that). If code within a method needs explanation, use standard code comments. A school of thought exists that says with good enough identifiers you do not need code comments at all. I'm not that extreme, but I take the point. Indeed, many code comments do not tell you anything new. I urge you to leave those uninformative comments out, as they only serve to make Simple Code Verbose and Intimidating.
Fix the code formatting Code formatting standards always spark contention. I'm in favor of coding standards, as long as they're the ones I already use, of course. If you take over a piece of badly formatted code (different from how you would do it), and you're having trouble reading it, you'll want to reformat it. Having a project-wide coding standard makes it less defensible for people to reformat code to their personal unreadable style and more likely that any particular piece of code makes sense to you the first time you read it. The most widely accepted coding standard is probably Sun's.The advantage of reformatting badly formatted, badly written code is that as you change it, you leave a clear trail indicating where you've been. If you find yourself looking at a badly formatted method, you know not to trust it because the expert (that's you) has not put a seal of approval on it. moreover, if you reformat code, at least you know you've Read it. You May Not Have Understood IT, But Y OU Might NOTICE REPEATED Sections of Code, Calls To Obsolete Methods, or Any Number Obviously Bad Things. If You're Lucky, You Might Find A Bug Caused by Indentation That Doesn't Match The Brackets.
While you're messing with the code, you might like to rename identifiers to be meaningful. I have almost given up using abbreviations in identifier names because I find them difficult to understand and hard to pronounce. There's no point having an easy-to- Type Identifier if it makes so little sense that You can't Remember it. and please, use allcaps identifiers Only for firm variables!
Finally, I think it's just dishonest to claim that code that does not meet project coding standards is good code.Follow project conventions Your project, if it is a big one, probably includes some conventions Examples that I have encountered include.:
Error logging and tracing:. The project may specify a standard interface to the logging system Code that does not use the standard interface may write directly to System.out, causing messy output that can not be turned off by the logging system configuration With. a standard logging interface, logging implementations can be changed as the project evolves without breaking existing code Access to resources:.. There may be conventions for location of images to display in the user interface code that does not comply with the conventions, for example by referring to absolute file locations rather than the class path, will break as soon as the project needs to be packaged for release Use of CORBA:.. Distributed object systems, such as CORBA, add complexity to a project CORBA's limited capability to pass objects as Arguments CAN, Unless The Proprammers Are Vigilant, Restead of Proper Objects. Project Standards Can Require That Corba Objects Be wrapped in an OO / CORBA impedance mismatch adapter, which restricts the CORBA types to the remote access layer where they belong. Similarly, ORB instance creation is a task that belongs in the remote access layer, not in general code.
Changing bad code to follow these project conventions yields low-hanging fruit in the form of immediate results. Inventing conventions where they do not exist proves more difficult, but in all likelihood is worth the effort.
Refactoring With any luck, by fixing the Javadoc comments, reformatting the code, and changing the code to comply with project conventions, you've learnt something about the code Armed with this experience, you're prepared to change it -. The refactoring process. In case you're unfamiliar with the term, refactoring means changing the design of existing code. (See Resources for more on refactoring.) Write automated unit tests Unit tests are one of those things that we all think of as being done by the mythical perfect programmer, yet something for which the rest of us just never find time (but read Test Infected). However, with some experience, you'll find that unit tests save you time. Automated unit tests help you go fast. Here's HOW.
In all likelihood, if you are working on a large piece of bad code, it's part of a bigger project. It might be a distributed system with assorted servers and associated processes, and some sort of user interface. The only way to tell whether your code works is to start up some of the system, try a few things, and see if it works. Starting all these processes, and remembering the whole sequence, wastes your time. You're employed to fix bugs, not wait for computers. The solution: write small, quick tests that you can run quickly and that will tell you whether or not the code works A JUnit test gives you a green light or a red light Is not that easier than checking all the output to make.. SURE IT IS CORRECT? WITH PROPER Procedures, Testing The Correctness of The Code Becomes An Absolute No-Brainer.
Of course, writing unit tests is not easy, which is one of the reasons they're not as popular as they should be. Writing unit tests for distributed systems is very difficult indeed. However, writing correct code without testing proves even harder than that , so the easy way out is to write unit tests. you'll find that if you start with unit tests for simple parts of the code that have small interfaces, you'll eventually have the skills to tackle larger and more complex parts of the code. And almost certainly, you'll find some embarrassingly stupid bugs along the way.Unit tests are very important because they tell you when you break things. And things are very likely to be broken, given the daring nature of the upcoming refactorings. By The Way, IF You Break Something, And Somebody Else Detects It First, Your Unit Tests Need To Be Improved. Other People Are Handy Like That.
You are not gonna need it One of the Extreme Programming mantras says, "You are not gonna need it." That is, unless you have a requirement for some code today, you have no reason to write it, and to make your job easier, you should not Unnecessary code becomes particularly easy to detect in code already in use -.. it's the bits nobody calls because they do not work Usually, the requirement has lapsed because nobody expects it to ever work in this. Case, It's Easiest to delete the code. Think of every line you dele NERER NEED To Understand, Document, Reformat, or Refactor.
Also, find ideas that were started but never completed, and delete them. You'll find that a lot of refactorings or design decisions that you do not understand were made only to accommodate changes that never saw the light of day, and by deprecating those features, you can simplify the design. If it later turns out that you need the feature back, you can put it in and do it right.Another fertile source of unnecessary code includes completed but nonetheless unused features. These are a consequence of designing .
Break Up Big Classes with big classes the usual causes Are:
No clear definition of what the class should do, resulting in it doing rather a lot of different things Out-of-control inner classes Numerous static and instance methods Excessive numbers of convenience methods Cut-and-pasted code
My rule of thumb states that 500 lines of code amounts to a big class. Classes over 1,000 lines are bad and must be fixed, and classes over 2,000 lines equal acts of terrorism. In the paragraphs below you'll find suggestions for five areas on How to Break Up Big Classes.
. Firstly, extract the named inner classes Some people write inner classes so they can achieve friend classes as you would in C ; for those people there is no solution but to stop them from writing Java However, developers create many inner classes simply to put. them near the code from which they are used. in such situations the inner classes can easily be extracted. However, tiny, anonymous inner classes are usually so intimately attached to their containing class that they can safely be left in the code. If they really are a problem, give them a name and move them out.Secondly, examine the class's static and instance (nonstatic) methods. If you have a lot of both, your class may be trying to be two different things. Perhaps the static methods implement some sort of singleton, while the instance methods implement things referred to by the singleton. If that's the case, it will not be much effort to break the class in two, with one taking over the responsibilities of the singleton and th e other taking over the responsibilities of the instances. You might also find some static methods that just provide some useful functionality. They can be extracted into a tools class. The difficult part about this sort of refactoring is finding all the references to this class that No longer compile. Refer to the "Tools" section Below for Some Ideas.
A third idea:. Closely examine the class's instance variables If two or more must be kept in sync (for example, a map from As to Bs and the reverse map from Bs to As), you should extract those variables into a new class. The new class's methods will replace any direct access to those instance variables in the original class. By placing all of the objects' behavior into one class, the relationship between those variables will be more aptly defined and more difficult to break.Fourthly, if your class is one of several implementations of an interface, you might find that all implementations of that class share some common behavior, but the code to implement it is duplicated in each class - a sure sign that you need an abstract superclass that implements the shared behavior. By taking the shared behavior up into the superclass, you're removing code from two classes, and coincidentally documenting the expected behavior of implementations of the interface. At the very least, you can write an Empty Abstract Superclass, Extend It in Each Implementation, And Move Shared Behavior Into The Superclass As You Resolve Incibilities.
When dealing with interfaces of classes, it's best to keep them small. The Law of Demeter says that you should never write code that invokes a getter on an object returned from a getter, for example x.getBlah (). GetFoo (). Instead , x's class should have a getFoo () method that does that for you. I think that law is wrong. If x.getBlah () returns a complex object, the public interface expands way out of control. Furthermore, changes to the interface of x.getBlah () cause changes to the interface of x For that reason, I suggest exactly the opposite:.. Convenience methods with simple alternatives should be removed from the public interface of the class If you suspect that a public method does not need to Be public, you can do an experiment by deprecating it. You Won't break the build, and you'll soon find out ket.................... ..
Finally, it's my conviction that big classes with multiple instance variables and synchronized methods represent reliable sources of deadlocks. Programmers often use synchronization to protect against concurrent access to instance variables that could corrupt relationships among those variables. For example, consider the case above where we had a map and its reverse as instance variables of a class. Obviously, synchronization should be used to ensure that updates to the pair of variables remain consistent. If the class has another, separate pair of related variables, synchronization should be used to ensure their consistency as well. However, as the instance has only one synchronization lock, access to the separate pairs of dependent variables is sequentialized as well, although not required. With the inevitable addition of a little more complexity, the unnecessary synchronization will result in a deadlock When all this is being attempted was to protect the instance variables. The R lated variables should be extracted into subclasses, where they can have their own synchronization locks.Break up big methods Just as big classes prove difficult to understand, so do big methods, whose usual causes include:
Too Many Options, Causeing the Method to Do Too Many Things Not Enough Support from Other Methods, Causeing The Method To Do Tasks At a Lower Level Than It Should Overly Complicated Exception Handling
My rule of thumb for methods:.. If I can not fit the entire thing on one screen, it's too long It's an entirely practical rule with no theoretical basis, but it seems right and works for me Remember though, when you read a method of 100 statements for the first time, it looks like 100 unrelated statements, and it is only after some study that you can see the internal structure. Life would be easier for you and for others if you made that internal structure explicit.The most obvious target when attacking overgrown methods is code repeated in multiple methods. you would have noticed these when you were reformatting the code. A typical case occurs when a getter method may throw an exception, for example to indicate that the value is not available. Of course, everywhere you need to get the value, you handle the exception and log an error message. This means that in every place you get the value, instead of one simple function invocation you have five or six lines of code. Duplicate those in 10 Methods Where You NEED To Look Up That Value, And You Have 50 or 60 Lines of Bad Code, Even Though All you are try to do is get some value.
To solve the problem, you can start by extracting the five or six lines of code into a new private (or protected) method. Decide on some way of handling the exceptional condition, for example by returning a default value, or by throwing another exception type that is propagated by the caller. Code the exception logging and handling only in the new method, and fix the callers to simply invoke the new method. This suggestion seems entirely trivial and it's basic object-oriented practice, but I have seen good programmers fall into the trap many times The best defense:. refuse to write any line of code twice, and always make a new method in preference to duplicating even a small piece of code.Loops serve as another obvious target for removal from a method Often. a method will call another method to get a Collection back from it, and immediately iterate over the Collection This is known as mixed levels of abstraction -. the caller does not need to look inside the collection to get it, but t he caller does need to look inside to use it. It's cleaner to write a method that will do the iteration over the collection for you, so the original method can use the collection without being concerned about its implementation. By the way, studying the Java Collections API CAN SAVE You a Lot of Effort in this Type of Code.
Once you've extracted any loops that do not belong, it'll be easier to read the method. You'll be able to see what is being done to what without the distraction of loop variables, indentation, and Iterator declarations. You might then notice that a chunk of the method only works with one of the parameters, and the results of that computation are combined with the values of the other parameters separately. This is a perfect opportunity to extract the operations on that parameter into a separate method . If you give the new method a good name, it'll be much clearer as to what's happening.Moving on, exceptions represent an excellent source of long methods and hard-to-read code. As exceptions are a new thing for most people (since Java was invented, or C became popular), there's not much folklore on how to use them. Even worse, as exceptions usually indicate that something has gone wrong, and you hardly ever know why, you do not know what to do When you do get one. this generally results in Overly Long, Complex, And Useless Exception-Handling Code.
With exceptions, it's best to realize that you do not have to handle every one. If you do not know what to do with a ClassNotFoundException, or an InterruptedException, do not do anything. You're allowed to pass them back to your caller. Of course, that changes the method's signature, which is not always possible, so it's best to declare your methods as throwing a nested exception (an exception that has another exception as its cause). that way, your caller can discover that The FooException You Threw Was Caused by a classNotFoundException, AND DEAL WITH IT APPRIATELY (OR RETHROW IT).
It's advisable for your foo package to have a generic FooException to throw from any method, just in case something goes wrong. Then your caller has to catch only FooException. For particular problems, you can implement FooNotFoundException and IllegalFooException. As long as these subclass FooException , you will not have to change your method signature, and the caller can include enough detail to make sensible responses to the problem.Now consider the following code, which uses the Reflection API in a fairly simple way:
public Object getInstance (String s) throws FooException {try {Class c = Class.forName (s); Class [] argTypes = {Integer.TYPE}; Constructor constr = c.getConstructor (argTypes); Object [] args = {new Integer (0)}; return constr.newInstance (args);} catch (ClassNotFoundException ex) {throw new FooException ( "class not found", ex);} catch (NoSuchMethodException ex) {throw new FooException ( "no such constructor" , ex);} catch (InvocationTargetException ex) {throw new FooException ( "exception in constructor", ex);} catch (IllegalAccessException ex) {throw new FooException ( "can not access constructor", ex);} catch (InstantiationException EX) {throw new fooException ("Something Bad", EX);}}
In the example above, I consider the five catch statements in a row to be bad code, as they're essentially useless, they take up a lot of space, and if I ever change FooException I might have to modify them. Instead, I Recommend Putting Them in a mathod by themselves, like tHIS:
public Object newGetInstance (String s) throws FooException {try {Class c = Class.forName (s); Class [] argTypes = {Integer.TYPE}; Constructor constr = c.getConstructor (argTypes); Object [] args = {new Integer (0)}; return constr.newInstance (args);} catch (Exception ex) {throwFooException (ex, "constructing new" s); return null;}} private void throwFooException (Throwable ex1, String explain) throws FooException {explain = toString () ":" explain; try {throw ex1;} catch (ClassNotFoundException ex) {throw new FooException (explain, ex);} catch (NoSuchMethodException ex) {throw new FooException (explain, ex); } catch (InvocationTargetException ex) {ex = ex1.getTargetException (); throw new FooException (explain, ex);} catch (IllegalAccessException ex) {throw new FooException (explain, ex);} catch (InstantiationException ex) {throw new FooException (Explain, EX);} catch (Exceptio n ex) {throw new fooException (explain, ex);}}
I claim that almost all of the class's exceptions will be converted into FooException, and almost always in the same mindless way, so you might as well do it explicitly and efficiently. The throwFooException () method can be reused many times. For each invocation, you can change 10 lines of exception-handling code into 3. in addition, you automatically add contextual information (toString ()), providing useful debugging information in all FooExceptions. If you're ever required to improve that debugging information, you only have to do it in one place Remember the lesson from Animal Farm:. ". Less code good, more code bad" (That's not exactly what George Orwell wrote ...) Rearchitecting As you increase your knowledge of the code, making it more comprehensible and better factored, you'll inevitably find some bits you do not like. When you started this project, you were not really capable of changing much, but now you've played with the code, and it's not as damaged as it Once Was. Along The Wa Y, You Have Seen Some Bad Things and had some ideas on how to fix them. it's time to make some service change.
Rewrite code you do not understand If you know what the code should do, and it seems to do it, but you can not figure out quite how, it's bad code. If the original author did something tricky, some comments should have been Provided. More Likely Though, The Author Was Clueless and The Code Really Is A Disaster. REWRITE IT.
By rewriting the code, you have the opportunity to do it the easy way, and to include useful comments. Start by writing down in Javadoc comments what the new code does (just to make sure that you know). Then you can take advantage of the byproducts of your refactoring: call this new method that does that, use that new data structure to store this The new code will make more sense to everybody and will probably run faster as well your unit tests will tell you whether your new code.. .. works Go into your bug tracking system and close all the bugs you have fixed.Move to a layered architecture Let's hope that your inherited, formerly bad code now looks better It's time to look at the bigger picture: how the packages relate to each other. Using some sort of tool that tells you what classes invoke what methods (that information can be extracted from the class files), find out the interclass dependencies. from those dependencies, infer the interpackage dependencies. If you can not draw a hierar Chy (WITHOUT Any Loops!) of what packages Deprond on what Other packages, you have some architectural work to do.
Using your instincts, a code browser, and a great deal of guesswork, figure out what such an architecture might look like one fine day. Then identify the particular problem spots. A particular class that frequently refers to other packages or a particular constant referred to Other packages hints......................
After a lot of consideration, you can derive a package hierarchy that looks sensible. Although it may require a lot of work to achieve the clean architecture, it gives you a goal to work toward. Your unit tests have clarified what the code should do, and now the architecture allocates the responsibility to do it to packages and classes. All you have to do is fill in the code.Tools All this cutting and pasting, extracting, and creating does not come without a cost. you will find yourself mindlessly changing parameter lists more frequently than you would like. you will have more compilation errors than you ever dreamed possible. you will (sometimes) insert more bugs than you take out. To help you through the boring bits, you need good tool support.
Assuming you have a good editor with search and replace functionality (preferably on multiple files at a time), you need a fast compiler to quickly detect what you've broken. Multiple invocations of javac probably will not run fast enough, instead, to Avoid The Startup Cost, You May Turn To a Tool Like Jikes Or Something That Keeps Javac Running In The Background.
A good IDE, one that compiles fast and saves you from referring to the Javadoc all the time, will stand you in good stead. I was never a fan of Java IDEs, but they seem to have evolved into useful tools. Being able to compile WITHOUT CHANGING Windows and Losing Concentration Proves Vital.
Moreover, a reverse-engineering tool would be useful for determining the architecture of the system. As implied above, I do not think the architecture of the system is anything to be concerned about when you start. After all, there's not much you can . do about it However, when you have the small details under control, you'll almost certainly need automated support to figure out the existing architecture Look for the buzzword "round-trip engineering" on the Web.Note:. I will not make any specific recommendations for tools, as I have not used most of the sophisticated ones. I'm usually happy with find, grep, and a Python interpreter. I've cobbled together utilities to do most of the things I need.
Conclusion if you're just starting out with bad code, you've got a long Way to go. But don't fret ,hen The going gets rough sense of superiority and a more than average amount of job security. moreover, you can learn from the mistakes of others (but blame them when you mess up). you can run your unit tests and say, "It works now." and you Can Write on Your Resumé, "i can make bad code good!"
That's got to beh.
Acknowledgements I would like to thank all the bad programmers in the world, without whom life would present no challenges. I would also like to thank all the good programmers in the world, without whom sympathetic shoulders to cry upon would not be found. Working on Bad Code Can Be Frustrating, And Kindred Spirits Will Keep You Sane.
Further, I would like to acknowledge the contribution of the Extreme Programming inventors, and all contributors on the WikiWikiWeb, without whom the evils of Big Design Up Front and the value of refactoring and unit tests would not be duly recognized.I would like to thank ian Clatworthy, Darryl Collins, Christian Larney, Simon McMullen, and Thomas Maslen for their comments on the first draft;. they have made this a better article I would also like to thank David O'Brien for his comments on my English - it's .
About the authorDr. John Farrell has been a Java programmer since the days when even programmers thought it would be mostly used for writing fancy Webpages. He currently maintains a distributed object server infrastructure that stretches from JDBC through CORBA to just below the GUI layer. It . was formerly bad code, but he made it good Before being a Java programmer, John completed a PhD in functional programming, which gave him an appreciation of elegant languages that run slowly; and worked as a C programmer, during which time he gained an . appreciation of automatic garbage collection He currently spreads the word that Java runs fast enough for real applications, if done right; and that good object-oriented design almost always represents the right way On the second Monday of every month, he is an active. Heckler at The Australian Java User Group Meetings in Brisbane, Australia.
Resources
Nested exceptions are such a good idea that Sun is considering adding them to the next release of the JDK Read "Java Tip 91: Use Nested Exceptions in a Multitiered Environment". Terren Suydam (JavaWorld): http://www.javaworld.com /JAVAWORLD/javatips/jw-javatip91.html for more junit tips and tricks, read "junit best practices" Andy Schneider (javaworld, december 21, 2000): http://www.javaworld.com/jw-12-2000/ jw-1221-junit.html Java coding standards differ from project to project, but one widely accepted set is Sun's own "Code Conventions for the Java Programming Language": http://java.sun.com/docs/codeconv/html/ CodeConvTOC.doc.html The Law of Demeter is an object-oriented design principle of debatable value http://www.c2.com/cgi/wiki?LawOfDemeter Jikes, IBM's natively compiled Java compiler, is an order of magnitude faster than javac : http://www10.software.ibm.com/developerWorks/opensource/jikes junit is an automated test framework derived from a Similar Smalltalk Tool: http://www.c 2.com/cgi/wiki?JavaUnit Refactoring is one of the fundamental practices of Extreme Programming and is absolutely essential to understand if you are working with existing code Read Refactoring:. Improving the Design of Existing Code by Martin Fowler (Addison-Wesley, 1999): http://www.amazon.com/exec/obidos/ASIN/0201485672/javaworld "Test Infected: Programmers Love Writing Tests" - Kent Beck and Erich Gamma's inspiring article will make you enthusiastic to start writing automated unit tests , if I haven't succeededed: http://members.pingnet.ch/gamma/junit.htm the wikiwiki web is the Web '