The hardest thing in computer science

I firmly believe that the hardest thing in computer science is naming things.
I saw it many times during code reviews or analyzing legacy code, over and over:

  1. Wrong (vague or meaningless) names are often the consequence of bad design. For instance, it is impossible to find a good, concise name for a class/module which holds too many responsibilities.
  2. When something (e.g. a class) is named badly, programmers tend to care less about it. You have probably seen generic "Service" classes, that grow very fast, as more and more methods are added to them.
  3. Wrong names erase very important parts of the domain language from the code. The consequence is, that it is very hard to truly understand the domain model, especially when you are not the author of the code. This leads to even more shallow understanding of the system and the business. And that is the shortest route to even more bad code.

"(...) when you program, you have to think about how someone will read your code, not just how a computer will interpret it."
— Kent Beck

Just imagine what would happen if, in real life, most of the names were vague, meaningless, or simply wrong. Not a good prospect, right? Yet you can see it in the source code very often.

I decided to describe some common naming anti-patterns that I observed during my career. I think that eliminating them can really improve readability and, in turn, the overall quality of any codebase.

Parameters and arguments are not the same thing

Let's consider the following method signature:

boolean isInGroup(Person person, Group group);  

It is very common to see an usage similar to the following:

if (isInGroup(person, group)) {  
    ...

The thing is, that the actual argument passed to a method is always much more specific than the corresponding parameter. A parameter might be generic, while an argument is a concrete value. It has its context, it has its intention. Therefore it deserves a much more specific name than just a repetition of the parameter name.

Just see how much more information we get, if the above code is rewritten as follows:

if (isInGroup(amendmentRequestor, documentReviewers)) {  
    ...

The idea is very simple – an argument is more than a parameter which it fits. Surprisingly often, though, programmers seem to forgot about it.

Do not let an interface name pollute the name of your implementation

The difference between parameters and arguments might be taken to another level when you consider interfaces and their implementations. If you read Dependency Inversion Principle (SOLID) carefully, it is clear that an interface is what a class (or a system or whatsoever) exposes for another components to express what it expects from them. Implementing an interface is meeting such expectations (i.e. making a class or another system something that you can "plug in" to this interface).

Sadly, many programmers see it other way around, as if an interface was something that an implementation exposes to tell other classes about its capabilities.

"Could you pass me the charger, please?" "Sure!" — real life conversation

"Could you pass me an electrical outlet implementation?" "WAT!?" — if our daily language was as sloppy as the code often is

This misunderstanding is another source of very bad names that lead to bad design.

Let me make an example. I once dealt with two systems that were integrated with each other. The first system was sending some messages to the second one. The name of the second system was very vague, it just reflected a generic concept in the first system.

The first system was basically a router which was preprocessing some messages and sending them to departments (Department was an actual name used in the domain model of the first system). Most of these departments were external organizations with some web-service APIs.

But here comes the second system. It wasn't 3rd party, it was maintained by the same group of people who maintained the first one. From the perspective of the first system, it was dealt with as any Department. So it was named "MessageDepartment" (seriously). The name reflected just the fact that it accepted messages and was plugged into a system that dealt with Departments. It wasn't any external department, just another subsystem of a larger whole. Yes, it accepted messages, but it would be much more informative to tell what it actually did with them.

This extremely generic and vague name caused more and more functionality being added to the system. The developers ended up with adding many (too many) responsibilities to a single module. Among others, it included document/message browsing and management, review/verification process, notification feeds and duplicate detection. All of these were quite big and deserved a separate module/subsystem to preserve cohesion.

It was clearly visible that something was wrong when you tried to describe that thing: it was difficult to find a concise definition/name. What the system became was just a big bag with different, unrelated functions.

I firmly believe that the design would have been much better, if the system had been named accurately from the beginning. I.e. if the emphasis was put on what it really was (e.g. a notice board) rather than what it was integrated with. It would not have been so easy to randomly put new things into it so, hopefully, the functionality would have been split into separate modules.

Express expectations through parameter names

Let's consider the following code snippets:

if (document.getType() == NOTE) {  
    archive(document);    
}
...

private void archive(Document document) {  
   ...
}

The good thing here is that a private method has been nicely extracted. Its name is quite OK too. Actually it could have been a proper code. The problem was that the logic in the private method was valid only for Notes (a type of Documents). It has not been used for any other documents (and should not have been). Therefore it would be great if the parameter was named more precisely, for example as follows:

private void archive(Document note) {  
   ...
}

Variable or field is much more than its type

How of then do you see code like this:

Person person = new Person("Bob");  

In some cases it might be OK, but in many others variable names may (and should) carry additional information. For example:

Person scrumMaster = new Person("Bob");  

Another example I once found:

class InterviewResult {  
    private Person person;
    ...
}

Can you tell me what this person field really is? You probably could if you took a look on its usage, but you would not have to if somebody cared enough to write just:

class InterviewResult {  
    private Person candidate;
    ...
}

Another, more complex example:

List<InterviewResult> result = interviewResults.stream() // (1)  
                                 .filter(r -> r.getStatus() != REJECTED)
                                 .collect(toList());

Map<Long, Map<InterviewStatus, InterviewResult>> groupedIRs =  
                    result.stream().collect( // (2)
                          groupingBy(result -> result.getCandidateId()), 
                          groupingBy(result -> result.getStatus()));

for (Map<InterviewStatus, List<InterviewResult>>  
        interviewStatus2InterviewResult : groupedIRs.values()) { // (3)

    // some processing, which I'll skip for your sanity
    // as it wasn't any better than the code above
}

It is an actual snippet that I once saw during a code review. At least three names (marked as (1), (2) and (3) in the comments above) could be improved here. But let's focus on the last one (interviewStatus2InterviewResult). It tells that the variable is a mapping of statuses to full interview results. It is true (at least), but do we really need that? The type of the variable tells the same and is clear enough, I think (Map<InterviewStatus, List<InterviewResult>>). The really important fact is that the map contains interview results for a single candidate. It is a consequence of the grouping operation few lines above. The code is not clear, so a good name would be even more helpful here.

I think that this code requires much more refactoring in general, but even just naming improvements make it noticeably less ugly:

List<InterviewResult> notRejectedInterviews =  
                allInterviewResults.stream() // (1)
                                   .filter(r -> r.getStatus() != REJECTED)
                                   .collect(toList());

Map<Long, Map<InterviewStatus, InterviewResult>> resultsByCandidate =  
                    notRejectedInterviews.stream().collect( // (2)
                          groupingBy(result -> result.getCandidateId()), 
                          groupingBy(result -> result.getStatus()));

for (Map<InterviewStatus, List<InterviewResult>>  
      resultsByStatusForACandidate : resultsByCandidate.values()) { // (3)

    // ...
}

Context of a method invocation is more than just the method itself

In another system I worked on, I once found a method called advancedSearch. There was nothing wrong with the method itself (let's not worry about what "advanced" actually meant).

The problem was in other place. The system had an on/off flag for duplicate detection (i.e. whether or not it should throw an exception if a duplicated item was submitted). The flag was named enableAdvancedSearch. How would you interpret it without knowing the explanation I gave above? I bet it would not be obvious. This advancedSearch method was actually used in other parts of the system too. The badly named flag wasn't about disabling/enabling the method/feature itself, but rather one particular context of using it.

The intention (i.e. checking for duplicates) was much more important than the means of executing it (i.e. by invoking advancedSearch). Therefore the proper name would be enableDuplicatePrevention.

Name something if you can

Logical expressions, literals, lambdas, even code snippets – they are all anonymous. Sometimes (very often, I think) it is a good idea to just name them. The easiest way to do that is to extract a method. Let's consider the following snippet as an example:

// some processing
if (statuses.contains(RED)) {  
    status = RED;
} else if (statuses.contains(YELLOW)) {
    status = YELLOW;
} else {
    status = DONE;
}
// more work

Why should a reader be forced to even look on these 7 lines of code when all he or she wants is to get a general idea about what is going on at the high level?

All that happens in these lines is just selecting the highest value out of a collection of status codes. It is a simple logic, but it would be easier to follow if it was explicitly named. And doing that is as easy as extracting a private method (e.g. status = theHighestStatusOf(statuses)).

Another example:

long countRed = source.getRecords().stream()  
                    .filter(e -> e.hasColor("red"))
                    .collect(counting());

Which in my opinion would be much better if a private method was extracted:

long countRed = countByColor("red");

The idea is simple. Just remember to be careful, because no name at all is actually better than a wrong name.

Summary

It is, for sure, not a complete list of naming anti-patterns. I hope it is just a good starting point for improving names in the code and finding more rules about them. Improving names improves readability, which positively affects the overall code quality. Which, I hope, is the goal for the most of us. Thanks for reading and I hope you will find it useful!


Photo credit: Geoffrey Fairchild


comments powered by Disqus