This article is part number 7 of the Readability series.


One of the best things you can do to improve the readability of your code is to avoid comments. “Uh, what?“—I hear you say—”That goes against all good coding guidelines, which state to heavily comment your code!”

Right. Except that the real goal is to have valuable comments only, and doing so is a (very) complex matter. The code should document itself at all times, and be crystal-clear at doing so. Redundant or useless comments hinder readability, if only because their presence denotes that the code is not clear enough to be self-descriptive, because they are obsolete, because they are confusing, or because they very frequently include typos or grammar mistakes.

Here is my suggestion: whenever you feel like writing a comment, stop, relax and ask yourself this question:

Is the comment explaining why the code behaves as it does, or is the comment outlining how the code works?

If your answer is that the comment details the rationale behind the algorithm (aka the “why”), then your comment is probably valuable and should stay. Otherwise, if the comment does not describe any rationale or some obscure corner case, there is something to be improved about your code.

Before moving on, remember that the compiler cannot validate your comments: you can write anything you like in a comment, and the explanation will eventually get out of sync with the code (if it is not already by the time you check it in). Comments are bound to get stale, contain typos or other inaccuracies, and no tool will help you in making them correct.

Having said that, let’s look at a few cases where comments can be converted into code that is auto-explanatory so that the comments can be yanked with peace.

Statement of various sequential steps

First of all, my pet peeve: comments that denote the various chunks, or steps, of an algorithm or function. Take a look at this function to print a table with all columns vertically aligned:

def print_table(table):
    # Early abort if the table is empty.
    if len(table) == 0:
        return

    # First, compute the widths of the columns.
    widths = [0] * len(table[0])
    for row in table:
        for i in xrange(len(row)):
            if widths[i] < len(row[i]):
                widths[i] = len(row[i])

    # Now, format every row as a textual line.
    lines = []
    for row in table:
        line = ''
        for i in xrange(len(row)):
            padding = ' ' * (widths[i] - len(row[i]))
            line += row[i] + padding
            if i != len(row) - 1:
                line += ' '
        lines.append(line)

    # And, finally, print the results:
    for line in lines:
        print line

In this code snippet, the comments are used to visually mark the different logical steps in the code. That’s fine… except for the fact that you can write this same code in a way to yield the comments irrelevant. Note, also, that these comments provide no value to the reader other than a small hint of what the code does.

Consider what happens if you split every chunk into a separate function, and give the auxiliary function a name that clearly describes the step being taken:

def calculate_column_widths(table):
    assert len(table) > 0
    widths = [0] * len(table[0])
    for row in table:
        for i in xrange(len(row)):
            if widths[i] < len(row[i]):
                widths[i] = len(row[i])
    return widths

def format_table(table, widths):
    lines = []
    for row in table:
        line = ''
        for i in xrange(len(row)):
            padding = ' ' * (widths[i] - len(row[i]))
            line += row[i] + padding
            if i != len(row) - 1:
                line += ' '
        lines.append(line)
    return lines

def print_table(table):
    if len(table) == 0:
        return

    widths = calculate_column_widths(table)
    lines = format_table(table, widths)
    for line in lines:
        print line

Your complex algorithm is now trivial to follow: the various steps are self-descriptive due to the function calls as are the data dependencies between each. Oh, and by the way, your code is now easier to unit-test too.

Statement of assumptions in the code

Another common case of useless comments are those that detail assumptions of the code:

def calculate_memory_size(pid):
    current_memory = ...
    # MINIMUM_SIZE ensures that we never return 0.
    return max(current_memory, MINIMUM_SIZE)

(Seen forms of that in the real world.) Note that comments like this can be translated into code, usually in the form of assertions. Doing so results in something that the compiler and runtime engine can actually validate for you. So, instead, you could do this:

def calculate_memory_size(pid):
    current_memory = ...
    result_size = max(current_memory, MINIMUM_SIZE)
    assert result_size > 0
    return result_size

Statement of obvious remarks

Everybody has seen variants of these:

i += 2  # Add two to i.
return True  # Returns false.

These are comments you would probably not write because you have been taught that these qualify as really bad comments. Right. However, it is very easy to fall into the trap of writing comments that follow this same pattern but aren’t as obvious. For example:

# An empty string is an error.
if len(input) == 0:
    raise InvalidInputError(input)

The comment here is completely redundant, because the following piece of code says exactly the same thing in a concise manner.

Repetition of the code

Comments that state exactly what the purpose of some piece of code does are often problematic. Consider this:

# The package base size is composed of the header (128K),
# the list of files (256K) and the tail marker (16K).
PACKAGE_BASE_SIZE = (128 + 256 + 16) * units.KiB

This comment is interesting… but it is horrible: any time you update the definition of PACKAGE_BASE_SIZE—say to add a new component or to change a particular value—you must also change the comment. The comment is restating the code. So, instead, we could do this:

_PACKAGE_HEADER_SIZE = 128 * units.KiB
_PACKAGE_FILES_LIST_SIZE = 256 * units.KiB
_PACKAGE_TAIL_SIZE = 16 * units.KiB
PACKAGE_BASE_SIZE = (_PACKAGE_HEADER_SIZE +
                     _PACKAGE_FILES_LIST_SIZE +
                     _PACKAGE_TAIL_SIZE)

Or:

PACKAGE_BASE_SIZE = (
    128 +  # Header size.
    256 +  # Files list size.
    16     # Tail size
) * units.KiB

These are just some examples to get your mind started on the whole art of avoiding useless comments, and I hope you find them interesting. If you noted some case that I missed, please write about it!

Go to posts index

Comments from the original Blogger-hosted post: