Billets
To Comment or Not to Comment ?
No, you should not stop writing code comments.
The Internet is undeniably a tremendous technology that has literally propelled information sharing and exchange of ideas to a level never achieved before. Nevertheless, this sometimes provokes some frustrations, as when we read the words of a person whose thinking and conclusion are fundamentally at the opposite of ours. It may generate a serious sense of urgency that causes you to react in one way or another, because you are aware of a crime of lese-majesty; someone, somewhere on the Internet is wrong!
I experienced this situation while reading an article entitled: “Stop Writing Code Comments”, in which its author develops an argumentation promoting to stop writing, or even ignore, comments in a computer program source code. Of course, this topic isn’t new, and I have already discussed this issue extensively with many colleagues. Some of them are also largely in favour of using them at a minimum, because they see comments, at best as a waste of time, at worst as a source of confusion. But to encourage the apology of the total suppression of comments, there is a line that I didn’t think that one could cross.
Even if my heart skipped a beat and if I felt the urge to deny this pamphlet that should be a bunch of nonsense, I still took the trouble to study the argumentation of this detractor. Doesn’t the “Art of War” teach you to know your enemy as well as yourself? Then, everybody must admit that it would be slightly presumptuous to claim to hold the one and only truth. If this author has the feeling that the comments in a program are totally useless, this feeling must be based on some objective standard. So I took a deep breath before tackling this post to fully understand the ins and outs.
I was then surprised to find out that the basic elements supporting his speech made sense, but that it was rather at the level of the recommendations he drew that it became wobbly. The author encourages indeed the thesis of self-documented code, which means every programmer must strive to write the code as clear and as expressive as feasible to avoid as much as possible to have to define what it’s doing. The names of each variable, class or function must leave no doubt about the purpose they serve. Functions must be small and serve only a single objective, explicitly indicated by its name. As soon as the logic that animates it becomes too complex, a function must be broken down into so many slight and simple pieces, each encapsulated in its own routine. All these rules of good practice aim at making as easy as possible to read the code, to ensure its testability, or to guarantee the ease with which someone can later make any changes.
I can only emphasize here the relevance and importance of this point of view because I’m myself a strong believer that good code shouldn’t just do what it should do; it must also be extremely readable, well structured and flexible, so as to minimize as much as feasible its complexity at all levels of reading. By this I mean that when you look at a piece of the source code, it must be engineered to deeply reduce the mental workload of the reader. Given that in the life of the software, its code will be read more often than it was written, it saves a lot of time at all stages of its existence, while drastically limiting the probability of leaving all kinds of bugs in the corners.
In fact, this principle goes much further than our debate on comments, and I will allow myself a little parenthetical comment here. The primary goal of software engineering and code writing is entirely in the management of complexity. And if it’s easy to forget this point when writing a program a few hundred lines of code long, it’s quite different for larger or more sophisticated software. Some of you may have already witnessed the organic growth of programs that, gradually, were transformed into a labyrinthine black box and that no one could understand in full any more. And the golden rule to fight against this phenomenon reposes mainly in the systematic and hierarchical modularization of the program’s constituents. That’s why the development of a system goes through the design of different modules that are themselves broken down into sub-modules, into classes, then into functions. That’s why you have to spend time refining the operations of those elements at a higher level of abstraction before coding them, or that you have to spend time creating interfaces to connect them to each other. Everywhere it’s necessary to make sure that the various parts are highly coherent with each other (they aim jointly at the same objective), but that their couplings (i.e. their interactions) remain weak. And only then it can be humanly possible to manage code whose effects can spread over several orders of magnitude of complexity, ranging from the manipulation of simple bits to the one of hundreds of MB of information.
Among all the tools that make it possible to confine the code complexity to a humanly acceptable level, the notion of self-documented code must surely be the spearhead of the programmer’s arsenal; I don’t want to debate this point here. My problem comes from the following step, which then consists in demonizing the comments, denying not only their usefulness, but also asserting that they generate more harm than good.
Criticisms of comments, numerous and well known, are fuelled by an eternal debate to define the quintessence of what the art of software engineering should be. For example, we can mention the fact that comments would make the code more challenging to read because the ordinary language is much less precise than any programming language. We can add that the code changes without anyone bothering to modify the comments that accompany it, making the reading even more confusing and difficult. Then, they’re like a waste of time as they require to write or read twice as much information, while one part may be completely obsolete and wrong. Moreover, the ultimate knowledge, the one that really matters and never lies, is the code itself; why should we bother with a vein and potentially harmful additional workload?
All the pitfalls I just mentioned are indeed real, needless to deny. But these problems only arise when comments are badly used, by programmers inexperienced in the art of employing them. And it isn’t because some clumsy people have already been injured with a shovel or a pickaxe that these tools must be considered as absolute evil. Imagine the situation:
“Using tools to dig a hole? But you’re crazy, it’s a source of annoyances and endless hassles. You have to handle them with care, take care of them, not to mention the risk of getting hurt. Personally, I’ll tell you, there is no hole that I cannot dig with my fingers and a teaspoon. You don’t believe me ? look at this hole of a cubic meter, made entirely in three days with elbow grease. So put away your tools or other devil’s excavators, a real worker uses only his muscles and a teaspoon.”.
If you find my story a bit caricatural, I’m going to relate another one that is true this time. Several people have already told me, totally convinced, that there is no situation that cannot be fully debugged with printf. That a true pure programmer didn’t need any IDE, just a text editor like vi and, of course, printf. One has even claimed that the mouse was a waste of time, everything can be achieved faster with keyboard shortcuts. I have always been puzzled by these kinds of statements. The unshakeable conviction with which they were uttered made me even doubt myself at first, letting me believe that I was the incompetent who didn’t understand how to debug a multitasking process manipulating complex data structures without employing break points, variable inspection tools, or even means to scrutinize memory content. But the truth is very different, these people have probably never gone head-to-head on issues other than those that I consider myself as childish. I don’t mean to say that you should never use printf, this is an option that is often relevant. What I mean is that there are also many more difficult situations and that it would be very naive to ignore them.
The situations cited as illustrations in the article banning comments are thus of the same type, simplistic. They all implement trivial examples emphasizing the uselessness of writing a comment such “The title of the CD” next to a variable called “title”, or “This function sends an email” for a function called “SendEmail ()”. I’m not saying that I’m never confronted with such banalities, but being rather versed in writing algorithms solving complicated problems, I remain very perplexed. I’m used to always mention in comment the acceptability range or the physical units to which variables associated with a physical process relate to. This allows at a glance to understand that the variable “speed” is used in m/s and not in km/h, without having to read a hundred lines of code looking for any clue. Should I now refrain from this kind of triviality and rely only on the notion of self-documented code? Should I now change the name of my variable “speed” to something less prosaic like “speed_positive_lessthan150_meterbysecond*” ?
In the same way, I always try to put a paragraph summarizing the main lines of what each function performs, in addition to giving them an explicit name. The reason is simple, many of them are quite abstract and cannot be summarized in a triplet of words. For example, imagine that you code a procedure that searches for the root of a polynomial using Newton’s iterative method. Personally, I think it’s a good idea to recapitulate what this function does, as well as to provide a pointer to some additional information. Should I now rename my “FindPolynomialRoot ()” function to something more explicit like
“FindPolynomialRoot_UseNewtonApproximation_MoreOnWikipedia_Newtons_method ()”
to avoid these abject comments?
Even if I push the point (and again), the reality is quite different. A good programmer must always find an acceptable compromise based on all the tools that can support him manage the complexity of a program. The self-documented code is one, very important, but it is frequently far from sufficient. Beware of all those who assert to have found the grail and advocate an absolutist policy, because the truth is often in the nuance, never in the extremes. Well applied, the usage of comments can and should help you better structure and document your code. Refusing them, as the author claims in his article because a programmer “is not a documentor”, is similar to stating that no any other kind of documentation than the code is necessary. Exit the software architectures, the UML diagrams, the test plans, the detailed models, the pseudocode or the descriptions of algorithmic and mathematical methods, all you need to know is in the code. And while we are at it, maybe that for a true pure programmer, all there is to know is in the code in binary form.
More seriously, the code contains only what the compiler needs to know, but certainly not everything you, as a human, need to know. Comments should be viewed and used as a means of intermediate documentation between the code and other design documents. They should therefore not overlap, but rather shed light to a higher level of abstraction, that is to a level that not any self-documented code could reach. Comments should always describe the programmer’s intent, and definitely not rephrase what the code does. They must not be redundant and certainly not obsolete. Because yes, comments must be written with the code altogether, and altering one is like changing the other. In order to get things right, tell yourself that by reading only comments about a function, you must be able not only to learn what the function does, but further to grasp the big logical lines of how it achieves it. In this sense, comments must always describe a story it should be possible to read while ignoring the lines of code with which they intermingle. Higher-level comments can also help to comprehend how all functions or classes relate to each other, by summarizing the general operation or by punctuating certain parts of the code, just as a table of contents and chapter titles structure a book. And if you’ve never felt the need to comment your code, maybe it’s just because you’ve never written a book. There is nothing pejorative about it, but it is good to know it before pretending to have understood everything and to fight for their eradication.
To close this little note, I would like to quote Steve McConnel’s book “Code Complete”. This one-of-a-kind reading takes a long and broad look at the often overlooked topic of managing complexity in computer programs. And if you only need to read one passage, then it’s definitely “The Commento” that you’ll find in Chapter 32.3.