What Makes Application Code Maintainable?
Coding isn't about just making things work.
Maintainability largely isn't about coding style items like how many spaces one indents, or whether one avoids i,j,k in their simple loop counters.
Following a style guideline does not insure understandable code (any more than the rules of grammar insure clear writing).
Coding is about making relationships and dependencies as clear as possible, given how generalized/abstract the code has to be.
Following someone else's code is harder than understanding your own code.
The basic problem is that programming often involves knowing where (in a haystack full of code and indirect references) the pieces that relate to each other are located.
In large systems, race conditions, unforeseen critical sections, buffer overruns, deadlock and state dependencies (e.g. did this get set before we started using it?) can get knotty. Without the right foresight, a bunch of little pieces of logic can easily add up to a great big confusing mess.
Code is unmaintainable when it takes longer to understand it - than it does to rewrite it.
The ultimate test of code maintainability is whether it lives on (to be used, useful and enhanced) well past its originators.
1. Debugging Capability:
It is impossible to understand a flow of logic performed on variables - without visualizing examples of data within those variables.
In fact, it's impossible to learn any (e.g. math, physics, ...) generalization, without familiarity with the concrete examples upon which the generalization was induced.
- To figure out/learn a large system it's essential to be able to see how different samples of data flow through the system - either via direct debug or emulation.
- The easier you make this visualization (via debugging, emulation, variable naming conventions, comments), the easier it will be to understand the code.
- In the end, the average customer doesn't care if an application is developed in GWBASIC, Visual Basic, C#, or Assembly for that matter, as long as it looks good, runs good, and works well.
- Among modern programming languages, debugging capability is much more important than language particulars.
2. Variable Scope and Type:
How a variable is scoped and typed is just as important as what it represents. Try to make it apparent where, what type of data is being changed.
- When it comes to tracking down bugs, being able to quickly see a variable's
type and scope (in a haystack full of variables) is as important as what the
variable represents.
- Using an abbreviated level of (what programmers refer to as) Hungarian
Notation helps make
wrong code look wrong
- e.g., a variable called "result" could be a boolean (bOK, is_ok), an enumeration, a file handle, an array... It's OK if the "result" variable is used right there and its meaning is obvious, but code often grows so that the original "result" variable winds up being many lines of code away - and easily gets mixed up with all those other "result" variables in other name spaces - as you search through the code trying to figure out what's going on.
- In my experience, weakly typed languages (which I like) without some variable type and scope (e.g. gStatus_msgs vs. status_msgs) naming conventions tend to produce code that's unnecessarily harder to follow, enhance, and debug.
- Which subroutines are doing what to the state (or values of the global variables) in the system?
- Ideally, variables should be as local as possible, and subroutines should be as small and specific as possible. But there's a trade-off between these two:
- It's easy to make subroutines small if they reference lots of global variables.
- Conversely, it's easy to make most of the variables local when the subroutines become huge.
3. Language Usage:
Some view software primarily as a mathematical construct. I view it more as language.
- There are two major aspects to writing code in any language: How is the logic structured?, and what (if any) are the naming conventions?
- e.g. verbs for methods and nouns for variables, etc.
Granted that programmers make dozens of little micro design decisions in a typical day, and you can't be productive if you vacillate over each one. But there are a couple of places where a little more pause and thought have big effects on maintainability:
Try to use consistent naming.
It's incredible how little thought goes into consistent naming - yet this has a HUGE bang for the buck in making code and end user functionality understandable. Some common examples are:
- My latest Kindle Fire HDX software uses the term "Add to Home" to add a book or app to the main screen, and the term "Remove from Favorites" to remove from the same screen. Why could they not choose between "Home" and "Favorites", since they are the exact same thing? Pick either one and be consistent.
- While there is little problem picking up this pattern on a simple Kindle - it can be a much bigger problem on a large software package when it's done all over the place.
- Whenever I've built a PC, the board is called one thing on the box cover, another thing in the paper or pdf instructions, and goes by a third name on the CD directory holding the board drivers.
Sometimes they mix marketing names, internal company project names, and chipset names when describing the same thing.
From what your POST variables are called (it's helpful if their names correspond in some way to the database table columns that they will update), to having a button labeled "Copy to Cart" named "copy_to_cart" in the POST, to programmers who call a stream or file handle one thing on Friday and choose another name for it somewhere else on Monday (and yes, I admit I have done this too. But knowing this is bad, I usually catch it and correct it) - it really helps make code understandable to (at least) think about how to be as consistent as possible in naming variables, table columns, tables, etc.
- e.g.
Please don't call a table column "total_sales_price" and read it into a variable called "total_price" then rename it "sales_total_price" in a subroutine argument, while the same thing is called "sales_price_sum" by the next person coding a related data structure.
A module named "customer_leads.php" which manages a table called "customer_leads" is much easier to follow.
- e.g.
Please don't call something "parameter" when it's consistently a "next_entry" (next user, next screen, next recipe,...). The same goes for "id" (as opposed to "order_number"), "check_and_update" for "setup_next_screen".
- e.g. If you have a string variable and an enumerated variable that represent the same thing, please give them the same root name. e.g. job1_string and job1_enum are easier to follow than id and param.
Once again - everyone's code has inconsistencies. But if one has experience picking up other people's code, they will realize how much they detract from maintainability - and will be on the lookout for them. They will revise them whenever they get the chance to view their coding with a fresh eye (which is usually on Monday morning).
Think out warning and error messages.
e.g. "Invalid Input" or "Communication error" does not cut it when the problem is Unrecognized user" or "Invalid password".
- I have put in upwards of several days of effort (near project completion) making sure that a technician can understand logged warning and error messages.
Knowing the programming language isn't enough to develop good software (any more than being good at crossword puzzles means that one can write well).
- The variable and subroutine names within the software application are a form of vocabulary.
- Their preconditions (or which subroutines have to be called in what order with what parameters) are a form of grammar.
- Over many lines of code in a large system these can easily add up to a foreign language (for
anyone else).
- One major purpose of a code review is to make sure that someone else can understand your code and can act as a support backup.
- One benefit of an education in the hard sciences is that it shows you can follow someone else's (easily misinterpreted) reasoning.
Program
logic has analogies to language prose, e.g.:
Readable |
|
Obscure |
if (can_machine_start(user_id) ) then |
-vs.- |
if (check_conditions(2, user_id)) then |
start_scan(row, column, length, direction) |
-vs.- |
start_scan(product[lineno-offsetY], product[lineno+offsetX], sizeof(buffer1[lineno]), read_io(control34[lineno])) |
I believe those who can write straightforward,
salient prose to boil down a complex situation - are also good at designing and writing straightforward, understandable code to solve a technical problem.
The more intelligent you are, the easier it is for others to follow your code.
- Richard Feynman's physics lectures come to mind - as an example of a very intelligent person explaining some very difficult concepts.
- Writing well is hard work. If others cannot follow your code, it's more likely you're a babbling idiot than a profound genius.
The more you struggle to learn new things outside of your comfort zone, the more you will appreciate how to structure code and end user features so that others can more easily understand them.
- People who've worked deep in their comfort zone for a long time often don't appreciate that where they see obvious certainty - others with wider experience see ambiguity. If you don't buy this - try getting out of your comfort zone and learning something like a foreign language someday (then go overseas and have some genius of a little kid treat you like you were retarded). It's good to appreciate that that's often what it's like to work on other people's code.
The most common mistake I've seen is when management tries to address (difficult to follow, "high entropy") code support by looking for an expert in that programming language (rather than someone experienced in that type of application).
- Getting someone who is at least familiar with code that does that type of stuff (aka: familiar with that "application domain") will get you further, faster.
4. Commenting:
- Commenting documents the low level design of the software.
- Commenting not only tells "why" (or purpose) of code, it also makes the "how" easier to visualize.
- and ultimately, you have to understand "how" the code works in order to debug and enhance it.
5. Don't Over Generalize.
The LESS a programmer understands the application domain (and its likely paths of change), the MORE generalized they tend to write the code.
- Reducing a chaotic set of requirements to simple categories, and then coming up with an easily understood, obvious design - takes far more up front thought than an overly generalized (and therefore overly complex) solution.
- In software: making simple things complex - is much easier than - making complex things simple.
- See programming joke
- Most programmers learn a new system through searching (or "greps"). As elegant as it sometimes seems, using object oriented polymorphism or hiding things such as function calls in tables of indirect references often makes searching out who calls what, and who changes which data - all that much more difficult to figure out. Was that extra generality really worth it? Did those indirect references really save that much over just using straightforward code?
- But don't under generalize either. In my experience, the most common place where this occurs is the simple copying of files between computers via disk mapping/mounts. If those files are important, I have found network disk mounts tend to be unreliable in 24x7 automated environments (and I prefer to use a more complicated "intelligent ftp" approach). As part of this: always assume the network will be unavailable at times. Buffer and forward those files automatically when the destination becomes reachable.
6. Don't Over Reuse Code.
There is a (rarely recognized) tradeoff between code reuse and code cloning:
It is the tradeoff between using more if-then-else statements throughout a single module
vs.
repeating some of the code (among separate modules) to keep their flow of control simpler.
- Reuse Pro:
- You only have to change things in one place.
- You don't have to reinvent the wheel every time you enhance a system.
- The code has fewer lines and takes up less memory.
- Reuse Con:
- Reusable code often contains extra complication to vary what it does - based on who called it.
- Conversely, extra pre & post condition checks and data structure setup often surround calls to reused functionality.
- Common/reused code can become fragile; e.g., change it to work in three places and you can easily break it for the other four. It takes more effort to test it.
- If ever there is a system where fixing one bug breaks other functionality - it is in a system whose common code has been made too complex by covering too wide a span of "common" functionality.
- An excessive number of granular function calls (to reused modules) to avoid repeating some code - can obfuscate how the main flow works.
- This is analogous to using concise, sophisticated words in prose that the reader has to lookup every other sentence. The main train of thought/narrative flow gets lost.
- Cloning Pro:
- Repeating some code allows you to more easily segment code modules by purpose (e.g., different login modules for different systems).
- Repeating code lets different (yet somewhat similar) modules evolve independently, producing less code maintenance entropy.
- The extra memory and disk space used by repeated code are cheap. Programmer time to understand code (especially someone else's code) is expensive.
- Less lines of code are NOT the same thing as keeping it simple.
- Cloning Con:
- Many programmers that are new to a system wind up repeating code that's already there. This produces cloned modules that repeat code by programmer rather than by purpose.
- Higher level changes require you to search/grep through the system to uncover other instances of the duplicated logic.
Less code (squashed down to the nth degree) does NOT always equate to simpler, easier to follow code.
Variations of this tradeoff:
- Should we have entirely separate builds
for (e.g.) separate machines, for (e.g.) separate corporate locations, etc.
- or -
Should we put all the code into one bigger build.
- i.e., Does the build simplicity of a having a single build outweigh the additional programming logic (of more if-then-else statements)?
- Should we develop a single responsive web page (which handles displays of different sizes)?,
- or -
Should we have an entirely separate page for (e.g.) mobile devices?
7. Code Cleanup Can Be Overdone:
Cleanup can encrypt and overly fragment the code, making it harder to understand.
A Common Scenario:
Cleaning up code by localizing variables and creating common modules is great, but cleanup refinements can be overdone. Doing things like replacing text strings with enumerated variables (for performance), compressing code down too tersely, and eliminating all repeated code - often act to fragment a narrative flow and/or encrypts code. This is somewhat akin to effectively semi-compiling it after development to reduce its bulk. While the result is still far from a reduction all the way down to machine language (which is indeed compressed), prior knowledge of the unencrypted development source code holds a definite advantage (and is a common way for programmers to appear exceptionally intelligent) - yet
the code wasn't like that just after development because it had to be simple and straightforward enough for the originator to understand it.
Reasonable code cleanup and clean modularity aside,
the next person shouldn't have to tease it back apart to unencrypt it (or recombine it from various modules) - in order to re-understand it.
- Computer memory and disk space are cheap. Programmer time to pickup, debug and enhance a system is expensive.
Least Favorite Software Engineering Philosophies:
Given the above, it should be fairly apparent that I disagree with some of what's out there. Two main counter philosophies are:
- Information Hiding
- I would have preferred this to be: "Make information modular and obvious".
- I disagree with the whole idea of "Information Hiding" (as practiced by those whose code I had to support).
- The major debates on the concept of "information hiding" revolve around the best way to develop code - and ignore the additional cross module state dependencies that arise during debug, maintenance and enhancement.
- Unless the code is proprietary, there's no reason to hide anything from someone trying to figure it out - especially after its initial development.
- Avoid Encodings (aka: Hungarian variable prefixing)
- As described in "Clean Code, A Handbook of Agile Software Craftsmanship", Robert C. Martin, Prentice Hall, 2009.
- I think limited encodings are a good idea. As previously stated: being able to quickly see a variable's type and scope (in a haystack full of variables) is as important as what the variable represents.
- Just because the compiler doesn't enforce typing doesn't mean you shouldn't do it.
- e.g.: Finding text is a numeric field is a common error (made easier to find if you know that text doesn't belong there from the variable's naming convention).
Finally:
Getting people to agree on what "good software" is - goes a long way towards being able to cohesively work together.
Finding help can be difficult when code is not easily understood by others.
Copyright ©2014 by Ken Freed. All rights reserved.