What follows are my rules of thumb for good software development practices. There are a lot of environments and methodologies and these often result in different practices. I have tried to assemble a list of rules that apply to almost any kind of development.
4: Build Rock Solid Components
6: Push Logic to the Lowest Level
11: Build Things to be Configurable
Anything that happens more than once in your application should be implemented as a class or function or macro or something. You should never implement any procedure, any logic, any process, any style, any constant, any configuration more than once. The reason is very simple. Eventually it will change. If you have implemented it more than once then you have to go back and find all of the places where you implemented it. You never will find them all. This is one of the most common sources of bugs.
Exception handling is one of the things that separates the average programmers from great programmers. All of those things that you think will never happen-they will. You may think the odds of something happening are 1 in a million, but if you code executes a million times it is a certainty that it will happen. And your estimate of 1 in a million is probably closer to 1 in 1000. Unless you are writing code that you expect no one to use, all kinds of things are going to happen that you never expected. How your software handles those situations is the difference between a good developer and a poor developer.
Good exception handling requires two things. First you must be able to recognize what potential exceptions might happen. This requires both spending the time to consider the questions and the insight to anticipate the possible exceptions. This is the hard part and requirers an effort to do that analysis. Secondly you have to have the discipline to actually implement code to handle the exceptions. Most developers will not do this. All great developers do.
When it comes to software development there always are a lot of ways to skin the cat. And while there are always some wrong ways there is often not a single best way. Additionally there are a lot of ways that are not the best way but are still workable. Sometimes you can argue endlessly about which way is the best way. I will give one answer to help resolve that decision. The best way is the simplest way. If you have two options that will provide the same result use the one that is simplest.
A lot of times keeping it simple means identifying what factors are relevant and which are not and then only dealing with the relevant ones. Do not add logic or code that does not contribute anything to the outcome. Spending a little more time understanding the logic of what you are coding will help you identify which factors are irrelevant. You’ll end up writing a simpler algorithm and you will probably finish it more quickly.
Sometimes keeping it simple means not having modules or layers that are not necessary. Don’t adopt a complex architecture if a simple one will do. Don’t build a 4 tier system if a 3 tier system will accomplish your objective just as well. Don’t build a 3 tier system if a 2 tier system will work just as well. In the 90’s and 00’s I successfully avoided DLL hell because I did not use DLL’s. If you can accomplish something with one classes instead of 3 that is probably simpler as long as the single class is not overly complicated. Two simple classes are better than a single class with a complicated and confusing list of methods.
The only thing that complexity adds to your project is opportunity for things to go wrong. Keep it simple.
In other words unit testing is critical to good development.
The easiest way to build a big system is to break it down into components. It’s difficult to build a system out of 1000 different parts. If the parts can be assembled into 50 components with 20 parts each then all you have to do is assemble the 50 components. This is much easier. But this only works if the components can be trusted. That means you have to know what each component is doing and you have to be confident that the component is doing what it claims to be doing. This doesn’t just mean a component behaves well under good circumstances. You have to know that your components will behave predictably when exceptions are encountered. This truly is one of the most important facets of building quality software.
I will admit that I was a late adopter of revision control. Working on a lot of projects independently it seemed pointless. I was wrong. Even if you work by yourself, the ability to look at and revert to past revisions is valuable. It provides a very quick way to evaluate where new bugs came from and provides a quick and accurate way to roll back changes quickly when necessary. There are a lot of advantages to revision control systems for groups as well as individual developers. I'll list just a few-I'm sure there are many more:
It is a pretty common and accepted practice to build application is layers. Usually at the top layer is the user interface and at the bottom is a data source-usually a database. There can be any number of layers between them depending on the architecture of the application. In general most applications take the form of an inverted pyramid with a single data source at the bottom and multiple user interfaces at the top.
By pushing logic and rules to the lowest level possible you reduce the number of modules that have to implement them. You accomplish consistent implementation and by doing it just once you make your application simpler. If you have multiple applications that all access a single web service then implementing rules in the web service (sometimes) prevents you from having to implement them in all of the applications. Rules enforced in a web service will never be broken by a front end developer. If you have multiple web services that access a single database then implementing rules in the database (sometimes) prevents you from having to implement them in the web service. Rules that are enforced by the database will never be broken by either a front end or a back end developer.
Most projects that you will work on will have parts that are straight forward and parts where there is potential for complications. Complications might come in the form of compatibly issues, calculations, data definitions and rules, capacity or licensing. You want to tackle these things first because they are the ones that have potential to either kill your project or take a long time. Putting off a part of the project that could potentially take a long time is increasing the risk that you will miss deadlines. Doing them early increases the chance of making your deadlines not just because you are dealing with the time consuming stuff earlier in the project but also because you are dealing with the stuff that might make you change directions earlier. As a result you minimize the probability of having to rewrite.
You don’t want to build system around a set of data and then afterwards find out that the data is not available how or when you need it. You don’t want to build a system using tools that you later realize that you can’t license. You don’t want to build a system around a process that you later realize is not valid. You don’t want to get five months into six month project and then realize that the module that you are just beginning will take three months, not one month because it turns out to be more complicated that you thought.
One of the keys to good development is identifying the difficult or risky parts of the project and attacking those first.
I’ve often heard a developer say “It would be easier just to rewrite it.” I’ve said it myself. It is rarely true. The developer who created the code had knowledge gained through the experience of building the application that you probably do not possess. There are rules, exceptions, preferences, behaviors and idiosyncracies that he learned. There are quirks in the logic that he has already solved. By throwing out his code and starting over you are losing all of that information.
Almost every time a developer starts into a project he thinks it is going to be easier than it turns out to be. That is exactly where the developer who says “It would be easier just to rewrite it” stands. It is only through executing the project that these issues come up. This is usually more time consuming than unraveling even convoluted code. You will have to relearn it the hard way. Almost every time I have seen a developer rewrite something because they thought it would be faster they miss a lot of the functionality that was in the original code. They either end up spending a lot longer than they estimated fixing their own code to do things the original code included or the end users end up losing a lot of features that they had in the earlier version.
Therefore, unless the code you are considering rewriting is small and well defined, it is usually much easier just to fix the code.
Developers hate to document. Expecting them to provide a lot of documentation is probably not reasonable. What is important then is to figure out what kind of documentation is the most useful and make sure that you have that. The single most important documentation defines the interfaces between modules or layers.
I prefer IPO documentation (input, processing, output). If I know what the inputs are, what the module will do and what the output or return data will be then I have enough information whether I am building that module or whether I am writing code that uses it. In an object oriented world I don’t need to know how a module does what it does, I only ned to know what it does, what it requires and what it will return to me. And to build a module in an object oriented world I don’t need to know what the rest of the system is doing, I only ned to know what my object is supposed to do, what will be provided to it and what is expected from it.
IPO is what you need to know to build the pieces and to assemble them into a working system.
Database constraints allow you to implement rules about your data that do not require coding on your part. They require no debugging or testing. All you have to do is turn them on. And except in a few rare situations there is no noticeable performance impact.
Yeah, I know what some of your are thinking: If you build your software properly you won’t allow bad data into the database in the first case. I suppose that’s true, if you never make a mistake, and no one else on your team ever makes a mistake, and if the people who follow you when you leave never make a mistake, and your DBA’s never make a mistake. When you think abut it that way i seems a little more likely that a mistake will eventually be made. Development is not a test of manhood. Smart developers don’t do things the hard way or the risky way just to prove they can. They use the available tools as effectively as possible.
Cleaning bad data because a bug allowed bad data into your database can be a very difficult process. Database constraints will in most cases prevent this from happening-and again I repeat without doing any coding, debugging or testing on your part. It is rule checking provided to you for free by your database. Why would anyone not use them?
Some programmers will tell you it is harder to make things configurable. They are just wrong. Is it really harder to code “for x = 1 to 20” than to code “for x = 1 to MAX_SOMETHING”? Obviously it is not. But saying that making something configurable is not harder is not really a compelling reason to do it. What we really need to explain is why making things configurable is actually easier. The reason is change.
Whatever you are programming today is going to change at some point. I don’t care what people tell you about your requirements being set in stone it is going to change-if not now then a months from now or a year from now. And while the initial coding of a configurable program may not be easier, the changes are a LOT easier-so much easier that you will come out ahead after just the first change. But what you are coding is not likely to change just once, it will probably change many times.
The reason that changing a configurable system is easier is that you only have to change one thing, the configuration setting. This is really a specific implementation of rule 1: Never Do Anything Twice. If your code is not configurable then you have to figure out all of the places where it is going to change. You’ll miss some of them and that will result in bugs. Event if you don’t miss some of them you’ll spend a lot of time looking for all of the places that need to be changed.
When I say make things configurable I am not necessarily saying that everything needs to be user configurable. You don’t necessarily need to build a front end where users can manage every single setting. But minimally you ought to have some sort of settings file or list of constants that will allow the developer or a power user to configure the application.
There is a lot about test driven development that I like but I have this one big issues. If you are building something with a reasonable level of complexity, it’s not enough to build a test and then write code that fulfills the test requirements. You can never really write a test that would check for every possibility. You really have to understand the problem that you are trying to solve and then understand how and why your solution solves that problem. You need to have an understanding of what can go wrong and apply good development methods to avoid those circumstances. This is true regardless of whether you are tasked with writing a single function or an entire system.
I has a professor who used to say “The only thing passing a test measures is the ability to pass the test.” He was speaking of testing in a scholastic setting. Passing a test does not measure your actual knowledge or ability but hopefully a well written test can approximate that. Some of my teacher friends tell me that this is the problem with government mandated test for high school graduation. If teacher performance is based on how many of their students pass the test then they focus on teaching what is required to pass the test, not necessarily what is required to be successful adults. The same is true with software development. I’ve actually watched developers implement invalid logic because it produces the outcome required by the test instead of understanding the logic of what the code is really supposed to do.
It’s great to have a set of tests before you start development as long as your development is not focussed simply on passing the tests. Being too focussed on passing an acceptance test will result in software that is accepted but is not very robust.
I used to have a rule of thumb that I never wanted to write any chunk of code that I could not see on my screen without scrolling. That would be about 20 to 30 lines of code. If the chunk of software that you are coding is larger than that then you need to shrink it by implementing functions of classes or something. This is a rule of thumb, not a rule set in stone.
The basis for that rule was that if a piece of code is so large that I needed to do a lot of scrolling to see it all then it is going to be more difficult to understand if I am seeing it for the first time or if I am trying to recall what I was doing after a period.
The reality is that whether I am trying to understand existing code or trying to write new code, it is far easier to understand a small chunk than a large chunk. The best way to build a large system is not to build big chunks. It is far easier to build small chunks which are used to build other small chunks which are used to build other small chunks…
One of the benefits of any good development methodology is reuse. Reuse is a lot easier if you build generic classes and components. You are not likely to reuse a function that converts Tokyo time to Honolulu time. You’re much more likely to reuse a class that will covert a time from any given time zone to any other time zone. And probably that application that is just handling times in Tokyo an Honolulu will be expanded to handle other time zones. You’re not very likely to reuse a user interface widget that accepts input of a company id number in whatever format that would be. You’re much more likely to reuse a generic masked input widget. Of course both a masked edit and a date class are available in most development environments. This only proves the case that generic components are much more reusable.
I once had to write a program to take serial input and stream it into a database. Instead of building a program that was specific to that particular application, that specific protocol, that specific database, I built a generic serial streaming program. I ended up selling it to a lot of other people who had the same problem-all because I built generic software.
At some point someone is going to have to read your code and figure out what you are doing. It may even be you if you stick around long enough. I can say from experience that it is aggravating to look at your own code written a few years earlier and try to figure out what you were thinking. Imagine the frustration of trying to figure out someone’s logic.
This is partially about including comments, but it is more about style. Here are a few guidelines:
You may be the worlds leading master of whateve language you are using. Demonstrate it by writing code that is robust and elegant, not hard to read. In other words be a professional.
If you ever work on applications that use a database you ought to learn SQL. SQL is probably the simplest and most powerful language you will ever use and almost every database that you use will use SQL.
I know that it is becoming popular to use frameworks that isolate the developer from the database and this certainly has its place. If you are writing an application where your logic is related to a single table at any point in time then using framework classes to encapsulate those tables is a good idea. If you are writing modules that edit a person or a product or an organization record this is probably the easiest way to get that done.
However there are cases where being able to write a complex query or update statement will make your code considerably more simple and efficient. SQL can do things with data in a single instruction that sometimes require dozens of instructions in a higher level language. And it does it more efficiently because it can result in considerably less traffic between the database and the platform that is executing your program.
But learning SQL has another substantial advantage: encapsulating data rules and relationships within the database itself. Writing views and stored procedures allows you to put an object oriented face on your database that shields applications or reporting tools from the database implementation. Building relationships and rules into views allows you to change those easily without ever having to edit the applications or reports. You simply need to rebuild a view or a procedure.
I worked on a member application where they were frequently redefining what a member was. Not a problem-I just rebuild the members view whenever the rules changed. Any screens or pages or reports that were based on membership were changed instantly. Often they would change how they formatted names (include a middle name, include a middle initial, include prefixes or suffixes). Again not a problem. I had a view that assembled a member’s full name. Rebuilding the view would effect how names were shown everywhere.
Using stored procedures takes this a step further. Placing security validation or business rules in stored procedure makes shields that logic from a less knowledgeable developer who is editing the application to make an unrelated change. Implementing database updates that need to be implemented as a transaction involving several updates is more secure in a stored procedure. This is not something that you will need to do in every application but on the occasions when you do it is invaluable.
For the same reason that we like to encapsulate the model in an MVC architecture it makes just as much sense sometimes to encapsulate some of this at an even lower level, in the database itself.
This is an important part of writing readable code-important enough to warrant its own rule. You can save yourself a lot of time commenting if you just use intelligent naming. If a variable represents the start date of something call it StartDate (or start_date or whatever convention you are using). Don’t call it date or date1 or sd. If a method is used to determine of a person is an adult call it IsAdult returning true if they are an adult or false if they are not. You can tell from the name exactly what it does and what it returns. If you call it CheckAdultStatus someone can guess what it does but not what it returns. They still have to look it up in order to read your code. If you have a style that is used to format error messages call it ErrorMsgStyle, not BigRedText and certainly never style12. If I am placing a column title on the page I want to use a style whose name indicates that it is for a column title. I don’t want to have to know to use bold and blue. Besides, at some point we may decide to change the column titles to italic and violet. If the type is named ColumnTitle instead of LargeBoldBlue this is not a problem. Save is an acceptable name for a class method if it is clear what is being saved. If there are more than one thing that could be saved then it is not precise enough.
Here are some guidelines that I like to use. I don’t think that it is important that you use my guidelines as much as you have guidelines that increase the understandability of your code.
1. Names for anything that refer to a binary state should be named as a question in the form IsSomething or HasSomething. This clearly states what the variable represents and also how to interpret the value. Good examples: IsAdult, IsExpired, HasChildren. Bad examples: AdultStatus, ExpirationStatus, ParentalStatus.
2. Names for variables (other than binaries) should be nouns as they refer to information, not actions, and should precisely describe the information that they represent. Examples: EmployeeID, FirstName, EmployeeCount, EmployeeList. Bad examples: x, getcount, employees (does this refer to a list, a count or something else?). Admittedly I will sometimes use something like x or cnt as a loop variable as long as the loop only consists of a few lines.
3. Function or method names that do something other than just returning data should be named as a verb or action phrase that precisely describes what is being done. You should use conventions in order to have commonality. Functions that read something from a database should all be named ReadSomething or FetchSomething or something like that.
4. Class methods that simply return a value related to the class (accessors) should be named as variables and those rules apply here.
5. Styles should be named after their function instead of their appearance or implementation.