Error reporting is a tricky business. Get it right and your users will be able to correct their actions quickly. Get it wrong and you will be the cause of your user's frustration — because, face it: A) users will make mistakes and B) your application will encounter erroneous conditions due to bugs or unpredicted scenarios.
Understand usage errors and application errors
From the programmer's point of view, there are many types of errors that an application can detect; but, from the point of view of the user, errors can be classified in two broad categories: usage errors and application errors. Let's take a look at what these are before diving into how to handle them.
Usage errors are the kind of errors that can only be caused by the user due to a misunderstanding of the program interface or due to the fat-fingering of a command. These include cases like specifying an unknown option, providing an argument to an option that does not accept any, trying to invoke an unknown subcommand, or making a mistake in the number or type of the positional arguments to the application. These errors are easily fixable just by re-executing the program with the right syntax.
Application errors are "everything else that is not a usage error". For example: failure to open a file for write, failure to allocate memory, or failure to connect to a remote server. It is true that some of these can be caused by the user: for example, if the application takes a path to an input file and the user provides a path to an unreadable or non-existent file, the application will fail. However, there is nothing wrong with the invocation of the application: the application is behaving just as the user requested and encountered an error with an external system.
Once you grasp the difference between these categories, you are armed with the key knowledge that will improve any error handling you implement — even in GUI applications!
Guiding the user to get help
Whenever the user encounters an error, he will either be able to immediately fix the problem given enough experience or he will need to look for additional details elsewhere. As the designer of the application, it is your job to make these pleasant experiences:
In the case of usage errors, the most useful information that the user can receive is an explicit error message indicating what is wrong with his command and an additional message explaining how to obtain further help. This 1- or 2-line long message is terse and tells the user all he needs to know to fix the problem: novice users will find details on how to see the full instructions while experienced users will quickly realize their mistake by reading the explicit error message.
For example: a message such as "Unknown option" is suboptimal because it does not indicate which option was wrong and it does not mention how to see which options are valid. A better alternative is to say "Unknown option --path; type 'myapp help' for details."
In the case of application errors, you should print a description of the problem the application encountered. This is trickier than you might think: the description needs to be accurate enough to pinpoint the root cause of the problem and not be ambiguous, while at the same time it needs to provide enough context to relate such root cause to what the user asked for.
For example: a "No such file or directory" error message, on its own, is useless in general. Which file does not exist? An input file that the user provided on the command line or a supporting file that the application attempted to load in the background? A better way to convey such an error would be to say: "Cannot open configuration file /etc/myapp.conf: No such file or directory."
Generating error messages for usage errors is easy because those errors are captured in the presentation layer and handled at that level. However, application errors may come from deep layers of your software stack: unless you have been careful to annotate those errors during their propagation with enough details to generate a message with enough context, you won't be able to do so when the time to print it comes.
A common trend nowadays in languages that dump a stack trace in the face of errors is to delegate error reporting to just this: raise an exception from an inner layer of the software stack, let it propagate all the way out, and let the interpreter print it out along with the stack trace.
This is wrong. Your application should explicitly handle all errors that it is expected to raise and you should think hard on the best way to report each case. A single message summarizing the problem along with a suggestion on how to correct it is much better than dumping a cryptic and useless stack trace. (Pro tip: Implementing integration tests for these error cases will force you to think what the best course of action for each error is.)
But I agree with you: handling all errors is hard, especially in languages that implement exceptions. It's all too easy to miss the handling of a specific exception raised many abstraction levels below, and that's OK as long as you consider these cases to be bugs. For this reason, I often implement a catch-all handler that spots these unhandled exceptions, prints debugging information and asks the user to report the problem to the right issue tracker.
Never consider stack traces to be a replacement for error messages.
Errors go to stderr
stdtout and stderr is a topic that we will get into in more depth later in the series. But, for now, all you should keep in mind is that error messages belong in stderr. Make sure to send them there, even when writing shell scripts!
I want to help the user when there is an error!
That's an excellent goal and is what made me write this article in the first place. However, you will have to wait until Thursday for further details, as Thursday's post will focus on describing how to offer help and how not to. The conditions in which help is offered depend on the types of errors that the application can yield and thus I had to describe error handling first!