A coworker just sent me some Python code for review and, among such code, there was the addition of a function similar to:
def PathWithCurrentDate(prefix, now=None):
"""Extend a path with a year/month/day subdirectory layout.

Args:
prefix: string, The path to extend with the date subcomponents.
now: datetime.date, The date to use for the path; if None, use
the current date.

Returns:
string, The new computed path with the date appended.
"""
path = os.path.join(prefix, '%Y', '%m', '%d')
if now:
return now.strftime(path)
else:
return datetime.datetime.now().strftime(path)
The purpose of this function, as the docstring says, is to simplify the construction of a path that lays out files on disk depending on a given date.

This function works just fine... but it has a serious design problem (in my opinion) that you only see when you try to write unit tests for such function (guess what, the code to review did not include any unit tests for this). If I ask you to write tests for PathWithCurrentDate, how would you do that? You would need to consider these cases (at the very very least):
  • Passing now=None correctly fetches the current date. To write such a test, we must stub out the call to datetime.datetime.now() so that our test is deterministic. This is easy to do with helper libraries but does not count as trivial to me.
  • Could datetime.datetime.now() raise an exception? If so, test that the exception is correctly propagated to match the function contract.
  • Passing an actual date to now works. We know this is a different code path that does not call datetime.datetime.now(), but still we must stub it out to ensure that the test is not going through that past in case the current date actually matches the date hardcoded in the test as an argument to now.
My point is: why is such a trivial function so complex to validate? Why such a trivial function needs to depend on external state? Things become more obvious if we take a look at a caller of this function:
def BackupTree(source, destination):
path = PathWithCurrentDate(destination)
CreateArchive(source, os.path.join(path, 'archive.tar.gz'))
Now, question again: how do we test this? Our tests would look like:
def testOk(self):
# Why do we even have to do this?
... create stub for datetime.datetime.now() to return a fake date ...

CreateArchive('/foo', '/backups/prefix')
... validate that the archive was generated in the fake date directory ...
Having to stub out the call to datetime.datetime.now() before calling CreateArchive is a really, really weird thing at first glance. To be able to write this test, you must have deep insight of how the auxiliary functions called within the function work to know what dependencies on external state they have. Lots of black magic involved.

All this said, the above may not seem like a big issue because, well, a call to datetime.datetime.now() is cheap. But imagine that the call being performed deep inside the dependency tree was more expensive and dealt with some external state that is hard to mock out.

The trick to make this simpler and clearer is to apply a form of Dependency injection (or, rather, "value injection"). We want the PathWithCurrentDate function to be a simple data manipulation routine that has no dependencies on external state (i.e. make it purely functional). The easiest way to do so is to remove the now=None code path and pass the date in right from the most external caller (aka, the main() program). For example (skipping docstrings for brevity):
def PathWithCurrentDate(prefix, now):
path = os.path.join(prefix, '%Y', '%m', '%d')
return now.strftime(path)

def BackupTree(source, destination, backup_date):
path = PathWithCurrentDate(destination, backup_date)
CreateArchive(source, os.path.join(path, 'archive.tar.gz'))
With this approach, the dependency on datetime.datetime.now() (aka, a dependency on global state) completely vanishes from the code. The code paths to validate are less, and they are much simpler to test. There is no need to stub out a function call seemingly unused by BackupTree.

Another advantage of this approach can be seen if we were to have multiple functions accessing the same path. In this case, we would need to ensure that all calls receive the exact same date... what if the program kept running past 12AM and the "now" value changed? It is trivial to reason about this feature if the code does not have hidden queries to "now" (aka global state) within the code... but it becomes tricky to ensure our code is right if we can't easily audit where the "now" value is queried from!

The "drawback", as some will think, is that the caller of any of these functions must do more work on its own to provide the correct arguments to the called functions. "And if I always want the backup to be created on the current directory, why can't the backup function decide on itself?", they may argue. But, to me, the former is definitely not a drawback and the latter... is troublesome as explained in this post.

Another "drawback", as some others would say, is that testing is not a goal. Indeed it is not: testing is only a means to "correct" code, but it is also true that having testable code often improves (internal) APIs and overall design.

To conclude: the above is an over-simplistic case study and my explanations will surely not convince anyone to stop doing black evil "clever" magic from within functions (and, worse, from within constructors). You will only realize that the above makes any sense when you start unit-testing your code. Start today! :-)

Go to posts index

Comments from the original Blogger-hosted post: