Thoughts on Excel

Excel is powerful, but complexity challenged. Up to a point, Excel lets you do interesting things easily; mainly thanks to:

  • Being declarative. You specify what a cell is, not how or when to calculate it (VBA macros excepted). You don’t have to worry about control flow or side-effects (much).

  • Having a tight read-evaluate-print loop. You see immediately that changes do what you expect them to or spot the obvious error.

  • Being visually transparent. You can see the value of each cell, inspect its definition, and trace it from its dependants.

Those are truly valuable and the spreadsheet model deserves much of its popularity. Simple models are often an order of magnitude easier to write in Excel than an imperative/object-oriented programming language like Java or Python (not to mention C, C++ etc.).

The challenges of complexity

Then something happens as your models increase in complexity. You notice that you make more mistakes – often in your references or formula copying. Then you notice that if you want to change an assumption, you have to do so in lots of places (and at 4 am in the morning, you usually miss one or two). Then calculations slow down. Then you realise you’ve forgotten what those four numbers on the side of that table do, but removing them changes the output…

Using Excel effectively quickly becomes an exercise in discipline. Sure, you can sum the payments in this field in a cell to the left of the table because there’s more space there, but you won’t because calculations should always flow down and to the right. You can change the formula in that table mid-way down to reflect that you don’t have any reliable forecast data after 2020, but you won’t because you know you or the analyst after you will forget about it when more forecast data is added. You can just calculate the sum of $A$4:A$14 but you won’t before you name that range “category_revenues”. You develop design patterns, layout conventions, add-ons, and macros to help you contain complexity. That buys you enough power to compete with other Excel users.

(What you’re doing is trying to keep the growth of possible interconnections linear rather than exponential by informally modularising parts of your spreadsheet.)

… and lack of recursion

In addition to being complexity challenged, Excel has significant difficulty handling certain thoughts – recursive algorithms in particular (and that is a huge class). In Excel you tend to create a table for each iteration of a recursive algorithm (say a yearly revenue forecast) and somehow decide what the maximum number of iterations are. If you wanted to simply k-means cluster products you would have to resort to VBA when you would have liked to use a recursive (declarative) function.

Together these severely limit the potential for code reuse. This is more serious than it sounds. Without code-reuse you have to implement analysis from scratch (Excel’s built-in functions) each time. You don’t generalise your algorithms because they’re only used for a particular purpose. You only have time to implement simple algorithms. You don’t invest time making your model as good as you would if it were going to be used 100, 1,000, or 1,000,000 times. You need to think about the implementation of all parts of your model because they might affect each other.

Civilization advances by extending the number of important operations which we can perform without thinking about them. — A. N. Whitehead

What to do?

In practice, one of two things tend to happen when you push Excel to its limits: you decide simple solutions are good enough or you go get an imperative/object-oriented language like C++, Java, C#, Ruby, Python or R. The former is common and understandable, the latter has problems. Sure, you can do pretty much anything you want now, but you’ve given up what makes Excel good. Things often take longer to develop and you can’t get comfortable that things are correct (even though you understand the code).

Since my consulting days I’ve been dreaming up “a perfect Excel”, including explicitly connected encapsulated modules, raw-text representations, URIs, and Git versioning. I’ve considered Python or Ruby as the underlying language but that’s absolutely wrong. A spreadsheet is declarative, not imperative and a pure functional language is the way to go; Haskell is the strongest candidate (with things like higher-order functions, almost magical strong inferred typing, lazy evaluation, pattern matching (not the regexp kind), and monads). Not surprisingly, I’m not the only one who’s had that idea. It could overhaul analysis in academia, banking, consulting, and industry; that’s incredibly exciting stuff.