Thoughts on Excel

Excel is powerful, but complexity challenged. Up to a point, Excel lets you do interesting things easily; mainly thanks to:

  • Being declarative. You specify what a cell is, not how or when to calculate it (VBA macros excepted). You don’t have to worry about control flow or side-effects (much).

  • Having a tight read-evaluate-print loop. You see immediately that changes do what you expect them to or spot the obvious error.

  • Being visually transparent. You can see the value of each cell, inspect its definition, and trace it from its dependants.

Those are truly valuable and the spreadsheet model deserves much of its popularity. Simple models are often an order of magnitude easier to write in Excel than an imperative or object-oriented programming language like Java or Python (not to mention C, C++, and the likes).

The challenges of complexity

But something happens as your models increase in complexity. You notice that you make more mistakes – often in your references or formula copying. You notice that if you want to change an assumption, you have to do so in lots of places (and at 4 a.m. in the morning, you usually miss one or two). Calculations slow down. You realise you’ve forgotten what those four numbers on the side of that table do, but removing them changes the output…

Using Excel effectively quickly becomes an exercise in discipline. Sure, you can sum the payments in this field in a cell to the left of the table because there’s more space there, but you don’t because calculations should always flow down and to the right. You can change the formula in that table mid-way down to reflect that you don’t have any reliable forecast data after 2020, but you don’t because you know you or the analyst after you will forget about it when more forecast data is added. You can just calculate the sum of \$A\$4:A\$14 but you don’t before you name that range category_revenues. You develop design patterns, layout conventions, add-ons, and macros to help you contain complexity. That buys you enough power to compete with other Excel users.

(What you’re doing is trying to keep the growth of possible interconnections linear rather than exponential by informally modularising parts of your spreadsheet.)

… and lack of recursion

In addition to being complexity challenged, Excel has significant difficulty handling certain thoughts – recursive algorithms in particular (a huge class). In Excel, you tend to create a table for each iteration of a recursive algorithm (say a yearly revenue forecast) and somehow decide what the maximum number of iterations are. If you wanted to simply k-means cluster products you would have to resort to VBA when you would have liked to use a recursive (declarative) function.

These severely limit the potential for code reuse. That is more serious than it sounds. Without code reuse you have to implement analysis from scratch (Excel’s built-in functions) each time. You don’t generalise your algorithms because they’re only used for a particular purpose. You only have time to implement simple algorithms. You don’t invest the time to make your model as good as you would if it were going to be used 100, 1,000, or 1,000,000 times. You need to think about the implementation of all parts of your model because they might affect each other.

<footer>
    Alfred North Whitehead
</footer>

What to do?

In practice, one of two things tend to happen when you push Excel to its limits: you decide simple solutions are good enough or you go get an imperative/object-oriented language like C++, Java, C#, Ruby, Python or R. The former is common and fair, the latter has problems. Sure, you can do pretty much anything you want now, but you’ve given up what makes Excel good. Things often take longer to develop and you can’t get comfortable that things are correct (even though you understand the code).

Since my consulting days I’ve been dreaming up “a perfect Excel”, including explicitly connected encapsulated modules, raw-text representations, URIs, and Git versioning. I’ve considered Python or Ruby as the underlying language but those would be poor choices. A spreadsheet is declarative, not imperative. A pure functional language is the way to go; Haskell is the strongest candidate, with things like higher-order functions, almost magical strong inferred typing, lazy evaluation, pattern matching (not the regexp kind), and monads. Not surprisingly, I’m not the only one who’s had that idea (see Lisper and Malmström). It could overhaul analysis in academia, banking, consulting, and industry; that’s incredibly exciting stuff.

References

Björn Lisper and Johan Malmström, "Haxcel: A Spreadsheet Interface to Haskell" 14th Int. Workshop on the Implementation of Functional Languages ()