At DCG, when we work with clients we often conclude that spreadsheets may be the most appropriate way to manage a simple metrics program. However, I was reminded this week (based on an incident at a client which caused me to look up the appropriate section of "Competing on Analytics" by Thomas H. Davenport and Jeanne G. Harris) that there are a couple of common problems with basing your metrics program on spreadsheet repositories:
- Errors - Raymond Panko wrote a much referenced article back in 1998 based on his research which suggested that 20-40% of user-created spreadsheets contain errors. The more spreadsheets, the more errors.
- Multiple versions of the truth - too often there is not a single, multi-user spreadsheet but multiple versions being spread like a virus across an organization by email, each with its own few unique data values.
- Unintended uses - the data in spreadsheets are very difficult to keep under control when key data elements are changed to meet new needs. The knock-on effect on linked data can be catastrophic but not necessarily visible.
Worst of all, its almost impossible to debug or do a data integrity check on even a mildly complex spreadsheet. The lesson - use spreadsheets as much and as often as you like but do not attempt to build a serious metrics or estimating application on them if you want it to have a useful life in your organization of more than one year!