Too often we do great analysis but it’s a flash in the pan, a snowflake of inspiration that is almost impossible to recapture. That’s why I’d like to offer a few basic thoughts on documenting analytical work, particularly if you’re using a tool like Microsoft Excel:
- Save a clean copy of the data. Note the source of the data and any parameters used to filter the data from that source.
- If this is an Excel workbook, create a new tab and move it to the beginning like a cover page outlining this key piece of information.
- If this is data meant to be machine-readable, add comments or other human-readable notations to capture this information.
- Save a working copy of the data. This is the file that will capture all your changes as you go through. If you want, create a set of “checkpoints” by saving the file at various stages. If you can, use a version control system like git to save your data (so long as it is in a text file format like CSV or TSV, it’s less helpful with Excel files)
- Keep a list of key steps. Note any transformations of the data, including formulas, copying & pasting, or other key operations. This takes some time to figure out what to document, but you should have enough detail to redo the analysis as if you don’t remember a single step (because you won’t, trust me).
- Redo the steps to make sure they’re complete. We all love arriving at an answer and love announcing it in a triumphant email right before we go home, but take a moment to go back over the steps, make sure they’re correct and anyone trying to redo the analysis could reasonably follow what we’ve done.
If this all seems like a burden, just imagine being asked to re-verify your numbers after someone important writes a news article or gives a speech that included your findings and you figure out you made a mistake. As I like to say in my classes, check twice and you’ll sleep easier.